ABSTRACT
Evaporation is a basic element in the hydrological cycle that plays a vital role in a region's water balance. In this paper, the Wild Horse Optimizer (WHO) algorithm was used to optimize long short-term memory (LSTM) and support vector regression (SVR) to estimate daily pan evaporation (Ep). Primary meteorological variables including minimum temperature (Tmin), maximum temperature (Tmax), sunshine hours (SSH), relative humidity (RH), and wind speed (WS) were collected from two synoptic meteorological stations with different climates which are situated in Fars province, Iran. One of the stations is located in Larestan city with a hot desert climate and the other is in Abadeh city with a cold dry climate. The partial mutual information (PMI) algorithm was utilized to identify the efficient input variables (EIVs) on Ep. The results of the PMI algorithm proved that the Tmax, Tmin, and RH for Larestan station and also the Tmax, Tmin, and SSH for Abadeh station are the EIVs on Ep. The results showed the LSTM–WHO hybrid model for both stations can ameliorate the daily Ep estimation and it can also reduce the estimation error. Therefore, the LSTM–WHO hybrid model was proposed as a powerful model compared to standalone models in estimating daily Ep.
HIGHLIGHTS
Optimize long short-term memory (LSTM) and support vector regression (SVR) by the Wild Horse Optimizer (WHO) algorithm to estimate daily pan evaporation.
Using the partial mutual information (PMI) algorithm for recognition of the efficient input variables on pan evaporation.
The LSTM–WHO hybrid model was proposed as a powerful model compared to standalone models in estimating daily pan evaporation.
INTRODUCTION
Evaporation is a phenomenon in which water turns into vapor (Majhi et al. 2020). It is the first way of returning water to the hydrological cycle (Liu et al. 2004; Landeras et al. 2008). Evaporation is an important part of the hydrological cycle which affects the region's water balance, hydrological modeling, irrigation system design, and agricultural production (Gundalia & Dholakia 2013; Moazenzadeh et al. 2018; Guan et al. 2020; Mohamadi et al. 2020; Allawi et al. 2021). Millions of cubic meters of water are lost annually due to the process of evaporation and plenty of salt and solute which leads to water quality reduction (Havens & Ji 2018). As the shortage of water resources is always a serious problem in the world, the accurate estimation of evaporation is considered as one of the main processes of water loss which has been emphasized by researchers (Samii et al. 2023). The accurate estimation of evaporation takes a leading role in calculating water balance, designing irrigation systems, managing water resources and sustainable development (Piri et al. 2016; Dou & Yang 2018; Kumar et al. 2021). Therefore, the accurate estimation of evaporation in dry regions with a shortage of water resources seems necessary and it provides useful information in drought conditions (Malik et al. 2020a; Seifi & Riahi 2020).
Generally, evaporation is measured in direct and indirect ways. The pans are considered as one of the most common direct methods used for evaporation measurement in regions (Warnaka & Pochop 1998; Wang & Dickinson 2012; Ehteram et al. 2022). Using pans for direct measurement of evaporation is not possible in regions without meteorological stations where the establishment of the station requires costs and facilities. Instead, empirical formulas and analytical methods are used for evaporation estimation (Wang et al. 2019). Relative humidity, air temperature, wind speed, solar radiation, atmospheric pressure, dissolved solute, and latitude and longitude all have an impact on evaporation and should be taken in mind when we want to estimate the evaporation (Alizadeh et al. 2017; Tao et al. 2018).
Basically, experimental methods belong to a specific geographical region and they require specific data which is impossible in some cases (Khosravi et al. 2019). On the other hand, it is practically impossible to model evaporation through experimental methods due to the physical complexity, non-linear nature and many affecting elements on it, (Singh & Xu 1997). Nowadays regarding the limitations, new methods have been invented to estimate evaporation. In recent decades, artificial intelligence (AI) technique as a new approach, has been successfully applied to water resources studies (Diop et al. 2018; Qasem et al. 2019; Ali et al. 2020; Malik et al. 2020b; Tur & Yontem 2021).
This technique has demonstrated its ability to predict many hydrological variables such as streamflow (Adnan et al. 2021a), drought (Parisouj et al. 2020), rainfall (Salih et al. 2020; Adnan et al. 2021b), groundwater (Rahman et al. 2020; Mosavi et al. 2021) and evapotranspiration (Alizamir et al. 2020; Makwana et al. 2023). The main reason for using this technique is the high ability to establish non-linear relationships between input and output variables. In the first phase, this technique receives data as input in different forms such as numbers, speeches, texts, images etc. Consequently, it processes data through various algorithms. After the processing phase, the AI technique provides an output for the input. Finally, the result will be evaluated through analysis, discovery and feedback. This loop continues until the desired result is attained. Neural networks, machine learning, cognitive computing, deep learning, computer vision and natural language processing can be considered as main components of AI. AI technique has been accepted by researchers due to the high-speed data processing. Although the usage of the AI technique has been providing accurate and fast estimation of hydrological variables, its accuracy highly relies on the user's knowledge and the perception of AI (Razavizadeh & Dargahian 2018).
Many algorithms such as Particle Swarm Optimization (PSO), Genetic Algorithm (GA), Differential Evolution (DE), Bat Algorithm (BA), Gray Wolf Optimizer (GWO), Harris Hawks Optimization (HHO), Giza Pyramids Construction (GPC) etc. made hybrid with AI technique in order to augment the capabilities of it. These algorithms are classified as meta-heuristic algorithms which are used for optimization. They have become popular in recent years due to their higher efficiencies, optimal value estimation of model parameters and high accuracy of estimation. Ehteram et al. (2022) used an ANN model as well as several algorithms such as Firefly Algorithm (FFA), Capuchin Search Algorithm (CSA), GA, and Sine Cosine Algorithm (SCA) to predict how much water would evaporate each day for seven synoptic stations in Iran with different climates. Then, they used the Inclusive Multiple Model (IMM) for forecasting the evaporation. Guan et al. (2020) used the SVR model combined with the Krill Herd Algorithm (SVR-KHA) to estimate daily Ep in three Iranian meteorological stations with humid climates. Various combinations of meteorological input variables including Tmax, RH, rainfall, SSH, WS, solar radiation, and vapor pressure were considered for Ep modeling. Feng et al. (2018) suggested an Extreme Learning Machine (ELM), optimized ANN models with PSO algorithm (ANN–PSO) and GA algorithm (ANN–GA) for estimating Ep in different climate regions of China. Bhattarai et al. (2023) used LSTM and Gaussian process regression (GPR) models for forecasting monthly Ep at two meteorological stations, one with a tropical climate in Hialeah, Florida and the other with a Mediterranean climate in Markley Cove, California.
The innovative aspect of this study is to use of the new WHO algorithm for the optimization of LSTM and SVR parameters with the goal of improving the estimation of daily Ep in different climates.
MATERIALS AND METHODS
Study area
Fars province is one of the southern provinces of Iran, which is situated at latitude 27°1′ to 31°42′ N and longitude 50°34′ to 55°44′ E and covers an area of 122,272 km2. This province with 37 cities has diverse climates due to the uneven distribution of precipitation. This province contains of 67.4% range, 20.4% forest and 12.2% desert. It has a mean temperature of 18.9 °C, annual rainfall of 286.8 mm and annual evaporation of 2,553.4 mm. The cities of Larestan, in the south of Fars province, and Abadeh in the north were selected as the case study. Larestan city is located in 351 km from the center of the province. This city is spread between latitude 27°19′ to 28°11′ N and longitude 53°22′ to 55°44′ E with an area of 10,740 km2. The mean temperature, annual rainfall, and annual evaporation of this city are respectively, 23.9 °C, 213 mm, and 3,307 mm. According to the Emberger climate classification system, Larestan city has a hot desert climate.
Data collection and preparation
For modeling the daily Ep of both cities, the data of synoptic meteorological stations of these cities were collected on a daily scale. Larestan meteorological station is located at latitude 27°40′12″ N, longitude 54°22′29″ E and an altitude of 792 m. In such manner, Abadeh meteorological station is situated at latitude 31°11′54″ N, longitude 52°36′42″ E and the altitude of 2030 m. The distance between the two stations is 611 km. The collected data included Ep, SSH, Tmax, Tmin, WS, and RH. In this study, 3,654 data sets and 2,483 data sets were used respectively for Larestan and Abadeh stations during the years 2012–2022. Due to the missing data in Abadeh station for some years, the number of data is less than that in Larestan station. Table 1 shows the statistical characteristics of the data used in both meteorological stations. The variables of Tmin, Tmax, RH, WS, SSH were considered as primary input variables and the daily Ep considered as the output variable. The PMI algorithm was used to identify the EIVs on Ep among the five primary input variables in both stations.
Station . | Statistical criteria . | Ep (mm) . | Tmin (°C) . | Tmax (°C) . | SSH (h) . | RH (%) . | WS (m/s) . |
---|---|---|---|---|---|---|---|
Larestan | Min | 0.2 | −4.2 | 6.8 | 0 | 6 | 0 |
Max | 22.3 | 32.4 | 47.4 | 12.9 | 96.5 | 14.5 | |
Mean | 8.94 | 16.21 | 32.67 | 9.67 | 40.75 | 1.51 | |
Median | 8.8 | 16.6 | 33.8 | 10.2 | 37 | 1.38 | |
SD | 4.79 | 8.23 | 9.02 | 2.44 | 15.86 | 0.97 | |
CV% | 53.57 | 50.77 | 27.6 | 25.21 | 38.93 | 64.41 | |
Abadeh | Min | 0.3 | −6.6 | 4.2 | 0 | 6.5 | 0 |
Max | 18.2 | 29.7 | 39.3 | 12.7 | 87.5 | 10.67 | |
Mean | 8.87 | 11.5 | 27.41 | 9.67 | 26.78 | 3.06 | |
Median | 9.1 | 12.1 | 28.6 | 10.4 | 21.5 | 2.88 | |
SD | 3.7 | 5.8 | 6.63 | 2.59 | 14.82 | 1.28 | |
CV% | 41.68 | 50.46 | 24.19 | 26.77 | 55.33 | 41.75 |
Station . | Statistical criteria . | Ep (mm) . | Tmin (°C) . | Tmax (°C) . | SSH (h) . | RH (%) . | WS (m/s) . |
---|---|---|---|---|---|---|---|
Larestan | Min | 0.2 | −4.2 | 6.8 | 0 | 6 | 0 |
Max | 22.3 | 32.4 | 47.4 | 12.9 | 96.5 | 14.5 | |
Mean | 8.94 | 16.21 | 32.67 | 9.67 | 40.75 | 1.51 | |
Median | 8.8 | 16.6 | 33.8 | 10.2 | 37 | 1.38 | |
SD | 4.79 | 8.23 | 9.02 | 2.44 | 15.86 | 0.97 | |
CV% | 53.57 | 50.77 | 27.6 | 25.21 | 38.93 | 64.41 | |
Abadeh | Min | 0.3 | −6.6 | 4.2 | 0 | 6.5 | 0 |
Max | 18.2 | 29.7 | 39.3 | 12.7 | 87.5 | 10.67 | |
Mean | 8.87 | 11.5 | 27.41 | 9.67 | 26.78 | 3.06 | |
Median | 9.1 | 12.1 | 28.6 | 10.4 | 21.5 | 2.88 | |
SD | 3.7 | 5.8 | 6.63 | 2.59 | 14.82 | 1.28 | |
CV% | 41.68 | 50.46 | 24.19 | 26.77 | 55.33 | 41.75 |
Identification and estimation of PMI algorithm
Among the non-linear algorithms for choosing EIVs for models based on data processing is the PMI algorithm. Sharma (2000) created the PMI-based input selection (PMIS) algorithm that is being presented here in order to find EIVs in hydrological models. Every iteration of the PMI algorithm looks at two variables: an input (C) and an output (Y). Then, it finds the value of PMI by considering the output variable by maximizing Cs (assuming that Cs varies from C). The confidence bounds derived from the distribution created by a bootstrap loop serve as the foundation for the statistical idea that PMI estimates for Cs. Cs is added to S (the set of chosen input variables), and the method is stopped when there are no more significant inputs. At that point, the input is deemed significant.
In which n is a sample of n observations from (x, y), and f is the density derived from those observations. Equation (2) indicates that the technique used to estimate the marginal and joint PDFs has a significant impact on the precise and efficient estimation of MI. The Akaike information criterion (AIC), which is detailed below, is used to halt the PMI method.
Akaike information criterion
Development of models
Long short-term memory model
Recurrent neural networks (RNNs) have been enhanced to include long-term dependency learning or LSTM. A primary issue with RNN networks is their short-term memory, which eventually causes gradients to explode and vanish (Bengio et al. 1994). However, the LSTM model has found a solution to this issue. The cell state is the primary component of the LSTM network. The LSTM can insert new data or extract existing data from the cell state. Gate-like structures will carry out this process. These gates are thought of as the information's input channel. They are made up of a point-to-point multiplication operator and a sigmoid function. A number between 0 and 1 that represents the proportion of input that should be delivered to the output phase is the sigmoid function's output. When the value is 1, all input should be transmitted to the output; when the value is 0, no information should be given to the output.
Parameter . | Value . |
---|---|
Bach size | 50 |
Activation function | ReLU |
Learning rate | 0.01 |
Dropout rate | 0.2 |
Iterations | 500 |
Network weights optimizer | Adam |
Parameter . | Value . |
---|---|
Bach size | 50 |
Activation function | ReLU |
Learning rate | 0.01 |
Dropout rate | 0.2 |
Iterations | 500 |
Network weights optimizer | Adam |
Support vector regression model
is the weight vector, b is the bias and is the non-linear transform function or kernel function.
Parameter . | Value . |
---|---|
Penalty parameter | 10 |
Kernel function | RBF |
Allowable error | 0.02 |
Parameter . | Value . |
---|---|
Penalty parameter | 10 |
Kernel function | RBF |
Allowable error | 0.02 |
WHO algorithm
denotes the percentage of stallions in the population and is used as a control parameter.
Parameter . | Value . |
---|---|
Stallions' percentage | 0.2 |
dimensions | 30 |
Crossover percentage | 0.13 |
Number of horses | 100 |
Iterations number | 500 |
Parameter . | Value . |
---|---|
Stallions' percentage | 0.2 |
dimensions | 30 |
Crossover percentage | 0.13 |
Number of horses | 100 |
Iterations number | 500 |
Rank of model . | Value of GenA . |
---|---|
Perfect | 1 |
Excellent | |
Good | |
Poor and unsuitable for simulation purposes |
Rank of model . | Value of GenA . |
---|---|
Perfect | 1 |
Excellent | |
Good | |
Poor and unsuitable for simulation purposes |
Parameters . | Tmin . | Tmax . | RH . | SSH . | WS . | Ep . |
---|---|---|---|---|---|---|
Tmin | 1 | |||||
Tmax | 0.923** | 1 | ||||
RH | −0.522** | −0.729** | 1 | |||
SSH | 0.316** | 0.539** | −0.650** | 1 | ||
WS | 0.368** | 0.237** | −0.108** | −0.009 | 1 | |
Ep | 0.864** | 0.910** | −0.705** | 0.524** | 0.289** | 1 |
Parameters . | Tmin . | Tmax . | RH . | SSH . | WS . | Ep . |
---|---|---|---|---|---|---|
Tmin | 1 | |||||
Tmax | 0.923** | 1 | ||||
RH | −0.522** | −0.729** | 1 | |||
SSH | 0.316** | 0.539** | −0.650** | 1 | ||
WS | 0.368** | 0.237** | −0.108** | −0.009 | 1 | |
Ep | 0.864** | 0.910** | −0.705** | 0.524** | 0.289** | 1 |
**Correlation is significant at the 0.01 level.
Data normalization
Model evaluation criteria
Generalization ability (GenA)
El Bilali et al. (2022) ranked the models for simulation purposes in four categories as perfect, excellent, good and poor which has been displayed in Table 5.
RESULTS AND DISCUSSION
In this study, the new WHO algorithm was applied to train two SVR and LSTM models for estimating daily Ep. For this purpose, the daily data of two meteorological stations were used during the years 2012 to 2022 in Fars province, Iran. One of the stations is situated in Larstan City with a hot desert climate and the other is Abadeh City with a cold dry climate. The data were gathered from two stations including Tmin, Tmax, RH, WS and SSH as primary input variables of the models and daily Ep was considered as the output of the models.
Tables 6 and 7 display Pearson correlation coefficient values between the main meteorological parameters measured in the Larestan and Abadeh stations, respectively. These tables show a strong correlation (say > 0.6) between daily EP and Tmax, Tmin and RH for the two studied stations. This correlation is negative for RH while positive for Tmax and Tmin.
Parameters . | Tmin . | Tmax . | RH . | SSH . | WS . | Ep . |
---|---|---|---|---|---|---|
Tmin | 1 | |||||
Tmax | 0.875* | 1 | ||||
RH | −0.526** | −0.764** | 1 | |||
SSH | 0.314** | 0.558** | −0.647** | 1 | ||
WS | 0.146** | −0.011 | −0.028 | 0.049* | 1 | |
Ep | 0.788** | 0.813** | −0.650** | 0.522** | 0.174** | 1 |
Parameters . | Tmin . | Tmax . | RH . | SSH . | WS . | Ep . |
---|---|---|---|---|---|---|
Tmin | 1 | |||||
Tmax | 0.875* | 1 | ||||
RH | −0.526** | −0.764** | 1 | |||
SSH | 0.314** | 0.558** | −0.647** | 1 | ||
WS | 0.146** | −0.011 | −0.028 | 0.049* | 1 | |
Ep | 0.788** | 0.813** | −0.650** | 0.522** | 0.174** | 1 |
**Correlation is significant at the 0.01 level.
*Correlation is significant at the 0.05 level.
The PMI algorithm was utilized to identify the EIVs on daily Ep. The results based on the AIC as the stopping condition of the PMI algorithm are shown in Table 8. In Table 8, iteration, variable, I(x;y), MC–I*(95), MC–I*(99), and AIC indicate the number of repetitions of the PMI algorithm, name of the variable, value of PMI for each variable, 95% range of the critical value of MI, 99% range of the critical value of MI, and numerous value of AIC for each variable, respectively.
Station . | Iteration . | Variable . | I(x;y) . | MC–I*(95) . | MC–I*(99) . | AIC . |
---|---|---|---|---|---|---|
Larestan | 0 | Tmax | 0.878 | 0.024 | 0.026 | −6,685.1 |
1 | RH | 0.051 | 0.024 | 0.026 | −6,799.8 | |
2 | Tmin | 0.050 | 0.024 | 0.026 | −7,027.3 | |
3 | SSH | 0.028 | 0.024 | 0.026 | −6,986.5 | |
4 | LogWS | 0.016 | 0.024 | 0.026 | −6,538.6 | |
Abadeh | 0 | Tmax | 0.566 | 0.030 | 0.033 | −2,690.9 |
1 | Tmin | 0.072 | 0.030 | 0.033 | −2,830.1 | |
2 | SSH | 0.096 | 0.030 | 0.033 | −3,089.9 | |
3 | logRH | 0.036 | 0.030 | 0.033 | −2,997.7 | |
4 | LogWS | 0.027 | 0.030 | 0.033 | −2,593.2 |
Station . | Iteration . | Variable . | I(x;y) . | MC–I*(95) . | MC–I*(99) . | AIC . |
---|---|---|---|---|---|---|
Larestan | 0 | Tmax | 0.878 | 0.024 | 0.026 | −6,685.1 |
1 | RH | 0.051 | 0.024 | 0.026 | −6,799.8 | |
2 | Tmin | 0.050 | 0.024 | 0.026 | −7,027.3 | |
3 | SSH | 0.028 | 0.024 | 0.026 | −6,986.5 | |
4 | LogWS | 0.016 | 0.024 | 0.026 | −6,538.6 | |
Abadeh | 0 | Tmax | 0.566 | 0.030 | 0.033 | −2,690.9 |
1 | Tmin | 0.072 | 0.030 | 0.033 | −2,830.1 | |
2 | SSH | 0.096 | 0.030 | 0.033 | −3,089.9 | |
3 | logRH | 0.036 | 0.030 | 0.033 | −2,997.7 | |
4 | LogWS | 0.027 | 0.030 | 0.033 | −2,593.2 |
According to the PMI algorithm, EIVs are those whose AIC value indicates a decreasing trend. The AIC values of Tmax, RH, and Tmin variables in the Larestan station display a decreasing trend (Table 8). This point is also true for Abadeh station with Tmax, Tmin, and SSH variables. Therefore, the EIVs on daily Ep in Larestan station are Tmax, RH and Tmin and for Abadeh station are Tmax, Tmin and SSH. It is significant to point out that the usage of the PMI algorithm leads to reducing the time needed for recognition of the EIVs.
The values of RMSE, NSE, and R2 for models taken during the training and testing phases are shown in Table 9 for both stations. The LSTM–WHO hybrid model in both stations has the most accuracy and performance for estimating daily Ep in the training and testing phases. LSTM–WHO hybrid model for Larestan station gives RMSE (1.151/1.099 mm), NSE (0.941/0.950), R2 (0.943/0.952) and for Abadeh station gives RMSE (1.135/1.163 mm), NSE (0.904/0.896), R2 (0.912/0.902) for the training/testing phases (see bolded values in Table 9). Also, the LSTM standalone model has poor accuracy for daily Ep estimation. The LSTM model in Larestan station shows RMSE (1.950/1.797 mm), NSE (0.832/0.865) and R2 (0.840/0.825) and in Abadeh station gives RMSE (1.843/2.114 mm), NSE (0.748/0.658), R2 (0.770/0.681) for the training/testing phases.
Station . | Model . | Training . | Testing . | ||||
---|---|---|---|---|---|---|---|
RMSE (mm) . | NSE . | R2 . | RMSE (mm) . | NSE . | R2 . | ||
Larestan | SVR | 1.952 | 0.831 | 0.848 | 1.702 | 0.879 | 0.905 |
LSTM | 1.950 | 0.832 | 0.840 | 1.797 | 0.865 | 0.825 | |
SVR–WHO | 1.876 | 0.844 | 0.846 | 1.474 | 0.923 | 0.926 | |
LSTM–WHO | 1.151 | 0.941 | 0.943 | 1.099 | 0.950 | 0.951 | |
Abadeh | SVR | 1.769 | 0.767 | 0.768 | 1.890 | 0.725 | 0.748 |
LSTM | 1.843 | 0.748 | 0.770 | 2.114 | 0.658 | 0.681 | |
SVR–WHO | 1.390 | 0.857 | 0.873 | 1.419 | 0.846 | 0.857 | |
LSTM–WHO | 1.135 | 0.904 | 0.912 | 1.163 | 0.897 | 0.902 |
Station . | Model . | Training . | Testing . | ||||
---|---|---|---|---|---|---|---|
RMSE (mm) . | NSE . | R2 . | RMSE (mm) . | NSE . | R2 . | ||
Larestan | SVR | 1.952 | 0.831 | 0.848 | 1.702 | 0.879 | 0.905 |
LSTM | 1.950 | 0.832 | 0.840 | 1.797 | 0.865 | 0.825 | |
SVR–WHO | 1.876 | 0.844 | 0.846 | 1.474 | 0.923 | 0.926 | |
LSTM–WHO | 1.151 | 0.941 | 0.943 | 1.099 | 0.950 | 0.951 | |
Abadeh | SVR | 1.769 | 0.767 | 0.768 | 1.890 | 0.725 | 0.748 |
LSTM | 1.843 | 0.748 | 0.770 | 2.114 | 0.658 | 0.681 | |
SVR–WHO | 1.390 | 0.857 | 0.873 | 1.419 | 0.846 | 0.857 | |
LSTM–WHO | 1.135 | 0.904 | 0.912 | 1.163 | 0.897 | 0.902 |
The scatter plots of Figures 5 and 6 (right side) reveal that the accuracy of the SVR–WHO hybrid model for estimating maximum and minimum values of daily Ep is not as good as the LSTM–WHO hybrid model. However, its performance is acceptable compared to LSTM and SVR standalone models. The most distance between the regression line and the 1:1 line belongs to the LSTM model for two stations. The evaluation criteria also confirm it.
These results show the high power of the WHO algorithm by finding optimal values for parameters of standalone models which appeared for the LSTM–WHO hybrid model. These results show that standalone models are easier and faster but, in some cases, they are unable to model the complicated processes. Qian et al. (2020) believe that standalone models compared to hybrid ones, have less ability to couple and process the non-linear problems. They stated that hybrid models can be used as an efficient method to solve complicated non-linear problems. This research indicated that although the LSTM model is an efficient model for solving complicated and non-linear problems its performance is weak compared to hybrid models. The same fact is true for the SVR model. Training this model with the WHO algorithm augments its efficiency for estimating daily Ep values. Therefore, the WHO algorithm has been able to increase the accuracy of SVR and LSTM models by finding optimal values for the parameters of models.
Table 10 shows a literature review on daily Ep modeling by using different AI methods. The comparison between the evaluation criteria of the developed LSTM–WHO (Table 8) with the evaluation criteria of the models suggested by different researchers (Table 10) shows that the LSTM–WHO model has high accuracy and good performance compared to other models for daily EP estimation.
Previous studies . | Models . | Suggested model . | Input variables . | R2 . | NSE . |
---|---|---|---|---|---|
Keshtegar et al. (2019) | MLPNN, RSM, SVR-RSM | SVR-RSM | Tmax, Tmin, H%, U2, α | 0.912 | 0.910 |
Majhi et al. (2020) | MLANN, LSTM | LSTM | Tmax, Tmin, RHI, RHII, WS, BSS | 0.800 | 0.773 |
Seifi & Soroush (2020) | ANN, ANN–GA, ANN-WOA, ANN-GWO | ANN–GA | Tmin, Tmax, RH, n, U2 | 0.790 | 0.780 |
Malik et al. (2021) | VR-WOA, SVR-SHO, SVR-SSA, SVR-PSO, SVR-MVO, and PM | SVR-SSA | Tmin, Tmax, RHmax, RHmin, Us, Rs | 0.905 | 0.815 |
Kushwaha et al. (2021) | SVM, RT, REPTree, RSS | SVM | Tmin, RH, SSH, WS | 0.874 | 0.865 |
Malik et al. (2023) | SVR-GA, SVR-GWO | SVR-GWO | Tmin, Tmax, RHmax, RHmin, Us, R | 0.790 | 0.787 |
Previous studies . | Models . | Suggested model . | Input variables . | R2 . | NSE . |
---|---|---|---|---|---|
Keshtegar et al. (2019) | MLPNN, RSM, SVR-RSM | SVR-RSM | Tmax, Tmin, H%, U2, α | 0.912 | 0.910 |
Majhi et al. (2020) | MLANN, LSTM | LSTM | Tmax, Tmin, RHI, RHII, WS, BSS | 0.800 | 0.773 |
Seifi & Soroush (2020) | ANN, ANN–GA, ANN-WOA, ANN-GWO | ANN–GA | Tmin, Tmax, RH, n, U2 | 0.790 | 0.780 |
Malik et al. (2021) | VR-WOA, SVR-SHO, SVR-SSA, SVR-PSO, SVR-MVO, and PM | SVR-SSA | Tmin, Tmax, RHmax, RHmin, Us, Rs | 0.905 | 0.815 |
Kushwaha et al. (2021) | SVM, RT, REPTree, RSS | SVM | Tmin, RH, SSH, WS | 0.874 | 0.865 |
Malik et al. (2023) | SVR-GA, SVR-GWO | SVR-GWO | Tmin, Tmax, RHmax, RHmin, Us, R | 0.790 | 0.787 |
Gena and uncertainties
The GenA of the models in this study based on the classification method of El Bilali et al. (2022) is shown in Table 11. Also, in this table, the logarithmic values of the confidence intervals with a significance level of 95%, and the values of the upper limit (UL) and lower limit (LL) have been calculated for the studied models. All the developed models in both stations of Larestan and Abadeh have an excellent GenA. These values show that the models are well trained and the simulation process is well trained on the input data.
Station . | Model . | UL95% . | LL95% . | GenA . | Rank . |
---|---|---|---|---|---|
Larestan | SVR | 2.12 | 2.04 | 0.871 | Excellent |
LSTM | 2.17 | 2.10 | 0.921 | Excellent | |
SVR–WHO | 2.15 | 2.08 | 0.785 | Excellent | |
LSTM–WHO | 2.13 | 2.05 | 0.955 | Excellent | |
Abadeh | SVR | 2.15 | 2.07 | 1.068 | Excellent |
LSTM | 2.15 | 2.09 | 1.147 | Excellent | |
SVR–WHO | 1.99 | 1.88 | 1.021 | Excellent | |
LSTM–WHO | 2.03 | 1.92 | 1.025 | Excellent |
Station . | Model . | UL95% . | LL95% . | GenA . | Rank . |
---|---|---|---|---|---|
Larestan | SVR | 2.12 | 2.04 | 0.871 | Excellent |
LSTM | 2.17 | 2.10 | 0.921 | Excellent | |
SVR–WHO | 2.15 | 2.08 | 0.785 | Excellent | |
LSTM–WHO | 2.13 | 2.05 | 0.955 | Excellent | |
Abadeh | SVR | 2.15 | 2.07 | 1.068 | Excellent |
LSTM | 2.15 | 2.09 | 1.147 | Excellent | |
SVR–WHO | 1.99 | 1.88 | 1.021 | Excellent | |
LSTM–WHO | 2.03 | 1.92 | 1.025 | Excellent |
CONCLUSIONS
One of the most important climatic and hydrological variables is evaporation. The accurate estimation of evaporation especially in arid and semi-arid areas that encounter with shortage of water resources, is necessary for water management and agricultural activities. In this study for the first time, the WHO algorithm was examined to optimize SVR and LSTM models in two synoptic meteorological stations in Fars province, Iran. One of the stations is situated in Larestan city with a hot desert climate and the other is Abadeh city with cold dry climate. The accuracy of four models such as SVR, LSTM, SVR–WHO and LSTM–WHO for daily EP modeling was examined. PMI algorithm used to identify EIVs on daily Ep. The accuracy of the developed models was evaluated based on the RMSE, NSE, and R2 statistical criteria, the graphic criteria of violin plot and confidence intervals. Moreover, the GenA of the developed models for both stations were checked and its classification was also done. PMI algorithm results showed that Tmax, RH and Tmin variables and Tmax, Tmin and SSH variables are EIVs on daily EP in Larestan and Abadeh stations, respectively. The results of this study showed a positive application of the WHO algorithm in improving the accuracy of individual models for estimating the daily Ep. It is concluded that the LSTM–WHO hybrid model has the best performance in daily Ep estimation for both stations based on statistical criteria. Therefore, the LSTM–WHO hybrid model will be chosen as the superior model. In addition, the comparison of the violin plot of the LSTM–WHO hybrid model with the violin plot of the measured values confirms the results of the statistical criteria. The results of drawing confidence intervals for the selected model revealed that the estimated EP values are within the confidence intervals with a significance level of 95%. Also, the results of the GenA for the developed models showed that they have an excellent GenA which approved their good training of models in both stations. It should be noted that GenA of the LSTM–WHO hybrid model should be carried out in the studied area within a short distance with the same weather conditions. Therefore, the necessary actions e.g., calibration and re-evaluation should be taken in case the model will be used in other areas with different weather conditions. The limitation of this study was non-continuity in the time-series of data because of the missing data during some years. In the future, it will be suggested to use other new optimizer algorithms to optimize the parameters related to deep learning and learning machine models for estimating Ep and compare the results with the current study.
ACKNOWLEDGEMENTS
The authors would like to express their appreciation to the meteorological bureau of Fars Province for providing the information.
AUTHOR CONTRIBUTIONS
M.S. and H.F. wrote the main manuscript text and M.A.A. generated the code of the used models. All authors reviewed the manuscript.
FUNDING
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
ETHICAL APPROVAL
All authors certify that they have ethical conduct required by the journal.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.