To enhance water use efficiency in agriculture, accurately estimating plant water consumption is essential. This study examines the ability of artificial neural networks (ANN) and decision tree models to calculate reference evapotranspiration (ET0). Data from Shahrekord, Farokhshahr, and Saman stations (2004–2013), including temperature, humidity, wind speed at two meters, and sunshine hours, were used. The FAO Penman-Monteith model evaluated the models. In the first phase, eight scenarios were designed to estimate evapotranspiration under different climatic data. Evaluation metrics such as RMSE, MAE, and correlation coefficient (R) were calculated. Results showed that the ANN model, with a feedforward network and Levenberg-Marquardt algorithm, outperformed the M5 decision tree model at all stations: (RMSE = 0.3, MAE = 0.23, R = 0.99) at Shahrekord airport, (RMSE = 0.34, MAE = 0.26, R = 0.99) at Farokhshahr, and (RMSE = 0.31, MAE = 0.24, R = 0.99) at Saman. In the second phase, both models were trained with Farokhshahr data and tested with data from the other stations. Results again showed the superior performance of the ANN model. Overall, the study concluded that the ANN model outperforms the M5 model in data-scarce conditions, making it a better tool for water resource management.

  • The Penman–Monteith model is widely used in the evaluation of evaporation and transpiration.

  • The ability to generalize the M5 Tree model and the artificial neural network model in estimating reference evaporation and transpiration in different climates.

  • The performance of the neural network is better than the M5 Tree model.

Evapotranspiration (ET) is a critical parameter within the hydrological cycle, playing a pivotal role in designing efficient irrigation systems and determining optimal irrigation patterns. Accurate estimation of ET can significantly reduce water consumption, underscoring its importance in sustainable water management practices. As an integral component of the hydrological cycle, ET prediction is essential for the effective management of water resources. Reference evapotranspiration (ET0) serves as a fundamental metric in various applications, including hydrological balance assessments, irrigation system optimization, crop performance simulations, and comprehensive water resource planning (Makwana et al. 2023). Globally, more than 60% of precipitation is lost through ET processes (Sebastian et al. 2023).

ET encompasses multiple processes, such as water evaporation from open surfaces, transpiration through vegetation, and sublimation of snow and ice (Cuxart & Boone 2020). The measurement of ET presents a considerable challenge due to the difficulty of isolating these individual components, which are often measured collectively in hydrological studies. Plant water requirements can be determined using meteorological methods based on energy balance and vapor transfer principles; however, these techniques are typically expensive and labor-intensive. A practical alternative involves using lysimeters, which estimate ET through controlled measurements of herbaceous water balance. Despite their accuracy, lysimeters are also costly and time-consuming. To address these limitations, numerous empirical equations have been developed to estimate ET using meteorological data. Approximately 50 methods are available, each with varying data requirements. Among these, the Penman–Monteith FAO model is recognized as the global standard for ET calculation, particularly in scenarios lacking lysimetric data (Koç 2022). This model has consistently demonstrated superior accuracy compared to other empirical methods (Jensen et al. 1990). However, it relies on detailed and precise meteorological inputs, which may not always be readily accessible.

Given the complexity of ET, which is influenced by numerous climatic and environmental factors, leveraging data from meteorological stations is crucial for developing models that require minimal input variables. Mathematical models that incorporate measured meteorological parameters as predictors – such as artificial neural networks (ANNs) – offer a cost-effective and reliable solution. ANNs and similar machine learning approaches have been extensively employed in predictive studies and have shown promising results in estimating ET with high accuracy (Bakhtiari et al. 2022).

Machine learning models, including radial basis function neural networks (RBNNs), ANNs, generalized regression neural networks (GRNNs), and adaptive neuro-fuzzy inference systems (ANFISs), have been extensively utilized for ET0 estimation. These approaches have demonstrated their widespread applicability at both global and regional levels due to their practicality, reliability, and low error margins (Algretawee & Alshama 2021; Bakhtiari et al. 2022; Makwana et al. 2023).

One study developed and compared artificial intelligence (AI) models to estimate daily ET0 using limited meteorological data, such as maximum and minimum temperature (Tmax, Tmin), relative humidity (RH), wind speed (WS), and sunshine duration (BSS). Models including ANN, Extreme Learning Machine (ELM), M5 Tree, and Multiple Linear Regression (MLR) were trained using data from the Sardarkrushinagar station in Gujarat, India. Validation against the Penman–Monteith FAO-56 equation revealed that the ANN model achieved superior accuracy compared to ELM, M5 Tree, and MLR. Additionally, the study highlighted the importance of selecting relevant input variables to minimize error and enhance model performance (Makwana et al. 2023). Kisi (2007a) modeled ET0 using Multilayer Perceptron (MLP) and ANN models, which outperformed empirical methods such as Penman, Hargreaves, and Torque models. Similarly, Kim & Kim (2008) validated GRNN models for ET0 estimation, while Kumar et al. (2009) demonstrated the robustness of ANN models under drought conditions, surpassing Blaney-Criddle and FAO radiation methods. Additional research by Hamid et al. (2011) and Huo et al. (2012) reaffirmed the high accuracy of neural networks compared to empirical models. Studies in Iran also underscore the effectiveness of data-driven models for ET0 estimation. For instance, Bakhtiari et al. (2022) evaluated ANN, ANFIS, Support Vector Machine (SVM), and the M5 Tree model in the southern Caspian Sea region using climatic data spanning 1991–2020. ANFIS achieved the highest accuracy when employing a full set of climatic variables, outperforming other models. Moreover, ANFIS demonstrated reliability even with fewer input variables, making it a recommended tool for regions with limited data availability. Further research in Turkey compared M5 Tree, ANN, ANFIS, and Neuro-Fuzzy models using climatic data from Central Anatolia. ANFIS consistently outperformed the other models in terms of mean absolute error (MAE) and root mean square error (RMSE), particularly when optimized subsets of variables were used (Keshtegar et al. 2018). A similar evaluation in the United States confirmed the superiority of data-driven techniques like ANFIS and SVM over traditional empirical equations, with ANFIS delivering the highest correlation coefficients and minimal error rates (Üneş et al. 2020).

Over the years, various studies have explored neural networks and machine learning techniques for ET0 estimation across different regions. For instance, Kim et al. (2014) applied ANNs in East Asia, while Falamarzi et al. (2014) integrated neural networks with wavelets in Australia. Gocic et al. (2015) employed ANNs and SVM in Serbia, analyzing data collected over three decades. Deo & Sahin (2015) demonstrated the applicability of ANNs in East Australia, incorporating meteorological parameters and precipitation indices. M5 Tree models, while less commonly used in hydrology, have also shown potential in specific applications, such as estimating water flow in Ankara's Suha River (Sattari et al. 2013) and converting evaporation data to ET0 in semi-arid regions (Rahimikhoob et al., 2013, Rahimikhoob 2014). These studies collectively highlight the effectiveness of machine learning models, particularly ANNs and ANFIS, for ET0 estimation under diverse climatic conditions. While traditional empirical methods remain relevant, data-driven approaches offer superior accuracy and reliability, especially in regions with limited meteorological data.

ET plays a pivotal role in water resource management, particularly in designing efficient irrigation systems and optimizing agricultural practices. Accurate estimation of ET is essential for addressing water scarcity challenges and ensuring sustainable water use. Traditional methods, while accurate, are often costly and time-consuming, highlighting the need for alternative approaches. The primary objective of this study is to assess the performance of ANN in modeling local ET. Specifically, this research aims to estimate ET0 at one station using data from two neighboring stations, providing insights into the feasibility of applying data-driven models in regions with limited meteorological data.

Study area

Shahrekord, located between 50°49′22′′ to 50°53′44′′ longitude and 32°18′22′′ to 32°21′50′′ latitude, is the capital of Chaharmahal and Bakhtiari province. The region is characterized by a dry and cold climate, as classified by the Emberger Profile. The city's average annual temperature is 11.5 °C, with December being the coldest month and July the warmest.

This study utilized daily climatic data over a 10-year period, including maximum and minimum temperatures, mean RH, wind speed measured at a 2-m height, and sunshine duration. The raw data were initially organized and processed in Excel for editing and preparation. Statistical analyses were performed using SPSS 16.0 to identify and eliminate outliers, ensuring the reliability and accuracy of the dataset for further analysis (Table 1).

Penman–Monteith FAO model

This model is widely regarded as one of the most reliable methods for estimation and is extensively employed by experts in the field. It is based on a reference grass crop with a height of 12 cm and a radiation reflection coefficient of 23%. The method utilizes the ET0 equation, serving as the standard against which other methods are calibrated. The mathematical relationship of this model is expressed in Equation (1):
(1)
In this relationship, ET0 refers to evapotranspiration (mm/day); Rn refers to daily solar radiation reaching at the plant level (MJ m–2 d–1); G refers to the soil heat flux (MJ m–2 d–1); refers to psychrometric constant (kpa ); T refers to the mean daily temperature (°C); U2 is the wind speed at a height of 2 m (M/s); ea is the saturation vapor pressure (kpa); ed is the actual vapor pressure (kpa) and saturation vapor pressure curve slope (kpa ) (Allen et al. 1998).

ANN model

Neural networks are a cornerstone of computational intelligence, offering key properties essential for various scientific and engineering applications. These properties include function approximation, parallel processing, learning capabilities, and generalization. Additionally, neural networks are notable for their flexibility, as they do not impose strict assumptions on the input data, allowing data to follow any statistical distribution (Civco & Wanug 1994). The computational efficiency of neural networks is another critical feature, closely tied to the underlying model architecture. Neural network structures can be categorized into two primary types: feedforward networks and recurrent networks (Benediktsson et al. 1990). Feedforward networks, such as the widely used MLP, lack feedback loops, whereas recurrent networks, such as Hopfield networks, incorporate feedback mechanisms. For comprehensive theoretical details on ANNs, the work of Haykin (1998) serves as a key reference.

Feedforward networks typically consist of one or more hidden layers comprising sigmoid neurons, followed by an output layer of linear neurons. The presence of multiple layers with nonlinear activation functions enables these networks to model both linear and nonlinear relationships between input and output data. In multilayer networks, the weight matrix is determined by the number of layers and connections between neurons.

Once initialized, the weights and biases of the network are iteratively adjusted during training to minimize the performance function. This process enables the network to perform tasks such as function approximation (nonlinear regression), pattern recognition, and classification. A schematic representation of an ANN is provided in Figure 1, illustrating the structure and flow of information within the network.
Figure 1

A schematic view of the ANN.

Figure 1

A schematic view of the ANN.

Close modal

In this study, a multilayer feedforward network was utilized. Such networks are considered universal function approximators, capable of approximating any function with a finite number of discontinuities to an arbitrary degree of accuracy, provided the hidden layer contains a sufficient number of neurons. ANNs consist of layers of nodes, including an input layer, one or more hidden layers, and an output layer. In this study, the neurons in the input layer included minimum and maximum temperature, WS at a height of two meters, average RH, and sunshine hours. The output layer neuron represented ET calculated using the FAO Penman–Monteith model. By removing specific input data, eight scenarios were designed to calculate ET0. Additionally, the neurons in the hidden layer were optimized using a trial-and-error method, which is discussed in detail in the results section.

Since each parameter has its own classification, any changes within the same domain should follow a normal distribution to avoid limiting the weights (Kisi 2008). To achieve this, Equation (2) was used to normalize the data for future analysis (Rahimikhoob 2008).
(2)

In this regard, xn is the normalized value; xmin is the minimum amount of each input variable; xmax is the maximum input data.

M5 Tree model

The M5 Tree model, first proposed by Quinlan in 1992, offers several distinct advantages over other learning process models. Its simplicity and straightforward implementation eliminate the need for trial-and-error procedures. Additionally, the model excels in handling missing data and performs effectively on large datasets. Building upon the concept of regression trees introduced by Breiman et al. in 1984, the M5 Tree model combines decision tree methodologies with linear regression, resulting in a powerful tool for predicting numeric values.

The M5 Tree model is structured as a binary decision tree, where each terminal node utilizes a linear regression equation to generate predictions. The construction of the tree involves two main phases. In the initial phase, the algorithm selects the most suitable input parameter to split the data, thereby forming a standard decision tree structure (Figure 2). Each selected parameter is divided at a specific split point, determined by various selection criteria. The resulting nodes are then categorized into two groups based on their similarities.
Figure 2

(a) Example of the M5 Tree model, splitting the input space x1 × x2 by M5 model tree algorithm and (b) diagram of model tree with six linear regression models at the leaves.

Figure 2

(a) Example of the M5 Tree model, splitting the input space x1 × x2 by M5 model tree algorithm and (b) diagram of model tree with six linear regression models at the leaves.

Close modal
The splitting criterion in the M5 Tree model is based on the standard deviation of the data categories at each node, which serves as a measure of the node's error. The algorithm evaluates the potential reduction in this error caused by the selected test attributes. The reduction in standard deviation, which reflects the improvement in the model's accuracy, is calculated using Equation (3). This systematic approach allows the M5 Tree model to deliver efficient and accurate predictions across diverse datasets.
(3)
Here, T refers to the batch of samples to arrive nodes; Ti refers to a subset of samples that i supply the category of potential output and sd is the standard deviation (Wang & Witten 1997; Rahimikhoob et al. 2013). The M5 Tree model repeatedly splits at each node to reduce the squared deviations from the mean data to nearly zero, resulting in a large tree with numerous branches and nodes. However, such a large and complex tree is not practical, so extra branches must be pruned to create a more efficient model. This is accomplished by replacing a branch with a leaf and using linear regression equations instead. The second step in the design process involves pruning the overgrown tree and replacing the branches with leaves and linear regression equations. This technique involves subdividing the input parameters and establishing a linear relationship between them. For more information on the M5 Tree model, refer to Quinlan (1992).

Performance evaluation criteria

To assess the effectiveness of the methods used in this study, statistical indicators such as average absolute deviation, RMSE, and correlation coefficient were calculated based on Equations (4), (5), and (6).
(4)
(5)
(6)

In this study, the Penman–Monteith ET method is used as the benchmark for assessing the accuracy of various ET estimation methods. To evaluate model performance, three statistical metrics are employed. The MAE quantifies the average deviation between observed and predicted values, with lower values closer to zero indicating higher model accuracy. Similarly, the RMSE provides a measure of the model's precision, where smaller values indicate reduced errors and improved performance. Finally, the correlation coefficient (R) evaluates the strength of the relationship between observed and predicted values, with higher R values signifying a stronger correlation and superior model performance.

To develop the neural network model, the average values of maximum temperature, minimum temperature, sunshine duration, WS, and RH at a height of two meters were used as input data. In the absence of Penman–Monteith lysimeter measurements, these were utilized as output data for the model. The dataset was divided into 75% for training and 25% for testing. A feedforward neural network architecture was implemented, employing the Levenberg–Marquardt training algorithm. A sigmoid activation function was used in the hidden layer, while a linear function was applied in the output layer. The number of neurons in the hidden layer was optimized using a trial-and-error approach (Kisi 2007b; Jain et al. 2008).

The optimization process involved running the network with varying neuron counts in the hidden layer, ranging from 5 to 20. Statistical performance metrics were calculated for each configuration, and a ranking system was applied to identify the optimal network structure. In this system, each performance index was scored based on its effectiveness, with higher scores assigned to optimal values and lower scores to suboptimal ones. For instance, in the case of the correlation coefficient, values closer to one received higher scores, while those farther from one were assigned lower scores. The optimal configuration of neurons in the hidden layer for the Farokhshahr station is provided as an example in Table 2.

Table 1

Characteristics of the studied stations

Station nameAbove sea levelLongitudeLatitude
Farokhshahr 2,073 m 50.93 E 32.30 N 
Shahrekord airport 2,050 m 50.84 E 32.29 N 
Saman station 2,075 m 50.87 E 32.44 N 
Station nameAbove sea levelLongitudeLatitude
Farokhshahr 2,073 m 50.93 E 32.30 N 
Shahrekord airport 2,050 m 50.84 E 32.29 N 
Saman station 2,075 m 50.87 E 32.44 N 
Table 2

Optimal configuration of neurons in the hidden layer at the Farokhshahr station

Number of neuronsScenario
10  
 
 
 
 
 
 
 
Number of neuronsScenario
10  
 
 
 
 
 
 
 

Increasing the number of neurons in the hidden layer did not result in a significant reduction in network errors. The neural network model was implemented using MATLAB (R2012b), and the statistical results of the model are detailed in Table 2. For data mining algorithms, Weka software was employed, with the Penman–Monteith ET serving as the output variable. Input variables included maximum and minimum temperature, average RH, sunshine duration, and WS at a height of two meters, tested across various scenarios. The dataset was divided into 75% for training and 25% for testing. The statistical performance of the data mining algorithms is summarized in Table 4. According to multiple studies, a minimum of four years of data is required to effectively model ET (Falamarzi et al. 2014).

The results of ET modeling using ANNs, as outlined in Table 3, demonstrate that the tangent sigmoid activation function in the hidden layer is particularly effective in accurately capturing this phenomenon. Various scenarios were developed with different input variables, and the lowest error was observed at the Farokhshahr station (RMSE = 0.34), followed by the airport station (RMSE = 0.3) and the Saman station (RMSE = 0.31).

Table 3

The results of ANNs

ScenarioRMSE
MAE
R
FarokhsharAirportSamanFarokhshahrAirportSamanFarokhshahrAirportSaman
 0.34 0.3 0.31 0.26 0.23 0.24 0.99 0.99 0.99 
 0.47 0.46 0.42 0.39 0.38 0.37 0.99 0.99 0.99 
 0.73 0.68 0.89 0.56 0.53 0.65 0.98 0.98 0.98 
 0.38 0.34 0.46 0.30 0.25 0.33 0.99 0.99 0.99 
 0.85 0.81 1.01 0.66 0.62 0.73 0.98 0.98 0.98 
 0.57 0.54 0.63 0.47 0.43 0.49 0.99 0.99 0.99 
 0.77 0.74 0.98 0.62 0.57 0.77 0.99 0.98 0.98 
 1.03 0.97 1.27 0.78 0.72 0.96 0.97 0.98 0.97 
ScenarioRMSE
MAE
R
FarokhsharAirportSamanFarokhshahrAirportSamanFarokhshahrAirportSaman
 0.34 0.3 0.31 0.26 0.23 0.24 0.99 0.99 0.99 
 0.47 0.46 0.42 0.39 0.38 0.37 0.99 0.99 0.99 
 0.73 0.68 0.89 0.56 0.53 0.65 0.98 0.98 0.98 
 0.38 0.34 0.46 0.30 0.25 0.33 0.99 0.99 0.99 
 0.85 0.81 1.01 0.66 0.62 0.73 0.98 0.98 0.98 
 0.57 0.54 0.63 0.47 0.43 0.49 0.99 0.99 0.99 
 0.77 0.74 0.98 0.62 0.57 0.77 0.99 0.98 0.98 
 1.03 0.97 1.27 0.78 0.72 0.96 0.97 0.98 0.97 
Table 4

Results of M5 algorithms statistical analysis

ScenarioRMSE
MAE
R
FarokhshahrAirportSamanFarokhshahrAirportSamanFarokhshahrAirportSaman
 0.36 0.32 0.34 0.28 0.25 0.27 0.98 0. 99 0.99 
 0.51 0.47 0.47 0.41 0.38 0.38 0.97 0.98 0.98 
 0.74 0. 04 0.91 0.58 0.55 0.67 0.94 0.95 0.95 
 0.41 0.36 0.46 0.32 0.37 0.35 0.98 0.99 0.98 
 0.84 0.8 0.99 0.64 0.63 0.71 0.92 0.94 0.98 
 0.59 0.54 0.49 0.48 0.43 0.63 0.96 0.97 0.97 
 0.79 0.77 0.63 0.6 0.78 0.93 0.94 0.93 
 0.99 0.97 1.3 0.76 0.72 0.89 0.91 0.89 
ScenarioRMSE
MAE
R
FarokhshahrAirportSamanFarokhshahrAirportSamanFarokhshahrAirportSaman
 0.36 0.32 0.34 0.28 0.25 0.27 0.98 0. 99 0.99 
 0.51 0.47 0.47 0.41 0.38 0.38 0.97 0.98 0.98 
 0.74 0. 04 0.91 0.58 0.55 0.67 0.94 0.95 0.95 
 0.41 0.36 0.46 0.32 0.37 0.35 0.98 0.99 0.98 
 0.84 0.8 0.99 0.64 0.63 0.71 0.92 0.94 0.98 
 0.59 0.54 0.49 0.48 0.43 0.63 0.96 0.97 0.97 
 0.79 0.77 0.63 0.6 0.78 0.93 0.94 0.93 
 0.99 0.97 1.3 0.76 0.72 0.89 0.91 0.89 

To evaluate the influence of input parameters on ET modeling, an input variable elimination approach was employed (Figure 3). This method involved running the model with all input variables and then iteratively removing one variable at a time. The parameter that caused the largest increase in error upon removal was deemed the most sensitive, while the least impactful parameters had minimal effect on model accuracy when excluded. Eliminating input variables generally resulted in decreased model accuracy and increased error. These findings align with those of Makwana et al. (2023), who emphasized that selecting appropriate input variables not only minimizes error but also improves the correlation between dependent and independent variables. Their study, which compared AI models for estimating daily ET0 using limited inputs, found that ANN models outperformed ELM, M5 Tree, and MLR models based on performance metrics. Similarly, another study by Yassin et al. (2016) compared the performance of ANN and Gene Expression Programming (GEP) for estimating daily ET0 across 13 weather stations in arid conditions. The results indicated that ANN models consistently outperformed GEP models. The findings of the current study are consistent with previous research, reaffirming the effectiveness of ANN models in ET modeling, particularly in terms of accuracy and reliability.
Figure 3

Scatter diagram of ET using neural network and Penman–Monteith FAO method (Scenario A) (a: Farokhshahr, B: Airport, and c: Saman).

Figure 3

Scatter diagram of ET using neural network and Penman–Monteith FAO method (Scenario A) (a: Farokhshahr, B: Airport, and c: Saman).

Close modal

The results presented in Table 3 indicate that Scenario 4, which incorporates maximum and minimum temperatures, WS, and sunshine hours as input variables, achieved the lowest error (RMSE: 0.38, 0.34, and 0.46) when compared to the FAO Penman–Monteith model across the evaluated stations. Similarly, among the scenarios utilizing three input variables, Scenario 6, which includes maximum and minimum temperatures along with WS, showed the lowest error (RMSE: 0.57, 0.54, and 0.63) in comparison to the FAO Penman–Monteith model.

The ET modeling results using the M5 Tree model demonstrated high accuracy within the study region. This model uses simple linear relationships to estimate ET0 based on key parameters such as temperature, humidity, sunshine, and WS. Additionally, the M5 Tree model offers an effective algorithm for imputing missing data on various time scales, including daily and monthly intervals. Table 3 illustrates various configurations of the M5 Tree model, with the highest correlations observed for input variables resulting in the lowest errors: RMSE = 0.36, 0.32, and 0.34 for the Farokhshahr, Saman, and airport stations, respectively, when validated against the Penman–Monteith method (Figure 4).
Figure 4

Scatter diagram of ET using M5 Tree model of Penman–Monteith FAO method (Scenario A) (a: Farokhshahr, B: Airport, and c: Saman).

Figure 4

Scatter diagram of ET using M5 Tree model of Penman–Monteith FAO method (Scenario A) (a: Farokhshahr, B: Airport, and c: Saman).

Close modal

According to the findings in Table 4, among the scenarios with four input variables, Scenario 4 (maximum and minimum temperature, WS, and sunshine hours) again demonstrated the lowest error (RMSE: 0.41, 0.36, and 0.46) across the stations compared to the FAO Penman–Monteith model. Similarly, for three input variables, Scenario 6 (maximum and minimum temperature with WS) produced the lowest error (RMSE: 0.59, 0.54, and 0.49) when evaluated against the FAO Penman–Monteith model.

Numerous studies have utilized the M5 Tree model with results aligning closely to those of the present study. These findings confirm that the M5 Tree model effectively estimates ET with reduced errors, demonstrating its suitability as a reliable tool for such applications (Üneş et al. 2020; Bakhtiari et al. 2022).

The difference in model accuracy between the airport station and the Farokhshahr station is consistent with the variations observed across the models. However, the changes in the Saman station model are more pronounced. Among the six models evaluated in this scenario, the one with the fewest input parameters is considered the most optimal. This is due to its comparable performance to other scenarios with more input variables, while offering greater practicality in terms of application.

The distribution of Penman–Monteith ET values, calculated using both the tree-based computational model and the artificial neural network model (ranging from 2 to 3), is illustrated. In the second phase of the study, both models were trained using data from the Farokhshahr station and subsequently tested against data from other stations. The comparative results are detailed in Table 5.

Table 5

Statistical results of M5 Tree model and neural network generalization

StationRMSE
MAE
R
Neural networkM5 Tree modelNeural networkM5 Tree modelNeural networkM5 Tree model
Saman 0.39 0.42 0.32 0.32 0.99 0.99 
Airport 0.33 0.35 0.26 0.27 0.99 0.98 
StationRMSE
MAE
R
Neural networkM5 Tree modelNeural networkM5 Tree modelNeural networkM5 Tree model
Saman 0.39 0.42 0.32 0.32 0.99 0.99 
Airport 0.33 0.35 0.26 0.27 0.99 0.98 

Given the limited availability of necessary data and the lack of measurements for certain variables at most stations, intelligent models and data mining techniques offer a more effective alternative to empirical methods in data-scarce environments. The results indicate that by utilizing the most influential meteorological variables, such as those identified in this study, the ANN model and the M5 Tree model can achieve comparable accuracy to models proposed by other researchers, without requiring a wide range of variables affecting ET.

Sensitivity analysis and error evaluation of each scenario revealed that maximum temperature had the greatest influence on ET, while average RH exhibited the least impact in this region. Accurate ET estimation in this area relies primarily on temperature and WS data (Figures 3 and 4). The findings further demonstrated that the ANN model provided satisfactory accuracy for estimating ET0 in this region, aligning with the results of similar studies (Algretawee & Alshama 2021).

This study investigates the generalization capabilities of ANN models and decision tree models in estimating ET0. In the first phase, ANN and M5 decision tree models were assessed for their performance under scenarios with limited climatic data. The primary objective was to identify the most influential variables for accurately estimating ET0 in data-scarce conditions. The results demonstrated that the ANN model consistently outperformed the M5 Tree model across all three stations: Shahrekord airport, Farrokhshahr, and Saman. Sensitivity analysis highlighted the critical importance of temperature and WS data for accurate ET0 estimation in this region. In the second phase, the models were trained using data from the Farrokhshahr station and tested against data from the other two stations. This phase further confirmed the superior accuracy of the ANN model compared to the M5 Tree model in estimating ET0. For future research, it is recommended to expand the evaluation of these models by incorporating additional intelligent and data mining approaches, testing under diverse climatic conditions across more stations, and comparing results with nearby stations. Furthermore, if reliable lysimetric data becomes available, it could serve as a valuable benchmark for validating the accuracy of these models.

Our sincere gratitude goes to the honorable management of the Meteorological Office of the entire Chahar Mahal and Bakhtiari province for providing the data used in this study.

The authors declare there is no conflict.

All relevant data are included in the paper or its Supplementary Information.

Algretawee
H.
&
Alshama
G.
(
2021
)
Modeling of evapotranspiration (ETo) in a medium urban park within a megacity by using Artificial Neural Network (ANN) model’
,
Period Polytech Civ Eng
,
65
(
4
),
1260
1268
.
Allen
R. G.
,
Pereira
L. S.
,
Raes
D.
&
Smith
M.
(
1998
) ‘
Crop evapotranspiration guidelines for computing crop water requirements
’,
FAO Irrigation and Drainage, Paper No. 56
.
Food and Agriculture Organization of the United Nations
,
Rome
.
Bakhtiari
B.
,
Mohebbi-Dehaghani
A.
&
Qaderi
K.
(
2022
)
Comparative analysis of data-driven methods for daily reference evapotranspiration estimation of Southern Caspian Sea
,
Meteorol Appl
,
29
(
4
),
16
.
doi:10.1002/met.2091
.
Benediktsson
J. A.
,
Swain
P. H.
&
Erosy
O. K.
(
1990
)
Neural network approaches versus statistical methods in classification of multisource remote sensing data’
,
IEEE Tran Geosci Remote Sensing
,
28
(
4
),
540
551
.
Breiman
L.
,
Friedman
J. H.
,
Olshen
R. A.
&
Stone
C. F.
(
1984
)
Classification and Regression Trees
.
Belmont, CA
:
Wadsworth
.
Civco
D. L.
&
Wanug
Y.
(
1994
) ‘
Classification of multispectral, multitemporal, multisource Spatial data using artificial neural networks
,’
Congress on Surveying and Mapping’. USA
.
Cuxart
J.
&
Boone
A. A.
(
2020
)
Evapotranspiration over land from a boundary-layer meteorology perspective
,
Boundary-Layer Meteorol
,
177
(
2
),
427
459
.
Falamarzi
F.
,
Palizdan
N.
,
Feng Hung
Y.
, &
Shui Lee
T.
(
2014
)
Estimating evapotranspiration from temperature and wind speed data using artificial and wavelet neural networks (WNNs)
,
Agric Water Manag
,
140
,
26
36
.
Gocic
M.
,
Motamedi
S.
,
Shamshirband
S.
,
Petkovic
D.
,
Sudheer
C.
,
Hashim
R.
&
Arif
M.
(
2015
)
Soft computing approaches for forecasting reference evapotranspiration
,
Comput Electron Agric
,
113
,
164
173
.
Hamid
Z. A.
,
Moghaddamnia
A.
,
Maryam
B. V.
,
Adel
G.
&
Kisi
O.
(
2011
)
Performance evaluation of ANN and ANFIS models for estimating garlic crop evapotranspiration
,
J. of Irr. Drain. Eng.
,
137
(
5
),
280
286
.
Haykin
S.
(
1998
)
Neural Networks A Comprehensive Foundation
, 2nd.edn.
Upper Saddle River, NJ
:
Prentice-Hall
. pp.
26
32
.
Jensen
M. E.
,
Burman
R. D.
&
Allen
R. G.
(
1990
)
Evapotranspiration and Irrigation Water Requirements. ASCE Manuals and Reports on Engineering Practices No. 70
.
Prague, In
:
New York, NY: ASCE
. 360 pp.
Keshtegar
B.
,
Kisi
O.
,
Ghohani Arab
H.
&
Zounemat-Kermani
M.
(
2018
)
Subset modeling basis ANFIS for prediction of the reference evapotranspiration
,
Water Resources Management
,
32
,
1101
1116
.
Kim
S.
,
Singh
V. P.
,
Seo
Y.
&
Kim
H. S.
(
2014
)
Modeling nonlinear monthly evapotranspiration using soft computing and data reconstruction techniques
,
Water Resour Manag
,
28
(
1
),
185
206
.
Kumar
M.
,
Raghuwanshi
N. S.
&
Singh
R.
(
2009
)
Development and validation of GANN model for evapotranspiration estimation
,
ASCE J Hydrol Eng
,
14
(
2
),
131
140
.
Makwana
J. J.
,
Tiwari
M. K.
&
Deora
B. S.
(
2023
)
Development and comparison of artificial intelligence models for estimating daily reference evapotranspiration from limited input variables
,
Smart Agr Technol
,
3
,
100115
.
https://doi.org/10.1016/j.atech.2022.100115
.
Quinlan
J. R
. (
1992
) ‘
Learning with continuous classes
,’
Proceedings of the 5th Australian Joint Conference on Artificial Intelligence, World Scientific
, pp.
343
348
.
Rahimikhoob
A.
(
2008
)
Artificial neural network estimation of reference evapotranspiration from pan evaporation in a semi-arid environment
,
Irrigation Science
,
27
,
35
39
.
Sattari
M. T.
,
Pal
M.
,
Apaydin
H.
&
Ozturk
F.
(
2013
)
M5 model tree application in daily river flow forecasting in Sohu Stream, Turkey
,
Water Resour
,
40
(
3
),
233
242
.
Üneş
F.
,
Kaya
Y. Z.
&
Mamak
M.
(
2020
)
Daily reference evapotranspiration prediction based on climatic conditions applying different data mining techniques and empirical equations
,
Theor Appl Climatol
,
141
,
763
773
.
https://doi.org/10.1007/s00704-020-03225-0
.
Wang
Y.
&
Witten
I. H.
(
1997
) ‘
Induction of model trees for predicting continuous lasses
,’
Proceedings of the Poster Papers of the European Conference on Machine Learning
.
Prague, In
:
University of Economics, Faculty of Informatics and Statistics
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).