ABSTRACT
In drought-prone regions like Australia, accurately assessing evaporation rates is essential for effectively managing and maximising the use of precious water resources and reservoirs. Current estimates show that evaporation reduces Australia's open water lake capacity by about 40% annually. With climate change, this water loss is expected to become an even greater concern. This study investigates a transformer-based neural network (TNN) to estimate monthly evaporation in three Australian locations. The models were trained and tested using monthly weather data spanning from 2009 to 2022. Input parameters were chosen based on Pearson's correlation coefficient values to identify the most impactful combinations. The developed TNN model was compared with two widely used empirical methods, namely Thornthwaite and Stephens and Stewart. The TNN model's impressive accuracy in evaporation prediction, attributed to its unique self-attention mechanism, suggests its promising potential for future use in evaporation forecasting. Additionally, the study revealed an intriguing result: Despite using the same input datasets, the TNN model surpassed traditional methods, achieving an average improvement of 18% in prediction accuracy. The TNN prediction model accurately predicts water loss (average R² = 0.970), supports irrigation management and agricultural planning and offers financial benefits to farming and related industries.
HIGHLIGHTS
A self-attention mechanism based on transformers has been developed to forecast pan evaporation.
Utilise meteorological data to anticipate evaporation rates in three distinct Australian regions.
Precise forecasts of evaporation can enhance water resource management practices.
The transformer model stands out as an effective tool for predicting water loss attributed to evaporation.
ABBREVIATIONS
- AI
artificial intelligence
- ANN
artificial neural network
- DL
deep learning
- Ep
pan evaporation
- MAE
mean absolute error
- ML
machine learning
- NSE
Nash–Sutcliffe efficiency
- PCC
Pearson correlation coefficient
- R2
coefficient of determination
- RH
relative humidity
- RMSE
root mean square error
- Rs
solar radiation
- SS
Stephens and Stewart
- WS
wind speed
- Ta
mean air temperature
- TH
Thornthwaite
- Tmax
maximum air temperature
- Tmin
minimum air temperature
- TNN
transformer neural network
INTRODUCTION
Background
Anticipating evaporation is essential for efficient irrigation and water management, as well as for optimising water usage and agricultural forecasting. This is because evaporation is among the primary drivers of the hydrological cycle. Furthermore, an increased rate of evaporation is a sign of rising temperatures (Chen et al. 2018). Thus, monitoring and managing water resources necessitates vigilant observation of evaporation patterns (Jasmine et al. 2022). The evaporation leads to a significant decrease in levels of water in lakes, reservoirs, and overall, in the budget for water. Thus, evaporation losses must be projected prior to putting water resource policies into practice and planning watering systems. Evaporation rates are influenced by the vapour pressure differentials and heat availability; these decisive elements are influenced by meteorological elements such as humidity, solar radiation, wind speed, air pressure, and air temperature (Fan et al. 2018). Other criteria, including location, kind of climate, seasonal effect, and time of day, are also strongly correlated with such aspects. As a result, evaporation is a complicated phenomenon with highly non-linear properties (Abed et al. 2022a).
Evaporation is projected using direct as well as indirect techniques, such as the evaporation pan, Penman approach, water balance, energy balance, and mass transfer (Wu et al. 2020a). Kisi et al. (2016) showed that the widely favoured approach is the use of the evaporation pan method, chosen for its cost-effectiveness and simplicity. Additionally, this method offers an accurate evaluation of variations in evaporation (Kahler & Brutsaert 2006). The present work aims to project an evaporation pan (Ep) with a precision comparable to real evaporation. On the other hand, approaches based on weather data linked to the energy budget, water budget, and experimental evaporation equation have been employed for Ep estimation (Wang et al. 2016). For instance, in McJannet et al. (2008), a survey of existing literature and methods to gauge evaporation pinpointed ‘combination methods’ as the most suitable for calculating evaporation from open water in an Australian locale using the current data. The chosen combination method involved applying the Penman–Monteith technique, with a modification accounting for variations in heat storage within the water body to adjust the available energy for evaporation. This method has been used and tested in other countries, in Bontempo Scavo et al. (2021), it has been tested in Italy, and further experimental expressions have been tested. However, the linear modelling method does not adequately capture the subtle stochastic aspects of the evaporative process, which could result in very significant inaccuracies (Abed et al. 2021a). Furthermore, due to the inherent variability in empirical models' performance across various circumstances, the calibration of their parameters becomes indispensable to tailoring them to specific zones, thereby introducing an additional layer of complexity to their utilisation. Due to insufficient performance levels, these methods have spurred scientists to explore alternative approaches like artificial intelligence (AI) for accurately assessing evaporation levels (Abed et al. 2022b). In recent decades, AI has demonstrated its effectiveness in addressing numerous complex engineering challenges by leveraging its capacity to perform tasks that traditionally necessitate human intelligence. These tasks include problem-solving, recognising patterns, and making decisions, often at scales and speeds far exceeding human capabilities (Haykin & Lippmann 1994).
Literature review
Various water engineering and environmental concerns have been addressed using AI techniques including artificial neural network (ANN), extreme learning machine (ELM), adaptive neuro-fuzzy inference system (ANFIS), and gene expression programming (GEP) (Khairan et al. 2022). These AI methods are more user-friendly, dependable, and capable of faithfully mimicking complex non-linear processes (Abed et al. 2010). Research into applying AI to calculate various hydrological metrics has been extensively studied (Al-Mukhtar 2021). Scientists have determined that ANNs are more precise in their estimations than those generated using conventional methods (Ditthakit et al. 2022). As a result, several engineering research domains have successfully used AI-based modelling tools.
Through the application of AI techniques and various optimisation methods, numerous studies have explored predicting pan evaporation rates, considering the challenges associated with both practical and theoretical measurement methodologies mentioned earlier (Kumar et al. 2021; Ikram et al. 2022; Kisi et al. 2022; Adnan et al. 2023). In fact, a majority of these works primarily emphasised determining the generalised capabilities of AI frameworks for different weathers since climatic features are stochastic and non-stationary (Abed et al. 2023b). These works provided specific machine learning (ML) models to address the problem based on varying input sets comprising prevalent climatic factors like temperature, wind velocity, humidity, vapour pressure, sunshine, and solar radiation (Abed et al. 2021b). For example, Keskin & Terzi (2006) employed ANN and Penman frameworks and devised a predictive evaporation modelling system. They used numerous meteorological factors as ML inputs. These researchers suggested that ML had a better performance than the Penman method for Ep prediction. Kişi (2013) devised evolutionary neural networks to forecast monthly Ep. The results suggested that the developed approaches offered excellent accuracy compared with empirical methods. Deo et al. (2016) assessed monthly evaporation and associated water loss; they employed three ML approaches: multivariate adaptive regression spline (MARS), ELM, and relevance vector machine (RVM). RVM was found to be the most successful out of the three approaches when using meteorological variables as predictor variables. Falamarzi et al. (2014) assessed the ability of ANN and wavelet-based ANNs to predict evaporation on a daily scale based on temperature and wind speed as input data. The finding suggested that these two approaches predicted evaporation accurately. Reports indicate AI learning techniques' superiority at estimating Ep for different climatic scenarios. Specifically, deep learning (DL) algorithms have been extensively employed across various engineering research fields to create precise and dependable predictions (Arif et al. 2022). DL is a subset of ML using advanced neural networks to analyse and learn from large datasets, enabling machines to recognise patterns, make predictions, and perform complex tasks autonomously.
Considering that the DL techniques that employ enhanced neural networks with multiple layers are suitable for data from time series, they enable advanced approaches to estimate Ep. Their effectiveness can be attributed to their presently rising popularity concerning AI approaches employed for business and scientific domains considering increased accuracy (Hu et al. 2018). Recurrent neural networks (RNNs), a foundational element of DL techniques, are particularly well-suited for predicting and analysing time series data since they can use and store memory from previous network states, a noteworthy capability (Chang et al. 2016). Despite a conventional RNN model's ability to process time series data trends, retaining the variables' extended dependence is a concern, like exploding and vanishing gradients (Bengio et al. 1994). Considering these two critical challenges associated with a typical RNN, network training could produce zero or impractically large network weights. In practical terms, for improved network training, it is critical to retain essential information and omit redundant or unneeded data regarding various network states.
Recently, researchers have effectively employed attention-based approaches for time series estimation. The self-attention (intra-attention) phenomenon is the foundation of the transformer system that has been in widespread use recently (Shao et al. 2019). Transformers were initially employed for machine transliteration applications by Vaswani et al. (2017). Researchers indicated a noteworthy ability to generalise critical tasks like sequence processing and computer vision. A transformer network does not suffer from the vanishing gradient challenge, which is a common issue for recurrent networks. Furthermore, it can utilise all past data points, regardless of the temporal distance between them, unlike recurrent networks. The transformer variant enables parallel processing, allowing for higher speed than recurrent networks. This capability gives the transformer the capacity to identify long-term dependencies without needing to go through sequential processing. Overall, transformer networks do not assess inputs sequentially because the system depends on the self-attention phenomenon that surmounts specific challenges concerning recurrent and convolutional sequence-to-sequence approaches (Wang et al. 2021). Apart from speed and efficiency, the transformer offers the benefit of providing better interpretability compared with other approaches. Contrary to RNNs that might have interpretation difficulties owing to sequential processing, transformer outputs are simpler and transparent. Therefore, it is relatively easy to comprehend model predictions and regulate as required. Researchers have successfully used transformer models for various tasks concerning time series estimation, which is superior to many prediction methods. Further studies have been performed to enhance recurrent DL networks by incorporating self-attention mechanisms. For instance, a deep transformer approach employed to estimate illness resembling influenza was devised by Wu et al. (2020b), and it is superior to Long Short-Term Memory (LSTM) and sequence-to-sequence approaches. The self-attention feature of the transformer model provides better estimation performance compared with the sequence-to-sequence systems using linear attention. Given its strength to handle the intricacies of time series data, the use of transformers in simulations has the potential to be incredibly powerful. Sequence approaches, on the other hand, face significant difficulty in replicating the complex dynamics of such data. The vanishing gradient problem, which prevents RNNs from being able to effectively forecast over longer time frames, can potentially be addressed in a comprehensive manner by using this technique.
The literature validated that employing AI along with appropriate algorithms can precisely simulate evaporation for many regions and offer excellent outcomes compared with comparatively sophisticated conventional techniques (Lu et al. 2018). However, determining and devising a trustworthy, efficient, and generalised prediction approach remains difficult owing to the complex non-linear phenomena with evaporation. Of the numerous ML techniques utilised in recent times, the DL frameworks exhibited wide perspective towards addressing prediction challenges. It is known to provide superior outcomes to relatively sophisticated methods. Specifically, the literature indicates that among several DL models, attention-based models were employed similarly for estimating time series data with a noteworthy ability to address the challenges concerning convolutional sequence-to-sequence and RNN approaches. This study offers a potent approach concerning evaporative losses by employing a transformer model with the self-attention mechanism to estimate Ep in Australia. Successful development would be salutary for managing water resources to allow sustainable farming.
Objectives
The aim of this study is to investigate the use, predictability, and precision of transformer neural network (TNN) approaches for processing climatological data from 2009 to 2022 and estimating monthly Ep magnitudes for three regions in Australia. The predictive precision of the technique is evaluated through several input patterns to obtain maximum precision. The performance of the TNN technique is compared with two common empirical methods, namely Thornthwaite and Stephens and Stewart, based on the same input data. Model efficiency metrics are processed and assessed based on statistical performance indicators deemed appropriate for estimating evaporation rates. In addition, an adequate assessment was performed in this study to establish that TNN modelling is a reliable approach for predicting evaporation, a process that is especially important for agricultural management and water resource management.
STUDY AREA AND DATA
Study area
Data description
Monthly data covering the period from 1 January 2009 to 31 October 2022 was obtained from the Bureau of Meteorology. Several parameters like relative humidity (RH), global radiation (Rs), mean, minimum, maximum air temperatures (Ta, Tmin, Tmax), wind speed (WS), and evaporation pan (Ep) were recorded for the three stations. The weather parameters recorded each month concerning quantified weather information gathered from the three stations are specified in Table 1. The table displays Cv, Cx, Xmean, Xmin, Xmax, and Sx, which are the coefficients of variation, skewness, mean, minimum, maximum, and standard deviation, respectively, for the studied meteorological parameters. The table indicates that Ep was the minimum in Brisbane; in contrast, Alice Springs recorded the highest Ep. This pattern could be associated with site differences concerning RH's magnitude, which is inversely associated with evaporation. Brisbane and Alice Springs stations recorded the highest and lowest RH values. In contrast, Adelaide logged maximum skewness. Skewness is positive if the associated data is not proportional and fails to adhere to the standard deviation.
Station . | Dataset . | Unit . | Xmin . | Xmax . | Xmean . | Sx . | Cv . | Cx . |
---|---|---|---|---|---|---|---|---|
Alice Springs | Tmax | °C | 17.36 | 39.16 | 29.8 | 5.97 | 20 | −0.37 |
Tmin | °C | 3.69 | 23.22 | 13.8 | 6.1 | 44.06 | −0.18 | |
RH | % | 18.2 | 70.05 | 42.7 | 12.65 | 29.63 | 0.05 | |
WS | m/s | 2.69 | 4.91 | 3.7 | 0.52 | 13.86 | 0.11 | |
Rs | MJ m−2 | 13.8 | 31.4 | 22.7 | 4.84 | 21.27 | −0.09 | |
Ep | mm | 2.5 | 15.09 | 9 | 3.46 | 38.5 | −0.07 | |
Brisbane | Tmax | °C | 19.9 | 30.93 | 25.3 | 2.95 | 11.64 | −0.09 |
Tmin | °C | 7.68 | 23.29 | 15.8 | 4.38 | 27.61 | −0.09 | |
RH | % | 57.8 | 79.96 | 70.12 | 4.19 | 5.97 | −0.4 | |
WS | m/s | 3.22 | 7.13 | 4.5 | 3.65 | 80.49 | 10.22 | |
Rs | MJ m−2 | 10.13 | 30.1 | 19.3 | 5.02 | 25.98 | 0.25 | |
Ep | mm | 2.46 | 9.16 | 5.5 | 1.82 | 32.88 | 0.16 | |
Adelaide | Tmax | °C | 14.25 | 31.86 | 22.04 | 4.93 | 22.37 | 0.10 |
Tmin | °C | 5.5 | 19.24 | 12.18 | 3.52 | 28.92 | 0.11 | |
RH | % | 46.11 | 77.01 | 62.6 | 8.09 | 12.91 | 0.07 | |
WS | m/s | 3.25 | 6.37 | 4.87 | 0.59 | 12.2 | −0.16 | |
Rs | MJ m−2 | 7.08 | 32.67 | 18.1 | 7.14 | 39.36 | 0.17 | |
Ep | mm | 1.49 | 10.56 | 5.33 | 2.66 | 49.9 | 0.25 |
Station . | Dataset . | Unit . | Xmin . | Xmax . | Xmean . | Sx . | Cv . | Cx . |
---|---|---|---|---|---|---|---|---|
Alice Springs | Tmax | °C | 17.36 | 39.16 | 29.8 | 5.97 | 20 | −0.37 |
Tmin | °C | 3.69 | 23.22 | 13.8 | 6.1 | 44.06 | −0.18 | |
RH | % | 18.2 | 70.05 | 42.7 | 12.65 | 29.63 | 0.05 | |
WS | m/s | 2.69 | 4.91 | 3.7 | 0.52 | 13.86 | 0.11 | |
Rs | MJ m−2 | 13.8 | 31.4 | 22.7 | 4.84 | 21.27 | −0.09 | |
Ep | mm | 2.5 | 15.09 | 9 | 3.46 | 38.5 | −0.07 | |
Brisbane | Tmax | °C | 19.9 | 30.93 | 25.3 | 2.95 | 11.64 | −0.09 |
Tmin | °C | 7.68 | 23.29 | 15.8 | 4.38 | 27.61 | −0.09 | |
RH | % | 57.8 | 79.96 | 70.12 | 4.19 | 5.97 | −0.4 | |
WS | m/s | 3.22 | 7.13 | 4.5 | 3.65 | 80.49 | 10.22 | |
Rs | MJ m−2 | 10.13 | 30.1 | 19.3 | 5.02 | 25.98 | 0.25 | |
Ep | mm | 2.46 | 9.16 | 5.5 | 1.82 | 32.88 | 0.16 | |
Adelaide | Tmax | °C | 14.25 | 31.86 | 22.04 | 4.93 | 22.37 | 0.10 |
Tmin | °C | 5.5 | 19.24 | 12.18 | 3.52 | 28.92 | 0.11 | |
RH | % | 46.11 | 77.01 | 62.6 | 8.09 | 12.91 | 0.07 | |
WS | m/s | 3.25 | 6.37 | 4.87 | 0.59 | 12.2 | −0.16 | |
Rs | MJ m−2 | 7.08 | 32.67 | 18.1 | 7.14 | 39.36 | 0.17 | |
Ep | mm | 1.49 | 10.56 | 5.33 | 2.66 | 49.9 | 0.25 |
METHODOLOGY
TNN model input combination scenarios
Choosing the appropriate predictors is critical to devising a robust predictive model (Abed et al. 2023a, 2023b); this study assessed different sets of inputs concerning meteorological variables to successfully develop the suggested TNN input–output model and enhance its predictive characteristics. It is expected to provide a better real-world understanding of input parameters and their impact on the estimated evaporation in the area (Abed et al. 2021a). There are several conscious decisions behind selecting these sets. First, to enable comparison, the TNN model's inputs were selected according to the essential weather factors in the two suggested conventional frameworks (Stephens and Stewart and Thornthwaite). Additionally, predictors were selected after examining the Pearson correlation coefficient (PCC) (Freedman et al. 2007). The PCC technique is a statistical indicator that indicates the mathematical association, or correlation, between two constant variables. It is understood as the most acceptable method to determine correlation interest since it uses the covariance technique (Hauke & Kossowski 2011). It provides information concerning association or correlation magnitude and direction. The two variables may have a negative or positive correlation. The two parameters are not correlated if PCC is 0. More information about the PCC interpretations and ranges can be found in Abed et al. (2022b). This work used PCC to determine the meteorological variables exhibiting the most significant effect on evaporation estimates; the results are shown in Table 2.
. | Tmax . | Tmin . | RH . | WS . | Rs . | Ep . |
---|---|---|---|---|---|---|
PCC matrix for the Alice Springs station | ||||||
Tmax | 1 | |||||
Tmin | 0.89 | 1 | ||||
RH | −0.55 | −0.21 | 1 | |||
WS | 0.57 | 0.64 | −0.29 | 1 | ||
Rs | 0.80 | 0.65 | −0.57 | 0.56 | 1 | |
Ep | 0.91 | 0.77 | −0.85 | 0.67 | 0.70 | 1 |
PCC matrix for the Brisbane station | ||||||
Tmax | 1 | |||||
Tmin | 0.94 | 1 | ||||
RH | 0.31 | 0.54 | 1 | |||
WS | 0.20 | 0.19 | 0.08 | 1 | ||
Rs | 0.73 | 0.63 | −0.18 | −0.05 | 1 | |
Ep | 0.82 | 0.71 | −0.86 | 0.46 | 0.84 | 1 |
PCC matrix for the Adelaide station | ||||||
Tmax | 1 | |||||
Tmin | 0.94 | 1 | ||||
RH | −0.89 | −0.78 | 1 | |||
WS | 0.09 | 0.19 | −0.14 | 1 | ||
Rs | 0.83 | 0.74 | −0.81 | 0.28 | 1 | |
Ep | 0.94 | 0.88 | −0.92 | 0.39 | 0.90 | 1 |
. | Tmax . | Tmin . | RH . | WS . | Rs . | Ep . |
---|---|---|---|---|---|---|
PCC matrix for the Alice Springs station | ||||||
Tmax | 1 | |||||
Tmin | 0.89 | 1 | ||||
RH | −0.55 | −0.21 | 1 | |||
WS | 0.57 | 0.64 | −0.29 | 1 | ||
Rs | 0.80 | 0.65 | −0.57 | 0.56 | 1 | |
Ep | 0.91 | 0.77 | −0.85 | 0.67 | 0.70 | 1 |
PCC matrix for the Brisbane station | ||||||
Tmax | 1 | |||||
Tmin | 0.94 | 1 | ||||
RH | 0.31 | 0.54 | 1 | |||
WS | 0.20 | 0.19 | 0.08 | 1 | ||
Rs | 0.73 | 0.63 | −0.18 | −0.05 | 1 | |
Ep | 0.82 | 0.71 | −0.86 | 0.46 | 0.84 | 1 |
PCC matrix for the Adelaide station | ||||||
Tmax | 1 | |||||
Tmin | 0.94 | 1 | ||||
RH | −0.89 | −0.78 | 1 | |||
WS | 0.09 | 0.19 | −0.14 | 1 | ||
Rs | 0.83 | 0.74 | −0.81 | 0.28 | 1 | |
Ep | 0.94 | 0.88 | −0.92 | 0.39 | 0.90 | 1 |
Table 2 illustrates that RH, WS, Rs, Tmax, and Tmin had all been significantly associated with Ep, implying that they might be essential for estimating evaporation parameters for data from each station. Specifically, for every station, Tmax and RH were most profoundly associated with Ep. Hence, Tmax and RH will be considered in all datasets to improve the accuracy of Ep forecasts. Studies have also revealed that Tmax, Tmin, RH, WS, and Rs are the prime factors affecting evaporation (Abed et al. 2022b).
Correspondingly, the present work considered seven distinct input combinations for building the TNN model (Table 3). The climate dataset was partitioned into two parts: 80% was used to train (calibrate) the model, while 20% was used to test (validate). In this regard, to ensure the accuracy of the data partitioning, the TNN models employed the k-fold cross-validation technique. Cross-validation is a method used to assess the performance of AI models by resampling the data into multiple subsets. Each subset serves as a test set while the remaining subsets act as training sets. This approach allows the model to be evaluated on unseen data, which better represents real-world scenarios. The process begins by dividing the data into k equal-sized subsets. During each iteration, k − 1 subsets are used for training the model, and the remaining subset is used for testing. This cycle repeats k times, with each subset being used once as the test set. After completing all iterations, the results from each test are averaged to provide a more accurate estimation of the model's performance. Hence, the dataset was split into two sections based on the initial years, with the first section being used for training and the second section used for validation. This research attempts to conduct a comprehensive assessment to test AI capabilities and employ empirical frameworks to estimate Ep magnitudes at a monthly timescale for three areas across Australia.
Model . | Scenario of inputs . |
---|---|
T.N.N-1 | Ta |
T.N.N-2 | Ta, Rs |
T.N.N-3 | RH, Tmax |
T.N.N-4 | RH, Tmax, Tmin |
T.N.N-5 | Tmax, Tmin, Rs, RH |
T.N.N-6 | Tmax, Tmin, Rs, WS, RH |
T.N.N-7 | Tmax, Tmin, Rs, WS, RH, Ep |
Model . | Scenario of inputs . |
---|---|
T.N.N-1 | Ta |
T.N.N-2 | Ta, Rs |
T.N.N-3 | RH, Tmax |
T.N.N-4 | RH, Tmax, Tmin |
T.N.N-5 | Tmax, Tmin, Rs, RH |
T.N.N-6 | Tmax, Tmin, Rs, WS, RH |
T.N.N-7 | Tmax, Tmin, Rs, WS, RH, Ep |
Empirical models
This study used Stephens and Stewart and Thornthwaite to contrast the two conventional methods. The decision to employ the Stephens and Stewart technique in this research conforms to the recommendations of Sudheer et al. (2002). After thoroughly inspecting 23 widely used techniques for estimating evaporation, they concluded that the Stephens and Stewart model was the most effective of all. Moreover, they are extensively employed approaches (Rosenberry et al. 2004), taking into account the quantity of meteorological inputs and the availability of data.
Stephens and Stewart (SS)
Thornthwaite (TH)
where symbolises the median monthly temperature (°C).
The symbols d and N stand for the number of days in a month and the amount of theoretical sunshine hours, respectively.
Self-attention transformer model for Ep prediction
To address the constraints of recurrent and convolutional sequential methods, the transformer framework incorporates a self-attention mechanism. The transformer architecture utilises self-attention to determine which data are essential to the encoding of the current token by selectively preserving only the most relevant information from the previous token. Otherwise stated, tweaking is conducted for the attention approach for calculating the equivalent of latent space pertaining to the decoder and encoder. Nonetheless, positional encoding needs to be incorporated with the outputs and inputs, a mandate for loss of recurrence. Similarly, position data enables the transformer system to cycle through sequences of inputs and outputs repeatedly in a time-step manner. The encoding layer of the transformer system consists of two components: the feed-forward layer and multi-head self-attention. A one-to-one correlation is established by the attention system, regarding the time-specific moments. The attention layers are stimulated by aspects pertaining to human attention; however, fundamentally, a weighted mean reduction is also included in it. To the attention layer, feeding of three inputs is carried out: query, values, and keys. Each of the sub-layer includes residual associations; consequently, normalisation of the layers occurs. The aim behind various heads is the contrast against employing different Convolutional Neural Network (CNN) filters, in which each filter would extract latent characteristics from the input. Similarly, in the multiheaded attention approach, different latent features pertaining to the input are reduced via various heads. The concatenation function is employed to combine the outputs from every head. The transformer approach, unlike recurrent networks, does not suffer from the vanishing gradient problem and can make reference to an earlier point in the sequence regardless of its range. This characteristic allows the transformer scheme to detect long-term reliance. Furthermore, as opposed to the RNNs, sequential computing is not needed for the transformer, enabling quicker speed employing parallel processing. In other words, the assessment of transformer inputs is not assessed in a sequential manner. Thus, the issue of vanishing gradients is inherently avoided. However, RNNs are not able to perform effectively with regard to long-term forecasts. In comparison, transformers maintain direct associations with each of the previous timesteps, thus permitting data to traverse over an extended series. Nonetheless, a new issue has emerged: there is a direct correlation between the framework with the humongous data inputs. With regard to the transformer framework, the use of a self-attention mechanism is employed to segment non-essential information.
Performance evaluation
Finding the right performance indicators is essential, as each one possesses its own unique characteristics. Moreover, getting an understanding of the properties regarding every statistical indicator is also helpful to better understand the way a model performs. Thus, in this research, various numerical metrics were used to assess the model's predictive performance, which is described as follows:
- (1) Coefficient of determination (R2): The R2 denotes how well the estimated and actual outputs match, whose values can range from 0 to 1 (includes both limits). A stochastic framework is denoted by a value of zero, while a perfect fit is signified by a value of one. Since R2 is very popular, comparison of the model becomes easier as well as more consistent. The R2 can be computed using the following equation:
- (2) Root mean square error (RMSE): The average of the squares of the contrast between the real and estimated values, also known as the root mean square error. With respect to the evaluation of performance for the regression model, RMSE has been broadly employed versus MSE. Furthermore, RMSE is known for its simplicity and ease of use for determination. Moreover, big errors are penalised by RMSE, and thus they are regarded to be more acceptable. The RMSE can be computed using the following equation:
- (3) Mean absolute error (MAE): The MAE is calculated by taking the absolute difference between the real and predicted values, as shown in Equation (9). This metric does not penalise large errors that occurred by outliers, as opposed to other metrics such as the mean squared error.
- (4) Nash–Sutcliffe efficiency (NSE): This performance metric is used to compare the amount of residual variance (noise) to the amount of variability in a given set of data. It is often utilised in hydrologic modelling as it can normalise data accuracy, making it easier to understand. The NSE can be computed using the following equation:
RESULTS AND DISCUSSION
Monthly Ep prediction using empirical models
Two conventional models were employed for estimating Ep on a monthly scale, which includes temperature-based as well as radiation-based models. The R2, MAE, RMSE, and NSE values from the two used methods compared with the measured data are shown in Table 4. As signified based on the numerical findings presented in Table 4, the SS (radiation-based) demonstrated greater prediction precision versus the temperature-based model. With the radiation-based model for all stations, the lowest RMSE values were 0.393, 0.398, and 0.383, and the highest R2 values were 0.657, 0.667, and 0.627. With the Thornthwaite model, RMSE values were increased by an average of 21%, while the corresponding R2 values were decreased by an average of 31%. As shown in Table 4, the performance values seen evidently imply that the Thornthwaite model was surpassed by the radiation-based model. This is possible because solar radiation was included, which in general provides an enhancement over those employing only temperature-based estimation (Abed et al. 2021a).
Station . | Model . | R2 . | RMSE . | MAE . | NSE . |
---|---|---|---|---|---|
Alice Springs | Stephens and Stewart | 0.657 | 0.393 | 0.318 | 0.676 |
Thornthwaite | 0.453 | 0.510 | 0.410 | 0.454 | |
Brisbane | Stephens and Stewart | 0.667 | 0.398 | 0.308 | 0.668 |
Thornthwaite | 0.472 | 0.503 | 0.408 | 0.473 | |
Adelaide | Stephens and Stewart | 0.627 | 0.383 | 0.302 | 0.628 |
Thornthwaite | 0.411 | 0.479 | 0.373 | 0.412 |
Station . | Model . | R2 . | RMSE . | MAE . | NSE . |
---|---|---|---|---|---|
Alice Springs | Stephens and Stewart | 0.657 | 0.393 | 0.318 | 0.676 |
Thornthwaite | 0.453 | 0.510 | 0.410 | 0.454 | |
Brisbane | Stephens and Stewart | 0.667 | 0.398 | 0.308 | 0.668 |
Thornthwaite | 0.472 | 0.503 | 0.408 | 0.473 | |
Adelaide | Stephens and Stewart | 0.627 | 0.383 | 0.302 | 0.628 |
Thornthwaite | 0.411 | 0.479 | 0.373 | 0.412 |
Estimation of monthly Ep using the TNN model
To show the robustness of the TNN model pertaining to evaporation prediction, this section presents a complete analysis with regards to the empirical outcomes that have been derived based on the experiment employing this model. In this research, the designed TNN model was used for monthly Ep forecasting at three sites in Australia: Alice Springs, Adelaide, and Brisbane. It is important to highlight that, in agricultural and hydrological contexts, the monthly evaporation rate is essential for calculating water budgets and determining crop water requirements. In Table 5, a significant difference was noted in the accuracy of the Ep forecast depending on the input combinations. The best numerical metrics are highlighted in bold. In fact, the accuracy of a model's prediction could be improved through the use of the full climatological dataset (Ep, RH, Tmin, Tmax, Rs, and WS) from all sites rather than combining inputs with incomplete data. Current findings show that the accuracy of prediction models increases as the number of input variables increases, which corresponds to a previous study by Abed et al. (2022b). To attain good agreement in monthly Ep prediction, four input combinations were adequate. The prediction accuracy of the TNN model was found to be insufficient for all stations when there were only RH and Tmax data used. This demonstrated that employing advanced capabilities like AI may not enhance the model's predictive performance, especially when a limited number of meteorological inputs exist. When five input parameters were employed, adequate results were attained; however, when Ep was used as an input, there was a slight enhancement in the accuracy of the prediction.
Station . | Model . | R2 . | RMSE (mm) . | MAE (mm) . | NSE . |
---|---|---|---|---|---|
Alice Springs | T.N.N-1 | 0.660 | 0.148 | 0.122 | 0.661 |
T.N.N-2 | 0.769 | 0.121 | 0.100 | 0.770 | |
T.N.N-3 | 0.830 | 0.105 | 0.078 | 0.831 | |
T.N.N-4 | 0.882 | 0.087 | 0.067 | 0.883 | |
T.N.N-5 | 0.912 | 0.075 | 0.055 | 0.913 | |
T.N.N-6 | 0.935 | 0.065 | 0.050 | 0.936 | |
T.N.N-7 | 0.966 | 0.047 | 0.039 | 0.967 | |
Brisbane | T.N.N-1 | 0.677 | 0.149 | 0.125 | 0.678 |
T.N.N-2 | 0.805 | 0.116 | 0.097 | 0.806 | |
T.N.N-3 | 0.847 | 0.102 | 0.082 | 0.848 | |
T.N.N-4 | 0.876 | 0.093 | 0.075 | 0.877 | |
T.N.N-5 | 0.895 | 0.088 | 0.058 | 0.888 | |
T.N.N-6 | 0.934 | 0.074 | 0.048 | 0.925 | |
T.N.N-7 | 0.960 | 0.051 | 0.039 | 0.963 | |
Adelaide | T.N.N-1 | 0.717 | 0.153 | 0.130 | 0.728 |
T.N.N-2 | 0.795 | 0.133 | 0.114 | 0.795 | |
T.N.N-3 | 0.874 | 0.103 | 0.086 | 0.875 | |
T.N.N-4 | 0.925 | 0.080 | 0.065 | 0.926 | |
T.N.N-5 | 0.943 | 0.070 | 0.058 | 0.944 | |
T.N.N-6 | 0.954 | 0.061 | 0.052 | 0.955 | |
T.N.N-7 | 0.982 | 0.035 | 0.027 | 0.985 |
Station . | Model . | R2 . | RMSE (mm) . | MAE (mm) . | NSE . |
---|---|---|---|---|---|
Alice Springs | T.N.N-1 | 0.660 | 0.148 | 0.122 | 0.661 |
T.N.N-2 | 0.769 | 0.121 | 0.100 | 0.770 | |
T.N.N-3 | 0.830 | 0.105 | 0.078 | 0.831 | |
T.N.N-4 | 0.882 | 0.087 | 0.067 | 0.883 | |
T.N.N-5 | 0.912 | 0.075 | 0.055 | 0.913 | |
T.N.N-6 | 0.935 | 0.065 | 0.050 | 0.936 | |
T.N.N-7 | 0.966 | 0.047 | 0.039 | 0.967 | |
Brisbane | T.N.N-1 | 0.677 | 0.149 | 0.125 | 0.678 |
T.N.N-2 | 0.805 | 0.116 | 0.097 | 0.806 | |
T.N.N-3 | 0.847 | 0.102 | 0.082 | 0.848 | |
T.N.N-4 | 0.876 | 0.093 | 0.075 | 0.877 | |
T.N.N-5 | 0.895 | 0.088 | 0.058 | 0.888 | |
T.N.N-6 | 0.934 | 0.074 | 0.048 | 0.925 | |
T.N.N-7 | 0.960 | 0.051 | 0.039 | 0.963 | |
Adelaide | T.N.N-1 | 0.717 | 0.153 | 0.130 | 0.728 |
T.N.N-2 | 0.795 | 0.133 | 0.114 | 0.795 | |
T.N.N-3 | 0.874 | 0.103 | 0.086 | 0.875 | |
T.N.N-4 | 0.925 | 0.080 | 0.065 | 0.926 | |
T.N.N-5 | 0.943 | 0.070 | 0.058 | 0.944 | |
T.N.N-6 | 0.954 | 0.061 | 0.052 | 0.955 | |
T.N.N-7 | 0.982 | 0.035 | 0.027 | 0.985 |
The R2 values pertaining to TNN tested at the three studied stations were 0.966 for Alice Springs, 0.982 for Adelaide, and 0.960 for Brisbane. Thus, the TNN model was found to demonstrate the greatest degree of collinearity between observed (Ep) and estimated (Ep) values. Furthermore, R2 values yielded by the transformer model with regard to the three stations were close to 1. It can be noted that the magnitude with regards to the (R2) was 1 when the model delivered ideal performance. Thus, in terms of R2 value, the TNN model demonstrated a superior performance for the prediction Ep values. The findings demonstrate that as the accuracy of the model increased, the MAE and RMSE values decreased. The TNN model displayed improved performance in terms of MAE and RMSE values, with respective values of 0.039, 0.039, 0.027, 0.047, 0.051, and 0.035 mm for all study sites. These metrics measure the accuracy of the model by evaluating the forecasts, and the smaller the values, the superior the performance. Thus, the TNN model was deemed to have achieved superior results. The NSE value is another metric measurement for model accuracy used for assessing the proposed TNN model's efficacy, which seemed to be close to one with regard to all the studied regions. For instance, to focus on this metric concerning the transformer model, NSE ≥0.963 has been observed with regard to all the studied regions. Overall, the current assessment provides convincing evidence confirming the TNN model's significant potential in predicting Ep for all the study locations in Australia.
Comparison of TNN and empirical models
Table 6 shows the outcomes of two empirical models used to forecast monthly Ep, and then compared with their respective TNN models employing the same input combinations. In the initial observation for the input combination of Rs and Ta for all the stations, the lowest prediction accuracy in terms of the R2 values (which were 0.657, 0.667, and 0.627) was offered by the radiation-based model (Stewart and Stephens) versus the TNN model. The performance of the TNN model was seen to be excellent with regards to achieving high prediction accuracy when compared with the Thornthwaite (temperature-based model) when the input was only Ta. The results of Table 6 show that the TNN model exhibits significantly higher performance than the empirical models. It demonstrated a remarkable capability to accurately predict monthly Ep, even when using the same input parameters, due to its ability to handle non-linear and complex functions. This could be due to the TNN self-attention feature that can detect concealed characteristics, which suggests that the transformer architecture is a more powerful approach for evaporation prediction.
Input combination . | Station/Model . | R2 . | MAE (mm) . | RMSE (mm) . | NSE . |
---|---|---|---|---|---|
Alice Springs | |||||
Ta, Rs | Stephens and Stewart | 0.657 | 0.393 | 0.318 | 0.676 |
TNN-2 | 0.769 | 0.100 | 0.121 | 0.770 | |
Ta | Thornthwaite | 0.453 | 0.510 | 0.410 | 0.454 |
TNN-1 | 0.660 | 0.122 | 0.148 | 0.661 | |
Brisbane | |||||
Ta, Rs | Stephens and Stewart | 0.667 | 0.398 | 0.308 | 0.668 |
TNN-2 | 0.805 | 0.097 | 0.116 | 0.806 | |
Ta | Thornthwaite | 0.472 | 0.503 | 0.408 | 0.473 |
TNN-1 | 0.677 | 0.125 | 0.149 | 0.678 | |
Adelaide | |||||
Ta, Rs | Stephens and Stewart | 0.627 | 0.383 | 0.302 | 0.628 |
TNN-2 | 0.795 | 0.114 | 0.133 | 0.795 | |
Ta | Thornthwaite | 0.411 | 0.479 | 0.373 | 0.412 |
TNN-1 | 0.717 | 0.130 | 0.153 | 0.728 |
Input combination . | Station/Model . | R2 . | MAE (mm) . | RMSE (mm) . | NSE . |
---|---|---|---|---|---|
Alice Springs | |||||
Ta, Rs | Stephens and Stewart | 0.657 | 0.393 | 0.318 | 0.676 |
TNN-2 | 0.769 | 0.100 | 0.121 | 0.770 | |
Ta | Thornthwaite | 0.453 | 0.510 | 0.410 | 0.454 |
TNN-1 | 0.660 | 0.122 | 0.148 | 0.661 | |
Brisbane | |||||
Ta, Rs | Stephens and Stewart | 0.667 | 0.398 | 0.308 | 0.668 |
TNN-2 | 0.805 | 0.097 | 0.116 | 0.806 | |
Ta | Thornthwaite | 0.472 | 0.503 | 0.408 | 0.473 |
TNN-1 | 0.677 | 0.125 | 0.149 | 0.678 | |
Adelaide | |||||
Ta, Rs | Stephens and Stewart | 0.627 | 0.383 | 0.302 | 0.628 |
TNN-2 | 0.795 | 0.114 | 0.133 | 0.795 | |
Ta | Thornthwaite | 0.411 | 0.479 | 0.373 | 0.412 |
TNN-1 | 0.717 | 0.130 | 0.153 | 0.728 |
Discussion
Transformers are regarded as a state-of-the-art approach to DL. The advent of the transformer has transformed the application of attention by substituting convolution and recurrence with its unique self-attention mechanism. In terms of time series forecasting, transformers do not analyse their inputs sequentially. Thus, the vanishing gradient issue, which negatively affects the performance of RNNs in long-term prediction, has been addressed by this approach. A new self-attention algorithm has been applied to the evaporative prediction domain in the present research. Good performances were shown by the developed TNN model with respect to prediction Ep applicable to all three chosen stations. The efficacy of the TNN model is evident in its consistency of Ep prediction performance and high accuracy across all the studied sites. This can be attributed to the incorporation of self-attention mechanisms, which are integral to the transformer's architecture. This demonstrates the potential of using intra-attention to enhance the performance of DL predictive models.
As mentioned in the literature, with regards to applicability, the current methods of prediction of open water evaporation differ between and, in certain cases, within regions as well; there is no best method to be generally adopted. In fact, evaporation prediction models face limitations due to variability in environmental factors such as wind speed, humidity, and solar radiation (Shiri & Kişi 2011). They may also struggle with accuracy over diverse geographic regions and lack precision for specific local conditions. In addition, model assumptions and simplifications can also affect reliability, especially in complex terrain or changing climate scenarios (Abed et al. 2022b). As a result, the transformer model, when improved with the self-attention component, is a promising technique for Ep prediction. Additional research can also select a wider range of other regions, with different weather conditions. Nonetheless, DL has far-reaching implications in terms of managing water resource systems and irrigations, as it can be used to monitor monthly variations in evaporation. However, a few challenges need to be considered when developing DL models, including the computational demands, as these models require substantial computational resources, particularly for training large-scale models with extensive datasets. Furthermore, such models require significant amounts of training data during the calibration phase.
The Ep modelling approach, demonstrated by this research, provides a reliable estimate for water losses caused by evaporation, which is essential for managing water resources efficiently. Multiplying Ep values by the land area of watering resources offers an effective scientific-based method to assess the quantity of evaporative water loss, a major factor in determining the existing water asset volume. This calculation simplifies the task of estimating the total amount of existing water available for watering and enables the use of a set of intelligent irrigation schedules. These also help reduce unessential water loss and make the irrigation process more efficient. Thus, this research suggests that the application of the TNN model to predict Ep comes with considerable economic benefits for farmers, especially in areas suffering from drought, water scarcity, or other hydrological imbalances. Additionally, it offers valuable insight for hydrologists about how to incorporate soft computing into their analysis of non-stationary and non-linear hydrological variables.
CONCLUSION
The objective of this study was to develop a TNN predictive model for monthly evaporation losses and compare it to other empirical models, such as the temperature-based model (Thornthwaite) and the radiation-based model (Stephens and Stewart). The effectiveness of the DL model was assessed by predicting Ep using monthly data from three weather stations located in Australia: Alice Springs, Brisbane, and Adelaide. The monthly Ep from 2009 to 2022 was used as time series data for training (calibration) and testing (validation) of the designed model. The PCC was used to choose the appropriate input parameters (predictors) for the TNN model in terms of Ep forecasting. Conventional evaluation metrics were used to determine the effectiveness of each model.
The investigation led to the following findings:
1. The developed TNN model demonstrated a remarkable level of accuracy when used to forecast monthly Ep values at each of the sites selected in this study.
2. All stations showed the highest prediction accuracy when using models based on comprehensive meteorological datasets, as opposed to models using limited data.
3. The developed TNN model proved to be superior to the empirical methods. Furthermore, when using the same set of inputs as those methods, the accuracy of TNN's monthly Ep projections was significantly improved.
4. To create an Ep prediction model that is highly reliable and widely applicable, it is crucial to evaluate the technique's effectiveness across various regions in Australia and globally. This exploration holds significant promise for future research endeavours. By subjecting the technique to examination in various geographical contexts, the foundation can be laid for a comprehensive understanding of its capabilities and potential, paving the way for robust and versatile advancements in this field of study.
ACKNOWLEDGEMENT
The author would like to thank the Australian Government Research Training Program Scholarship (RTP) for its support.
AUTHOR CONTRIBUTIONS
M.A.: methodology, formal analysis, visualization, and writing – review and editing. M.A.I. and A.N.A.: writing – review and editing and supervision. Y.F.H.: writing – review and editing.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.