Abstract
Accurate prediction of pan evaporation (PE) is one of the crucial factors in water resources management and planning in agriculture. In this research, two hybrid models, self-adaptive time-frequency methodology, ensemble empirical mode decomposition (EEMD) coupled with support vector machine (EEMD-SVM) and EEMD model tree (EEMD-MT), were employed to forecast monthly PE. The EEMD-SVM and EEMD-MT were compared with single SVM and MT models in forecasting monthly PE, measured between 1975 and 2008, at Siirt and Diyarbakir stations in Turkey. The results were evaluated using four assessment criteria, Nash–Sutcliffe Efficiency (NSE), root mean square error (RMSE), performance index (PI), Willmott's index (WI), and Legates–McCabe's index (LMI). The EEMD-MT model respectively improved the accuracy of MT by 36 and 44.7% with respect to NSE and WI in the testing stage for the Siirt station. For the Diyarbakir station, the improvements in results were less spectacular, with improvements in NSE (1.7%) and WI (2.2%), respectively, in the testing stage. The overall results indicate that the proposed pre-processing technique is very promising for complex time series forecasting and further studies incorporating this technique are recommended.
INTRODUCTION
Evaporation is probably the most complicated and difficult parameter to forecast among all the elements of the hydrological cycle due to the complex interactions among the hydrologic components of water surface, land, atmosphere process, and vegetation. Hence, proper forecasting of evaporation is a significant issue in water resources management and agriculture, particularly in semi-arid and arid areas. Because of temperature differences, this hydrological phenomenon is a nonlinear process which occurs in nature (Irmak et al. 2002; Trajkovic & Kolakovic 2010; Tezel & Buyukyildiz 2016; Malik et al. 2017). There are many parameters influencing pan evaporation (PE) rates, including air temperature (TA), solar radiation (SR), sunshine hours (HS), relative humidity (RH) and wind speed (WS). Although accurate estimation of PE is of high importance, especially in studies related to integrated water resources management, the effects of various weather parameters on pan evaporation in different regions are still not well understood.
In general, there are two main approaches, namely direct and indirect procedures, for calculating and forecasting evaporation. Direct measurements of PE are widely applied for predicting evaporation. Indirect methods such as energy-budget, water budgets, and Penman–Monteith techniques are based on various climate variables (e.g. TA, WS and SR), which are not easily accessible. It is not possible nowadays to extract a mathematical relationship that includes all of the physical processes due to the nonlinearity and complexity of the evaporation process. Several researchers therefore recommend the use of models (Tezel & Buyukyildiz 2016; Wang et al. 2017; Tan et al. 2018) that do not require a priori knowledge of the underlying physical processes or excessive data. Such models are considered to be more adequate (adaptable) in real-decision support systems compared to the physical-based methods (Kisi 2007; Ch et al. 2014; Deo & Şahin 2015). Considering that evaporation is highly nonlinear, these techniques are able to map the nonlinearities associated with evaporation and its input parameters to a high degree of accuracy.
In the last decade, machine learning (ML) methods such as Adaptive Neuro Fuzzy Inference System (ANFIS), artificial neural network (ANN), model tree (MT), gene expression programming (GEP), extreme learning machine (ELM) and support vector machine (SVM) have been successfully employed for solving a wide range of environmental and water engineering problems (Adamowski & Karapataki 2010; Chua et al. 2011; Sattar 2013; Ch et al. 2014; He et al. 2014; Ebtehaj et al. 2015; Deo et al. 2016; Najafzadeh et al. 2016; Li et al. 2017; Mouatadid & Adamowski 2017; Rezaie-Balf & Kisi 2017; Rezaie-Balf et al. 2017a, 2017b; Yu et al. 2017). Many studies have also been conducted using ML methods for predicting pan evaporation (Tabari et al. 2012; Lin et al. 2013; Kim et al. 2015; Tezel & Buyukyildiz 2016; Kisi et al. 2016; Malik et al. 2017; Wang et al. 2017), although the development of powerful approaches and methods with high levels of reliability to achieve accurate forecasts remains a major challenge. This is because time series PE data are generally highly nonlinear and seasonal. Under such situations, applying the raw data to the model directly may not produce accurate results if the data are highly nonlinear. In this context, using a preprocessing technique may enhance the model's performance (Wu et al. 2010). In recent years, researchers using data pre-processing tools such as principal component analysis (PCA), continuous wavelet transform (CWT), moving average (MA), wavelet multi-resolution analysis (WMRA), maximum entropy spectral analysis (MESA), and singular spectrum analysis (SSA) have been successful in improving forecasting accuracy (Hu et al. 2007; Kişi 2009; Sang et al. 2009; Wu et al. 2009; Wu & Chau 2011; Nourani et al. 2013; Sang et al. 2013; Rezaie-Balf et al. 2017a, 2017b; Tiwari & Adamowski 2017; Wen et al. 2017).
More recently, new noise assisted data analysis techniques, empirical mode decomposition (EMD) and ensemble empirical mode decomposition (EEMD), were pioneered by Huang et al. (1998) and Wu & Huang (2009), respectively. The EEMD approach is an advanced form of traditional decomposition techniques (e.g. wavelet decomposition and Fourier decomposition), and is an intuitive, empirical and self-adaptive data processing tool which was developed for non-linear and non-stationary time series. Recently, several successful applications have been presented which utilized EEMD to forecast time series phenomena such as rainfall and runoff (Karthikeyan & Kumar 2013; Latifoglu et al. 2015; Wang et al. 2015; Zhao & Chen 2015; Barge & Sharif 2016), water quality parameters (Fijani et al. 2019), soil moisture (Prasad et al. 2018), and electricity demand (Al-Musaylh et al. 2018). For instance, Napolitano et al. (2011) carried out a study to explore several aspects of ANN in forecasting daily stream flow time series using EMD. They showed that using EMD analysis, the ANN can have a higher prediction accuracy and reliability. Likewise, Sang et al. (2012) presented a hybrid EMD and maximum entropy spectral analysis method. They proved that this method, in addition to improving the period identification, also can improve overall period identification by being able to discriminate period, trend, and noise. Wang et al. (2013) applied EEMD as an adaptive data analysis tool for decomposing annual rainfall time series in modeling rainfall-runoff process using a SVM model. They employed particle swarm optimization (PSO) to calculate optimal SVM control parameters. Similar to previous investigations, they showed their proposed model's efficiency and capability in rainfall-runoff forecasting. Moreover, a four-stage model, EEMD-based radial basis function neural network (RBFN) and linear neural network (LNN), was developed by Di et al. (2014) for hydrological time series forecasting. They considered six datasets and showed that their model outperformed the RBFNN, EMD-RBFNN, and EMD-WA-RBFNN-LNN methods.
The overall goal of the current study is to propose a new hybrid model to improve the monthly PE forecasting. The SVM and MT techniques were investigated and improvements of the models' accuracy were tested by coupling them with the EEMD procedure. The data used are the monthly PE data recorded at Siirt and Diyarbakir stations, located in Turkey. The rest of the present research is organized as follows. The next section provides a case study and data analysis, followed by a section describing the methodology of EEMD and each forecasting method (i.e. MT and SVM). Next, the statistical indicators applied in this study are provided. The penultimate section represents the forecasting results of the hybrid models and the comparison results, and the final section consists of a summary and conclusion of outline results.
STUDY AREA AND DATASET
To validate the performance of the proposed approach, the monthly climatic data and PE from Siirt station (latitude 37° 55′ N, longitude 41° 56′ E, and altitude 896 m) and Diyarbakir station (latitude 37° 54′ N, longitude 40° 13′ E, and altitude 649 m), operated by the Turkish Meteorological Organization (TMO) were utilized. All the climatic data were obtained from TMO. It is the only legal organization that provides meteorological information in Turkey. TMO measures pan evaporation, which are not publicly available, using Standard A pans (World Meteorological Organization standards). The 187 datasets of climatic variables comprising wind speed (WS), temperature (TA), relative humidity (RH) and solar radiation (SR) of the aforementioned stations consisting of 33 years (1975–2008) are described in Table 1. Furthermore, data were not available for January, February, March and December and about 5% of the whole data were missing for May and April. Data were directly used in this study without any pre-processing (e.g. filling in missing ones). Seventy-five per cent of the whole data was used for training of the applied models and the obtained models were tested by using the remaining 25% data.
Descriptive statistics of the meteorological data at Siirt and Diyarbakir stations
Statistical parameters . | Meteorological variables with unit . | ||||
---|---|---|---|---|---|
T (oc) . | RS (MJ/m2) . | Rh (%) . | U2 (m/s) . | PE (mm/day) . | |
Siirt station (training data set) | |||||
Minimum | 10.1 | 216.16 | 17.9 | 0.7 | 1.4 |
Maximum | 33.5 | 859.12 | 72.3 | 2.8 | 15.7 |
Mean | 23.34 | 585.25 | 39.99 | 1.67 | 7.89 |
Sx | 6.058 | 152.61 | 12.36 | 0.45 | 3.34 |
Csx | –0.37 | –0.58 | 0.48 | 0.13 | 0.008 |
Kx | –0.94 | –0.405 | –0.52 | –0.79 | –0.83 |
Siirt station (testing data set) | |||||
Minimum | 10.4 | 321.14 | 25.6 | 0.6 | 2 |
Maximum | 32.3 | 688.38 | 69.2 | 1.4 | 16.6 |
Mean | 24.72 | 547.22 | 40.04 | 1.11 | 9.55 |
Sx | 5.69 | 116.48 | 11.62 | 0.19 | 3.68 |
Csx | –0.48 | –0.57 | 0.77 | –0.76 | –0.06 |
Kx | –0.65 | –0.91 | –0.21 | 0.19 | –0.81 |
Diyarbakir station (training data set) | |||||
Minimum | 8 | 163.41 | 13.4 | 0.2 | 1.2 |
Maximum | 33.4 | 917.65 | 81.2 | 4.9 | 20.2 |
Mean | 23.21 | 643.05 | 42.27 | 2.51 | 8.38 |
Sx | 6.472 | 176.21 | 15.01 | 1.01 | 4.04 |
Csx | –0.37 | –0.66 | 0.48 | –0.51 | 0.12 |
Kx | –0.96 | –0.37 | –0.88 | –0.24 | –0.81 |
Diyarbakir station (testing data set) | |||||
Minimum | 10.3 | 415.35 | 10.7 | 1.1 | 3.6 |
Maximum | 32.6 | 890.11 | 79.3 | 4 | 16.8 |
Mean | 24.52 | 694.96 | 33.61 | 2.43 | 10.03 |
Sx | 6.33 | 144.52 | 17.93 | 0.69 | 3.76 |
Csx | –0.32 | –0.39 | 0.93 | 0.21 | –0.14 |
Kx | –1.07 | –0.97 | 0.33 | –0.35 | –1.21 |
10.3 | 415.35 | 10.7 | 1.1 | 3.6 |
Statistical parameters . | Meteorological variables with unit . | ||||
---|---|---|---|---|---|
T (oc) . | RS (MJ/m2) . | Rh (%) . | U2 (m/s) . | PE (mm/day) . | |
Siirt station (training data set) | |||||
Minimum | 10.1 | 216.16 | 17.9 | 0.7 | 1.4 |
Maximum | 33.5 | 859.12 | 72.3 | 2.8 | 15.7 |
Mean | 23.34 | 585.25 | 39.99 | 1.67 | 7.89 |
Sx | 6.058 | 152.61 | 12.36 | 0.45 | 3.34 |
Csx | –0.37 | –0.58 | 0.48 | 0.13 | 0.008 |
Kx | –0.94 | –0.405 | –0.52 | –0.79 | –0.83 |
Siirt station (testing data set) | |||||
Minimum | 10.4 | 321.14 | 25.6 | 0.6 | 2 |
Maximum | 32.3 | 688.38 | 69.2 | 1.4 | 16.6 |
Mean | 24.72 | 547.22 | 40.04 | 1.11 | 9.55 |
Sx | 5.69 | 116.48 | 11.62 | 0.19 | 3.68 |
Csx | –0.48 | –0.57 | 0.77 | –0.76 | –0.06 |
Kx | –0.65 | –0.91 | –0.21 | 0.19 | –0.81 |
Diyarbakir station (training data set) | |||||
Minimum | 8 | 163.41 | 13.4 | 0.2 | 1.2 |
Maximum | 33.4 | 917.65 | 81.2 | 4.9 | 20.2 |
Mean | 23.21 | 643.05 | 42.27 | 2.51 | 8.38 |
Sx | 6.472 | 176.21 | 15.01 | 1.01 | 4.04 |
Csx | –0.37 | –0.66 | 0.48 | –0.51 | 0.12 |
Kx | –0.96 | –0.37 | –0.88 | –0.24 | –0.81 |
Diyarbakir station (testing data set) | |||||
Minimum | 10.3 | 415.35 | 10.7 | 1.1 | 3.6 |
Maximum | 32.6 | 890.11 | 79.3 | 4 | 16.8 |
Mean | 24.52 | 694.96 | 33.61 | 2.43 | 10.03 |
Sx | 6.33 | 144.52 | 17.93 | 0.69 | 3.76 |
Csx | –0.32 | –0.39 | 0.93 | 0.21 | –0.14 |
Kx | –1.07 | –0.97 | 0.33 | –0.35 | –1.21 |
10.3 | 415.35 | 10.7 | 1.1 | 3.6 |
Xmin, Xmax, Xmean, Sx, Csx, and Kx denote the minimum, maximum, mean, standard deviation, skewness, and kurtosis, respectively.
The location of the study area is illustrated in Figure 1. In this area, average annual precipitation is between 400 and 800 mm, falling mainly in the winter months of December to January. The mean values of the pan evaporation during spring and summer (main crop growing seasons) at Siirt and Diyarbakir are 8.81 and 7.47 mm, respectively. Summers are hot and dry and the annual temperature ranges from 16.0 to 46.8 °C (Keshtegar & Kisi 2017). According to the de Martonne (1926) aridity index, both Siirt and Diyarbakir stations have a dry, semiarid (semi-desert) climate.
MODELS USED
Two ML algorithms, SVM and MT, were developed and coupled with EEMD as a data pre-processing procedure for forecasting monthly PE in Siirt and Diyarbakir. In this section, the description of EEMD, SVM and MT are briefly presented.
Ensemble empirical mode decomposition (EEMD)
The EEMD method, proposed by Wu & Huang (2009), is an empirical procedure used to represent a nonlinear and non-stationary signal from original data. This data pre-processing method is an improvement of the EMD, self-adaptive time-frequency transformation procedure, and does not rely on process information. EMD is used mainly for decomposing the original time series data into a finite and low number of oscillatory modes depending on the local characteristic time scale (Huang et al. 1998). The oscillatory modes are revealed by intrinsic mode function (IMFs) components, embedded in the data. These component signals are represented as a sum of zero-mean well behaved fast and slow oscillation modes (IMFs) which have two requirements (Huang et al. 1998; Wu & Huang 2009): (i) Throughout the whole length of data, the number of zero-crossings and the number of extrema must either be equal or at least differ by one; and (ii) At any point, the mean value of the local maxima and minima envelope is zero. According to these properties, some meaningful IMFs can be well defined. Generally, an IMF indicates a simple oscillatory mode, compared to simple harmonic function. Based on this definition, a shifting process of original time series can be briefly expressed as follows (Huang et al. 1998).
Step 1: Identify all extrema (local maxima and minima) points of the given time series y(t);
Step 2: Connect by spline interpolation, to create an upper envelope of local maxima points emax(t), and a lower envelope emin(t) of all minima points;
Step 5: If h(t) satisfies the two properties of IMF as indicated by a predefined stopping criterion, h(t) is indicated as the first IMF (written as c1(t) where 1 is its index); If h(t) is not an IMF, y(t) is replaced with h(t) and iterate steps 1–4 until h(t) satisfies the two conditions of IMFs.
Support vector machine (SVM)
Support vector machines, firstly presented by Cortes & Vapnik (1995), is an effective tool for solving nonlinear problems by relating input variables to a defined output. The least square SVM is an advanced type of SVM, developed by Suykens & Vandewalle (1999), and is used for chaotic time series prediction. The SVM on the basis of the structural risk minimization (SRM) principle, minimizes the expected error in order to reduce the over-fitting. In SVM, in order to separate the data patterns, the input data are mapped into a higher dimensional feature space.



















Model tree (MT)
The MT is a state-of-the-art hierarchical algorithm, presented by Quinlan (1992), for providing the relation between input–output parameters. This algorithm is used to solve the problem by dividing the data space into several sub-problems (sub-spaces) and builds piecewise linear regression models for each sub-domain. The data records, in classification trees, are classified by sorting the tree from root node to some leaf nodes of the tree. The MT is based on the well-known CART algorithm (Breiman et al. 1984), which deals with continuous-class learning attributes. This algorithm is known to be one of the most efficient methods to present physically meaningful insights for a given phenomenon (Kisi et al. 2016). Figure 2 illustrates the tree-building process, within four linear regression models and the knowledge extracted from the structure for corresponding sub-domains. Furthermore, a general tree structure of MT approach is illustrated in Figure 2(b). In this figure, if X1 and X2 are less than 3 and 1.5, respectively, the fifth model should be considered as a form of Y = a0 + a1X1 + a2X2.
Splitting the input space and prediction by the model tree for a new data record: (a) splitting of the input space (X1 × X2) by the M5p model tree algorithm; (b) predicting a new data record by the model tree.
Splitting the input space and prediction by the model tree for a new data record: (a) splitting of the input space (X1 × X2) by the M5p model tree algorithm; (b) predicting a new data record by the model tree.
STATISTICAL PERFORMANCE INDICATORS
In the present research, to assess the EEMD-SVM, EEMD-MT, SVM and MT models, several error indicators were used:
- 1.Nash–Sutcliffe Efficiency (NSE): This criterion is taken into account to evaluate the ability of hydrological models. The highest value of unity demonstrates a perfect fit between observed and predicted PE where a negative value means that the model performs worse than the arithmetic mean of the time series (Nash & Sutcliffe 1970):
- 2.Root mean square error (RMSE): RMSE shows the standard deviation of the forecasting errors or residuals. RMSE varies from zero for perfect estimates to large positive values for poor estimates (Hyndman & Koehler 2006):
- 3.Performance index (PI): PI values vary from 0 to +∞, with lower values showing better accuracy. A PI value close to zero (e.g. PI ≤ 0.2) indicates that the model predicts the observed values well (Gandomi & Roke 2015):
- 4.Willmott's index of agreement (WI): WI is a standardized measure for model prediction error and ranges from 0 to 1. The values close to 0 show poor accuracy while the values close to 1 reveal good performance (Willmott et al. 2012):




MODELS APPLICATION AND RESULTS ANALYSIS
Models implementation based on EEMD for PE forecasting
The main goal of the EEMD-MT/SVM models is to forecast monthly PE at different stations in Turkey. According to Figure 3, the workflow of the proposed approach contains three main steps to enhance the performance of proposed forecasting techniques that can be expressed as follows:
In Step 1, the EEMD procedure is firstly utilized to decompose the all original input/output time series y(t) into several IMF components Ci(t) (i = 1, 2, 3, … , n) and one residual component rn(t).
Next, for each extracted IMF component and the residual component (for example IMF1), the SVM and MT are established as forecasting models to simulate the decomposed IMF and residual components, and to predict each component by the same sub-series (IMF1) of four input variables, respectively.
In Step 3, the forecasted values of all extracted IMF and residue components by the SVM and MT models are aggregated to generate the modelled PE time series.
To summarize, the hybrid EEMD-SVM/MT forecasting tools establish the idea of ‘decomposition and ensemble’. The decomposition is used to simplify the forecasting process, and the ensemble is utilized to formulate a consensus forecasting on the original time series.
Forecasting of monthly PE using proposed models at Siirt station
Next, the accuracy of the proposed models in forecasting of monthly PE was assessed at both training and testing stages (Table 2). Based on the comparison between SVM and MT approaches, the MT is found to perform better than the SVM in terms of statistical metrics during training and testing stages at Siirt station. The quantitative evaluation results in Table 2 also show that EEMD-MT has a lower value of RMSE (0.765 mm) and PI (0.079 mm) compared to the EEMD-SVM model in the training stage for the Siirt station. Moreover, the EEMD-MT provided more accurate results in terms of NSE (0.89), WI (0.969), and LMI (0.699) than the SVM and MT.
Evaluation metrics of the models in the training and testing stages at Siirt station
Models . | Statistical error indices . | ||||
---|---|---|---|---|---|
NSE . | RMSE (mm/day) . | PI (mm/day) . | WI . | LMI . | |
Total available data in training stage | |||||
SVM | 0.842 | 1.304 | 0.126 | 0.942 | 0.672 |
MT | 0.901 | 1.047 | 0.110 | 0.962 | 0.724 |
EEMD-SVM | 0.929 | 0.887 | 0.093 | 0.978 | 0.774 |
EEMD-MT | 0.947 | 0.765 | 0.079 | 0.986 | 0.792 |
Total available data in testing stage | |||||
SVM | 0.463 | 2.663 | 0.137 | 0.814 | 0.341 |
MT | 0.653 | 2.140 | 0.101 | 0.893 | 0.492 |
EEMD-SVM | 0.796 | 1.641 | 0.069 | 0.934 | 0.591 |
EEMD-MT | 0.893 | 1.183 | 0.051 | 0.969 | 0.699 |
Models . | Statistical error indices . | ||||
---|---|---|---|---|---|
NSE . | RMSE (mm/day) . | PI (mm/day) . | WI . | LMI . | |
Total available data in training stage | |||||
SVM | 0.842 | 1.304 | 0.126 | 0.942 | 0.672 |
MT | 0.901 | 1.047 | 0.110 | 0.962 | 0.724 |
EEMD-SVM | 0.929 | 0.887 | 0.093 | 0.978 | 0.774 |
EEMD-MT | 0.947 | 0.765 | 0.079 | 0.986 | 0.792 |
Total available data in testing stage | |||||
SVM | 0.463 | 2.663 | 0.137 | 0.814 | 0.341 |
MT | 0.653 | 2.140 | 0.101 | 0.893 | 0.492 |
EEMD-SVM | 0.796 | 1.641 | 0.069 | 0.934 | 0.591 |
EEMD-MT | 0.893 | 1.183 | 0.051 | 0.969 | 0.699 |
The bold numbers represent the values of performance criteria for the best fitted models.
In the testing stage, with respect to the measured value of PE, it can be seen from Table 2, in terms of NSE and WI, that the EEMD-MT and EEMD-SVM models are able to model PE at Siirt station well. The SVM model has the highest error in estimating PE (RMSE = 2.663 mm; PI = 0.137 mm) compared to the other models. In addition, the EEMD-MT model improves the MT model with 44.71 and 49.50% reductions in term of RMSE and PI, respectively, and a 36, 8 and 42% increase in NSE, WI, and LMI, respectively.
Comparison of the modelled PE is made against measured values in the scatter plots (Figure 5) for both training and testing stages. The figure illustrates the superiority of the EEMD-MT model during the test stage for Siirt station. The superiority of the EEMD-MT model over the SVM, MT, and EEMD-SVM models is also clear from Figure 6. These results support our hypothesis that the EEMD method is suitable for decomposing monthly PE time series, and the idea of ‘decomposition and ensemble’ is feasible and the proposed EEMD-SVM/MT models can overcome drawbacks of the individual SVM and MT models by generating a synergetic effect in forecasting. Thus the monthly PE data decomposed by using the EEMD procedure can improve the forecast results at Siirt.
Scatter plots of observed vs. estimated for proposed models in training and testing stages at Siirt station.
Scatter plots of observed vs. estimated for proposed models in training and testing stages at Siirt station.
Time series plot of proposed models in testing stage at Siirt station.
Forecasting of monthly PE using proposed models at Diyarbakir station
Similar to the methodology used for Siirt station, four proposed models (i.e. SVM, MT, EEMD-SVM, and EEMD-MT) were developed for the purpose of forecasting monthly PE at Diyarbakir station. Specifically, SVM and MT were adopted as benchmark models and the results compared with the conjunction models which used EEMD to pre-process the original monthly PE data, adopting the phase-space reconstruction method to design the input vectors. The training and testing data sets were identical across all models. Table 3 indicates the performance results for the four models in terms of the four aforementioned statistical error metrics. The results show that the hybrid techniques have the lowest RMSE and PI values and the highest NSE and WI, outperforming the SVM and MT models, at both training and testing stages. In the training stage, the comparison of the SVM and MT models shows that the MT model has better accuracy (NSE = 0.882, WI = 0.953, and LMI = 0.755) in forecasting of PE; however, these models' results are significantly improved by coupling with the EEMD technique. The EEMD-MT approach outperformed all other models (Table 3). By integrating the EEMD with SVM, RMSE and PI were reduced to 1.270 and 0.138 mm in the training stage, respectively. It can be seen from Table 3 that by coupling with EEMD, the RMSE and PI of single MT were reduced by 25.9–28.9 and 19.4–75.5% in the training and testing stages, respectively. Improvements in the forecast results were approximately 5.2, 6.1 and 4.5% respectively, for NSE, WI and LMI values at the training stage, and 1.7, 2.2 and 11.6%, respectively, in the testing stage.
Evaluation metrics of the models in the training and testing stages at Diyarbakir station
Models . | Statistical error indices . | ||||
---|---|---|---|---|---|
NSE . | RMSE (mm/day) . | PI (mm/day) . | WI . | LMI . | |
Total available data in training stage | |||||
SVM | 0.866 | 1.521 | 0.196 | 0.925 | 0.711 |
MT | 0.882 | 1.451 | 0.176 | 0.953 | 0.755 |
EEMD-SVM | 0.901 | 1.270 | 0.138 | 0.972 | 0.773 |
EEMD-MT | 0.928 | 1.075 | 0.125 | 0.981 | 0.789 |
Total available data in testing stage | |||||
SVM | 0.869 | 1.462 | 0.149 | 0.958 | 0.703 |
MT | 0.907 | 1.362 | 0.139 | 0.963 | 0.712 |
EEMD-SVM | 0.902 | 1.268 | 0.138 | 0.973 | 0.733 |
EEMD-MT | 0.923 | 1.098 | 0.034 | 0.985 | 0.795 |
Models . | Statistical error indices . | ||||
---|---|---|---|---|---|
NSE . | RMSE (mm/day) . | PI (mm/day) . | WI . | LMI . | |
Total available data in training stage | |||||
SVM | 0.866 | 1.521 | 0.196 | 0.925 | 0.711 |
MT | 0.882 | 1.451 | 0.176 | 0.953 | 0.755 |
EEMD-SVM | 0.901 | 1.270 | 0.138 | 0.972 | 0.773 |
EEMD-MT | 0.928 | 1.075 | 0.125 | 0.981 | 0.789 |
Total available data in testing stage | |||||
SVM | 0.869 | 1.462 | 0.149 | 0.958 | 0.703 |
MT | 0.907 | 1.362 | 0.139 | 0.963 | 0.712 |
EEMD-SVM | 0.902 | 1.268 | 0.138 | 0.973 | 0.733 |
EEMD-MT | 0.923 | 1.098 | 0.034 | 0.985 | 0.795 |
The bold numbers represent the values of performance criteria for the best fitted models.
The measured and forecasted values of PE during training and testing stages are compared in Figures 7 and 8 for the four models. It can be seen from Figure 8 that SVM and MT models are able to follow the general trend of measured PE quite well, although these models are not able to match the extreme high and low values accurately, especially at the latter part of the testing period. In fact, the SVM and MT models under-predict PE for the testing data from 2006 onwards, whereas the EEMD-SVM/MT models are able to provide accurate estimates during this period of time (with the exception of 2008 where it is noticed that PE is under-predicted by all models). In general, the results of this analysis illustrate that the proposed EEMD-SVM/MT models are able to obtain better results than the SVM/MT models, shown by the improvements in different evaluation measures.
Scatter plots of observed vs. estimated for proposed models in training and testing stages at Diyarbakir station.
Scatter plots of observed vs. estimated for proposed models in training and testing stages at Diyarbakir station.
Time series plot of proposed models in testing stage at Diyarbakir station.
Models assessment at spring and summer seasons
To further evaluate the performance of the proposed models, the results were assessed in the spring and summer seasons only, since these are the main crop growing periods. Hence, further analysis was conducted for the period from May to October for 1975–2008 in both stations. Table 4 illustrates the results of the EEMD-SVM and EEMD-MT models in forecasting PE from May to October for Siirt station. As can be seen in Table 4, the EEMD-MT has a lower error in terms of RMSE and PI for all months compared to the EEMD-SVM model. It is notable that the models' accuracy (NSE, WI and LMI) from May to July reduces but then increases for the months of August, September, and October.
Results of EEMD-SVM and EEMD-MT models for PE forecasting at spring and summer seasons in Siirt station
Models . | . | Statistical error indices . | ||||
---|---|---|---|---|---|---|
Month . | NSE . | RMSE (mm/day) . | PI (mm/day) . | WI . | LMI . | |
EEMD-SVM | May | 0.639 | 0.849 | 0.079 | 0.888 | 0.527 |
June | 0.546 | 1.319 | 0.0732 | 0.786 | 0.485 | |
July | 0.428 | 1.357 | 0.062 | 0.748 | 0.453 | |
August | 0.377 | 1.311 | 0.063 | 0.691 | 0.396 | |
September | 0.570 | 0.736 | 0.048 | 0.849 | 0.568 | |
October | 0.301 | 1.045 | 0.112 | 0.762 | 0.459 | |
EEMD-MT | May | 0.704 | 0.768 | 0.069 | 0.908 | 0.690 |
June | 0.655 | 1.149 | 0.0621 | 0.858 | 0.621 | |
July | 0.724 | 0.943 | 0.041 | 0.895 | 0.636 | |
August | 0.684 | 0.933 | 0.043 | 0.864 | 0.562 | |
September | 0.642 | 0.671 | 0.043 | 0.882 | 0.656 | |
October | 0.423 | 0.801 | 0.0653 | 0.827 | 0.596 |
Models . | . | Statistical error indices . | ||||
---|---|---|---|---|---|---|
Month . | NSE . | RMSE (mm/day) . | PI (mm/day) . | WI . | LMI . | |
EEMD-SVM | May | 0.639 | 0.849 | 0.079 | 0.888 | 0.527 |
June | 0.546 | 1.319 | 0.0732 | 0.786 | 0.485 | |
July | 0.428 | 1.357 | 0.062 | 0.748 | 0.453 | |
August | 0.377 | 1.311 | 0.063 | 0.691 | 0.396 | |
September | 0.570 | 0.736 | 0.048 | 0.849 | 0.568 | |
October | 0.301 | 1.045 | 0.112 | 0.762 | 0.459 | |
EEMD-MT | May | 0.704 | 0.768 | 0.069 | 0.908 | 0.690 |
June | 0.655 | 1.149 | 0.0621 | 0.858 | 0.621 | |
July | 0.724 | 0.943 | 0.041 | 0.895 | 0.636 | |
August | 0.684 | 0.933 | 0.043 | 0.864 | 0.562 | |
September | 0.642 | 0.671 | 0.043 | 0.882 | 0.656 | |
October | 0.423 | 0.801 | 0.0653 | 0.827 | 0.596 |
In Diyarbakir station, the forecasted PE values in both the spring and summer seasons with the EEMD-SVM/MT models have a similar trend to the Siirt station (Table 5). Also, it can be said that the EEMD-SVM has inadequate accuracy in most of the months and it could just be a drawback of this model. Moving to a more detailed analysis on seasonal forecasting of the proposed models, Figure 9 illustrates RMSE and PI indices along May–October for the aforementioned period. As can be seen in both stations, the EEMD-MT model has lower error compared to EEMD-SVM. This indicates that the EEMD-SVM model may not provide a complete solution to every case. This also shows that accurate prediction of monthly PE is a complex issue.
Results of EEMD-SVM and EEMD-MT models for PE forecasting at spring and summer seasons in Diyarbakir station
Models . | . | Statistical error indices . | ||||
---|---|---|---|---|---|---|
Month . | NSE . | RMSE (mm/day) . | PI (mm/day) . | WI . | LMI . | |
EEMD-SVM | May | 0.677 | 0.802 | 0.068 | 0.924 | 0.723 |
June | 0.664 | 1.159 | 0.092 | 0.927 | 0.715 | |
July | 0.183 | 1.498 | 0.085 | 0.812 | 0.562 | |
August | 0.380 | 0.973 | 0.055 | 0.825 | 0.593 | |
September | 0.026 | 1.039 | 0.082 | 0.760 | 0.463 | |
October | 0.535 | 0.820 | 0.106 | 0.849 | 0.617 | |
EEMD-MT | May | 0.805 | 0.623 | 0.053 | 0.953 | 0.753 |
June | 0.523 | 1.381 | 0.073 | 0.832 | 0.622 | |
July | 0.620 | 0.848 | 0.035 | 0.892 | 0.662 | |
August | 0.652 | 0.729 | 0.029 | 0.896 | 0.678 | |
September | 0.601 | 0.666 | 0.038 | 0.903 | 0.692 | |
October | 0.866 | 0.440 | 0.037 | 0.969 | 0.783 |
Models . | . | Statistical error indices . | ||||
---|---|---|---|---|---|---|
Month . | NSE . | RMSE (mm/day) . | PI (mm/day) . | WI . | LMI . | |
EEMD-SVM | May | 0.677 | 0.802 | 0.068 | 0.924 | 0.723 |
June | 0.664 | 1.159 | 0.092 | 0.927 | 0.715 | |
July | 0.183 | 1.498 | 0.085 | 0.812 | 0.562 | |
August | 0.380 | 0.973 | 0.055 | 0.825 | 0.593 | |
September | 0.026 | 1.039 | 0.082 | 0.760 | 0.463 | |
October | 0.535 | 0.820 | 0.106 | 0.849 | 0.617 | |
EEMD-MT | May | 0.805 | 0.623 | 0.053 | 0.953 | 0.753 |
June | 0.523 | 1.381 | 0.073 | 0.832 | 0.622 | |
July | 0.620 | 0.848 | 0.035 | 0.892 | 0.662 | |
August | 0.652 | 0.729 | 0.029 | 0.896 | 0.678 | |
September | 0.601 | 0.666 | 0.038 | 0.903 | 0.692 | |
October | 0.866 | 0.440 | 0.037 | 0.969 | 0.783 |
Error indices in term of RMSE and PI for seasonal forecasting; (a), (b) related to Siirt station and (c), (d) related to Diyarbakir station.
Error indices in term of RMSE and PI for seasonal forecasting; (a), (b) related to Siirt station and (c), (d) related to Diyarbakir station.
CONCLUSION
Predicting a hydro-meteorological time series is often complicated and frustrating. The series may appear completely random; a volatile sequence showing no signs of predictability. Time series are full of patterns and relationships. Decomposition aims to identify and separate them into distinct components, each with specific properties and behavior. In this study, the EEMD procedure was applied to tackle the trends and random behavior of time series data which may impact on accuracy of PE forecasting. This research attempted to improve SVM and MT models in the prediction of pan evaporation by coupling with the EEMD procedure. The PE time series from two climatic stations in Turkey, Siirt and Diyarbakir, were utilized for validation of the proposed coupled models. Firstly, the SVM and MT models alone were applied to forecast monthly PE. In order to enhance the forecasting accuracy of the typical models, coupled EEMD-SVM and EEMD-MT models were then used. For the coupled models, the original time series data were decomposed into eight IMFs and one residual for PE modeling process. Input variables included TA, RH, WS, and SR. The performance of the proposed models was assessed in terms of NSE, RMSE, PI, WI, and LMI in the training and testing phases. At Siirt station, the EEMD-MT model provided more accurate results in terms of NSE (0.89), WI (0.97), and LMI (0.70) than the SVM (NSE = 0.46, WI = 0.81, and LMI = 0.34), MT (NSE = 0.65, WI = 0.89, and LMI = 0.49), and EEMD-SVM (NSE = 0.80, WI = 0.93, and LMI = 0.60) in the testing stage. For the Diyarbakir station, the highest PE forecasting accuracy was achieved with the EEMD-MT model (NSE = 0.92, WI = 0.98, and LMI = 0.80) compared to others. The results showed that the SVM and MT models coupling EEMD generally provide better accuracy and are superior to the SVM and MT models alone in forecasting monthly PE at both Siirt and Diyarbakir stations. The proposed models can be applied to different climate conditions and EEMD may be compared with other pre-processing methods such as CEEMDAN and improved CEEMDAN in future studies.