Autoregressive time series forecasting is common in different areas within water resources, which include hydrology, ecology, and the environment. Simple forecasting models such as linear regression have the advantage of fast runtime, which is attractive for real-time forecasting. However, their forecasting performance might not be acceptable when a non-linear relationship exists between model inputs and output, which necessitates the use of more sophisticated forecasting models such as artificial neural networks. This study investigates the performance and potential of a hybrid pre-processing technique to enhance the forecasting accuracy of two commonly used neural network models (feed-forward and layered recurrent neural network models) and a multiple linear regression model. The hybrid technique is a combination of significant input variable selection (using partial linear correlation) for reducing the dimensionality of the input data and input data transformation using discrete wavelet transform for decomposing the input time series into low and high frequency components. Two case study forecasting applications, namely, monthly inflow forecasting for a lake in Victoria (Australia) and weekly algal bloom predictions at a bay in Hong Kong were used to assess the forecasting ability of the models when used in conjunction with the hybrid technique. Results demonstrated that the hybrid technique can significantly improve the forecasting performance of all the models used.
INTRODUCTION
Autoregressive time series forecasting is common in many different disciplines such as the financial sector and in water resources. The important feature of the autoregressive time series forecasting is the use of a model to forecast future values of a time series variable of interest in a process or a system through the use of its known past values. The goal of autoregressive time series forecasting is to provide accurate forecasts of the variable of interest, such as the foreign exchange rates (Vojinovic et al. 2001), streamflow (Wei et al. 2012), salinity (Maier et al. 2010), and chlorophyll (Cheng & Wei 2009). Such forecast information can be utilized in future operational planning of the system at hand.
The general difficulty associated with autoregressive time series forecasting is the non-linear and non-stationary characteristics which are often encountered in most water-related time series data (Coulibaly et al. 2001). The non-linear characteristic of the time series data shows the dynamic relationship between past values (i.e., model inputs) and future values (i.e., model outputs) in a way that a small change in model inputs will affect model outputs significantly. Exponentially increasing the magnitude of model outputs due to a small change in model inputs is one example of non-linear behavior, while the effect is ignorable in the linear behavior (Makridakis et al. 2003). The non-linear characteristic is commonly modeled as a non-linear combination of model inputs to predict the model output.
The non-stationary characteristic of the time series data is realistically considered as not having constant mean and/or constant variation within the time series (Tsay 2005). This characteristic can be observed through the seasonality and/or the trend in the time series data. However, the key consideration is not the time variation of data but whether the underlying process that generates the data is itself evolving (Cannas et al. 2006). This consideration is commonly taken into account by the use of a ‘random walk’ model, which assumes that the future value depends on the previous value plus a random noise in order to capture the non-stationary characteristic (Tsay 2005).
Many forecasting models have been developed in the past. The multiple linear regression (MLR) model is among the simplest, which basically performs forecasts by fitting a linear relationship between inputs and outputs. Although the MLR model might be very attractive for real-time forecasting due to its simplicity, the MLR model is considered unsuitable for non-linear data, which necessitate the use of more sophisticated models (Elshorbagy et al. 2010; Jung et al. 2010). To overcome this limitation of linear models, various data-driven modeling techniques like artificial neural networks (ANN), support vector machines (SVM), and genetic programming (GP) have been proposed (Minns & Hall 1996; Babovic & Keijzer 2002; Liong et al. 2002; Yu et al. 2004). Among them, ANN models have been widely used for autoregressive time series forecasting over the last 15 years (Islam 2010; Maier et al. 2010; Arunkumar & Jothiprakash 2013). Several distinguishing features of ANN models make them valuable and attractive for forecasting tasks (Samarasinghe 2006; Maier et al. 2010). First, there are few a priori assumptions about the ANN models, and they learn from examples and capture the functional relationship in the data even if the underlying relationships are too complex to specify. Second, ANN models can generalize after learning from the sample data presented to them. Third, ANN models are universal functional approximators for any continuous function to the desired accuracy.
There are many different types of ANN models, and among them, feed-forward neural networks (FFNN) and layered recurrent neural networks (LRNN) have gained attention in the literature for autoregressive time series forecasting (Güldal & Tongal 2010; Wei et al. 2012). FFNN are static and are the most commonly used architecture due to their simple framework (Güldal & Tongal 2010). The LRNN provide a representation of dynamic internal feedback loops to store information for later use and to enhance the efficiency of learning. The LRNN is more suited for mapping problems that involve auto-correlated samples (Parlos et al. 2000). However, the forecasting accuracy of ANN models has not always been high. For example, in a recent study by Wang et al. (2009) involving ANN models for forecasting monthly streamflow time series at two different case study locations, one demonstrated a high Nash–Sutcliffe coefficient (E) value of 0.87 while the other had an E value of 0.61 between observed and forecasted streamflow values for the validation dataset. Furthermore, there is limited information on comparative performances between feed-forward and RNN on the autoregressive time series forecasting.
Recently, input selection techniques have been used for reducing the number of model inputs, and can improve forecasting performance in some cases (May et al. 2008; Tran et al. 2015). Input selection techniques can be basically divided into two approaches, namely, model-free and model-based approaches (Fernando et al. 2009). Model-free approaches use statistical measures of dependence to determine the strength of the relationship between candidate model inputs and the model output. The partial linear correlation (PLC) technique is the simplest model-free technique, which can select inputs that are linearly correlated with outputs. The PLC technique has been used in several studies (Tiwari et al. 2013; Valipour et al. 2013; Liu et al. 2014). The partial mutual information (PMI) technique (Fernando et al. 2009) is another model-free technique which can select inputs that are non-linearly related with outputs. On the other hand, model-based approaches use the performance of calibrated models with different inputs as the basis for choosing the most appropriate inputs. ANN and GP based techniques are examples of the model-based approach. Muttil & Chau (2007) compared two model-based approaches (ANN and GP) for selecting ecologically significant inputs for algal bloom prediction. They showed that both these methods produced similar results. Tran et al. (2015) conducted a comparative study between PLC, PMI, and GP techniques on four hypothetical and two real datasets. They found that there are differences in significant inputs selected by these techniques. However, in terms of forecasting performance, inputs selected by the PLC technique provided similar forecasting performance with inputs selected by the PMI and GP techniques when tested using ANN models.
Another approach that has also been recently used to improve the forecasting performance of ANN models is input preprocessing using wavelet transformation (Adamowski & Sun 2010; Wei et al. 2012; Tiwari et al. 2013). The wavelet transformation is used to decompose an input variable into a set of linear components with high and low frequency. Such decomposition can be considered to separate the non-stationary and stationary parts, which becomes particularly useful for autoregressive time series forecasting. However, in their study, the effect of different wavelet parameters including wavelet functions and decomposition levels have not been investigated.
In this study, the overall objective was to improve the performance of autoregressive time series forecasting by using a hybrid combination of significant input selection and input preprocessing using wavelet transformation. Several studies have used the hybridization of various techniques to enhance the forecasting accuracy (Kisk 2010; Cao et al. 2013; Sun et al. 2014). However, as far as the authors are aware, the hybridization of the PLC technique for input selection (used in this study because of its simplicity) with wavelet decomposition to improve the forecasting performance of MLR and ANN models has not been undertaken in the past. The major contributions of this paper are as follows:
(i) A hybrid technique that combines input selection using the PLC technique with wavelet decomposition for MLR and ANN models is investigated for enhancing autoregressive time series forecasting.
(ii) Wavelet transformation has been demonstrated to be useful for improving time series forecasting. This study analyzes the sensitivity of different decomposition levels and different wavelet functions on the forecasting performance.
(iii) The improvement in performance of two popularly used ANN models (FFNN and LRNN) when used in conjunction with input selection and wavelet transformation is analyzed.
The remainder of this paper is structured as follows. Brief descriptions of feed-forward and layered recurrent ANN models, and then PLC and wavelet decomposition techniques are presented in the next section. A description of the performance indicators used in this study to evaluate the performance of the models is also included in the next section. This is followed by a description of the two case study datasets. Results and discussion are then presented, which is followed by the conclusions drawn from this study.
METHODS AND TECHNIQUES
ANN models
The RNN, on the other hand, has some feedback loops from the output layer or hidden layer to the input layer, as shown in Figure 2(b). The RNN allows signals to propagate in both forward and backward directions, which offers the network dynamic memories (Samarasinghe 2006). In other words, RNN has the capacity associated with the feedback loops to delay and store information from the previous time step that is not present in the architecture of feed-forward networks. The presence of these feedback loops has a profound impact on the learning capability and performance of the neural network, which allows the network to capture the true hidden dynamic memories or autoregressive components of nonlinear time series systems (Carcano et al. 2008). The RNN has been proved to be a powerful method for handling complex systems such as nonlinear time-varying systems. A LRNN, which is a form of RNN, is also investigated and compared with the FFNN in this study.
PLC
DWD
Preprocessing of input data has been recently used to enhance the performance of ANN models (Wu et al. 2009). The goal of data preprocessing is to identify important characteristics such as trend, oscillation, and noise in order to attain better predictability, because the time series data have time varying signals with static and dynamic components. Among the available data preprocessing techniques, wavelet decomposition has demonstrated good performance in several forecasting studies (Wu et al. 2009; Kisl & Cimen 2011; Wei et al. 2012). The underlying principle of this data preprocessing technique is to decompose the time series data of interest into several time series components, which are then used as model inputs to forecast the future value of the time series data.
Performance indicators
CASE STUDIES
RESULTS
Input preprocessing
Three types of model inputs (as summarized in Table 1) were used in this study for the two case studies to investigate the effectiveness of input selection and wavelet transformation on the autoregressive forecasting. The first type of input is called ‘Original inputs' and these are created by selecting all probable inputs for forecasting. These Original inputs are not subjected to either of the 2 input preprocessing techniques. As shown in Table 1, for Data #1, a set of 15 time lagged inputs (i.e., Yt−1 to Yt−15) is used to forecast the current value, Yt. For Data #2, nine variables are used as input, which include the eight variables presented earlier along with chlorophyll itself. For each of these nine variables, a time lag of 7–13 days is used (giving a total of 63 inputs) for the 7-day ahead forecasting of chlorophyll concentrations.
Type of model inputs . | Description . | Data #1 . | Data #2 . |
---|---|---|---|
Original inputs | All probable inputs to forecast the current value Yt | Yt−1, Yt−2,…, Yt−15 (15 inputs) | Yt−7, Yt−8,…, Yt−13 X(1)t−7, X(1)t−8,…, X(1)t−13;X(2)t−7, X(2)t−8,…, X(2)t−13…;X(8)t−7, X(8)t−8,…, X(8)t−13 (63 inputs) |
PLC-selected inputs | Inputs selected by PLC technique from the Original inputs | Yt−1, Yt−11, Yt−13, Yt−15 (4 inputs) | Yt−7, Yt−8, Yt−13 (3 inputs) |
PLC-Wavelet inputs | Inputs created by which wavelet transformation is applied on PLC inputs | D1t−1,D1t−11, D1t−13, D1t−15; D2t−1,D2t−11, D2t−13, D2t−15; D3t−1,D3t−11, D3t−13, D3t−15; A3t−1,A3t−11, A3t−13, A3t−15; (16 wavelet inputs) | D1t−7,D1t−8, D1t−13; D2t−7,D2t−8, D2t−13; D3t−7,D3t−8, D3t−13; A3t−7,A3t−8, A3t−13; (12 wavelet inputs) |
Type of model inputs . | Description . | Data #1 . | Data #2 . |
---|---|---|---|
Original inputs | All probable inputs to forecast the current value Yt | Yt−1, Yt−2,…, Yt−15 (15 inputs) | Yt−7, Yt−8,…, Yt−13 X(1)t−7, X(1)t−8,…, X(1)t−13;X(2)t−7, X(2)t−8,…, X(2)t−13…;X(8)t−7, X(8)t−8,…, X(8)t−13 (63 inputs) |
PLC-selected inputs | Inputs selected by PLC technique from the Original inputs | Yt−1, Yt−11, Yt−13, Yt−15 (4 inputs) | Yt−7, Yt−8, Yt−13 (3 inputs) |
PLC-Wavelet inputs | Inputs created by which wavelet transformation is applied on PLC inputs | D1t−1,D1t−11, D1t−13, D1t−15; D2t−1,D2t−11, D2t−13, D2t−15; D3t−1,D3t−11, D3t−13, D3t−15; A3t−1,A3t−11, A3t−13, A3t−15; (16 wavelet inputs) | D1t−7,D1t−8, D1t−13; D2t−7,D2t−8, D2t−13; D3t−7,D3t−8, D3t−13; A3t−7,A3t−8, A3t−13; (12 wavelet inputs) |
The second type of input is called PLC-selected inputs, which are identified by applying the PLC technique on the Original inputs. As can be seen from Table 1, the PLC technique selected 4 inputs out of 15 probable inputs for Data #1 while only 3 inputs are selected out of 63 inputs for Data #2.
The third type of inputs are called the PLC-Wavelet inputs, which are created by applying the DWD technique on the PLC inputs. For each input variable (e.g., Yt−1), four components are created (e.g., D1t−1, D2t−1,…,D3t−1, A3t−1) for three decomposition levels as per Equation (8). As a result, for Data #1, a total of 16 wavelet inputs are created while a total of 12 inputs are made for Data #2, as shown in Table 1. In this study, the PLC technique was implemented using the SPSS software package and the DWD technique was implemented using the Wavelet Toolbox of the Matlab software package.
Forecasting performance
The FFNN and LRNN were implemented using the Neural Network Toolbox of Matlab. Both FFNN and LRNN models used only one hidden layer for ease of comparison. The MLR models were also implemented using Matlab.
All datasets were divided into training (60%), validating (20%), and testing (20%) datasets. The testing dataset was the last 20% of the time series data. The training and validation datasets are randomly selected to have a better chance of representative learning data for FFNN and LRNN models. As indicated in the flow chart of the methodology (presented in Figure 1), the original inputs (made up of the training dataset) are used as input for the PLC to select the significant inputs, called the PLC-selected inputs. The PLC-selected inputs are then decomposed by the DWD to generate the PLC-Wavelet inputs. For the purpose of comparison, the original inputs (containing all probable inputs), the PLC-selected inputs and the PLC-Wavelet inputs are used as inputs for all three forecasting models. The training and validation datasets were used in the training process in FFNN and LRNN models in which the training dataset was used to provide ‘learning knowledge’, while the validation dataset was used to avoid over-fitting. The MLR model used the initial 80% of the dataset for calibrating without splitting into a validation dataset. The testing dataset was used only for assessing the forecasting performance of the MLR, FFNN, and LRNN models. Both two datasets were scaled between [−1, 1] and were then used for training and testing.
The training process for FFNN and LRNN models was carried out using the Levenberg–Marquardt algorithm (Samarasinghe 2006). The least square method (Tsay 2005) was used for the calibration of the MLR model. The single objective of minimizing MSE was used in the training process of all models. The remaining two performance indicators (i.e., E and MAPE) were then computed using the training (i.e., combination of training and validation) and testing outcomes. A suitable number of hidden neurons for FFNN and LRNN models was selected using trial and error. The results are presented in Table 2 only for the testing performances.
. | . | Data #1 . | Data #2 . | ||||||
---|---|---|---|---|---|---|---|---|---|
Models . | Inputs . | hn . | MSE . | MAPE . | E . | hn . | MSE . | MAPE . | E . |
MLR | Original inputs | – | 0.052 | 119.3 | 0.567 | – | 0.030 | 26.2 | 0.769 |
PLC-selected inputs | – | 0.047 | 106.7 | 0.570 | – | 0.024 | 25.5 | 0.770 | |
PLC-Wavelet inputs | – | 0.026 | 85.1 | 0.702 | – | 0.011 | 16.7 | 0.914 | |
FFNN | Original inputs | 12 | 0.034 | 86.3 | 0.589 | 2 | 0.020 | 22.3 | 0.873 |
PLC-selected inputs | 10 | 0.030 | 79.0 | 0.632 | 6 | 0.018 | 20.8 | 0.884 | |
PLC-Wavelet inputs | 16 | 0.022 | 77.8 | 0.732 | 6 | 0.009 | 16.8 | 0.944 | |
LRNN | Original inputs | 12 | 0.035 | 104.2 | 0.580 | 2 | 0.021 | 24.1 | 0.867 |
PLC-selected inputs | 12 | 0.033 | 65.1 | 0.604 | 2 | 0.018 | 21.9 | 0.882 | |
PLC-Wavelet inputs | 16 | 0.018 | 71.6 | 0.787 | 6 | 0.009 | 16.7 | 0.940 |
. | . | Data #1 . | Data #2 . | ||||||
---|---|---|---|---|---|---|---|---|---|
Models . | Inputs . | hn . | MSE . | MAPE . | E . | hn . | MSE . | MAPE . | E . |
MLR | Original inputs | – | 0.052 | 119.3 | 0.567 | – | 0.030 | 26.2 | 0.769 |
PLC-selected inputs | – | 0.047 | 106.7 | 0.570 | – | 0.024 | 25.5 | 0.770 | |
PLC-Wavelet inputs | – | 0.026 | 85.1 | 0.702 | – | 0.011 | 16.7 | 0.914 | |
FFNN | Original inputs | 12 | 0.034 | 86.3 | 0.589 | 2 | 0.020 | 22.3 | 0.873 |
PLC-selected inputs | 10 | 0.030 | 79.0 | 0.632 | 6 | 0.018 | 20.8 | 0.884 | |
PLC-Wavelet inputs | 16 | 0.022 | 77.8 | 0.732 | 6 | 0.009 | 16.8 | 0.944 | |
LRNN | Original inputs | 12 | 0.035 | 104.2 | 0.580 | 2 | 0.021 | 24.1 | 0.867 |
PLC-selected inputs | 12 | 0.033 | 65.1 | 0.604 | 2 | 0.018 | 21.9 | 0.882 | |
PLC-Wavelet inputs | 16 | 0.018 | 71.6 | 0.787 | 6 | 0.009 | 16.7 | 0.940 |
hn, number of hidden neurons.
With regards to the effect of input selection, the forecasting performances of MLR, FFNN, and LRNN models showed slight improvement when the PLC-selected inputs were used. As can be seen from Table 2, the use of PLC-selected inputs slightly improved the performance of ANN models in all cases. Although the use of input selection provided a marginal improvement in forecasting performance, a secondary benefit is the reduction in computing time when only significant inputs are used.
As far as the comparison of performances between FFNN and LRNN models is concerned, the LRNN model produced slightly better forecast performance than the FFNN model for Data #1 while both FFNN and LRNN produced similar forecasting performance for Data #2, which is demonstrated through three performance indicators presented in Table 2. On the other hand, both FFNN and LRNN outperformed the MLR for all three types of inputs. However, it is interesting to note that the performance gap between MLR and ANN models substantially reduced when PLC-Wavelet inputs were used.
Sensitivity of wavelet parameters
A sensitivity analysis was conducted to understand the effect of different levels of decomposition and the different wavelet functions used for producing the PLC-Wavelet inputs. In this study, three different decomposition levels are investigated and then four different wavelet functions are studied on Data #1 (i.e., monthly inflow values to Lake Eildon) using the FFNN model.
As mentioned earlier, for decomposition level 3 of an input variable, there are four probable component inputs (i.e., D1, D2, D3, and A3), which were used as model inputs in this study (as shown in Table 1). The probable inputs for decomposition levels 1 and 2 of an input variable are A1 and D1 (for level 1) and D1, D2, and A2 (for level 2). For Data #1, the PLC technique selected four inputs and thus there will be a total of 8, 12, and 16 model inputs for the three different levels of decomposition, respectively. The sensitivity analyses for the three levels of decomposition for Data #1 are shown in Table 3. As can be seen from this table, level 2 of decomposition produced the best forecasting performance based on the MSE and E values, whereas level 3 is slightly better than level 2 decomposition based on the MAPE value.
. | . | Performance indicators for data #1 . | ||
---|---|---|---|---|
PLC-Wavelet inputs . | hn . | MSE . | MAPE . | E . |
Level 1 (8 inputs) | 12 | 0.021 | 81.4 | 0.743 |
Level 2 (12 inputs) | 20 | 0.018 | 78.9 | 0.786 |
Level 3 (16 inputs) | 16 | 0.022 | 77.8 | 0.732 |
. | . | Performance indicators for data #1 . | ||
---|---|---|---|---|
PLC-Wavelet inputs . | hn . | MSE . | MAPE . | E . |
Level 1 (8 inputs) | 12 | 0.021 | 81.4 | 0.743 |
Level 2 (12 inputs) | 20 | 0.018 | 78.9 | 0.786 |
Level 3 (16 inputs) | 16 | 0.022 | 77.8 | 0.732 |
hn, number of hidden neurons.
Among the discrete wavelet functions available within the Matlab toolbox, the four commonly used wavelet functions, namely, the Haar wavelet, the Daubechies wavelet of order 3, Biorthogonal wavelet, and the Symlets wavelet of order 4 are used for the sensitivity analysis in this study. The FFNN model and level 2 of wavelet decomposition is used to compare the forecasting performance of the four different wavelet functions. The results presented in Table 4 show that the Daubechies wavelet of order 3 produced the best forecasting performance based on the MSE and E values, whereas the Symlets wavelet of order 4 produced the best forecasting performance based on the MAPE value.
. | . | Performance indicators for Data #1 . | ||
---|---|---|---|---|
Wavelet function . | hn . | MSE . | MAPE . | E . |
Haar wavelet | 20 | 0.018 | 78.9 | 0.786 |
Daubechies wavelet of order 3 | 4 | 0.016 | 62.4 | 0.802 |
Biorthogonal wavelet | 16 | 0.018 | 51.1 | 0.782 |
Symlets wavelet of order 4 | 12 | 0.020 | 38.4 | 0.755 |
. | . | Performance indicators for Data #1 . | ||
---|---|---|---|---|
Wavelet function . | hn . | MSE . | MAPE . | E . |
Haar wavelet | 20 | 0.018 | 78.9 | 0.786 |
Daubechies wavelet of order 3 | 4 | 0.016 | 62.4 | 0.802 |
Biorthogonal wavelet | 16 | 0.018 | 51.1 | 0.782 |
Symlets wavelet of order 4 | 12 | 0.020 | 38.4 | 0.755 |
hn, number of hidden neurons.
DISCUSSION
As expected, the MLR model is not suitable for non-linear datasets, as demonstrated by its low performance on both the case study datasets. However, when the hybrid technique is used to preprocess the inputs, the MLR model has a significantly improved forecasting performance that makes it still useful for on-line forecasting. Therefore, sophisticated models like ANN models should be used to improve forecasting performance only when simple models like MLR produce poor performances. This observation further clarifies one of the common beliefs of the M-competitions presented in Crone et al. (2011) that ‘sophisticated methods are not better than simpler methods'.
The LRNN model did not significantly outperform the FFNN model, while it has a high computational time compared to that required by the FFNN model. This is understandable since the LRNN model has feedback loops, which require additional computing resources.
It is observed that the use of PLC for input selection leads to a marginal improvement in the forecasting accuracy of both datasets and for all three models. On the other hand, the use of the DWD technique along with PLC inputs significantly improves the forecasting performance of all three models for both datasets. However, it still involves trial and error to identify the optimal DWD parameters. The sensitivity analysis of the different DWD parameters indicated that there are differences in forecasting performance for different levels of decomposition and with different wavelet functions. The level 2 decomposition and the Symlets wavelet might be the initial choice for these two DWD parameters.
The DWD technique has the ability to decompose the time series signal into low and high frequency. Low frequency is often viewed as representative for mean values and high frequency is for variation (Kisk 2010). Such decomposition is similar to de-trending (or separating of the non-stationary signal), which makes it easier for ANN models or even MLR models to simulate time series data with improved forecasting performances. On the other hand, conventional de-trending (e.g., differencing) is often carried out for traditional statistical models (e.g., ARMA models), which require non-stationary signals to accurately forecast time series data. However, a negative effect of conventional de-trending is the loss of variation and peak values, which are vital for hydroinformatics applications such as streamflow and rainfall forecasting. Therefore, conventional de-trending is often not used in forecasting studies using data-driven models (e.g., Wang et al. 2009; Kisk 2010).
Although the hybrid technique used in this study can improve the forecasting performance of both datasets, the forecasting accuracy is still not as high as expected. This indicates that the underlying mechanism of time series data is not yet fully captured and further development in the understanding of the underlying mechanism and mathematical modeling is necessary.
CONCLUSION
This study investigated a hybrid technique using PLC (for input selection) and DWD (for separating the non-stationary part) for improving the forecasting performance of regression-based models such as ANN models and MLR models. Based on the forecasting results of two real-world case study datasets, this study concluded that the hybrid technique can significantly enhance the forecasting performance of both the simple MLR model and ANN models. Furthermore, the LRNN model produced similar forecasting performance to that of the FFNN model. As far as input selection is concerned, the use of the PLC technique can significantly reduce the number of model inputs while producing a minor impact on the forecasting performance of ANN models. The input transformation using the DWD technique leads to significant improvement in the performance of the ANN models as well as the MLR model. This was demonstrated not only through improved performance indicators, but also through better capturing of peak values. Thus, this study provided useful understanding regarding the improvement in performance of ANN models when used in conjunction with significant input selection and wavelet-based data transformation, which could be useful for researchers and practitioners involved in autoregressive time series forecasting of real-world applications.