In this paper, an advanced stream flow forecasting model is developed by applying data-preprocessing techniques on adaptive neuro-fuzzy inference system (ANFIS). Wavelet multi-resolution analysis is coupled with an ANFIS model to develop a hybrid wavelet neuro-fuzzy (WNF) model. Different models with different input selection and structures are developed for daily, weekly and monthly stream flow forecasting in Railway Parade station on Ellen Brook River, Western Australia. The stream flow time series is decomposed into multi-frequency time series by discrete wavelet transform using the Haar, Coiflet and Daubechies mother wavelets. The wavelet coefficients are then imposed as input data to the neuro-fuzzy model. Models are developed based on Takagi-Sugeno-Kang fuzzy inference system with the grid partitioning approach for initializing the fuzzy rule-based structure. Mean-square error and Nash-Sutcliffe coefficient are chosen as the performance criteria. The results of the application show that the right selection of the inputs with high autocorrelation function improves the accuracy of forecasting. Comparing the performance of the hybrid WNF models with those of the original ANFIS models indicates that the hybrid WNF models produce significantly better results especially in longer-term forecasting.
Modeling and predicting the future conditions of surface water resources is an essential part of sustainable water resources management and natural disaster mitigation. Reliable stream flow forecasting has an essential role in flood mitigation, reservoir operation optimization, irrigation scheduling, water supply and catchment management. Forecasting hydrological responses always contains uncertainty. Several studies have been carried out to improve the reliability and accuracy of hydrological forecasting (Sivakumar & Berndtsson 2010). There are different types of forecasting methods such as physics based, conceptual and data driven. Physics-based methods have the potential advantage of providing a physical insight into the hydrological system, but require a large amount of data and parameters. Conceptual models on the other hand, are relatively simpler to formulate and are usually (but not necessarily) based on the linear assumption. Recently, data-driven approaches have become more popular due to minimum data requirements and expandability. Data-driven models are able to simulate nonlinear and non-stationary characteristics of hydrological processes with minimum observations data.
Application of different data-driven methods have been studied in hydrology, but still the accuracy of stream flow forecasting needs to be enhanced. Among all data-driven methods, computational intelligence (CI) approaches are more capable of simulating the non-linear relationships of hydrological processes. Adaptive neuro-fuzzy inference system (ANFIS) is one of the CI techniques based on fuzzy logic. The concept of fuzzy logic was originally proposed by Zadeh (1965). In the last decade fuzzy logic modelling has been applied to various fields of engineering, including some research in hydrology (Sen & Altunkaynak 2006; Firat et al. 2009; Jayawardena et al. 2014). ANFIS was first introduced by Jang (1993) and has the advantages of both neural networks and fuzzy reasoning techniques. Application of ANFIS has been investigated in several hydrological researches (Vernieuwe 2005; Aqil et al. 2007; Firat et al. 2009) including river flow forecasting (Nayak et al. 2004; Keskin et al. 2006; Talei et al. 2010; Sanikhani & Kisi 2012). The outcomes of these studies indicate that ANFIS is a promising approach for reaching accurate and quick forecasts. However, due to the restricted structure of fuzzy inference systems (FISs), ANFIS has some limitations. This approach is not suitable for training oversize input–output systems and also may not be reliable for extreme condition forecasting (Badrzadeh et al. 2013).
River flow time series is very complex and contains a wide range of frequency components. One of the recent developments for improving the accuracy of forecasting is applying wavelet multi-resolution analysis on the stream flow time series. In the last decade, some researchers developed hybrid models by combining wavelet and a forecasting model. The most popular hybrid wavelet model for stream flow forecasting is wavelet neural networks method (Kim & Valdés 2003; Cannas et al. 2006; Adamowski & Sun 2010; Badrzadeh et al. 2014). The application of combining the wavelet analysis and neuro-fuzzy technique for hydrological forecasting has been investigated in very few studies. Partal & Kisi (2007) introduced a conjunction model of wavelet-neuro-fuzzy using db2 wavelet function, to forecast the daily precipitation of three stations in Turkey. Their proposed hybrid wavelet fuzzy model involve some degree of uncertainty. Nourani et al. (2011) preprocessed ANFIS rainfall-runoff models with discrete wavelet transform (DWT). They applied this model on both daily and monthly time scale and concluded that combining DWT with ANFIS model in runoff forecasting leads to promising result, especially for monthly forecasting. Ren et al. (2013) also established an ANFIS model based on wavelet analysis for monthly runoff forecasting. Comparing the observed and predicted values, they concluded results needed further improvements.
A review of available literature indicates the research gap in selecting the suitable wavelet-neuro fuzzy structure. Various mother wavelets are available with different members. The most popular mother wavelet is Daubechies which is fast and memory efficient. The order of Daubechies wavelet (db) ranges from one to twenty (number of vanishing moments). Most of the previous studies on application of wavelet analysis on hydrological forecasting models have used only the db2 mother wavelet. Moreover, very few studies have been carried out to investigate the application of hybrid models on stream flow forecasting with lead times of more than one day but less than one month. In addition, the application of a wavelet neuro-fuzzy (WNF) model for stream flow forecasting needs to be investigated in different areas with different characteristics.
Taking these considerations into account, this study aims at improving short, medium and long term river flow forecasting by various classical and hybrid fuzzy-based approaches with different structures and input selections. Application of three different mother wavelets of db order five, Coiflets order one and Haar on different input vectors are investigated in this study.
Considering the seasonality of the Ellen Brook River and the growing water demand in the study area, this study provides more accurate tools to assist decision makers in sustainable water resources planning, flood protection, mitigation of contamination or licensing of exploitations.
Wavelet analysis and decomposition
FUZZY NEURAL NETWORKS
Fuzzy inference system
The most important modelling tool based on fuzzy set theory is FISs which maps the input data onto corresponding output data. It combines the membership functions (MFs), fuzzy logic operators and evaluates the rule outcome. The basic structure of FIS consists of three conceptual steps. In the first step, crisp data are converted into fuzzy sets using MFs, then MFs are connected with the fuzzy rules to drive the fuzzy output and finally associated crisp outputs are computed by a process of de-fuzzification. Based on the problem, different type of fuzzy MFs can be used. The most common types of MFs are triangular, trapezoidal, Gaussian, generalised bell-shaped and sigmoid functions. In this study generalised bell-shaped MF is used. There are two main types of FISs, Mamdani and Takagi-Sugeno-Kang (TSK) (Takagi & Sugeno 1985; Jang & Sun 1997). With the Mamdani FIS, de-fuzzification of the output is necessary, whereas with the TSK FIS (Sugeno & Kang 1988), there is no need for de-fuzzification since the output is expressed as a mathematical function of the inputs. TSK fuzzy model is more efficient in optimization and adaptive techniques and its output MFs could be either linear (first-order) or constant (zero-order) with respect to inputs. TSK type had the advantage of not requiring to go through de-fuzzification.
Adaptive neuro-fuzzy inference system
A neuro-fuzzy system integrates FISs and neural networks, and has the added advantage of both approaches. A neuro-fuzzy system uses the natural language description of fuzzy sets and the learning capability of neural networks. Compared to neural networks, neuro-fuzzy needs less time for training but its learning is highly constrained and typically more complex than neural networks. One of the most popular integrated systems is ANFIS which has shown significantly superior results in modeling nonlinear time series. ANFIS was first introduced by Jang (Jang 1993). In ANFIS a feed forward network finds fuzzy if-then rules for reaching optimal model. Functioning of the ANFIS is equivalent to the TSK first-order fuzzy model (Jang & Sun 1997). The ANFIS structure for the TSK first-order fuzzy model consists of five layers. Figure 2 presents the Sugeno fuzzy reasoning mechanism for two inputs (x and y) with two linguistic labels (A and B) and the common rule set for a first-order TSK fuzzy model. Figure 3 illustrates the ANFIS structure for this TSK fuzzy model when one output is considered.
The most substantial components of ANFIS are the rules which are defined by premise parameters and consequent parameters . The best values of the parameters for providing rules that would ideally model the target system, are reached by a learning algorithm. With given input-output data, ANFIS employs the back propagation gradient descent method and the least-squared error estimates as the hybrid learning algorithm to optimize these parameters.
In this study, the grid partitioning approach is applied for initializing the design of a FIS. Grid partitioning divides the data space into rectangular sub-spaces using axis-paralleled partition. This method generates rules for all possible combinations of MFs of all inputs. Therefore the number of rules is equal to where k is the number of inputs and m is the number of MFs. This approach is suitable for small number of inputs (Jang & Sun 1997). When we have a relatively large number of inputs, using this method encounters problems as the number of fuzzy rules increases exponentially with increasing number of input variables. Figure 4 shows a grid partitioning system of a model with two inputs and three MF.
Hybrid WNF model
The proposed wavelet neuro-fuzzy with grid partitioning (WNFG) model is an integrated one with the input pre-processed by the DWT approach in order to increase the forecasting accuracy. At first, the original time series are decomposed to their high and low frequency components (Approximation and Details). Data normalization is applied on the wavelet component as an important part of fuzzy modeling. Then the decomposed sub-series are used as inputs to the ANFIS. Each sub-series component plays a different role in the time series and it is important to keep all of them as ANFIS inputs (Wang & Ding 2003). The model output is the un-decomposed stream flow time series with one step ahead time lag for forecasting. Figure 5 illustrates the main framework of the hybrid model.
As mentioned before three different mother wavelets, Haar, Coiflet order one (Coif1) and Daubechies order five (db5) are used to decompose the stream flow time series in this study. There is no theory to determine the best level of decomposition for time series. Therefore, three (2-4-8), four (2-4-8-16) and five (2-4-8-16-32) levels of decomposition are employed to the data sets. Based on the number of input data (level of decomposition +1), the number of MFs is determined. It means in order to reach a feasible training, the number of MFs is decreased by increasing the number of input data sets. Root mean square error (RMSE) and Nash-Sutcliffe coefficient of efficiency (NSE) are considered as the main performance criteria in both ANFIS and WNFG models as they are very sensitive to the peak flows. With these criteria both correlation and error, between observed and modelled variable, are clearly measurable.
CASE STUDY AND DATA USED
In this study, the stream flow data of the Railway Parade station on Ellen Brook River are used to illustrate the proposed approach. The Ellen Brook catchment is located about 20 km North-East of Perth city, Western Australia (Figure 6). The Ellen Brook surface water catchment area is approximately 715 km2 and three local governments administer the catchment. The climate of the catchment is a warm temperate Mediterranean type. Climate change predictions for the Ellen Brook catchment include decreased rainfall and runoff, increased temperatures and evaporation and increased seasonal variation and storm intensity, which all will have a significant effect on water resources in this area (Wilke 2006).
For this study, mean daily stream flow discharge for 34 years, with an observation period from 1977 to 2010, are collected from the Department of Water. First 23 years of input data (around 70% of the whole data set) are used for training and the remaining 11 years (around 30% of the whole data set) are used for validation. It is checked that the extreme values are laid in the training set rather than validation set. The average daily stream flow of the Railway Parade station is 0.88 m3/s with a maximum flow of 41.28 m3/s in July 1987 and a minimum flow of zero as it is an intermittent river. For mid-term and long-term forecasting, average weekly and monthly river flow time series are also prepared. Statistical analyses of the Railway Parade station daily, weekly and monthly stream flow data set are given in Table 1, which contains the mean, minimum, maximum and standard deviation values.
For feasible modeling, the zero values in the time series are replaced with the small value of 0.001. Higher flow rates are placed in the training set rather than the verification set as models' performance are higher in a forecast within the data range being utilized during the training phase. Understandably, in allocating extreme values of historical time series in the training set, the accuracy of model for extreme flow forecasting has been improved. To improve the model efficiency, the input selection for ANFIS forecasting was chosen based on forward stepwise selection of inputs and considering the time series with a high auto correlation function (ACF) value (Figure 7). The input combination and the structure of the ANFIS model for daily flow forecasting is given in Table 2. This study tried as far as possible to select time series with high ACF. Adding too much input does not necessarily lead to better results due to the constraint structures of Fuzzy modelling. Therefore, the first four steps ahead of time series are selected as the optimum number for daily and weekly forecasting, to reach a feasible result and avoid overfitting.
As already discussed, three different mother wavelets with different levels of decomposition are chosen for developing the hybrid WNF model to compare their application. There is no theory to determine the best level of decomposition for hydrological time series. Some studies suggest int[log(n)] level of decomposition, where n is the length of the time series (Wang & Ding 2003). Taking into account the length of daily, weekly and monthly time series (12410, 1773 and 408), the suggested level of decomposition for each time series would be 4, 3 and 3, respectively. Considering the suggested level as an initial indicator, in this study this various level of decomposition up to maximum feasible level (due to restricted structure of fuzzy modelling) have been examined.
The input of the WNFG models would be the wavelet coefficients which are the wavelet decomposition outputs. In the first step, stream flow timeseries decompose with selected mother wavelet to the certain level of decomposition, then these sub-series will be used as the ANFIS model inputs (Figure 5). The number of input time series is N + 1 for N level of decomposition as explained in Equation (3), the wavelet coefficients are one approximation and N details. It is essential to keep all of wavelet components as the model's inputs as each of them has a very important role in forecasting, especially in peak flow forecasting.
After developing model frameworks and determining the input selection, they were applied to forecast Ellen Brook stream flow. The hybrid model results were compared with original ANFIS results for short-term, mid-term and long-term forecasting.
Obviously the lower value for root mean-square error and higher value for NSE (up to 1) indicate better performance. The NSE of the models are illustrated in Figure 8.
It can be observed that the efficiency of the ANFIS model dramatically decreases with increasing the length of forecasting. Considering the validation set as the most reliable set for evaluating the performance, ANFIS models almost failed in forecasting the weekly and monthly stream flow. However, by combining the DWT to the ANFIS models the performance of models significantly improved (Figure 8). Overall, the NSE of ANFIS model increased from 0.69 to 0.82 and for 0.61 to 0.81 for weekly and monthly forecasting, respectively. The results show that applying different time-lagged stream flow time series with high ACF as model inputs, improves accuracy of forecasting in the study area. The best fitted neuro-fuzzy model is the model with up to 4 step size of time series as the input of daily forecasting and up to 3 for weekly and monthly forecasting.
Results also indicate that the type of mother wavelet and the level of decomposition could have a significant impact on weekly and monthly model efficiency. Applying db5 DWT with 3 or 4 level of decomposition leads to the best forecasting of Ellen Brook stream flow. Whereas decomposing the monthly stream flow time series with Coiflet1 wavelet, leads to a very poor simulation. Therefore, Daubechies number 5 could be selected as the most efficient mother wavelet in hybrid neuro-fuzzy model for the study area.
Tables 4–6 show developed ANFIS and WNFG models structure and forecasting performance for daily, weekly and monthly data, respectively. It can be seen that due to restricted structure of fuzzy modelling, some models failed to simulate weekly and monthly river flow. Best fitted models with highest accuracy are also highlighted in the tables.
Figure 9 illustrates the wavelet coefficients of Ellen Brook daily time series, with db5 wavelet to 4 levels of decompositions which are the best-fitted daily (WNFG-D3) and weekly (WNFG-W3) model inputs.
Figure 10 shows the scatter plots of observed and forecasted stream flow with the best-fit ANFIS and WNFG models for different lead times. These scatter plots clearly illustrate the performance of different models. It can be seen that unlike the WNFG models, the accuracy of ANFIS models decreases in the weekly and monthly forecasting. This figure also demonstrates that in spite of relatively high correlation between observed and ANFIS modelled stream flow, these models frequently fail to simulate the extreme events.
Figure 11 also compares the hydrographs of observed monthly stream flow with the best-fit ANFIS and hybrid WNFG outputs. It can be seen that the hybrid models provides a better match with the observed time series.
For investigating the ability of models in forecasting the extreme values, the first twenty highest observed stream flow in the 34 years of observation is compared with their simulated values. Flows with values, approximately greater than 0.50 of maximum Q for each time series, are considered in this evaluation.
Table 7 shows the relative error between the observed daily, weekly and monthly stream flow and their best-fit simulated values, which illustrates that hybrid models have relatively fewer errors.
This study proposed a hybrid WNF method for improving stream flow simulation and forecasting. Application of multi-resolution analysis of the input data on the performance of neuro-fuzzy models for forecasting one step ahead of the daily, weekly and monthly stream flow has been investigated. Haar, Daubechies order five and Coiflet order one wavelets were applied on Ellen Brook stream flow time series to decompose the time series in different levels of resolution. Different wavelet coefficients were imposed to ANFIS models as their inputs. The overall results show that pre-processing the raw data with wavelets has significantly improved the accuracy of forecasting especially for the peak values.
Although using the right selection of the different time series with different time-lag and high ACF improves the ANFIS model efficiency, the improvement is considerably less than pre-processing the data with DWT.
Usually computational approaches fail to simulate sudden extreme conditions as they are using current and few previous data as their inputs. Considering the transient nature of hydrological signals, applying DWT on input data and extracting different frequencies from historical data, helps to predict extreme values more accurately. This matter is well observed in this research, where WNFG forecasted time series are highly matched with the observed time series at extreme values. Since ANFIS forecasted time series fail to simulate the peak conditions most of the time, it is not recommended for flood and drought studies.
Furthermore the results verified that altering mother wavelet or the level of decomposition could have a considerable impact on the model performance. Applying Daubechies order five DWT leads to the best forecasting of Ellen Brook stream flow. It can be concluded that for improving the forecasting of a stream flow with highly noisy time series, or an intermittent trend, decomposing the time series with db5 before simulating with ANFIS model would lead to a highly accurate short- and long-term, and peak flow forecasts. Whereas decomposing the time series with Coiflet1 wavelet might not improve the forecasting accuracy as much. Although, these results are based on the unique characteristics of Ellen Brook stream flow time series, considering the similar characteristics of Western Australia rivers, with high seasonal trend, the same method would eventuate the best prediction result in the region. The outcome of this study will be useful for hydrologists, hydrological designers and decision makers in forecasting stream flows and developing sustainable water distribution plan.