## Abstract

Accurate forecasting of daily streamflow is essential for water resource planning and management. As a typical non-stationary time series, it is difficult to avoid the effects of noise in the hydrological data. In this study, the wavelet threshold de-noising method was applied to pre-process daily flow data from a small forested basin. The key factors influencing the de-noising results, such as the mother wavelet type, decomposition level, and threshold functions, were examined and determined according to the signal to noise ratio and mean square error. Then, three mathematical techniques, including an optimized back-propagation neural network (BPNN), optimized support vector regression (SVR), and adaptive neuro-fuzzy inference system (ANFIS), were used to predict the daily streamflow based on raw data and wavelet de-noising data. The performance of the three models indicated that a wavelet de-noised time series could improve the forecasting accuracy. The SVR showed a better overall performance than BPNN and ANFIS during both the training and validating periods. However, the estimation of low flow and peak flow indicated that ANFIS performed best in the prediction of low flow and that SVR was slightly superior to the others for forecasting peak flow.

## INTRODUCTION

Accurate and immediate streamflow forecasting in rivers is essential for disaster (e.g., flood and drought) prediction and water resource management (Suliman *et al.* 2015). Various conceptual and data-based forecasting models have been developed in recent decades and applied around the world (Noori & Kalin 2016). The conceptual models focus on the study of rainfall–runoff processes and underlying physical laws, which involve multiple factors, such as climatic conditions, land use, and evapotranspiration, and have been subjected to the requirements of excessive data (Karran *et al.* 2014; Samadi 2017). Artificial intelligence (AI) techniques have been popular data-based methods in recent years and have been successfully developed to model nonlinear and non-stationary hydrological time series (Anvari *et al.* 2014; Singh *et al.* 2015; Bui *et al.* 2016). These methods have the ability to extract the relationship between inputs and outputs without the need for detailed physical process understanding. Among AI methods, artificial neural networks (ANN), support vector machine (SVM), and adaptive neuro-fuzzy inference systems (ANFIS) have been proven to be effective tools for modeling complex hydrologic systems (Babovic *et al.* 2001; Vojinovic *et al.* 2003; Taormina & Chau 2015).

The abilities of these machine learning methods to predict non-stationary time series are always limited due to the direct utilization of real data, which cannot avoid the effects of noise (Lotric & Dobnikar 2005). To overcome this limitation, noise reduction has become an important issue and is treated as a pre-processing method (Lang *et al.* 1996). For non-stationary hydrological time series data, conventional de-noising schemes, such as Wiener filter and Kalman filter, are excessively dependent on the establishment of appropriate state space functions (Chou 2014). Wavelet-based noise reduction techniques are able to decompose a signal into several scales that represent both temporal and frequency domains and have had broad applications in various data analyses and signal processing in recent years (Patel *et al.* 2014). Discrete wavelet transform (DWT) as a data-based mathematical technique is particularly suited to time series de-noising and data compression (Evrendilek & Karakaya 2014). However, there are multiple factors influencing the efficiency of DWT, such as the choice of mother wavelet, decomposition level, threshold estimation method, and so on (Sang *et al.* 2014). Extraction of useful information from raw hydrological time series data based on most optimal factors was beneficial for the improvement of the daily streamflow forecasting accuracy of the machine learning methods.

The objective of this study was to compare the performances of ANN, SVM, and ANFIS models coupled with the DWT de-noising method for daily streamflow forecasting. A small forested basin located in Eastern China with large interannual variability in streamflow was chosen as a case study. The sensitivity of de-noising processing to the wavelet type, decomposition level, and threshold functions were examined. The key parameters of three data-based models were optimized by different optimization algorithms to obtain the best model structures. Raw and de-noised daily averaged flow time series data were applied to compare the effectiveness of regular and hybrid models.

## STUDY AREA AND DATA

In this study, historical daily flow data of the Yuetan Basin were collected from the Bureau of Hydrology, Ministry of Water Resources of PR China to evaluate the performance of different methods. The Yuetan Basin (117°38′–118°10′ E, 29°33′–29°50′N) is the source of the Xin'anjiang River and is located in Huangshan City, Eastern China, with a drainage area of 950 km^{2} (Figure 1). It is a forest-dominated catchment, with sparse farming and residential land. The percent of forest cover is 95.44%, while the farm land and residential land accounts for 3.26% and 0.01%, respectively. Mountainous regions and hills are the major landforms in the basin. Located in the subtropical monsoon climate zone, the mean annual temperature is approximately 16 °C. Precipitation is predominated by storms, which are significantly influenced by the terrain. The average precipitation is 1,760 mm, with the majority occurring during May to August. Without the effects of human activity, the streamflow varies widely in different seasons. Most maximum values occur in June and July, while the minimal streamflow is centralized from November to January. The location of the basin along with the hydrologic station is shown in Figure 1. Daily averaged flow data for a period of 14 years from 2000 to 2013 were collected from Yuetan Station. The total number of available data was 5,114 days, which included various hydrological conditions during different seasons of the year. For the purposes of model induction (Babovic & Keijzer 2000), the whole dataset was split into two sub-sets, of which the first 11 years (2000–2010) data were used for training purposes, and the remaining 3 years (2011–2013) of data were used for validation. The main statistics of the training data and validating data are given in Table 1. The skewness values of both the training and validating data series are quite high, which may decrease the estimation accuracy of the models.

Descriptive statistics | Unit | Training data | Testing data |
---|---|---|---|

Min. | m^{3}/s | 0.11 | 0.19 |

Mean | m^{3}/s | 31.39 | 38.07 |

Median | m^{3}/s | 12.5 | 14.65 |

Max. | m^{3}/s | 1,500 | 1,740 |

Standard deviation | m^{3}/s | 71.63 | 103.63 |

Skewness | Dimensionless | 8.75 | 9.02 |

Descriptive statistics | Unit | Training data | Testing data |
---|---|---|---|

Min. | m^{3}/s | 0.11 | 0.19 |

Mean | m^{3}/s | 31.39 | 38.07 |

Median | m^{3}/s | 12.5 | 14.65 |

Max. | m^{3}/s | 1,500 | 1,740 |

Standard deviation | m^{3}/s | 71.63 | 103.63 |

Skewness | Dimensionless | 8.75 | 9.02 |

Streamflows in natural canals were always thought to have nonlinear and dynamic structure (Babovic & Keijzer 2000; Kişi 2007). In this study, the Lyapunov exponent was calculated first to determine whether the daily averaged flow data of Yuetan station exhibited chaotic behavior (Rosenstein *et al.* 1993). The Lyapunov exponent of this time series was positive (0.43), which indicated that the time series exhibited chaos features (Wolf *et al.* 1985). During the process of chaos theory, the first step was reconstructing the dynamics in phase space (Ghorbani *et al.* 2010). The delay-time method was the most frequently used reconstruction method for a univariate time series, namely, the input variables of the models presented the previously observed daily flow values, while the output variable corresponded to the one-day-ahead flow value (t + 1). Numerous studies have demonstrated that autocorrelation function (ACF) and partial autocorrelation function (PACF) are useful to determine the number of daily flow values at different time lags that can best represent the time series (Lohani *et al.* 2012; He *et al.* 2014). Therefore, ACF and PACF were applied first to determine the lag time of inputs. According to the results of ACF and PACF, as shown in Figure 2, the maximum four-lag time was determined (t, t−1, t−2, t−3). Several combinations were considered as inputs of the models: (I) *Q _{t}, Q_{t-1}, Q_{t-2}, Q_{t-3}*, (II)

*Q*, (III)

_{t}, Q_{t-1}, Q_{t-2}*Q*, (IV)

_{t}, Q_{t-1}*Q*.

_{t}*x*generated from a nonlinear dynamic system was able to reconstruct a phase space, as follows (Sun

_{i}*et al.*2010): where is an appropriate delay time and

*m*is the embedding dimension. The choice of the delay time is a very important step in phase space reconstruction. The average mutual information (AMI) analysis suggested by Fraser and others (Fraser & Swinney 1986; Sun

*et al.*2010) is commonly used to determine a proper delay time and is also applied in this study. We calculated AMI using time lags of 1–20 days and the first minimum AMI value occurred at

*τ*= 3, shown in Figure 3. Based on the optimal delay time 3, the embedding dimension

*m*was calculated using the false nearest neighbors method (Kennel

*et al.*1992) and the result indicated that the optimal value of

*m*was 8. Therefore, a multi-day-ahead input was tried in this study to predict

*Q*

_{t + 1}: (V)

*Q*,

_{t}*Q*

_{t−3},

*Q*

_{t−6},

*Q*

_{t−9},

*Q*

_{t−12},

*Q*

_{t−15},

*Q*

_{t−18},

*Q*

_{t−21}. The original input and output data were conveniently normalized to values between 0.1 and 0.9 before the training process to improve the efficiency of the models.

## OPTIMIZED SUPPORT VECTOR MACHINE FOR REGRESSION

*et al.*2014). The general principles of support vector machine for regression (SVR) have been described in many studies (Kavousi-Fard

*et al.*2014; Li

*et al.*2016), so it is not necessary to list them in detail. The performance of SVR mainly depends on a set of parameters: the penalty parameter C, kernel function type and corresponding kernel parameters. In this study, a radius basis function (RBF) with high effectiveness and great speed was selected as the kernel function during the SVR training process. The penalty parameter

*C*and kernel function's parameter

*c*were determined by the particle swarm optimization algorithm (PSO), which was motivated by the social behavior of bird flocking (Kennedy 2010). Each particle in the PSO was given a random initial velocity and position, and then, during the process of particle movement, the best position of each particle was evaluated by a fitness function (Harish

*et al.*2015), which was expressed as the output error of the

*i*th particle in this study: where

*f*is the fitness value,

*S*is the number of training samples,

*t*is the observed value,

_{k}*p*is the predicted value based on

_{k}*x*, and

_{i}*x*denotes the penalty parameter

_{i}*C*and kernel function's parameter

*c*. The 10-fold cross-validation method was applied to the optimized procedure, and the velocity and position of each particle were updated until the termination condition was satisfied. The optimum parameters obtained by PSO were tested with test data to check the performance of the SVR model.

## OPTIMIZED ARTIFICIAL NEURAL NETWORKS

*et al.*2017). BPNN is typically composed of an input layer, hidden layer, and output layer. BPNN has many advantages, such as its simple architecture, ease of construction, and rapid calculation speed (Liu

*et al.*2013). However, it has some issues in application, such as the choice of the neuron number in the hidden (NNH) layer and initialization of the network. A too small or too large NNH may lead to bad generalization capabilities and an over-fitting problem (Li

*et al.*2016). In this study, we calculated the range of NNH according to the following empirical formula: where

*N*is NNH,

*M*is the number of neurons of the input layer (1–4 and 8),

*J*is the number of neurons of the output layer (1), and

*a*is a constant from 1 to 10. The value ranges of NNH in this study were defined as 3–10 for inputs I–IV, and 4–13 for input V. The initial values of the network connection weights between the input layer and hidden layer as well as the hidden layer and output layer were important factors influencing the convergent behavior of BPNN. In this study, PSO was combined into BPNN and used to choose the appropriate connection weight matrixes. Furthermore, the 10-fold cross-validation method was applied to limit the over-fitting problem and improve the generalization ability of BPNN.

## ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM

ANFIS is a fuzzy Sugeno model that maps inputs through input membership functions (MFs) and associated parameters to outputs through output MFs (Jang *et al.* 1997). More details of the mathematical formulations of ANFIS have been described in previous studies (Subasi 2007). The Gaussian and bell-shaped MFs have been increasingly popular for specifying fuzzy sets because of the smoothness and concise notation (Chang & Chang 2006). In this study, we chose the bell-shaped MFs as they had one additional parameter, which indicated that a non-fuzzy set can be more easily approached than Gaussian MFs. The hybrid learning algorithm, combining gradient descent and the least square techniques proposed by Jang *et al.* (Jang 1993), was applied to optimize the network parameters. During the model structure identification process, determination of the optimum number and the form of fuzzy rules is a crucial step. A fuzzy c-means clustering algorithm and subtractive clustering algorithm are applied to the dataset. After developing various models and repeatedly comparing their results, the fuzzy c-means clustering algorithm was chosen to determine the appropriate number of clusters for each input.

## WAVELET DE-NOISING METHOD

Hydrological time series often have complicated nonlinear and non-stationary characteristics, so that de-noising is a necessary task to reduce the noise interference (Babovic 2005; Sang 2013). Among the de-noising methods, wavelet threshold de-noising method was proposed by Donoho & Johnstone (1994) and proven as a simple and effective method by many studies (Wang *et al.* 2014). A large extent of noise could be suppressed, while singular information about the original signal was preserved (Lu *et al.* 2016). The wavelet threshold de-noising method was performed on the hydrological time series in this study, which consisted of three steps: (1) choosing the most appropriate wavelet function and number of resolution levels, and then, decomposing the original data with DWT; (2) applying several types of threshold functions to address the detailed coefficients of each level and obtaining the adjusted decomposed coefficients; (3) reconstructing the time series with adjusted coefficients and choosing the proper de-noise time series.

The description of DWT has occurred in many previous studies (Nalley *et al.* 2012). Daubechies wavelets were the most commonly used mother wavelets and had been proven to be effective in hydrologic time series modeling (Wei *et al.* 2012). The Daubechies wavelet family is often denoted by ‘*dbN*’, where *db* is the ‘surname’ and *N* is the wavelet order. Haar (*db1*), *db2*–*db10* are common members of the Daubechies wavelet family. Each of them was applied to the whole hydrological dataset, and the one with the best result was chosen as the most appropriate mother wavelet. The number of decomposition levels (*L*) was another important factor that needed to be carefully selected during the decomposing process. Training with a small *L* is simple but may lead to inaccurate predictions without sufficient past information, while a high *L* causes hard training and slowly decreases the approximation error (Soltani 2002). Therefore, an *L* of 3–10 was processed for each mother wavelet, with the aim of determining the optimal *L* for the appropriate mother wavelet.

At each decomposition level, a suitable threshold (*λ*) needs to be selected, and the detailed coefficients are processed in the form of the threshold function (Zhang *et al.* 2015). We used two strategies to estimate the threshold in this study: (1) a universal threshold proposed by Donoho and Johnstone (Donoho 1995), , where *N* is the time series length and *σ* is the standard deviation of the noise; (2) a level-dependent threshold proposed by Birge & Massart (Misiti *et al.* 1996). This strategy kept all of the approximation coefficients for the level *J* + 1, and for level *i* from 1 to *j*, the *n _{i}* largest coefficients were kept according to the formula , where

*α*and

*M*are real numbers greater than 1. Generally,

*α*= 3 for de-noising purposes, and

*M*is between

*L*(1) and 2 *

*L*(1).

However, the hard threshold function may lead to a larger variance in the reconstruction of the original series, while the soft threshold function may create an unnecessary bias when the true coefficients are large (Antoniadis 2007). To overcome the drawbacks of the hard and soft threshold functions, three modified threshold functions were applied in this study to process the coefficients.

- Generalized threshold function: where is a secondary threshold defined as in the semi-soft threshold function. In the generalized threshold function,
*m*= 1, 2, …, ∞; when*m*= 1, it is a soft threshold function and when*m**=*∞, it is a hard threshold function. We tried different threshold functions on the time series, and then, we used the modified coefficients to compute the inverse transform and reconstruct the de-noising time series. The accuracy of de-noising series were assessed by two indicators: the signal to noise ratio and mean square error , in which*I*and_{i}*J*are the real series and de-noised series, respectively. The_{i}*SNR*was a measure of the desired signal strength relative to the background noise, and the decrease in the*SNR*meant more noise was included in the reconstructed time series. The*MSE*measured the difference between the raw series and de-noised series, and a higher value indicated a higher error. The reconstructed series with minimum*MSE*and maximum*SNR*were analyzed by PSO-BPNN, PSO-SVR, and ANFIS in the same way as the original series. All of the algorithms were performed by programming codes in MATLAB R2013b, and a Library for Support Vector Machines (LIBSVM) developed by Lin*et al.*(Chang & Lin 2011) was used to design the PSO-SVM model.

## PERFORMANCE EVALUATION

*CE*), root mean square error (

*RMSE*), and mean absolute error (

*MAE*). The

*CE*is also known as the Nash–Sutcliffe coefficient and was defined as follows: The

*RMSE*was calculated as: The

*MAE*was defined as the average of absolute errors: where

*n*is the number of input samples,

*Q*and

_{io}*Q*are observed and predicted value of sample

_{ip}*i*, respectively. is the mean value of observed data. The values of

*CE*= 1,

*RMSE*= 0, and

*MAE*= 0 would be obtained if the predicted values were exactly equal to the observed values.

## RESULTS AND DISCUSSION

The de-noising effectiveness of the whole hydrological time series (2000–2013) at the Yuetan site was evaluated by the values of *SNR* and *MSE* mentioned above. The table including *SNR* and *MSE* values is supplied as Supplementary material (available with the online version of this paper). To confirm the most appropriate resolution levels for decomposition, the results were compared among 3–10 resolution levels. Generally, the smoothing of the detailed signals improved with the increase of the resolution level number, while a greater resolution level led to error propagation (Chou 2014). In this study, with the increase of the resolution levels, the *SNR*s were decreasing and *MSE*s were increasing for all cases. Therefore, the number of resolution levels used in the wavelet decomposition was determined to be three. The trends of *SNR* and *MSE* towards resolution levels are shown in Figure 4, taking the mother wavelet *db3* and semi-soft threshold function as examples.

Under resolution level three, the mother wavelet *db3* offered better performance than the other wavelet functions, and it was selected as the objective wavelet. The values of *SNR* and *MSE* changed along with the wavelet functions, and the trends were not stable, as shown in Figure 5. Based on the determined resolution level and mother wavelet, a comparison can be performed to rank each threshold function's performance. Among seven threshold functions, the semi-soft threshold function (Ssoft) showed the best *SNR* and *MSE*, while the soft threshold function with the Birge and Massart strategy (Bsoft) gave the worst result (Figure 6). The ranks of the other functions were: a hard threshold function with the Donoho and Johnstone strategy (Dhard) > (better than) Garrote threshold function (GaT) > generalized threshold function (GeT) > soft threshold function with Donoho and Johnstone strategy (Dsoft) > hard threshold function with the Birge and Massart strategy (Bhard). By means of *SNR* and *MSE*, we compared the stability of the de-noising performance of various wavelets and finally chose the wavelet with a scale of three and *db3* mother wavelet to de-noise the original hydrological time series based on Ssoft.

The wavelet de-noised dataset was separated into a training sub-set (2000–2010) and a validating sub-set (2011–2013) similar to the original one. A comparison between the results obtained from the raw time series (RS) and wavelet de-noised series (WS) during both training and validating periods was presented to evaluate the performances of PSO-SVR, PSO-BPNN, and ANFIS. To obtain the most effective performance of PSO-BPNN, the NNHs 3–10 were evaluated for inputs I–IV and 4–13 for input V during the training periods for both RS and WS. The NNH with the lowest *RMSE* and *MAE* of the same input may not be the same for RS and WS, which is shown in Table 2.

Input | RS | WS | |
---|---|---|---|

I | Q(t),Q(t-1), Q(t-2), Q(t-3) | 6 | 4 |

II | Q(t),Q(t-1), Q(t-2) | 7 | 3 |

III | Q(t),Q(t-1) | 4 | 6 |

IV | Q(t) | 3 | 7 |

V | Q(t),Q(t-3),Q(t-6),…,Q(t-21) | 5 | 10 |

Input | RS | WS | |
---|---|---|---|

I | Q(t),Q(t-1), Q(t-2), Q(t-3) | 6 | 4 |

II | Q(t),Q(t-1), Q(t-2) | 7 | 3 |

III | Q(t),Q(t-1) | 4 | 6 |

IV | Q(t) | 3 | 7 |

V | Q(t),Q(t-3),Q(t-6),…,Q(t-21) | 5 | 10 |

The results of PSO-BPNN, PSO-SVR, and ANFIS with the best fit structures for RS and WS during the training and validating periods are summarized in Table 3. The performances during the two periods were similar for all models, which indicated good generalization capacities of these nonlinear systems. In order to find the differences of predictive capacities of RS and WS, the *RMSE* and *MAE* values were compared between the two series. The results of most cases showed that the *RMSE* and *MAE* values of WS were lower than RS during both periods (Figure 7), indicating a better predictability of WS and the necessity of data de-noising.

CE | RMSE | MAE | |||||
---|---|---|---|---|---|---|---|

Model | Input | Training | Validating | Training | Validating | Training | Validating |

PSO-SVR_WS | I | 0.83 | 0.82 | 31.87 | 46.95 | 9.94 | 11.21 |

II | 0.80 | 0.81 | 34.88 | 47.96 | 11.82 | 14.79 | |

III | 0.70 | 0.62 | 43.13 | 68.17 | 11.10 | 13.97 | |

IV | 0.62 | 0.45 | 47.91 | 82.30 | 12.99 | 17.97 | |

V | 0.79 | 0.76 | 35.89 | 54.73 | 11.97 | 14.20 | |

PSO-SVR_RS | I | 0.80 | 0.81 | 35.23 | 48.00 | 14.36 | 14.96 |

II | 0.76 | 0.66 | 38.04 | 64.75 | 13.33 | 15.39 | |

III | 0.69 | 0.65 | 43.49 | 65.52 | 11.78 | 14.09 | |

IV | 0.61 | 0.46 | 48.70 | 81.51 | 15.26 | 20.00 | |

V | 0.76 | 0.76 | 38.19 | 55.06 | 13.22 | 15.57 | |

PSO-BPNN_WS | I | 0.71 | 0.64 | 42.28 | 66.39 | 15.49 | 19.56 |

II | 0.69 | 0.58 | 43.34 | 71.97 | 13.98 | 17.72 | |

III | 0.70 | 0.52 | 43.03 | 76.42 | 13.48 | 18.09 | |

IV | 0.61 | 0.49 | 48.63 | 78.92 | 15.48 | 20.08 | |

V | 0.68 | 0.71 | 44.08 | 60.46 | 14.84 | 17.72 | |

PSO-BPNN_RS | I | 0.67 | 0.66 | 45.06 | 64.15 | 16.88 | 21.04 |

II | 0.67 | 0.48 | 44.84 | 80.03 | 13.63 | 19.22 | |

III | 0.69 | 0.54 | 43.36 | 74.75 | 14.11 | 18.46 | |

IV | 0.62 | 0.49 | 48.17 | 78.94 | 14.69 | 19.56 | |

V | 0.68 | 0.63 | 44.29 | 67.89 | 15.14 | 19.10 | |

ANFIS_WS | I | 0.78 | 0.81 | 36.94 | 47.98 | 10.69 | 12.88 |

II | 0.77 | 0.80 | 37.10 | 49.49 | 11.27 | 13.38 | |

III | 0.68 | 0.69 | 44.17 | 61.31 | 11.99 | 14.24 | |

IV | 0.64 | 0.60 | 47.03 | 69.92 | 12.97 | 16.26 | |

V | 0.68 | 0.71 | 43.90 | 60.22 | 13.35 | 16.24 | |

ANFIS_RS | I | 0.74 | 0.74 | 40.08 | 56.83 | 12.21 | 14.57 |

II | 0.71 | 0.74 | 42.26 | 56.18 | 11.97 | 13.56 | |

III | 0.69 | 0.67 | 43.30 | 63.20 | 12.47 | 15.35 | |

IV | 0.61 | 0.61 | 48.72 | 69.32 | 13.52 | 16.25 | |

V | 0.63 | 0.68 | 47.49 | 62.95 | 15.00 | 18.08 |

CE | RMSE | MAE | |||||
---|---|---|---|---|---|---|---|

Model | Input | Training | Validating | Training | Validating | Training | Validating |

PSO-SVR_WS | I | 0.83 | 0.82 | 31.87 | 46.95 | 9.94 | 11.21 |

II | 0.80 | 0.81 | 34.88 | 47.96 | 11.82 | 14.79 | |

III | 0.70 | 0.62 | 43.13 | 68.17 | 11.10 | 13.97 | |

IV | 0.62 | 0.45 | 47.91 | 82.30 | 12.99 | 17.97 | |

V | 0.79 | 0.76 | 35.89 | 54.73 | 11.97 | 14.20 | |

PSO-SVR_RS | I | 0.80 | 0.81 | 35.23 | 48.00 | 14.36 | 14.96 |

II | 0.76 | 0.66 | 38.04 | 64.75 | 13.33 | 15.39 | |

III | 0.69 | 0.65 | 43.49 | 65.52 | 11.78 | 14.09 | |

IV | 0.61 | 0.46 | 48.70 | 81.51 | 15.26 | 20.00 | |

V | 0.76 | 0.76 | 38.19 | 55.06 | 13.22 | 15.57 | |

PSO-BPNN_WS | I | 0.71 | 0.64 | 42.28 | 66.39 | 15.49 | 19.56 |

II | 0.69 | 0.58 | 43.34 | 71.97 | 13.98 | 17.72 | |

III | 0.70 | 0.52 | 43.03 | 76.42 | 13.48 | 18.09 | |

IV | 0.61 | 0.49 | 48.63 | 78.92 | 15.48 | 20.08 | |

V | 0.68 | 0.71 | 44.08 | 60.46 | 14.84 | 17.72 | |

PSO-BPNN_RS | I | 0.67 | 0.66 | 45.06 | 64.15 | 16.88 | 21.04 |

II | 0.67 | 0.48 | 44.84 | 80.03 | 13.63 | 19.22 | |

III | 0.69 | 0.54 | 43.36 | 74.75 | 14.11 | 18.46 | |

IV | 0.62 | 0.49 | 48.17 | 78.94 | 14.69 | 19.56 | |

V | 0.68 | 0.63 | 44.29 | 67.89 | 15.14 | 19.10 | |

ANFIS_WS | I | 0.78 | 0.81 | 36.94 | 47.98 | 10.69 | 12.88 |

II | 0.77 | 0.80 | 37.10 | 49.49 | 11.27 | 13.38 | |

III | 0.68 | 0.69 | 44.17 | 61.31 | 11.99 | 14.24 | |

IV | 0.64 | 0.60 | 47.03 | 69.92 | 12.97 | 16.26 | |

V | 0.68 | 0.71 | 43.90 | 60.22 | 13.35 | 16.24 | |

ANFIS_RS | I | 0.74 | 0.74 | 40.08 | 56.83 | 12.21 | 14.57 |

II | 0.71 | 0.74 | 42.26 | 56.18 | 11.97 | 13.56 | |

III | 0.69 | 0.67 | 43.30 | 63.20 | 12.47 | 15.35 | |

IV | 0.61 | 0.61 | 48.72 | 69.32 | 13.52 | 16.25 | |

V | 0.63 | 0.68 | 47.49 | 62.95 | 15.00 | 18.08 |

The comparison of results obtained from inputs I–V for the three WS models showed that the input consisting of four antecedent daily flow data (I) had a better fit than the others (Figure 8). With the lead time decreasing (II, III and IV), the results tended to deteriorate. The multi-day-ahead input (V) showed a moderate predictive capacity compared with the one-day-ahead inputs. It is worth mentioning that the multi-day-ahead input had much better generalization capacities than inputs II–IV for the PSO-BPNN, which was not very satisfactory during the validating periods of both RS and WS.

Among the three models, PSO-SVR obtained the smallest values of *RMSE* and *MAE* as well as the highest value of *CE* for input I of WS, and so it was selected as the best-fit model for daily streamflow prediction in this study. ANFIS was slightly worse than PSO-SVR, but was apparently better than PSO-BPNN. The performances of PSO-SVR, PSO-BPNN, and ANFIS suggested that the three models showed good prediction accuracies for intermediate and moderate streamflow. However, it was an important measure to evaluate the predictive ability of a model on the estimations of the high and low streamflow. The *MAE*s of the three models for the 50 lowest values of the training data and validating data were compared and are shown in Figure 9. The differences between the simulated results and observed data of PSO-SVR and ANFIS were apparently smaller than for PSO-BPNN during the two periods. The change trends of *MAE* were similar between PSO-SVR and ANFIS, but the prediction ability of ANFIS was superior to that of PSO-SVR with minimum data. The performances of the three models on the 50 highest values of the training data and validating data were also compared (Figure 10), and none appeared to be superior to the others. The average relative errors of PSO-SVR, PSO-BPNN, and ANFIS were 34.01%, 41.01%, and 31.86% for the highest training data and 28.83%, 56.70%, and 44.08% for the highest validating data, respectively. PSO-SVR showed a smoother trend of errors and better generalization ability than the other two models for maximal data. Overall, PSO-SVR provided accurate and reliable daily flow predictions for moderate and high streamflow, and ANFIS was good at handling low values.

## CONCLUSION

In this study, the performances of PSO-BPNN, PSO-SVR, and ANFIS for daily streamflow prediction in a forested basin were compared using one-day-ahead and multi-day-ahead inputs, respectively. The hydrological time series was first decomposed and de-noised by the wavelet modeling framework to enhance the forecasting performance. By discussing several key issues regarding the wavelet de-noising method, seven threshold functions were applied to identify and separate the deterministic components and noise of a hydrologic series. The wavelet with three levels and the db3 mother wavelet were chosen to de-noise the raw series using the Ssoft method. Both the RS and WS were used for daily flow prediction, and the predictive capacities of PSO-BPNN, PSO-SVR, and ANFIS were evaluated. The WS models presented better results than the RS models for all situations. The further comparison of WS models indicated that the PSO-SVR model was superior to the PSO-BPNN and ANFIS models, and the 4-day ahead SVR_WS yielded the best performance among all of the models in terms of various model efficiencies. Due to the effects of climate and topography, extreme values were very common in the daily flow data of the Yuetan Basin. A comparison of the estimation of low flow and peak flow showed that the PSO-SVR model was slightly superior to the others for peak flow prediction, while the ANFIS model performed best for low flow prediction.

As the de-noising stage enhanced the forecasting performance and the proposed model could be applied for short-term forecasting of a hydrologic time series, it was difficult to identify a method that could maintain its prediction accuracy for extreme values. The results obtained here were limited to a single application, and further studies are needed to assess more cases and improve the predictive capacity by introducing a subsection function model or other new methods.

## ACKNOWLEDGEMENTS

This work is supported by the Technology Research and Development Program of the Higher Education Institutions of Tianjin (043135202JW1716), the innovation team training plan of the Tianjin Education Committee (TD12-5037) and National Natural Science Foundation of China (No. 41372373). Comments and suggestions from two anonymous reviewers and the editor are greatly appreciated.

## REFERENCES

*.*

*.*