## Abstract

In this paper the performance of extreme learning machine (ELM) training method of radial basis function artificial neural network (RBF-ANN) is evaluated using monthly hydrological data from Ajichai Basin. ELM is a newly introduced fast method and here we show a novel application of this method in monthly streamflow forecasting. ELM may not work well for a large number of input variables. Therefore, an input selection is applied to overcome this problem. The Nash–Sutcliffe efficiency (NSE) of ANN trained by backpropagation (BP) and ELM algorithm using initial input selection was found to be 0.66 and 0.72, respectively, for the test period. However, when wavelet transform, and then genetic algorithm (GA)-based input selection are applied, the test NSE increase to 0.76 and 0.86, respectively, for ANN-BP and ANN-ELM. Similarly, using singular spectral analysis (SSA) instead, the coefficients are found to be 0.88 and 0.90, respectively, for the test period. These results show the importance of input selection and superiority of ELM and SSA over BP and wavelet transform. Finally, a proposed multistep method shows an outstanding NSE value of 0.97, which is near perfect and well above the performance of the previous methods.

## INTRODUCTION

Streamflow forecasting is a critical problem in hydrology and water management because of a wide range of variabilities in space and time. Many water resources system rule curves require monthly streamflow forecasts for their operation. So far, various methods have been proposed which are capable of forecasting streamflows with different accuracy levels under different conditions. Forecasting methods have considerably evolved from simple linear equations to very complicated methods during the last half century. Thus, a wide range of models, including stochastic (e.g., AR-autoregressive and ARMA-autoregressive moving average), conceptual (e.g., HEC-HMS and HBV) and physical based (e.g., SWAT) have been considered by researchers for streamflow forecasting. For instance, Adamowski *et al.* (2012) indicated that, due to the complex relationship between rainfall–runoff variables and the lack of sufficient hydrological data in mountainous watersheds, data-driven models, such as artificial neural network (ANN), are more suitable than the process-based models in forecasting streamflows. In fact, there is no single unique method which could be used for all types of systems and basins. During this period, hydrologists have exploited the new techniques available with the progress of science, to build more reliable forecasting methods. Artificial intelligence methods have attracted great attention, due to their ability to address the nonlinear and complex relationship between input and output variables. Since then, ANNs have been widely applied by various researchers for streamflow forecasting (e.g., Zealand *et al.* 1999; Wang *et al.* 2006; Islam 2010; Wei *et al.* 2012; Sudheer *et al.* 2013; Zounemat-Kermani 2014).

As any model, ANNs must be trained by the local data. Backpropagation (BP) is a standard method usually used for training neural networks (Rumelhart *et al.* 1986). In BP, least mean square error between the target and the network output is propagated backward to train the network. This is done by adjusting the network weights and biases which together form the model parameters with an objective to minimize the total error. Although BP works well on simple training problems, as the problem complexity grows the performance of the method falls (Dariane & Karami 2014). Meanwhile, the application of evolutionary methods has been suggested as an alternative to overcome some of these deficiencies (Montana & Davis 1989). However, evolutionary methods are generally time-consuming and may not be suitable for occasions where time is important (i.e., large problems). The method of extreme learning machine (ELM) was introduced by Huang *et al.* (2006) as a fast and simple way for training ANN models.

In the last couple of years, some researchers have reported the successful application of ELM in water resources problems. For example, Li & Cheng (2014) applied a combination of wavelet transform and neural network trained by extreme learning machine (WANN-ELM) to forecast one month ahead discharge. According to their results, ANN trained by ELM showed slightly better performance than the support vector machine (SVM), while WANN-ELM demonstrated the most accurate performance among the three models. Deo & Shahin (2015) compared the application of ANN trained by BP and ELM algorithms in predicting a monthly effective drought index. According to their results, the learning speed of ELM is 32 times faster than the BP algorithm. Wang *et al.* (2015) developed a hybrid model using ELM and seasonal auto-regression integrated moving average (SARIMA) to forecast wind speed. They applied ANN trained by ELM algorithm to simulate the wind speed. According to their results, the hybrid model proposed overtakes the single ones such as the BP neural network and the classic models (e.g., SARIMA) in accuracy of the predictions. Also, an online sequential ELM model was applied by Yadav *et al.* (2016) in a flood forecasting problem in Germany. According to their findings, for the considered case, the ELM-based model was more accurate than SVM, genetic programming (GP), and ANN trained by traditional methods.

According to the simple computational nature of ELM and the need for the definition of more hidden nodes than the gradient-based algorithms (e.g., BP), it would be inappropriate to use large number of inputs. Our review of the literature in streamflow forecasting fields showed that there is no application of ANN-ELM that uses an evolutionary algorithm to select effective features (i.e., input variables). However, input selection methods have been used in combination with other methods. For example, Asadi *et al.* (2013) applied the combination of data preprocessing and ANNs to predict basin runoff and indicated the effectiveness of the input selection approach in improving the results. Bowden *et al.* (2005) used two input determination approaches including partial mutual information, and the hybrid genetic algorithm (GA) and general regression neural network. They concluded that both input selection approaches provide more accurate results for river salinity forecasting. Moreover, Dariane & Azimi (2016) developed a combination of wavelet neuro-fuzzy models based on genetic input selection algorithm to forecast streamflow. They demonstrated that input selection approaches are more useful for large basins where effective input variables’ determination is manually a time-consuming task. Their results indicated that by using genetic input selection algorithm and wavelet transform, considerable improvements would be achieved.

Suitable data preprocessing can help a predictor model to forecast the main features of a time series (main oscillations) more accurately (Tiwari & Chatterjee 2010; Wei *et al.* 2012). Singular spectral analysis (SSA) and wavelet transform are two types of preprocessing techniques. SSA is based on the decomposition of the original series into a sum of series including the trend, oscillation, and noise components. This is provided by the reconstruction of the original series (Golyandina *et al.* 2001). SSA has been employed in some hydrological problems (e.g., Sivapragasam *et al.* 2001; Wu & Chau 2011). Sivapragasam *et al.* (2001) applied SVM to model rainfall–runoff process by using SSA preprocessing technique. They demonstrated that their combined model has better performance than the single SVM without any data preprocessing. Wu & Chau (2011) used a conjunction of ANNs and SSA and indicated that using SSA provided a significant improvement in model accuracy. Wavelet transform is used more widely than the SSA to develop hydrologic forecasting models (e.g., Kim & Valdes 2008; Rajaee *et al.* 2011; Wei *et al.* 2013; Santos & Silva 2014). Similar to SSA, wavelets help models to perform more accurately. For instance, Santos & Silva (2014) compared the performance of single ANN and wavelet-ANN for 1, 3, 5, and 7 day ahead streamflow forecasting. The wavelet-ANN models performed better than ANN. This superior performance became more evident for longer lead times (i.e., 5 and 7 day ahead forecasts). In addition, they employed a trial and error process for selecting appropriate input variables. Their results showed that using the approximations of the first five decomposition levels as inputs produced the best results and incorporation of detail signals did not directly play an important role in improving the results. Also, Tiwari & Chatterjee (2010) applied a wavelet-ANN hybrid model to forecast daily streamflow. They used a correlation analysis to select and then recombine significant wavelet components. These components were applied as the new input variables to the ANN model. Their proposed hybrid model was able to catch the peak flows more accurately than the simple ANN model.

In fact, streamflow forecasting helps decision-makers to adjust their actions according to the state of the forecasted streamflow. With one month lead time, reservoir operators are able to better decide about their releases and storages. Otherwise, one must use the long-term mean streamflow in the absence of any forecast that would yield considerable errors. On the other hand, using more lead time (e.g., two or more months) provides extra flexibility and more time for system operation adjustment but also would have higher risk. Overall, a monthly horizon with zero lead time (forecasting the flow of next month) is common in most reservoir operation systems. However, both lead time and forecast horizon may vary from one system to another. Herein, to forecast monthly streamflow, a proposed method is investigated where after the initial variable selection using GA, the variables are decomposed using both the wavelet transform and the SSA. Then, GA input selection is used once again to pick out the best sub-signals from the pool of all decomposed signals. In the next step, these sub-signals are set as the input variables of RBF-ANN trained by ELM (ANN-ELM) and ANN trained by BP (ANN-BP). Finally, a hybrid model is defined where the outputs of the aforementioned ANN-ELM and ANN-BP models are used as the inputs of an adaptive neuro-fuzzy inference system (ANFIS) model.

## METHODOLOGY

Following the authors’ previous study (Dariane & Azimi 2016), in this paper, the capabilities of ANNs have been examined in Ajichai streamflow forecasting. Due to some challenges and problems in application of ANNs (mentioned later), a proposed hybrid model which benefits from two signal decomposition techniques, two network training methods, and using twice an input selection algorithm was applied. Here, a brief presentation of applied methods, including ELM algorithm, SSA method, wavelet transform, and genetic input selection is given. Also, a short description of the proposed method is given. Readers can obtain detailed information about neuro-fuzzy and gradient-based training algorithms from Jang (1993) and Hagan *et al.* (1996).

### ELM training algorithm

*et al.*(2006) is a novel and fast converging algorithm for training a single hidden-layer, feed forward neural network. For

*n*input variables,

*N*hidden nodes, and

*M*training cases, the model is presented as follows: where

*x*,

_{j}*o*

_{j}_{,}

*w*,

_{i}*b*, and

_{i}*β*are training input datasets, training output datasets, random input weights, random input biases, and weights of output layer, respectively.

_{i}*g*(

*x*) is an activation function. Here, the Gauss RBF kernel function (Equation (2)) is applied as the activation function of the ANN-ELM model: where

*x*and

_{i}*σ*are constant parameters with values found to be 0.05 and 1.5 using a sensitivity analysis.

*N*

*=*

*M*, the model can estimate the training target datasets with zero error: where

*y*is training target dataset. Also, these functions can be written as (Huang

*et al.*2006): where

*H*is the output matrix of the hidden layer.

The solution of Equation (6) is = *H ^{*}.Y*, where

*H*is the Moore–Penrose generalized inverse of the hidden layer matrix

^{*}*H*and

*Y*is training target dataset.

The ELM algorithm can be summarized as follows:

Determine input weights and biases randomly.

Calculate the hidden layer output matrix

*H*(from Equation (4)).Calculate the output weight

*β**=**H*.^{*}Y

### Singular spectral analysis

*et al.*(2001). Data preprocessing consists of decomposition and construction stages. In the first decomposition step, the embedding procedure transfers the time series

*x*

*=*(

*x*

_{0}, … ..,

*x*) of length

_{N−1}*N*into a sequence of

*L*-dimensional vectors

*x*=

_{i}*(x*

_{i−1}, … ..,

*x*

_{i+L−2})

^{T}, (

*I*=

*1, … .., K*

*=*

*N−L*

*+*

*1*): In the next step of the decomposition stage,

*XX*is calculated and its eigen triple (

^{T}*s*,

_{i}*u*,

_{i}*v*) is determined by the singular value decomposition (SVD). Where

_{i}*s*is i

_{i}*singular value of*

^{th}*X*,

*u*and

_{i}*v*are

_{i}*i*left and right eigen vectors of

^{th}*XX*, respectively. The trajectory matrix

^{T}*X*can be as follows: The next two steps (grouping and diagonal averaging) provide reconstruction stages. In the grouping procedure, the indices

*j*

*=*1, … .,

*L*are partitioned into M disjoint groups

*I*

_{1},

*I*

_{2}, … ,

*I*, so that the elementary matrices,

_{M}*X*, in Equation (8) are split into

_{i}*M*groups. Each group consists of indices as

*I*

*=*{

*i*}. The resultant matrix

_{1}, … ., i_{p}*X*is defined as:

_{I}*X*=

_{In}*X*

_{i1}*+*

*……+ X*and so the matrix

_{ip}*X*is presented as the sum of

*M*resultant matrices: In the last step of SSA, for diagonal averaging, each resultant matrix transforms into a new time series of length

*N*. Let

*X*be a (

*L*×

*k*) matrix with elements

*x*. Make

_{ij}*L*= min (

^{*}*L*,

*k*),

*K*= max (

^{*}*L*,

*k*). Let us define

*x*=

_{ij}^{*}*x*, if

_{ij}*L*<

*K*; otherwise

*x*

_{ij}^{*}*=*

*x*. Diagonal averaging transfers matrix

_{ji}*X*to a series

*g*

_{0}, … ,

*g*using the following formula: Equation (10) corresponds to averaging the elements along diagonals

_{N−1}*i*+

*j*=

*k*+ 2. This diagonal averaging is applied to a resultant matrix

*X*. Thus, the original time series is decomposed into the sum of

_{In}*M*series, and can be derived as follows:

### Wavelet transform

Wavelet transform, which is an important technique in the field of signal processing, has attracted a great deal of attention since its introduction in the early 1980s. There are two types of wavelet transform: continuous and discrete. The first one deals with continuous functions and the second one is applied for discrete functions or time series. Most of the hydrological time series are measured in discrete time steps. Therefore, DWT would be a suitable method for decomposition and reconstruction of these series. This way, the signal is divided into two parts including ‘approximation’ and ‘details’ and the original signal is broken down into lower resolution components. These components explain a better behavior and reveal more information about the process than the original time series. Therefore, they can help the forecasting models to predict with a higher accuracy (Remesan *et al.* 2009; Adamowski & Chan 2011).

*x*(

*t*), is defined as (Mallat 1998): where

***denotes conjugate complex function. is wavelet function or mother wavelet,

*a*is scale or frequency factor also called dilation factor, and

*τ*is the time factor. The term ‘scale’ refers to extending or compressing the wavelet. Using small scale causes the wavelet to be compressed and in the case of large scale the wavelet is extended. Large scale values are not able to show the details whereas small scales are applied to reveal more details.

*m*and

*n*are integers that control, respectively, the wavelet dilation and translation;

*a*is a specified fixed dilation step with a value greater than 1; and

_{0}*τ*is the location parameter which must be greater than zero. Scales and positions are usually based on powers of two (dyadic scales and positions), making it more efficient for practical cases (Mallat 1998). The most common (and simplest) choice for the parameters

_{0}*a*

_{0}and

*τ*

_{0}are 2 and 1, respectively. By this way, for a discrete time series

*x*, which occurs at different time

_{i}*t*, the DWT is defined as (Mallat 1998): where

*W*is the wavelet coefficient for the discrete wavelet of scale

_{m,n}*a*=

*2*and location

^{m}*τ*

*=*

*2*.

^{m}n### GA for input selection

Input variable selection approaches are generally partitioned into three main groups of wrapper, embedded, and filter methods. The GA input selection algorithm which is applied here can be considered as a wrapper input selection method. In order to obtain more comprehensive information about these classes of input selection methods, readers are referred to Kohavi & John (1997), Blum & Langley (1997) and Guyon & Elisseeff (2003).

Herein, the whole data were divided into train (70%), validation (15%), and test (15%). For input selection, the objective function of GA was defined as 0.75*MSE_{TRAIN} +0.25*MSE_{VALIDATION} which uses both training and validation periods to evaluate the function. Using a weighted combination of fitness function helps in selecting better input features. Thus, for training and validation MSE, the weights were proposed equal to 0.75 and 0.25, respectively, which were determined through a trial and error process. It should be mentioned that after selecting the input variables of the ANN model, the train and validation data were used as usual to find the weights and biases of the ANN model after which they are evaluated by the test period data.

The main GA operators are applied and the new generation is produced. Then stopping conditions are checked and the decision is made to continue the loop or to end the process. The details of input selection algorithm are available in our last study (Dariane & Azimi 2016).

### The proposed model

In this study, a hybrid forecasting model has been developed based on data preprocessing, input selection, and data-driven methods. In this model, a GA input selection method is applied in the initial stage to choose the most proper variables from the available ones. Precipitation at Sarab, Esbaghran and Gushchi stations, along with temperature at Mirkuh plus Vanyar streamflow (all lagged by one time period) are selected as the final inputs. Then, the selected input variables are decomposed using wavelet transform and SSA independently. Next, the decomposed series are combined together to form a selection pool from which a second GA input selection algorithm is applied to choose the best decomposed sets. These sets are then used as the optimum input variables for ANN-BP and ANN-ELM, independently. Finally, a hybrid model is used, where the outputs of ANN-BP and ANN-ELM are set as the input for an ANFIS model, to generate the forecasted flow (Figure 1).

## CASE STUDY

The data used for developing the models belong to the Ajichai basin. Ajichai is a sub-basin in the larger Urmia Lake Basin. Urmia Lake Basin which is mainly located within two Azerbaijan provinces in northwest Iran has an area of 51,876 km^{2} (Figure 2). A small portion of the basin is located in Kurdistan province. Ajichai, with an area of 7,675 km^{2} above Vanyar station, is considered a relatively large basin in the northeastern part of the lake. A monthly time step with zero lead time was used in developing monthly streamflow forecasting models in this research. Nine precipitation stations considered for this basin include Bostanabad, Ghurigul, Ghushchi, Nahand, Saeedabad, Sarab, Esbaghran, Sohzab, and Vanyar (Figure 2). In addition, temperature data from Sohzab, Mirkuh, and Ghurigul stations are used for developing the monthly streamflow forecasting models. The statistical properties of all available time series are shown in Table 1. Figure 3 presents the hydrograph of Ajichai River at Vanyar station. As can be seen from Figure 3, there are high fluctuations in streamflow data. In some periods, mainly in summer months, the river runs dry.

Data | Station | Max. | Min. | Mean | St. dev. |
---|---|---|---|---|---|

Rainfall (mm) | Bostanabad | 124.9 | 0 | 21.4 | 23.7 |

Ghurigul | 146.5 | 0 | 25.2 | 25.6 | |

Ghushchi | 133.5 | 0 | 20.8 | 23.9 | |

Nahand | 126.5 | 0 | 22.3 | 22.2 | |

Saeedabad | 294 | 0 | 33.1 | 39.6 | |

Sarab | 139.4 | 0 | 20.5 | 20.2 | |

Esbaghran | 153 | 0 | 23.9 | 24.7 | |

Sohzab | 166 | 0 | 25.4 | 25.7 | |

Vanyar | 136.6 | 0 | 17.5 | 21.2 | |

Mirkuh | – | – | – | – | |

Temperature (°C) | Ghurigul | 22.8 | −16.8 | 6.9 | 9.6 |

Sohzab | 22.5 | −17 | 8.1 | 8.9 | |

Mirkuh | 24.9 | −12.8 | 8.9 | 8.7 | |

Runoff (m^{3}/s) | Vanyar | 178.3 | 0 | 12.6 | 20.7 |

Data | Station | Max. | Min. | Mean | St. dev. |
---|---|---|---|---|---|

Rainfall (mm) | Bostanabad | 124.9 | 0 | 21.4 | 23.7 |

Ghurigul | 146.5 | 0 | 25.2 | 25.6 | |

Ghushchi | 133.5 | 0 | 20.8 | 23.9 | |

Nahand | 126.5 | 0 | 22.3 | 22.2 | |

Saeedabad | 294 | 0 | 33.1 | 39.6 | |

Sarab | 139.4 | 0 | 20.5 | 20.2 | |

Esbaghran | 153 | 0 | 23.9 | 24.7 | |

Sohzab | 166 | 0 | 25.4 | 25.7 | |

Vanyar | 136.6 | 0 | 17.5 | 21.2 | |

Mirkuh | – | – | – | – | |

Temperature (°C) | Ghurigul | 22.8 | −16.8 | 6.9 | 9.6 |

Sohzab | 22.5 | −17 | 8.1 | 8.9 | |

Mirkuh | 24.9 | −12.8 | 8.9 | 8.7 | |

Runoff (m^{3}/s) | Vanyar | 178.3 | 0 | 12.6 | 20.7 |

## EXPERIMENTAL SETUP

In this section, the experimental setup is presented. As was mention in the section ‘Case study’, there are 13 potential input variables of different kinds in this basin. In order to decrease the number of input variables, an initial GA-based input selection is applied. Thus, four out of thirteen input variables, including precipitation at Sarab, Esbaghran, and Gushchistations, along with temperature at Mirkuh (all lagged by one time period) are selected. Also, the lagged streamflow of Vanyar was directly added to the selected input variables. For all available data, 41 years (1966–2006, inclusive) of monthly data are available, of which 70% (344 datasets) are used for training the applied data-driven models and the remaining 30% (147 datasets) are applied for validation and testing them. It is worthwhile to mention that the test and validation periods were selected from the middle parts of the data duration based on some initial assessments. Root mean square error (RMSE), coefficient of determination (R^{2}), and Nash–Sutcliffe efficiency (NSE) index, as three performance measures, were selected to evaluate the models’ performance. These parameters are commonly applicable for estimating the performance of data-driven models (Yadav *et al.* 2016).

## RESULTS AND DISCUSSION

As was mentioned earlier, ANN-ELM suffers from weak performance when there are a large number of input variables. Using wavelet transform and SSA to decompose the input data into further input series would generate more input variables which could aggravate the aforementioned problem. In other words, by application of these transforms the final performance would become worse than they were before the decomposition, thus undermining the decomposition process. This problem could be overcome by using an input selection method where only suitable input variables are chosen from among many data series. In the following, the performance of decomposition methods as well as the ELM training method along with the input selection algorithm is evaluated. Finally, the results of the proposed method are presented and discussed.

### ELM versus BP

In this part of the article, the performance of ANN trained by ELM is evaluated through comparison with the one trained by BP algorithm. For this purpose, a neural network model is defined for forecasting the monthly river discharges at Vanyar station, Ajichai Basin. There are potentially 13 input variables of different kinds in this basin. These include monthly precipitation, temperature, and streamflow, all lagged by one time interval.

In order to reduce the number of variables, a GA-based input selection algorithm is applied. Table 2 shows configuration parameters of the GA selection method. All these parameters have been obtained through a trial and error process. It should be noted that the model training, verification, and testing is carried out using 70, 15, and 15% of available data, respectively. During the input selection process, 70% of monthly data were used for training the model and selecting suitable input variables. Then, these selected variables are verified using another 15% of available data which do not participate in the ANN training process.

Number of generation | 500 |

Size of population | 20 |

Rate of crossover | 0.75 |

Rate of mutation | 0.08 |

Number of generation | 500 |

Size of population | 20 |

Rate of crossover | 0.75 |

Rate of mutation | 0.08 |

Clearly, the input selection methods would probably eliminate some of the appropriate inputs in favor of the lagged streamflow variable due to the high cross correlation between the streamflow and most of those variables. Therefore, to avoid this discrepancy, the lagged streamflow was excluded from the input selection test and was directly added to the selected variables’ collection. The application of GA input selection algorithm resulted in choosing four out of thirteen input variables including precipitation at Sarab, Esbaghran, and Gushchistations, along with temperature at Mirkuh, all lagged by one time period.

In the next step, these variables plus the lagged streamflow at Vanyar were applied as the inputs to the ANN-ELM and ANN-BP models. Figure 4 shows a comparison of the results during the test period. As can be seen from Table 3, the test NSE indices as well as other criteria indicate the superiority of ANN-ELM over ANN-BP. Regardless of the fact that the ANN-ELM uses many more neurons in the hidden layer (20 versus 5 used by ANN-BP), it trains the network ten times faster than the ANN-BP model. Thus, the speed is a great advantage of the ELM over BP, bearing in mind that the performance of ANN-ELM is also much better than the ANN-BP.

Train | Test | |||||
---|---|---|---|---|---|---|

Model | R^{2} | NSE | RMSE | R^{2} | NSE | RMSE |

ANN-BP | 0.67 | 0.67 | 0.067 | 0.66 | 0.60 | 0.067 |

ANN-ELM | 0.73 | 0.73 | 0.051 | 0.72 | 0.71 | 0.053 |

Train | Test | |||||
---|---|---|---|---|---|---|

Model | R^{2} | NSE | RMSE | R^{2} | NSE | RMSE |

ANN-BP | 0.67 | 0.67 | 0.067 | 0.66 | 0.60 | 0.067 |

ANN-ELM | 0.73 | 0.73 | 0.051 | 0.72 | 0.71 | 0.053 |

As can be seen from Figure 4, although the ELM method has been able to improve the performance of the ANN model, there are still instances (mainly in peak discharges) where more accuracy is needed. For instance, none of the methods were able to catch the main peak flow in the third year while errors in some other peaks are also significant. In the next step, data preprocessing methods, including wavelet transform and SSA methods are investigated for further enhancement.

### Data preprocessing approaches

It is clear that redundant and irrelevant variables lead to a poor generalization performance, add error and noise to the model, and prevent correct learning process (May *et al.* 2011). Thus, one of the main problems which might arise during the application of a data-driven model is to detect the appropriate input variables. In general, according to the law of parsimony, the number of model inputs should be as few as possible. This is more emphasized when ELM is applied which is highly sensitive to the number of input variables. In comparison to ANN-BP, the ANN-ELM method suffers from weak performance when there is a large number of input variables. ELM-based ANN requires more hidden neurons than the BP-based ANN to train the network. This leads to poor performance of the method in large networks as compared to the BP-based ANN. Therefore, after decomposing the initially selected input variables by the wavelet and SSA methods, a GA input selection algorithm is applied to extract more important sub-series and limit their numbers and thus the corresponding hidden neurons in order to avoid large networks. More details are presented in the following sections.

#### Using wavelet transform

Wavelet transforms are used to achieve more reliable and accurate outputs. In general, a suitable level of decomposition is selected with respect to the nature of time series. Usually, in order to select the best mother wavelet, the apparent similarity between mother wavelet and the time series should be considered; but some researchers use their own experience (e.g., Wei *et al.* 2013) and some others apply sensitivity analysis to choose a suitable mother wavelet (Nourani *et al.* 2011). In our application, we also used sensitivity analysis where db4 mother wavelet with two decomposition levels was determined suitable for the Ajichai time series. Therefore, one approximate sub-signal (a2) of the original signal and two detailed sub-signals (d1, d2) are generated for further application.

In the next step, the GA input selection is applied and nine appropriate inputs are selected from 15 generated sub-signals (Table 4). These inputs are then used by ANN-BP (using six neurons) and ANN-ELM (using 60 neurons) to forecast monthly streamflow at Vanyar station. As can be seen from Table 5, by applying wavelet transform all evaluation parameters are improved, especially during the test period. Meanwhile, the network trained by the ELM outperforms the one by BP, which was also observed earlier when no data preprocessing method was applied (see Table 3). According to Tables 3 and 5, the result of the network trained by ELM has not considerably improved by only using the wavelet transform. This unexpected result is caused by the large number of input variables of WANN-ELM before applying input selection. However, by using GA input selection, the NSE index of WANN-ELM increases substantially from 0.71 to 0.85. This shows the importance of using an input selection method in ELM-based networks.

Variables | Selected components |
---|---|

P^{Gushchi}_{t−1} | d2 |

P^{sarab}_{t−1} | d2 |

P^{Esbaghran}_{t−1} | d1 |

T^{Mirkuh}_{t−1} | a2, d2, d1 |

Q^{vanyar}_{t−1} | a2, d2, d1 |

Variables | Selected components |
---|---|

P^{Gushchi}_{t−1} | d2 |

P^{sarab}_{t−1} | d2 |

P^{Esbaghran}_{t−1} | d1 |

T^{Mirkuh}_{t−1} | a2, d2, d1 |

Q^{vanyar}_{t−1} | a2, d2, d1 |

Model | Number of inputs | Train | Test | ||||
---|---|---|---|---|---|---|---|

R^{2} | NSE | RMSE | R^{2} | NSE | RMSE | ||

WANN-BP (no Inp. Sel.) | 15 | 0.82 | 0.82 | 0.049 | 0.73 | 0.71 | 0.052 |

WANN-BP (with Inp. Sel.) | 9 | 0.85 | 0.85 | 0.045 | 0.76 | 0.75 | 0.047 |

WANN-ELM (no Inp. Sel.) | 15 | 0.93 | 0.93 | 0.031 | 0.73 | 0.72 | 0.051 |

WANN-ELM (with Inp. Sel.) | 9 | 0.93 | 0.93 | 0.030 | 0.86 | 0.85 | 0.038 |

Model | Number of inputs | Train | Test | ||||
---|---|---|---|---|---|---|---|

R^{2} | NSE | RMSE | R^{2} | NSE | RMSE | ||

WANN-BP (no Inp. Sel.) | 15 | 0.82 | 0.82 | 0.049 | 0.73 | 0.71 | 0.052 |

WANN-BP (with Inp. Sel.) | 9 | 0.85 | 0.85 | 0.045 | 0.76 | 0.75 | 0.047 |

WANN-ELM (no Inp. Sel.) | 15 | 0.93 | 0.93 | 0.031 | 0.73 | 0.72 | 0.051 |

WANN-ELM (with Inp. Sel.) | 9 | 0.93 | 0.93 | 0.030 | 0.86 | 0.85 | 0.038 |

As can be seen from Figure 5, although there is some improvement when compared to the previous results in Figure 4, there is still a need for further improvement, especially for peak flow forecasts. Also, it shows that WANN-ELM using GA input selection has more accurate forecasts than the WANN-BP-GA. Clearly, WANN-ELM-GA is more successful in peak flows’ prediction as compared to WANN-BP-GA. Consequently, GA input selection provides more improvement for the ELM-based network than the one trained by the BP.

#### Using SSA

A similar approach is used to develop models by SSA. In this regard, signals are decomposed into three levels and suitable input variables are selected by the GA input selection model. The results are presented in Table 6. As before, according to Table 6, the superiority of the ELM over BP is revealed. In addition, by applying the GA input selection model, the results of ELM- and BP-based networks show an increase of 17 and 9% in NSE during the test period, respectively. A similar trend can be observed in R^{2} and RMSE indices. Therefore, the GA input selection has more positive impact on the ELM-based network than the one trained by BP. Also, by trial and error process, the optimum number of neurons for SANN-ELM and SANN-BP, using the input selection approach, is obtained as 65 and 6, respectively. Moreover, a comparison of results in Tables 5 and 6 reveals that the application of SSA decomposition method has been able to improve the performance of both ELM- and BP-based models as compared to the wavelet transform. In other words, SSA extracts more appropriate details for the Ajichai time series compared to the wavelet transform.

Model | Number of input var. | Train | Test | ||||
---|---|---|---|---|---|---|---|

R^{2} | NSE | RMSE | R^{2} | NSE | RMSE | ||

SANN-BP (no Inp. Sel.) | 15 | 0.88 | 0.88 | 0.039 | 0.77 | 0.78 | 0.044 |

SANN-BP (with Inp. Sel.) | 10 | 0.89 | 0.89 | 0.032 | 0.88 | 0.87 | 0.031 |

SANN-ELM (no Inp. Sel.) | 15 | 0.94 | 0.94 | 0.027 | 0.74 | 0.73 | 0.051 |

SANN-ELM (with Inp. Sel.) | 10 | 0.95 | 0.95 | 0.022 | 0.90 | 0.90 | 0.031 |

Model | Number of input var. | Train | Test | ||||
---|---|---|---|---|---|---|---|

R^{2} | NSE | RMSE | R^{2} | NSE | RMSE | ||

SANN-BP (no Inp. Sel.) | 15 | 0.88 | 0.88 | 0.039 | 0.77 | 0.78 | 0.044 |

SANN-BP (with Inp. Sel.) | 10 | 0.89 | 0.89 | 0.032 | 0.88 | 0.87 | 0.031 |

SANN-ELM (no Inp. Sel.) | 15 | 0.94 | 0.94 | 0.027 | 0.74 | 0.73 | 0.051 |

SANN-ELM (with Inp. Sel.) | 10 | 0.95 | 0.95 | 0.022 | 0.90 | 0.90 | 0.031 |

So far, the findings are in support of Figure 6. As can be seen from this figure, data preprocessing by using the SSA leads to more accurate predictions compared to the wavelet transform. Figure 6 shows that the SSA method has been able to forecast the main peak with much higher accuracy. Nevertheless, learning from these experiences, we propose a multistep model that is able to forecast the streamflow with higher accuracy, as described in the following section.

We can conclude from the aforementioned experiments that: (a) preprocessing methods, i.e., wavelet and SSA both have positive impact on model results, (b) ELM is a better and faster method for training neural networks than the commonly used BP method, and (c) GA-based input selection helps to improve model performance, especially those trained by ELM. There is also a great amount of research supporting the idea of implementing a hybrid approach by combining data-driven models. In addition, the literature shows that in many cases ANFIS performs better than simple ANN models. Therefore, putting these together we propose a multistep model in order to improve further the accuracy of streamflow forecasts. In other words, the proposed model benefits from preprocessing methods, input selection procedure, and using both ANN and ANFIS models in a hybrid configuration, as illustrated by Figure 1.

As was mentioned in the section ‘The proposed model’, the two-step GA input selection process (in the proposed model) was applied to select the final appropriate input variables (Table 7).

Variables | Selected wavelet components | Selected SSA components |
---|---|---|

P^{Gushchi}_{t−1} | a2 | – |

P^{sarab}_{t−1} | – | – |

P^{Esbaghran}_{t−1} | – | – |

T^{Mirkuh}_{t−1} | d2, a2 | trend (L2), noise (L2) |

Q^{vanyar}_{t−1} | d2, a2 | trend (L2), noise (L2) |

Variables | Selected wavelet components | Selected SSA components |
---|---|---|

P^{Gushchi}_{t−1} | a2 | – |

P^{sarab}_{t−1} | – | – |

P^{Esbaghran}_{t−1} | – | – |

T^{Mirkuh}_{t−1} | d2, a2 | trend (L2), noise (L2) |

Q^{vanyar}_{t−1} | d2, a2 | trend (L2), noise (L2) |

According to Table 7, the trend sub-signal in decomposition level 2 (i.e., L2) using SSA is analogous to the approximate sub-signal in level 2 (a2) using wavelet transform. Similarly, the noise sub-series in level 2 using SSA is analogous to details of the sub-signal in level 2 using wavelet.

Table 8 shows the results of the proposed model. As can be seen from this table and also Figure 7, the results obtained through the proposed method are near perfect during the test period. The model seems to be very robust and reliable. Achieving a coefficient of determination and NSE equal to 0.97 and RMSE equal to 0.021 as the average of ten independent runs shows how well the model is performing during the independent test period. In addition, a look at Figure 7 reveals that the proposed method has been able to catch almost all variations of the streamflow data in the forecast model with one month lead time. The value of these results is more appreciated by noting that the measured data in this part of the world are, to a large extent, inaccurate as is the case in many other countries as well. Therefore, the proposed model could be used as a framework for other parts of the country as well as other similar regions.

Model | Number of input var. | Train | Test | ||||
---|---|---|---|---|---|---|---|

R^{2} | NSE | RMSE | R^{2} | NSE | RMSE | ||

Proposed model | 9 | 0.97 | 0.97 | 0.020 | 0.97 | 0.97 | 0.021 |

Model | Number of input var. | Train | Test | ||||
---|---|---|---|---|---|---|---|

R^{2} | NSE | RMSE | R^{2} | NSE | RMSE | ||

Proposed model | 9 | 0.97 | 0.97 | 0.020 | 0.97 | 0.97 | 0.021 |

## CONCLUSIONS

Monthly streamflow forecasting helps reservoir operators make better decisions about releases and storages. Using data from Ajichai Basin above Vanyar discharge station, the impact of data preprocessing methods, input selection algorithm, and hybridization in data-driven models were evaluated. Of the two preprocessing methods, it was shown that SSA, in general, outperforms the more commonly used wavelet transform method. Also, in our application, a comparison of commonly used BP network training method with recently introduced ELM method indicated the superiority of ELM both in accuracy and in speed. It was next shown that the deficiency of ELM with regards to a large number of variables could be easily overcome by applying a GA-based input selection model. Finally, a multistep data-driven model was proposed that uses both wavelet and SSA, as well as BP and ELM training algorithm along with the GA input selection and the capabilities of ANFIS model in a hybrid framework to yield near perfect outputs with substantial improvements over the previous method. Results indicated that streamflow forecasting could improve the system performance with NSE values well above zero and near one, where water resources system operators would have sufficient time to make proper decisions based on reliable forecasts.

## REFERENCES

*.*

*.*

*,*