Runoff prediction, as a nonlinear and complex process, is essential for designing canals, water management and planning, flood control and predicting soil erosion. There are a number of techniques for runoff prediction based on the hydro-meteorological and geomorphological variables. In recent years, several soft computing techniques have been developed to predict runoff. There are some challenging issues in runoff modeling including the selection of appropriate inputs and determination of the optimum length of training and testing data sets. In this study, the gamma test (GT), forward selection and factor analysis were used to determine the best input combination. In addition, GT was applied to determine the optimum length of training and testing data sets. Results showed the input combination based on the GT method with five variables has better performance than other combinations. For modeling, among four techniques: artificial neural networks, local linear regression, an adaptive neural-based fuzzy inference system and support vector machine (SVM), results indicated the performance of the SVM model is better than other techniques for runoff prediction in the Amameh watershed.

## INTRODUCTION

Rainfall-runoff modeling has a significant role in operational flood management procedures such as design of hydraulic systems and flood prediction. On the other hand, most of the hydrological processes are nonlinear, time varying and spatially distributed. The rainfall-runoff process in a watershed is a nonlinear process that is affected by many factors. Therefore, runoff prediction as a nonlinear and complex process is essential for effective water resources management. So far, many studies have performed runoff modeling with different methods. In recent years, data mining techniques and mathematical methods such as artificial neural networks (ANNs) (Dawson & Wilby 2001; Nayak *et al.* 2005, 2007; Han *et al.* 2007a, 2007b; Aksoy & Dahamsheh 2009), adaptive neural-based fuzzy inference system (ANFIS) (Firat & Güngör 2009; Moghaddamnia *et al.* 2009a, 2009b; Petković *et al.* 2015) and support vector machine (SVM) (Li *et al.* 2013; Wang *et al.* 2014) have been widely used in hydrological modeling. Most of the researchers used effective factors, especially precipitation, on the rainfall-runoff process with a lag time for modeling (Tayfur & Guldal 2006; Remesan *et al.* 2009). But there are still many unsolved issues in hydrological modeling using data driven methods; for example, determination of the best input data for the model and determination of the optimum length of data in the training section.

There are meaningful combinations of *n* input. Identifying the best input combination can greatly reduce the trial and error steps in the modeling process. For this purpose, various techniques have been proposed and used by researchers; for instance, principal component analysis (Zhang *et al.* 2006; Zhang 2007; Noori *et al.* 2010b; Niu 2013), forward selection (FS) (Chen *et al.* 1989; Wang *et al.* 2006; Noori *et al.* 2010a; Dehghani *et al.* 2014), procrustes analysis (Dinpashoh *et al.* 2004) and gamma test (GT) (Moghaddamnia *et al.* 2008, 2009a, 2009b; Ahmadi *et al.* 2009; Wan Jaafar *et al.* 2011; Kakaei Lafdani *et al.* 2013; Chang *et al.* 2014).

As mentioned above, there are a number of unsolved issues in rainfall-runoff modeling. The main purpose of this study is to investigate and find efficient solutions to solve the two mentioned issues (i.e. to determine the best input combination and to determine the length of data in the training section). Therefore, the following steps were taken in this study. First, the GT, FS and factor analysis (FA) were used to determine the best input combination for the runoff model. Then, the GT was applied in order to determine the appropriate amount of data that was required in the training step.

Thus, four data mining methods: ANNs, ANFIS, SVM and local linear regression (LLR), were selected for estimating the runoff in the Amameh watershed and finally the results of these methods were compared. These methods have been widely applied in rainfall-runoff modeling, more than other data driven techniques in recent years.

## METHODOLOGY

### Study area and data set

^{2}(Figure 1). This watershed is mainly covered by mountainous rangelands, comprising about 80% of the area. The mean annual precipitation of the Amameh watershed is about 840 mm and the mean annual temperature is about 8.6 °C. There are two hydrometric stations in the Amameh watershed, which are located at the outlet (Kamarkhani) and the middle (Amameh) of the watershed on the main stream. Also, there is one climatic station (Amameh) in the middle of the watershed. The wettest and driest months of the watershed are April and September, respectively.

In this study, 9 years (2001–2009) of daily rainfall (from Amameh station) and runoff (from Kamarkhani station) data have been used in order to develop the rainfall-runoff model. The basic statistics of the rainfall (P(t)) and runoff (Q(t)) data set are listed in Table 1. There were no missing data in the data set. Also, the quality of data was checked before the analysis. As a result, we did not find any outlier data.

Parameters | Location | Unit | X _{mean} | S _{x} | Cv | X _{max} | X _{min} |
---|---|---|---|---|---|---|---|

Rainfall (P) | Amameh S. | mm | 1.66 | 5.90 | 3.55 | 90.0 | 0.00 |

Runoff (Q) | Kamarkhani S. | m^{3}/s | 0.63 | 0.86 | 1.36 | 10.8 | 0.01 |

Parameters | Location | Unit | X _{mean} | S _{x} | Cv | X _{max} | X _{min} |
---|---|---|---|---|---|---|---|

Rainfall (P) | Amameh S. | mm | 1.66 | 5.90 | 3.55 | 90.0 | 0.00 |

Runoff (Q) | Kamarkhani S. | m^{3}/s | 0.63 | 0.86 | 1.36 | 10.8 | 0.01 |

Nine variables, as input variables, namely lag-1 daily streamflow (Q(t–1)), lag-2 daily streamflow (Q(t–2)), lag-3 daily streamflow (Q(t–3)), and lag-4 daily streamflow (Q(t–4)) as well as daily rainfall (P(t)), lag-1 daily rainfall (P(t–1)), lag-2 daily rainfall (P(t–2)), lag-3 daily rainfall (P(t–3)) and lag-4 daily rainfall (P(t–4)) were produced from the data.

All the data were normalized prior to the analysis, by mapping the mean to zero and the standard deviation to 0.5. The training and validation of the data sets were selected by randomizing the input data (Moghaddamnia *et al.* 2009b).

### Gamma test

The GT was first reported by Agalbjörn *et al.* (1997), Stefansson *et al.* (1997) and Konćar (1997) and later enhanced and discussed in detail by Durrant (2001), Evans & Jones (2002) and Evans (2002).

*k*th nearest neighbours for each vector . Specifically, the GT is derived from the Delta function of the input vectors: where |…| denotes Euclidean distance, and the corresponding Gamma function of the output values is: where is the corresponding

*y*-value for the

**th nearest neighbor of in Equation (3). In order to compute**

*k***Γ**, a least squares fitted regression line is constructed from the

**points The intercept on the vertical axis is the**

*p***Γ**value, as can be shown as The graphical output of this regression line (Equation (3)) can provide very useful information for hydrological modelers. First, it is remarkable that the vertical intercept

**Γ**of

**axis offers an estimate of the best MSE achievable utilizing a modeling technique for unknown smooth functions of continuous variables (Evans & Jones 2002). Second, the gradient**

*y**A*offers an indication of the model's complexity (a steeper gradient indicates a model of greater complexity).

We can also determine the reliability of the Gamma statistics by running a series of the GT for increasing M, to establish the size of data set required to produce a stable asymptote. This is known as the M-test. The M-test helps us to decide how much data are required to build a model with a mean squared error that approximates the estimated noise variance. In practice, the GT can be achieved through winGamma™ software implementation (Tsui *et al.* 2002). A formal proof for the GT can be found in Durrant (2001), and Evans (2002).

### Forward selection

FS is a data driven model building approach. FS has been widely used for different subjects by many researchers in order to determine the best input combinations and build prediction models (Chen *et al.* 2004; Eksioglu *et al.* 2005; Wang *et al.* 2006; Khan *et al.* 2007; Noori *et al.* 2010a, 2010b). FS is based on a linear regression model.

FS starts with an empty subset. In the first step, variables are ordered according to their correlation with the dependent variable, from the most to the least correlated variable. Then, the first variable is selected by the explanatory variable, which is best correlated with the dependent variable.

After that, at each step, each variable that is not already in the model is tested for inclusion in the model. The most significant of these variables is added to the model, as the second input according to their correlation with the output and the variable that most significantly increases the correlation coefficient (R^{2}) is selected as the second input. Finally, among N obtained subsets, the subset with optimum R^{2} is selected as the model input subset. The optimum R^{2} is integral to a set of variables after which adding a new variable does not significantly increase the R^{2} value (Noori *et al.* 2010a). In this study, the SPSS software package was used for selecting the best input combination with FS.

### Factor analysis

FA is a statistical method, and in this study was used to find the best combination of inputs. This method has frequently applied in different studies (Dinpashoh *et al.* 2004; Malekinezhad *et al.* 2011; Um *et al.* 2011). For more details about FA please refer to Harman (1976), Basilevsky (1994) and Rencher (1995). In this study, the method of principal components and varimax rotation, as one of the most acceptable types of rotation, was used to extract the factors loading matrix (White *et al.* 1991; Um *et al.* 2011).

### Artificial neural networks

The first ANN returns to the 1940s, when McCulloch and Pitts introduced it as a mathematical model to create a nonlinear relationship between the input and output of a complex system using historical data (McCulloch & Pitts 1943). After that, Rosenblatt (1962) developed the idea of the perceptron. The important phase of a neural network application is the training phase. There are many different learning algorithms for training. Between these algorithms, Levenberg-Marquart (LM), conjugate gradient and quasi-Newton are faster than other algorithms (Lahmiri 2011).

One of the training algorithms based on the quasi-Newton method, which was introduced in 1987 by Fletcher, is BFGS (Fletcher 1987). The BFGS algorithm is performed iteratively using successively improved approximations to the inverse Hessian, instead of the true inverse. The improved approximations are obtained from information generated during the gradient descent process (Jones 2004). ANNs have been widely used for simulating many hydrological processes such as rainfall-runoff simulations (Han *et al.* 2007a). There are a large number of articles and books available on ANN models (Jones 2004; Nayak *et al.* 2005, 2007; Han *et al.* 2007a, 2007b), so no further details are described here. In this study, we used the BFGS algorithm for runoff prediction. WinGamma™ software version 1.97 was used for this purpose.

### Local linear regression

The LLR is a nonparametric regression method. This technique has been successfully used in many low-dimensional forecasting and smoothing problems. The LLR performs linear regression through the *p*_{max} nearest points to a query point to produce a linear model in the locality of that query point (Durrant 2001). Deciding the size of , (the number of near neighbours to be included in the LLR modeling) is the tricky part in LLR modeling (Remesan *et al.* 2009). For more information and detail about LLR please refer to Durrant (2001) and Remesan *et al.* (2009).

### Adaptive neuro-fuzzy inference system

ANFIS was first introduced by Jang (1993). ANFIS is a network structure consisting of a number of nodes connected through a directional link. Each node is characterized by a node function with fixed or adjustable parameters. The learning or training phase of a neural network is a process to determine parameter values to sufficiently fit the training data. The basic learning rule is the well-known back propagation method, which seeks to minimize some measure of error, usually some of the squared differences between network outputs and the desired outputs. It can be used as a basis for constructing a set of fuzzy ‘If–Then’ rules with appropriate membership functions in order to generate the preliminary stipulated input–output pairs. The outline of a typical ANFIS is as follows:

Layer 1: Every node in this layer is an adaptive node with a node function that may be a generalized bell-shaped membership function or a Gaussian membership function.

Layer 2: Every node in this layer is a fixed node labeled Π, representing the firing strength of each rule, and is calculated by the fuzzy AND connective of the ‘product’ of the incoming signals.

Layer 3: Every node in this layer is a fixed node labeled N, representing the normalized firing strength of each rule. The

*i*^{th}node calculates the ratio of the*i*^{th}rule's firing strength to the sum of two rules' firing strengths.Layer 4: Every node in this layer is an adaptive node with a node function indicating the contribution of

*i*^{th}rule toward the overall output.Layer 5: The single node in this layer is a fixed node labelled R, indicating the overall output as the summation of all incoming signals.

For details about ANFIS and the learning algorithm please refer to Moghaddamnia *et al.* (2009a) and Remesan *et al.* (2009).

### Support vector machines

*b*are parameters of the model,

*N*is the number of training data, are vectors for the training process and

*X*is the independent vector. The role of the Kernel function simplifies the learning process by changing the representation of the data in the input space to a linear representation in a higher dimensional space called the output space (Remesan & Mathew 2014). The parameters of models are derived with maximization of the objectives of functions.

SVM models use some of the specific Kernel functions (often standard Kernel) to convert input vector. The standard Kernel functions applied in SVM are linear, polynomial, radial and sigmoidal (Remesan & Mathew 2014).

*w*and

*b*are the parameters of the model. The goal of this linear regression model is to find the linear function that is the best interpolation for the training point. According to the technique,

*w*and

*b*are determined by minimizing the sum of squares obtained data. For

*w*, it is required to minimize the Euclidean norm i.e. ||

*w*||

^{2}. It can be written as an optimization problem, as below: This dual formulation can be solved using the Lagrange multiplier. The obtained Lagrangian equation is as below: where are the parameters of the equation. The partial derivative of the Lagrangian equation compared with

*w*,

*b*, and is as follows: where correspond with and . By replacing the above equation into the Lagrangian equation we have: Equation (12) can be rewritten as follows: and therefore, This developed equation of support vectors is for a linear model which is used for non-linear relationships. It is not proper for many hydrological analyses of linear regression for modeling and therefore, it is proper to convert the Kernel to put data in a space with more dimensions and then using the linear regression. Kernel function . In this study, the Radial Basis Kernel Function (RBF) was used. For more detail please refer to Vapnik (1995).

### Statistical criteria for performance evaluation of models

^{2}) and Nash Sutcliffe (NS). These statistical terms can be defined as follows: where and denotes the observed and predicted runoff by model, respectively, and are the average of the observed and predicted runoff, respectively, and

*N*is the number of data points.

In this study, the GT, FA and FS were used to determine the best input combination of the runoff model. Also, the GT was used for determining the amount of data that were required in the training step. In addition, the ANNs, LLR, ANFIS and SVM methods were used for estimating the runoff of the Amameh watershed (in Kamarkhani station).

## RESULTS AND DISCUSSION

First, in this section, we describe the results obtained from the FS, GT and FA to identify the best input combination and length of data for training. Afterwards, the results of modeling using ANNs, LLR, ANFIS and SVM are compared in order to determine the best model for runoff modeling in the Amameh watershed.

### Results of model input selection

#### Forward selection

^{2}= 0.809) is selected as the first and the most important input. Second, the remaining candidates are evaluated and entered into the model one by one based on their correlation coefficient rank. For evaluation of modeling goodness, correlation coefficient (R

^{2}) and Standard Error (SE) were used. This step is repeated several times until the new input variable added to the model does not significantly improve the model performance. Finally, the input variables with the most significant effect on the output are selected and other variables are removed. The result of the FS method is shown in Table 2. From Tables 2 and 7, candidates were selected as input variables according to their importance: Q(t–1), P(t), Q(t–4), Q(t–3), P(t–3), P(t–2) and P(t–4). Also, according to FS, the function between the input and output data is as below:

Input subset | R^{2} | SE |
---|---|---|

Q (t–1) | 0.809 | 0.03510 |

Q (t–1), P(t) | 0.833 | 0.03284 |

Q (t–1), P(t), Q(t–4) | 0.847 | 0.03141 |

Q (t–1), P(t), Q(t–4), Q(t–3) | 0.848 | 0.03132 |

Q (t–1), P(t), Q(t–4), Q(t–3), P(t–3) | 0.849 | 0.03210 |

Q (t–1), P(t), Q(t–4), Q(t–3), P(t–3), P(t–2) | 0.850 | 0.03119 |

Q (t–1), P(t), Q(t–4), Q(t–3), P(t–3), P(t–2), P(t–4) | 0.850 | 0.03117 |

Input subset | R^{2} | SE |
---|---|---|

Q (t–1) | 0.809 | 0.03510 |

Q (t–1), P(t) | 0.833 | 0.03284 |

Q (t–1), P(t), Q(t–4) | 0.847 | 0.03141 |

Q (t–1), P(t), Q(t–4), Q(t–3) | 0.848 | 0.03132 |

Q (t–1), P(t), Q(t–4), Q(t–3), P(t–3) | 0.849 | 0.03210 |

Q (t–1), P(t), Q(t–4), Q(t–3), P(t–3), P(t–2) | 0.850 | 0.03119 |

Q (t–1), P(t), Q(t–4), Q(t–3), P(t–3), P(t–2), P(t–4) | 0.850 | 0.03117 |

#### Factor analysis

FA, as another method for determination of the best input combination, was also applied in this study. The first six factors, accounting for 96.1% of the total variance, were selected and subjected to Varimax Normalized Rotation in the FA approach. Table 3 shows the value of factor loading for input variables. The larger value shown in bold in the table of the correlation coefficient in each factor was selected as an important variable. Therefore, Q(t–3), P(t), P(t–1), P(t–2), P(t–3) and P(t–4) were determined as important variables for modeling.

Variable | Factor 1 | Factor 2 | Factor 3 | Factor 4 | Factor 5 | Factor 6 |
---|---|---|---|---|---|---|

Q4 | 0.935 | −0.033 | 0.008 | −0.011 | 0.155 | 0.040 |

Q3 | 0.949 | 0.008 | −0.007 | 0.144 | 0.104 | 0.041 |

Q2 | 0.940 | 0.173 | 0.038 | 0.108 | 0.063 | 0.025 |

Q1 | 0.911 | 0.141 | 0.201 | 0.082 | 0.012 | 0.059 |

P4 | 0.168 | 0.049 | 0.036 | 0.088 | 0.978 | 0.020 |

P3 | 0.155 | 0.088 | 0.050 | 0.977 | 0.088 | 0.038 |

P2 | 0.129 | 0.978 | 0.091 | 0.087 | 0.049 | 0.054 |

P1 | 0.102 | 0.089 | 0.982 | 0.048 | 0.036 | 0.097 |

P | 0.075 | 0.052 | 0.095 | 0.037 | 0.019 | 0.990 |

Variable | Factor 1 | Factor 2 | Factor 3 | Factor 4 | Factor 5 | Factor 6 |
---|---|---|---|---|---|---|

Q4 | 0.935 | −0.033 | 0.008 | −0.011 | 0.155 | 0.040 |

Q3 | 0.949 | 0.008 | −0.007 | 0.144 | 0.104 | 0.041 |

Q2 | 0.940 | 0.173 | 0.038 | 0.108 | 0.063 | 0.025 |

Q1 | 0.911 | 0.141 | 0.201 | 0.082 | 0.012 | 0.059 |

P4 | 0.168 | 0.049 | 0.036 | 0.088 | 0.978 | 0.020 |

P3 | 0.155 | 0.088 | 0.050 | 0.977 | 0.088 | 0.038 |

P2 | 0.129 | 0.978 | 0.091 | 0.087 | 0.049 | 0.054 |

P1 | 0.102 | 0.089 | 0.982 | 0.048 | 0.036 | 0.097 |

P | 0.075 | 0.052 | 0.095 | 0.037 | 0.019 | 0.990 |

#### Gamma test

For determining the effective variable in the modeling, first the Gamma value was calculated from a combination of all variables (nine input candidates). In the next step, one of the variables was omitted and the Gamma value was calculated for the combination of the remaining variables (eight variables). Then, the omitted variable in the previous stage was returned and another variable was omitted from the original combination and the Gamma value was then calculated for the new combination which again contained eight candidates. This process was repeated for each variable one by one and at each step the Gamma value was computed for an eight variables set. Finally, the variables which are removed increase the Gamma value compared with the original combination with nine variables. The results of GT are shown in Table 4. According to Table 4, P(t) is the most important variable because of having the biggest Gamma value after its omission from the combination. Other important variables are Q(t–1), P(t–1), Q(t–2) and P(t–3), respectively. As a result, these variables were selected as important variables.

Input variables | Mask | Gamma (Γ) | Gradient (A) | SE | V_{ratio} |
---|---|---|---|---|---|

All inputs | 111111111 | 0.0007951 | 0.0249267 | 0.0000685 | 0.123154 |

All inputs – Q(t–4) | 011111111 | 0.0007573 | 0.0358710 | 0.0000842 | 0.117299 |

All inputs – Q(t–3) | 101111111 | 0.0007503 | 0.0394821 | 0.0000759 | 0.116216 |

All inputs – Q(t–2) | 110111111 | 0.0009127 | 0.0129956 | 0.0000917 | 0.141363 |

All inputs – Q(t–1) | 111011111 | 0.0010774 | 0.0230902 | 0.0000796 | 0.166873 |

All inputs – P(t–4) | 111101111 | 0.0007300 | 0.0384730 | 0.0000504 | 0.113061 |

All inputs – P(t–3) | 111110111 | 0.0008647 | 0.0175060 | 0.0001163 | 0.133924 |

All inputs – P(t–2) | 111111011 | 0.0007224 | 0.0378475 | 0.0000771 | 0.111893 |

All inputs – P(t–1) | 111111101 | 0.0009514 | 0.0038840 | 0.0000943 | 0.147354 |

All inputs – P(t) | 111111110 | 0.0012084 | −0.0118756 | 0.0000582 | 0.187159 |

Input variables | Mask | Gamma (Γ) | Gradient (A) | SE | V_{ratio} |
---|---|---|---|---|---|

All inputs | 111111111 | 0.0007951 | 0.0249267 | 0.0000685 | 0.123154 |

All inputs – Q(t–4) | 011111111 | 0.0007573 | 0.0358710 | 0.0000842 | 0.117299 |

All inputs – Q(t–3) | 101111111 | 0.0007503 | 0.0394821 | 0.0000759 | 0.116216 |

All inputs – Q(t–2) | 110111111 | 0.0009127 | 0.0129956 | 0.0000917 | 0.141363 |

All inputs – Q(t–1) | 111011111 | 0.0010774 | 0.0230902 | 0.0000796 | 0.166873 |

All inputs – P(t–4) | 111101111 | 0.0007300 | 0.0384730 | 0.0000504 | 0.113061 |

All inputs – P(t–3) | 111110111 | 0.0008647 | 0.0175060 | 0.0001163 | 0.133924 |

All inputs – P(t–2) | 111111011 | 0.0007224 | 0.0378475 | 0.0000771 | 0.111893 |

All inputs – P(t–1) | 111111101 | 0.0009514 | 0.0038840 | 0.0000943 | 0.147354 |

All inputs – P(t) | 111111110 | 0.0012084 | −0.0118756 | 0.0000582 | 0.187159 |

The comparison among three combinations selected based on the GT, FS and FA methods indicate two differences among them. First, the number of selected variables and second, the kind of selected variables. For identifying the best input data combination, LLR and ANNs models were used as test models. The results of training and testing of LLR and ANNs models with four different input combinations are given in Table 5. For comparison of modeling results, R^{2} and RMSE were used. According to Table 5, although the accuracy of the LLR model with nine input variables is better than other LLR models in the training section, the LLR-GT model has better accuracy in the testing section. In addition, the accuracy of the ANNs-GT model is better than other models in the two sections. Finally, among these eight models, the ANNs-GT model was selected as the best model because it was formed from the lowest number of inputs. Therefore, the combination which was determined by the GT method was selected as the best input data combination for runoff modeling.

Model | Number of input variables | Training | Testing | ||
---|---|---|---|---|---|

R^{2} | RMSE | R^{2} | RMSE | ||

LLR | 9 | 0.97 | 0.015 | 0.06 | 0.280 |

LLR-GT | 5 | 0.97 | 0.021 | 0.89 | 0.033 |

LLR-FS | 7 | 0.97 | 0.018 | 0.31 | 0.099 |

LLR-FA | 6 | 0.96 | 0.020 | 0.49 | 0.100 |

ANNs | 9 | 0.90 | 0.030 | 0.85 | 0.036 |

ANNs-GT | 5 | 0.94 | 0.029 | 0.92 | 0.028 |

ANNs-FS | 7 | 0.88 | 0.032 | 0.86 | 0.035 |

ANNs-FA | 6 | 0.91 | 0.040 | 0.85 | 0.030 |

Model | Number of input variables | Training | Testing | ||
---|---|---|---|---|---|

R^{2} | RMSE | R^{2} | RMSE | ||

LLR | 9 | 0.97 | 0.015 | 0.06 | 0.280 |

LLR-GT | 5 | 0.97 | 0.021 | 0.89 | 0.033 |

LLR-FS | 7 | 0.97 | 0.018 | 0.31 | 0.099 |

LLR-FA | 6 | 0.96 | 0.020 | 0.49 | 0.100 |

ANNs | 9 | 0.90 | 0.030 | 0.85 | 0.036 |

ANNs-GT | 5 | 0.94 | 0.029 | 0.92 | 0.028 |

ANNs-FS | 7 | 0.88 | 0.032 | 0.86 | 0.035 |

ANNs-FA | 6 | 0.91 | 0.040 | 0.85 | 0.030 |

### Results of the training and testing data sets length determination

^{2}values during testing steps using the LLR model. In the training section, the results indicate that after the seventh scenario the values of R

^{2}and NS are approximately constant. The best values of RMSE, NS, and R

^{2}in the testing section were obtained for scenario 17, shown in bold in Table 6, with values of 0.03, 0.76 and 0.89, respectively, where 2,400 data points were used in the training section. Therefore, we should select about 2,400 data for the training section. Finally, since the lowest amount of SE and Gamma occurred at point 2,383; we used 2,383 data points for training and the remaining data out of 3,284 data points were used for testing the model.

Scenarios | Training data length | Testing period | ||
---|---|---|---|---|

RMSE | NS | R^{2} | ||

Case 1 | 500 | 0.04 | 0.65 | 0.84 |

Case 2 | 750 | 0.05 | 0.59 | 0.81 |

Case 3 | 1,000 | 0.05 | 0.58 | 0.80 |

Case 4 | 1,100 | 0.04 | 0.75 | 0.88 |

Case 5 | 1,200 | 0.05 | 0.66 | 0.84 |

Case 6 | 1,300 | 0.05 | 0.65 | 0.83 |

Case 7 | 1,400 | 0.05 | 0.62 | 0.82 |

Case 8 | 1,500 | 0.05 | 0.65 | 0.83 |

Case 9 | 1,600 | 0.04 | 0.71 | 0.86 |

Case 10 | 1,700 | 0.04 | 0.70 | 0.85 |

Case 11 | 1,800 | 0.05 | 0.62 | 0.81 |

Case 12 | 1,900 | 0.05 | 0.61 | 0.81 |

Case 13 | 2,000 | 0.04 | 0.69 | 0.85 |

Case 14 | 2,100 | 0.04 | 0.70 | 0.86 |

Case 15 | 2,200 | 0.04 | 0.69 | 0.85 |

Case 16 | 2,300 | 0.04 | 0.69 | 0.85 |

Case 17 | 2,400 | 0.03 | 0.76 | 0.89 |

Case 18 | 2,500 | 0.03 | 0.74 | 0.88 |

Case 19 | 2,600 | 0.04 | 0.69 | 0.85 |

Case 20 | 2,700 | 0.03 | 0.70 | 0.86 |

Case 21 | 2,800 | 0.03 | 0.75 | 0.87 |

Scenarios | Training data length | Testing period | ||
---|---|---|---|---|

RMSE | NS | R^{2} | ||

Case 1 | 500 | 0.04 | 0.65 | 0.84 |

Case 2 | 750 | 0.05 | 0.59 | 0.81 |

Case 3 | 1,000 | 0.05 | 0.58 | 0.80 |

Case 4 | 1,100 | 0.04 | 0.75 | 0.88 |

Case 5 | 1,200 | 0.05 | 0.66 | 0.84 |

Case 6 | 1,300 | 0.05 | 0.65 | 0.83 |

Case 7 | 1,400 | 0.05 | 0.62 | 0.82 |

Case 8 | 1,500 | 0.05 | 0.65 | 0.83 |

Case 9 | 1,600 | 0.04 | 0.71 | 0.86 |

Case 10 | 1,700 | 0.04 | 0.70 | 0.85 |

Case 11 | 1,800 | 0.05 | 0.62 | 0.81 |

Case 12 | 1,900 | 0.05 | 0.61 | 0.81 |

Case 13 | 2,000 | 0.04 | 0.69 | 0.85 |

Case 14 | 2,100 | 0.04 | 0.70 | 0.86 |

Case 15 | 2,200 | 0.04 | 0.69 | 0.85 |

Case 16 | 2,300 | 0.04 | 0.69 | 0.85 |

Case 17 | 2,400 | 0.03 | 0.76 | 0.89 |

Case 18 | 2,500 | 0.03 | 0.74 | 0.88 |

Case 19 | 2,600 | 0.04 | 0.69 | 0.85 |

Case 20 | 2,700 | 0.03 | 0.70 | 0.86 |

Case 21 | 2,800 | 0.03 | 0.75 | 0.87 |

### Results of the ANNs, LLR, ANFIS and SVM techniques

The performance of ANNs, LLR, ANFIS and SVM models were evaluated by error criteria, namely RMSE, NS and R^{2}. The results of the training and testing for the models are given in Table 7.

Models | Training | Testing | ||||
---|---|---|---|---|---|---|

RMSE | NS | R^{2} | RMSE | NS | R^{2} | |

ANNs | 0.03 | 0.88 | 0.94 | 0.03 | 0.85 | 0.92 |

LLR | 0.02 | 0.94 | 0.97 | 0.03 | 0.79 | 0.89 |

ANFIS | 0.02 | 0.94 | 0.97 | 0.04 | 0.73 | 0.88 |

SVM | 0.02 | 0.93 | 0.98 | 0.02 | 0.92 | 0.97 |

Models | Training | Testing | ||||
---|---|---|---|---|---|---|

RMSE | NS | R^{2} | RMSE | NS | R^{2} | |

ANNs | 0.03 | 0.88 | 0.94 | 0.03 | 0.85 | 0.92 |

LLR | 0.02 | 0.94 | 0.97 | 0.03 | 0.79 | 0.89 |

ANFIS | 0.02 | 0.94 | 0.97 | 0.04 | 0.73 | 0.88 |

SVM | 0.02 | 0.93 | 0.98 | 0.02 | 0.92 | 0.97 |

^{2}value of 0.92 and NS value of 0.85 is better than the LLR model in the testing section with R

^{2}= 0.89 and NS = 0.79.

^{2}and NS values equal to 0.02, 0.97 and 0.92, respectively. Figure 7 shows the curves of observed and predicted runoff by different models in the testing section, and Figure 8 shows the performance of the SVM model at large scale. As can be seen in Figure 8, the SVM model can predict runoff better than other models, especially at high flows. On the other hand, the comparison of the scatter plots in the testing section shows that the dispersion of points near the bisector line in the SVM model is less than other models. Therefore, the SVM model has the best performance in estimating runoff in the Amameh watershed.

## CONCLUSIONS

In this study, the ANNs, LLR, ANFIS and SVM models were used for daily runoff prediction in the Amameh watershed. The daily rainfall-runoff data for the period of 2001–2009 were used for developing the models. To determine the best input combination, GT, FS and FA were applied for runoff modeling. The results showed the GT method had the best performance in determining the best input data combination for modeling compared with the other methods. Based on the GT method, the optimum size of training data was determined to be equal to 2,383 data, and the remaining data were used for testing the models. The results of the modeling showed the accuracy of the SVM model is better than other models in two sections. Therefore, the SVM model is introduced as the best model for runoff estimation in the Amameh watershed.

Determining the best input combination using GT, as the main method in this study, is a less time-consuming procedure than the trial and error method. Moreover, this technique is easy for selection of relevant variables in the construction of nonlinear models for runoff prediction. In addition, GT is quite general, and could be applied to other nonlinear hydrological systems modeling (such as evaporation) and other models because GT is not linked to any specific model.

Generally speaking, increasing the length of data and adding other variables such as temperature, soil humidity etc. caused the results to change. Unfortunately, in the Amameh watershed, other variables affecting the rainfall-runoff process were not measured in the period from 2001–2009. Moreover, daily rainfall and runoff data were not available after 2009.

In the modeling section, four common data driven methods were applied. In recent years, these methods have been widely combined with other methods, and new methods have been developed such as NNARX, which is a combination of ANNs and ARX.

In recent years, these methods have been used in hydrological modeling more than in the past because they are easy and accessible. But there is not a specified relation between input and output, and the results can be varied by changing the length of data and input parameters. As a result, these models are not applied like numerical methods, operationally. On the other hand, the unclear effect of physical factors in the hydrological processes in modeling is one of their other problems. Nevertheless, application of these methods has been expended and development of the new methods are the sign of this progress. Therefore, in order to complete the current study, it is suggested that the result of GT is compared with other input selection techniques, and that the results of the modeling are compared with other methods such as Neuro-wavelet. Finally, we hope this study will persuade more researchers to use and evaluate GT in different catchments.

## CONFLICT OF INTEREST

The authors declare no conflict of interest.