## Abstract

The estimation of the suspended sediment load in rivers is one of the main issues in hydraulic engineering. Different traditional methods such as the sediment rating curve (SRC) can be used to estimate the suspended sediment load of rivers. The main problem with this method is its low accuracy and uncertainty. In this study, the ability of three intelligence models namely: gene expression programming (GEP), artificial neural networks (ANN) and adaptive neuro fuzzy inference system (ANFIS) were compared with the SRC method. The daily flow discharge and sediment discharge at two hydrometric stations of the Kasilian and Telar rivers in the period of 1964–2014 were used to develop intelligence models. The performance of these methods indicated that all intelligence models give reliable results in the estimation of the suspended sediment load and their performance was better than the SRC method. Moreover, results showed that the GEP model with a high coefficient of determination (*R*^{2}) and a low mean absolute error (*MAE*) was better than both the ANN and ANFIS models for the estimation of daily suspended sediment load of the two sub-basins of the Kasilian and Telar rivers.

## INTRODUCTION

Sedimentation is one of the main problems for hydraulic structures, dam reservoirs and hydroelectric power plants due to its effects on the operations of these structures. The correct estimation of the sediment load carried by a river is a crucial issue in water engineering. Thus, many studies have been carried out by various researchers to estimate the amount of transported sediment by relating this parameter to hydraulic parameters (discharge, flow velocity, water depth), geometric parameters of a river (slope, cross section area) and sediment properties (mean diameter of sediment, density, kind of sediment). However, these equations have not been used widely because of substantial inaccuracies. Previous research implied that these approaches disrupt projects due to overestimating or underestimating the volume of sediment (Aytek & Kişi 2008). An example of these regression models is presented by Zhang *et al.* (2012). In this research, the suspended sediment load is related to the flow flux as a dependent variable. Zhang *et al.* (2012) used sediment rating curves (SRCs) to investigate the variations in relationships between water discharge (*Q*) and suspended sediment concentration (*Q*_{s}) of three major rivers of the Pearl River Delta. Results indicate that the parameters of SRC vary with time. Harrington & Harrington (2013) assessed the ability of the SRC method for estimation of the suspended sediment load of the Rivers Bandon and Owenabue in Ireland. They found that the rating curves could provide acceptable estimates of suspended sediment load in both the Rivers Bandon and Owenabue.

In recent decades, many research studies have been carried out on black box models aimed at solving nonlinear problems. For example, the adaptive neuro fuzzy inference system (ANFIS), artificial neural networks (ANN) (such as radial basis function (RBF), multi-layer perceptron (MLP), etc.), genetic programming (GP), support vector machine (SVM) and gene expression programming (GEP) have been introduced as an alternative application in water resource problems. The research implies that these approaches present reasonable results using artificial intelligence methodology and are becoming effective tools for solving nonlinear problems of hydraulic engineering and water resource management. These innovative methods have been used widely in diverse areas of water engineering (Azamathulla & Ahmad 2012; Bahramifara *et al.* 2013; Haddadchi *et al.* 2013; Lafdani *et al.* 2013; Kashi *et al.* 2014; Ghorbani *et al.* 2015). For instance, Kisi & Shiri (2012) applied the GEP, ANFIS, ANN and SVM models to estimate the daily suspended sediment load. A comparison of the findings illustrated that the GEP was superior to the ANFIS, ANN and SVM techniques. Shamaei & Kaedi (2016) used GP and neuro fuzzy systems to predict suspended sediment concentration. Haddadchi *et al.* (2013) compared the ANN model and suspended load formulae to estimate the suspended load transport rate of gravel bed rivers and sandy bed rivers. They concluded that the performance of the ANN model was significantly better than traditional suspended load formulae. In this study, the suspended sediment load at two hydrometric stations: Kasilian (on the Kasilian River) and Telar (on the Telar River), was estimated by intelligence models. These stations were chosen as the two sub-basins of Kasilian and Telar have different catchment areas (342.89 km^{2} and 1,768.6 km^{2}), and the discharge and sediment discharge in these two basins are different in terms of quantity. The first objective of this study was to develop and test the GEP, ANN and ANFIS models to estimate the suspended sediment load of these rivers under different conditions in terms of discharge, sediment discharge and physiographic characteristics of the catchment area. The secondary goal was to compare both the performance of these models with each other and with the SRC.

## DATA, METHODS AND MODELS

### Study area

The Telar basin with a catchment area of 2,900 km^{2} is located in the north of Iran. Kasilian and Telar are the two sub-basins of this basin (Figure 1). The Great or main Telar River originates from the mountainous areas of Savadkuh and the central and eastern Alborz mountains and passes through the cities of Savadkuh and Ghaemshahr in Mazandaran Province. The Telar River with a length of 152 km is one of the most important rivers in Mazanderan Province and collects the runoff of extensive areas to convey to the Caspian Sea (Figure 1). Mazandaran has a variety of climates, including the mild and humid climate of the Caspian shoreline, a moderate climate and the cold climate of the mountainous regions. The study area is located within the cold semi-arid climatic regions. The prevailing climate in the study area is known as a local steppe climate and, based on the Köppen-Geiger climate classification, is classified as BSK. The main Telar River is fed by two main tributaries, namely the Kasilian and the Telar. These two rivers meet each other in the Ravat Sar near the city of Shirgah and form the main Telar River.

The catchment area of the Kasilian River sub-basin is 342.89 km^{2}. The geographical location of this basin is between 35° 58′ N to 36° 18′ N and 52° 56′ E to 53° 42′ E. The altitude of this basin is between 240 m and 3,440 m, with an average altitude of 890 m above sea level. The river of this basin with a length of 50 km flows from the south to the northwest. The Kasilian basin is situated in mountainous and forested areas. Its bed slope is relatively high (see Table 1). The average annual precipitation in this basin during the period of 1964–2014 was 784 mm. The maximum annual precipitation in this basin was 1404.3 mm. In addition, the maximum 24-hour rainfall of this basin was 58 mm, which occurred on May 13, 1990.

Characteristics . | Unit . | Basin . | |
---|---|---|---|

Kasilian . | Telar . | ||

Basin Area | km^{2} | 342.4 | 1768.6 |

Basin Slope | m/m | 0.2896 | 0.3692 |

Average Overland Flow | m | 503.64 | 547.33 |

Basin Length | m | 47267.82 | 68448.26 |

Perimeter | m | 160617.96 | 340052.61 |

Shape Factor | km^{2}/ km^{2} | 10.51 | 4.26 |

Mean Basin Elevation | m | 993.27 | 1980.84 |

Max Flow Distance | m | 60114.69 | 104003.66 |

Max Flow Slope | m/m | 0.0488 | 0.0280 |

Max Stream Length | m | 58921.18 | 103020.48 |

Max Stream Slope | m/m | 0.0363 | 0.0259 |

Characteristics . | Unit . | Basin . | |
---|---|---|---|

Kasilian . | Telar . | ||

Basin Area | km^{2} | 342.4 | 1768.6 |

Basin Slope | m/m | 0.2896 | 0.3692 |

Average Overland Flow | m | 503.64 | 547.33 |

Basin Length | m | 47267.82 | 68448.26 |

Perimeter | m | 160617.96 | 340052.61 |

Shape Factor | km^{2}/ km^{2} | 10.51 | 4.26 |

Mean Basin Elevation | m | 993.27 | 1980.84 |

Max Flow Distance | m | 60114.69 | 104003.66 |

Max Flow Slope | m/m | 0.0488 | 0.0280 |

Max Stream Length | m | 58921.18 | 103020.48 |

Max Stream Slope | m/m | 0.0363 | 0.0259 |

The Telar sub-basin catchment occupies the north-west part of the Telar basin and is situated between 35° 43.0′ N to 36° 18.8′ N and 52° 36.7′ E to 53° 24.2′ E (Figure 1). It has a catchment area of 1768.6 km^{2}, which is nearly 5.2 times larger than the Kasilian sub-basin. Similar to the Kasilian sub-basin, this basin receives precipitation in the form of both rain and snow. The mean annual precipitation of this sub-basin is 577.1 mm. The highest rainfall of 752.5 mm was recorded during the year 2012 and the lowest of 343.4 mm during the year 2010. The area receives the highest rainfall in the month of November (69.7 mm) and the lowest in the month of July (32.1 mm).

The two basins have different soil types including loamy, loamy clay, clay and silty clay loamy. The geological formations of the largest areas of the basins are related to the Mesozoic geological period and formed from thick layers of limestone, sand, and tuff with the Paleozoic core and the effects of the Precambrian geological period.

Since the two sub-basins have different catchment areas (342.89 km^{2} and 1768.6 km^{2}), the discharge and sediment discharge in these two basins are different in terms of quantity, and so the data of these two rivers (Kasilian and Telar) were used to investigate the ability of the models to estimate the sediment discharge. This study uses the recorded daily flow discharge and sediment discharge at two hydrometric stations (Kasilian and Telar) in the period of 1964–2014 to develop GEP, ANN and ANFIS models. The daily flow discharge (*Q _{d}*) and the daily sediment discharge (

*Q*) of the two hydrometric stations (Kasilian and Telar) were collected from the regional water authority of Mazandaran. The minimum, mean, maximum and standard deviation of the collected data are presented in Table 2. The whole data set covers 50 years (1964–2014) and was divided into two parts: the training set of 35 years (1964–1999), and the testing set of 15 years (1999–2014).

_{s}Station . | Q (m_{d}^{3}/ s). | Q (ton/day)_{s}. | ||||||
---|---|---|---|---|---|---|---|---|

Min . | Mean . | Max . | Std. deviation . | Min . | Mean . | Max . | Std. deviation . | |

Kasilian | 0.05 | 4.82 | 816.50 | 34.14 | 0.21 | 262.17 | 51421.12 | 2772.73 |

Telar | 0.76 | 8.45 | 143.57 | 10.39 | 2.60 | 3422.32 | 632308.53 | 30313.71 |

Station . | Q (m_{d}^{3}/ s). | Q (ton/day)_{s}. | ||||||
---|---|---|---|---|---|---|---|---|

Min . | Mean . | Max . | Std. deviation . | Min . | Mean . | Max . | Std. deviation . | |

Kasilian | 0.05 | 4.82 | 816.50 | 34.14 | 0.21 | 262.17 | 51421.12 | 2772.73 |

Telar | 0.76 | 8.45 | 143.57 | 10.39 | 2.60 | 3422.32 | 632308.53 | 30313.71 |

### Intelligence models

We used three intelligence models namely the GEP, ANN and ANFIS to estimate the suspended load. The GEP was introduced and developed by Ferreira (2001). The model is an extension of two previous evolutionary algorithms (genetic algorithm (GA) and GP) (Ferreira 2001). In this model, the population of individuals is selected and used based on fitness. Genetic variations are introduced using one or more genetic operators by GEP (Ferreira 2001, 2006). These three algorithms have basic differences which refer to the nature of individuals. In other words, in the GA individuals are chromosomes which are coded as linear strings of fixed length while in the GP, individuals are nonlinear entities of different sizes and shapes that are expressed as parse trees. In the GEP, individuals are encoded as linear strings of fixed length (the chromosome or genome) which are expressed as nonlinear entities of different sizes and shapes (expression tree) (Ferreira 2001, 2006). The first step of the processing in the GEP is to create a random population of initial chromosomes, which is like other evolutionary algorithms. The evaluation of these initial chromosomes is carried out by using fitness functions such as the mean square error (MSE), relative square error (RSE), root relative square error (RRSE) and root mean square error (RMSE) which can be used in the GEP model. The RRSE fitness function was selected in the present study because it was used in previous research as an example (Kisi & Shiri 2012; Emamgolizadeh *et al.* 2015; Emamgholizadeh *et al.* 2017).

ANNs were proposed and improved by McCulloch & Pitts (1943), inspired by and imitating the human brain. Following this, some types of ANNs improved (e.g. multilayer perceptron (MLP), radial basis function neural network (RBF), self-organizing network, and fuzzy neural network). The most significant advantages of these networks are generalization, ability to learn, the need for the least information, shorter performance time and simpler performance (Chang & Chen 2003). The ANN model is composed of simple units called neurons connected to each other by unidirectional links which carry distinct information (Nagy *et al.* 2002). Neurons are expressed by mathematical language and filter the signals in the whole of the network (Emamgholizadeh 2012). The MLP is the most common neural network that has had successful results with nonlinear problems (Emamgholizadeh *et al.* 2014). Most neural networks use a back propagation algorithm (BP) in training which was proposed by Rumelhart *et al.* (1988). In the BP algorithm, the neural networks process the information in processor elements (e.g., neurons, units or nodes). The MLP/BP structure was used in the present study to train the ANN.

The ANFIS was introduced by Jang in 1993 (Jang 1993). This approach is a combination of the ANN and fuzzy logic. In this way, the learning capability of neural networks is integrated into a fuzzy inference system (FIS) (Emamgholizadeh *et al.* 2014). Three types of FIS are common, based on the types of inference operation if-then rules. These include Tsukamoto's system, Mamdani's system and Sugeno's system (Kişi 2007). The first order of the third system was applied to the present study.

### Model developments (selection of input vectors)

One of the most important issues in model development is to find possible input variables for the modeling. Different methods such as the trial and error method and correlation analysis can be used for this purpose. Using the first method involves spending a lot of time and the second method does not give the exact lag values. Therefore, in this study, statistical parameters such as the auto-correlation function (ACF), partial autocorrelation function (PACF) and cross-correlation function (CCF) were used to find out the significant lag values of input variables. The mathematical relation of these parameters and their details can be found in Salas *et al.* (1980) and Senthil Kumar *et al.* (2011). Possible scenarios for input combinations are presented in Table 3. For finding the best combination of input vectors, a collection of time lagged *Q _{d}* (

*Q*

_{d-1},

*Q*

_{d-2}, … ,

*Q*

_{d-n}) and

*Q*(

_{s}*Q*

_{s-1},

*Q*

_{s-2}, … ,

*Q*

_{s-n}) was considered.

Scenarios . | Input parameter(s) . | Output . |
---|---|---|

1 | Q _{d} | Q _{s} |

2 | Q_{s-1} | Q_{s} |

3 | Q, _{d}Q_{d-1} | Q_{s} |

4 | , Q_{d}Q_{s-1} | Q_{s} |

5 | Q_{s-1}, Q_{s-2} | Q_{s} |

6 | , Q_{d}Q_{s-1}, Q_{s-2} | Q_{s} |

7 | , Q_{d}Q_{d-1}, Q_{d-2} | Q_{s} |

8 | , Q_{d}Q_{d-1}, Q_{d-2,}Q_{s-1} | Q_{s} |

9 | , Q_{d}Q_{d-1}, Q_{d-2}, Q_{s-1}, Q_{s-2} | Q_{s} |

Scenarios . | Input parameter(s) . | Output . |
---|---|---|

1 | Q _{d} | Q _{s} |

2 | Q_{s-1} | Q_{s} |

3 | Q, _{d}Q_{d-1} | Q_{s} |

4 | , Q_{d}Q_{s-1} | Q_{s} |

5 | Q_{s-1}, Q_{s-2} | Q_{s} |

6 | , Q_{d}Q_{s-1}, Q_{s-2} | Q_{s} |

7 | , Q_{d}Q_{d-1}, Q_{d-2} | Q_{s} |

8 | , Q_{d}Q_{d-1}, Q_{d-2,}Q_{s-1} | Q_{s} |

9 | , Q_{d}Q_{d-1}, Q_{d-2}, Q_{s-1}, Q_{s-2} | Q_{s} |

### Modeling performance criteria

*R*

^{2}) and mean absolute error (

*MAE*), were used. where

*N*is the number of data,

*O*is the observed data,

_{i}*P*is the predicted data and the bar denotes the mean of variables.

_{i}## RESULTS AND DISCUSSION

### Input vector selection

This paper uses the daily discharge and daily sediment discharge at two hydrometric stations on the Kasilian and Telar rivers. The whole data set covers 50 years (1964–2014), and was divided into two parts: the training set of 35 years (1964–1999), and the testing set of 15 years (1999–2014). Figure 2 shows the scatter plot between the daily flow discharge (*Q _{d}*) and daily suspended sediment discharge (

*Q*) for these two stations.

_{s}*Q*) for the Kasilian and Telar rivers are presented in Figure 3(a), 3(b), 3(d) and 3(e). The CCF between daily suspended sediment discharge (

_{s}*Q*) and the daily flow discharge (

_{s}*Q*) is given in Figure 3(c) and 3(f). For the Kasilian river, the auto-correlation and partial auto-correlation coefficient of the suspended sediment discharge were less than 0.135 for all lag values barring an exception for lag 1. Also, for this river the cross-correlation coefficient of the suspended sediment discharge with the flow discharge for lag 0 was 0.693 and it was higher than all other lagged cross-correlation coefficient values. For the Telar river, the ACF and PACF of daily suspended sediment discharge (

_{d}*Q*) for lag 0 was 0.208 and for other lags they were less than confidence levels. The cross-correlation coefficients of the suspended sediment discharge with the flow discharge for lags 0, 1, 2 were 0.598, 0.270 and 0.182, respectively. Overall, for training the intelligence models, based on the calculated values of PACF and CCF of the data series, the following input vectors (Equations (3) and (4)) were selected for the Kasilian and Telar rivers, respectively:

_{s}#### GEP development

*RRSE*is the root relative square error. Equation (6) was used to calculate this parameter as follows: where ranges from 0 to 1,000 (1,000 corresponds to a chromosome with ideal fitness). In addition, in Equation (6),

*P*and

_{ij}*T*are the predicted value for the individual chromosome

_{j}*i*and the target value for fitness case

*j*. The bar sign also denotes average values (Ferreira 2006).

Then, a set of terminals (T) must be selected for generating genes. In the current study, the time lagged daily flow discharge (*Q _{d}*) and daily sediment discharge (

*Q*) were chosen as terminal sets. Moreover, geometric and trigonometric functions like +, - , ×, ÷, sin, tan

_{s}^{−1}, root square and log were used. Next, the number of genes and the length of the head of the gene were selected. The number of genes determines the number of sub-Ets. The best number for this is 1 to 3 to optimize the GEP model (Ferreira 2001). Moreover, the head length was selected by trial and error. The results indicated that the GEP performance did not improve significantly by increasing the head length to more than 8 for the Kasilian station and 7 for the Telar station. So, the head lengths were selected to be 8 and 7 for the Kasilian and Telar stations, respectively. The number of chromosomes selected was 30 to give the best results. The next step was to select the genetic operators and their rates. These operators are presented in Table 4.

Parameter . | Description of parameter . | Setting of parameter . |
---|---|---|

P1 | Function set | +, ̶, ×, ÷, , sin x, cos x, tan-1 x, ex, ln |

P2 | Mutation rate | 0.044 |

P3 | Inversion rate | 0.1 |

P4 | IS rate | 0.1 |

P5 | RIS rate | 0.1 |

P6 | Gene transposition rate | 0.1 |

P7 | One point recombination rate | 0.3 |

P8 | Two point recombination rate | 0.3 |

P9 | Gene recombination rate | 0.1 |

Parameter . | Description of parameter . | Setting of parameter . |
---|---|---|

P1 | Function set | +, ̶, ×, ÷, , sin x, cos x, tan-1 x, ex, ln |

P2 | Mutation rate | 0.044 |

P3 | Inversion rate | 0.1 |

P4 | IS rate | 0.1 |

P5 | RIS rate | 0.1 |

P6 | Gene transposition rate | 0.1 |

P7 | One point recombination rate | 0.3 |

P8 | Two point recombination rate | 0.3 |

P9 | Gene recombination rate | 0.1 |

Finally, it was essential to select the linking function. In the present study, the addition (+) was selected as it was also used by previous researchers such as Hashmi & Shamseldin (2014), Azamathulla & Ahmad (2012), Kisi & Shiri (2012) and Emamgolizadeh *et al.* (2015). The number of generations selected was 50,000 because the variation of results was not significant after 50,000 generations. In other words, the fitness function converged to a certain value and after that no changes were seen.

The column diagrams of the GEP performance on the data from both stations are presented in Figure 4 to better compare each input combination performance.

Based on the coefficient of determination (*R*^{2}), the sixth input combination (*Q _{d}*,

*Q*

_{s-1},

*Q*

_{s-2}) and the eighth input combination (

*Q*,

_{d}*Q*

_{d-1},

*Q*

_{d–2},

*Q*

_{s-1}) for the Kasilian station and the Telar station, respectively, demonstrated more accurate results, (see Figure 4). This finding is in agreement with the proposed combination of vectors based on PACF and CCF, and also with results of other studies such as Aytek & Kişi (2008) and Guven & Talu (2010). Another important finding was that the second input combination of both stations (

*Q*

_{s-1}) displayed the weakest results. This confirms the influence of the flow water discharge to estimate the suspended sediment load as a dependent parameter. The results of the optimized input combinations of GEP are presented in Table 5.

Station . | Input combination . | R^{2}
. | . | R^{2}
. | . | Fitness function . | |
---|---|---|---|---|---|---|---|

Training . | MAE (ton/day)
. | Testing . | MAE (ton/day)
. | Training . | Testing . | ||

Kasilian | Q, _{d}Q_{s-1}, Q_{s-2} | 0.992 | 322.76 | 0.942 | 876.3 | 925.25 | 783.2 |

Telar | Q, _{d}Q_{d-1}, Q_{d-2,}Q_{s-1} | 0.967 | 1666.6 | 0.752 | 1269.7 | 847.48 | 478.1 |

Station . | Input combination . | R^{2}
. | . | R^{2}
. | . | Fitness function . | |
---|---|---|---|---|---|---|---|

Training . | MAE (ton/day)
. | Testing . | MAE (ton/day)
. | Training . | Testing . | ||

Kasilian | Q, _{d}Q_{s-1}, Q_{s-2} | 0.992 | 322.76 | 0.942 | 876.3 | 925.25 | 783.2 |

Telar | Q, _{d}Q_{d-1}, Q_{d-2,}Q_{s-1} | 0.967 | 1666.6 | 0.752 | 1269.7 | 847.48 | 478.1 |

The expression trees (ETs) of the GEP model for the Kasilian and the Telar stations are presented in Figure 5. By using the corresponding values, the explicit formulations of the GEP for the suspended sediment load (*Q _{s}*) as a function of flow discharge (

*Q*) were obtained as shown in Equations (7) and (8):

_{d}- (a)
- (b)

#### Artificial neural network

The best results of ANN developments on the data of both stations were obtained from training and testing by one hidden layer. The column diagrams of coefficient of determination variations versus transfer functions and the results of optimized input combinations of ANN are presented in Figure 6 and Table 6, respectively. As seen in Figure 6, the best results of the ANN were obtained when the sigmoid transfer function was used. Moreover, Figure 6(a) indicates that the secant hyperbolic transfer function similar to the other transfer functions is not able to estimate the suspended sediment load of the Kasilian station. The results in Table 6 show that the sixth and ninth combinations of the data set were the best input combinations for the Kasilian and the Telar stations, respectively.

Station . | Input combination . | R^{2}
. | . | R^{2}
. | . |
---|---|---|---|---|---|

Training . | MAE (ton/day)
. | Testing . | MAE (ton/day)
. | ||

Kasilian | Q, _{d}Q_{s-1}, Q_{s-2} | 0.971 | 678.4 | 0.926 | 1385.12 |

Telar | Q, _{d}Q_{d-1}, Q_{d-2}, Q_{s-1}, Q_{s-2} | 0.828 | 2034.54 | 0.610 | 3023.45 |

Station . | Input combination . | R^{2}
. | . | R^{2}
. | . |
---|---|---|---|---|---|

Training . | MAE (ton/day)
. | Testing . | MAE (ton/day)
. | ||

Kasilian | Q, _{d}Q_{s-1}, Q_{s-2} | 0.971 | 678.4 | 0.926 | 1385.12 |

Telar | Q, _{d}Q_{d-1}, Q_{d-2}, Q_{s-1}, Q_{s-2} | 0.828 | 2034.54 | 0.610 | 3023.45 |

#### Adaptive neuro fuzzy inference system

The ANFIS model was developed using 100 epochs and the linear function of the output layer. The linear plots of coefficient of determination variations versus transfer functions and the results of the optimized ANFIS model are presented in Figure 7 and Table 7, respectively. The statistical results of the ANFIS model (*R*^{2} and *MAE*) for training and testing sets proved that Gbellmf and Trimf transfer functions showed the best results for the Kasilian and Telar stations. Moreover, the best input combination for both stations was the ninth set of data which consisted of *Q _{d}*,

*Q*

_{d-1},

*Q*

_{d-2},

*Q*

_{s-1}and

*Q*

_{s-2}. Overall, the ANFIS model accurately learned to map the non-linear relationship between the input data and sediment discharge.

Station . | Input combination . | R^{2}
. | . | R^{2}
. | . |
---|---|---|---|---|---|

Training . | MAE (ton/day)
. | Testing . | MAE (ton/day)
. | ||

Kasilian | Q, _{d}Q_{d-1}, Q_{d-2}, Q_{s-1}, Q_{s-2} | 0.912 | 1423.4 | 0.875 | 1875.62 |

Telar | Q, _{d}Q_{d-1}, Q_{d-2}, Q_{s-1}, Q_{s-2} | 0.783 | 2145.34 | 0.510 | 3245.50 |

Station . | Input combination . | R^{2}
. | . | R^{2}
. | . |
---|---|---|---|---|---|

Training . | MAE (ton/day)
. | Testing . | MAE (ton/day)
. | ||

Kasilian | Q, _{d}Q_{d-1}, Q_{d-2}, Q_{s-1}, Q_{s-2} | 0.912 | 1423.4 | 0.875 | 1875.62 |

Telar | Q, _{d}Q_{d-1}, Q_{d-2}, Q_{s-1}, Q_{s-2} | 0.783 | 2145.34 | 0.510 | 3245.50 |

#### Comparison of SRC, GEP, ANN and ANFIS models

*Q*, their results were compared with those of the SRC model. The SRC empirically describes the relationship between the suspended sediment (

_{s}*Q*) and water discharge (

_{s}*Q*) for a certain location (Jansson 1996; Syvitski

_{d}*et al.*2000; Horowitz 2003; Morehead

*et al.*2003; Harrington & Harrington 2013). The most commonly used SRC is a power function and it can be expressed with the following relationship (Zhang

*et al.*2012): where

*Q*is the suspended sediment discharge and

_{s}*Q*is flow discharge. Constants values of

_{d}*a*and

*b*were calculated from data via a linear regression between log

*Q*and log

_{s}*Q*. Equations (10) and (11) obtained for the Kasilian and the Telar stations are as follows, respectively: The statistical metrics for this method and also for all intelligence models are given in Table 8 for the Kasilian and Telar stations. As seen in this table, the SRC method with

_{d}*R*

^{2}of 0.52 and 0.39 and

*MAE*of 4805.58 and 6832.80 ton/day (respectively for the Kasilian and the Telar stations) is not an effective tool for estimating the suspended sediment load.

Stations . | ||||
---|---|---|---|---|

. | Kasilian . | Telar . | ||

Model . | R^{2}
. | MAE (ton/day)
. | R^{2}
. | MAE (ton/day)
. |

GEP | 0.942 | 876.30 | 0.75 | 1269.7 |

ANN | 0.926 | 1385.12 | 0.61 | 3023.45 |

ANFIS | 0.875 | 1875.62 | 0.51 | 3245.50 |

SRC | 0.520 | 4805.58 | 0.39 | 6732.80 |

Stations . | ||||
---|---|---|---|---|

. | Kasilian . | Telar . | ||

Model . | R^{2}
. | MAE (ton/day)
. | R^{2}
. | MAE (ton/day)
. |

GEP | 0.942 | 876.30 | 0.75 | 1269.7 |

ANN | 0.926 | 1385.12 | 0.61 | 3023.45 |

ANFIS | 0.875 | 1875.62 | 0.51 | 3245.50 |

SRC | 0.520 | 4805.58 | 0.39 | 6732.80 |

For the Kasilian station, the results in Table 8 showed that the GEP, ANN and ANFIS models presented the best results compared to the SRC approach. The other findings implied that the GEP estimation is much more accurate than both the ANN and ANFIS models. The results indicated that the *R*^{2} of the GEP performance increased by approximately 81.1%, 2.3% and 7.7% compared to the SRC equation, ANN and ANFIS models, respectively.

Similarly, for the Telar station, the findings in Table 8 illustrate that the GEP performance is much more accurate than the SRC equation, ANN and ANFIS models. According to these results, the *R*^{2} of the GEP performance increased by approximately 76.1%, 12.6% and 34.7% compared to the SRC, ANN and ANFIS models, respectively. The scatter plots of the GEP performance for both stations are presented in Figure 8. In addition, the measured daily data from 2014 to 2016 were used for the validation of the GEP model. The performance of the GEP model was compared with two different types of statistics parameters: the coefficient of determination and *MAE*. The *MAE* was 954.6 ton/day for the Kasilian River whereas it was 1658.6 ton/day for the Telar River in the validation phase. Similarly, the *R*^{2} was 0.935 and 0.676 for the Kasilian and Telar rivers, respectively. The implementation of the model at the validation stage showed that compared to the results of the model at the testing stage, the accuracy of the model for the two rivers decreased. However, for water resources planning and management, the efficiency of the GEP model to estimate the suspended sediment discharge (*Q _{s}*) was fairly acceptable. Comparing the results of the models for the Kasilian and Telar rivers with different sized catchment areas illustrates that the capability of all models for the Telar sub-basin was less than that for the Kasilian sub-basin. In other words, when the size of the catchment area increased, the discharge and sediment discharge of the river increased and the capability of all the models in the estimation of river sediment discharge decreased. However, the suspended sediment discharge of the rivers was extremely nonlinear and as a result models might not be able to catch this nonlinear functional relationship.

## SENSITIVITY ANALYSIS

Sensitivity tests were conducted to determine the relative significance of each input variable on the suspended sediment discharge (*Q _{s}*). The GEP model was chosen as the best model to estimate

*Q*, and the importance of the input data variable to this model was also investigated.

_{s}Table 9 shows the statistical indices of the GEP models without a specific input variable along with the best GEP model. As illustrated, for the Kasilian River the GEP model without *Q _{d}* has the highest

*MAE*and lowest

*R*

^{2}. In other words, the ability of the GEP model to estimate the suspended sediment discharge (

*Q*) was significantly degraded when the model was run without the

_{s}*Q*. This shows that the

_{d}*Q*has the most significant impact on the suspended sediment discharge (

_{d}*Q*). Overall, the effect of input variables on the suspended sediment discharge (

_{s}*Q*) for the Kasilian River can be ranked from higher to lower as

_{s}*Q*,

_{d}*Q*

_{s-1}and

*Q*

_{s-2}. Similar to the Kasilian River, sensitivity tests were carried out for the Telar River. As the results in Table 9 show, the ability of the GEP model without the

*Q*was significantly decreased (

_{d}*R*

^{2}= 0.145,

*MAE*= 15470.5 ton/day) in estimating the suspended sediment discharge. Compared to the best model, the

*MAE*increased by almost 33.4% when the GEP was run without the

*Q*. In addition, the results demonstrated that the ability of the model was reduced by eliminating the two parameters of

_{d}*Q*

_{d-1}and

*Q*

_{d-2}from the input variables of the GEP model. Overall, for the Kasilian and Telar Rivers, the

*Q*had a more significant impact on the performance of the GEP model to estimate the suspended sediment discharge (

_{d}*Q*) rather than other variables such as

_{s}*Q*

_{d-1},

*Q*

_{d-2}and

*Q*

_{s-1}.

Method . | Kasilian . | Method . | Telar . | ||
---|---|---|---|---|---|

MAE (ton/day)
. | R^{2}
. | MAE (ton/day)
. | R^{2}
. | ||

The best GEP | 876.3 | 0.942 | The best GEP | 1269.7 | 0.752 |

GEP without Q _{d} | 12128.8 | 0.139 | GEP without Q _{d} | 15470.5 | 0.145 |

GEP without Q_{s-1} | 9928.1 | 0.297 | GEP without Q_{d-1} | 12809.6 | 0.221 |

GEP without Q_{s-2} | 2010.4 | 0.868 | GEP without Q_{d-2} | 2962.4 | 0.587 |

– | – | – | GEP without Q_{s-1} | 2895.6 | 0.598 |

Method . | Kasilian . | Method . | Telar . | ||
---|---|---|---|---|---|

MAE (ton/day)
. | R^{2}
. | MAE (ton/day)
. | R^{2}
. | ||

The best GEP | 876.3 | 0.942 | The best GEP | 1269.7 | 0.752 |

GEP without Q _{d} | 12128.8 | 0.139 | GEP without Q _{d} | 15470.5 | 0.145 |

GEP without Q_{s-1} | 9928.1 | 0.297 | GEP without Q_{d-1} | 12809.6 | 0.221 |

GEP without Q_{s-2} | 2010.4 | 0.868 | GEP without Q_{d-2} | 2962.4 | 0.587 |

– | – | – | GEP without Q_{s-1} | 2895.6 | 0.598 |

## CONCLUSION

In this paper, the GEP, ANN and ANFIS models are developed in order to estimate the suspended sediment load of the Telar and Kasilan Rivers located in the north-east of Iran. The results showed that the use of time lagged daily flow discharge (*Q _{d}*) and sediment discharge (

*Q*) as input combinations would increase the accuracy of intelligence models. Furthermore, the results indicated that the GEP performance provided much more accurate results compared to the ANN and ANFIS models. For the Kasilian station, the estimated suspended sediment load (

_{s}*Q*) by the GEP, ANN and ANFIS models had an

_{s}*MAE*of 876.30 ton/day, 1390.02 ton/day and 1875.62 ton/day, respectively. Corresponding

*R*

^{2}values were 0.942, 0.921 and 0.875. Similarly, for the Telar station, the estimated suspended sediment load (

*Q*) by the GEP, ANN and ANFIS models had an

_{s}*MAE*of 1269.7 ton/day, 3023.45 ton/day and 3245.5 ton/day, respectively. Corresponding

*R*

^{2}values were 0.752, 0.610 and 0.510. Overall, the results indicated that intelligence models were effective and reliable methods for estimating the suspended sediment load. The results of the GEP, ANN and ANFIS performance were compared with the results of the SRC equation. The findings showed that the GEP, ANN and ANFIS models were much more accurate than the SRC method to estimate the suspended sediment load of the Telar and Kasilan Rivers. When using the GEP model, it was also shown that the

*MAE*decreased by approximately 36.7%, 53.3% and 81.8% for the Kasilian station and 58.0%, 60.9% and 81.1% for the Telar station compared to the ANN, ANFIS and SRC models, respectively. The most obvious finding to emerge from this study was that the GEP, ANN and ANFIS models were reliable approaches for estimating the suspended sediment load of rivers.