## Abstract

We applied the Box-Jenkins time series model and artificial neural network (ANN) in the framework of a multilayer perceptron (MLP) to predict the total dissolved solids (TDS) in the Zāyandé-Rūd River, Esfahan province, Iran. The MLP inputs were total hardness (TH), bicarbonate (HCO_{3}^{−}), sulfate (SO_{4}^{2−}), chloride (Cl^{−}), Sodium (Na^{+}), and Calcium (Ca^{2+}), which were monitored over 9 years by the Esfahan Water Authority. The Autoregressive Integrate Moving Average (ARIMA) (2, 0, 3) (2, 0, 2) time series model with the lowest Akaike factor was selected. The coefficient of determination (*R*^{2}) and index of agreement (*IA*) between the measured and predicted data of the ARIMA (2, 0, 3) (2, 0, 2) time series model were 0.78 and 0.73, respectively. Using Tansig transfer functions, the Levenberg-Marquardt algorithm for training and an MLP neural network with 10 neurons in a hidden layer were developed. *R*^{2} and *IA* between the measured and predicted data were 0.94 and 0.91, respectively. Consequently, the results of the MLP were more reliable than the Box-Jenkins time series to predict TDS in the river.

## INTRODUCTION

Sufficient forecasting of future conditions in water bodies is of high interest to a great variety of stakeholders including government departments, agencies, policymakers, health and medical bodies. Water quality is a basic subject in water resource management. Water salinity is one of the main parameters in water quality, and predicting it is vital for water quality management for both drinking and irrigation uses. The salinity of the surface waters, as an index of the dissolved constituents and ions in the water, is a significant issue in water quality management for irrigation and drinking purposes. Total dissolved ions (TDI), conductivity and total dissolved solids (TDS) can describe salinity (McNeil & Cox 2000).

Water salinity depends on minerals in the water or soil characteristics of the catchment area, of which some parts are the particulate matter and other parts are in solution (Asadollahfardi *et al.* 2012a, 2012b). Salinity, which characterizes the majority of the dissolved constituents in water, can be determined by a number of methods. One technique to indicate TDI is the total number of ions in solution described as the sum of the main ions in water expressed in mg/l. These contain the cations sodium ion (Na^{+}), potassium ion (K^{+}), calcium ion (Ca^{+2}), and magnesium ion (Mg^{+2}) and anions chloride ion (Cl^{−}) and bicarbonate ion with sulphate ion (SO4^{2−}) in most surface water and groundwater. Conductivity is another parameter which can provide an easy measurement of the salinity. TDS or ‘filterable residue,’ the concentration of dissolved substance in water, is another parameter which may be considered. This includes mineral and organic matter (Asadollahfardi *et al.* 2012a, 2012b).

Mathematical and statistical approaches have been employed frequently to predict water quality (Faruk 2010). Several methods exist for predicting TDS in river water, such as the Box-Jenkins time series, Bayesian time series and artificial neural network (ANN). Some of the applications of the first and third methods are as follows: Jayawardena & Lai (1989) analyzed water quality of the Pearl River using time series. Sun *et al.* (2001) applied Box-Jenkins time series, including Autoregressive Integrate Moving Average (ARIMA) time series, to anticipate the water quality of the Apalachicola Gulf. They recognized water level changes in tide situations, water quality of entering streams, regional rainfall, wind velocity, and wastewater discharge into the water body as the determinative elements in water quality of the Gulf. Asadollahfardi (2002) deployed different Box-Jenkins time series to study surface water quality in Tehran, Iran. Kurunc *et al.* (2005) used ARIMA and Thomas-Fiering performed prediction to monitor water quality and flow rate of the Yeşilırmak River, Turkey. The root mean square error and mean absolute error indicated that the Thomas-Fiering model predictions were more reliable than ARIMA. Asadollahfardi *et al.* (2012a, 2012b) studied the water quality upstream and downstream of the Latian Dam (located in Iran) using the Box-Jenkins time series. Abudu *et al.* (2011) applied ARIMA, transfer function-noise (TFN), and ANN approaches to anticipate the monthly TDS of the Rio Grande in El Paso, Texas. Ranjbar & Khaledian (2014) used the ARIMA method to predict water quality parameters of the Sefīd-Rūd River. They reported satisfactory outcomes. Arya & Zhang (2015) applied ARIMA models to predict dissolved oxygen and temperature in four water quality monitoring stations at Stillaguamish River Washington, US. They indicated the aptness of the Box-Jenkins time series in the forecasting. Salmani & Salmani Jajaei (2016) studied the TDS of Karoun River in the southeast of Iran using ARIMA; they also used a transfer function model to formulate TDS as a function of water flow volume.

Maier & Dandy (1996) applied an ANN to estimate the concentration of salinity in the River Murray, South Australia. They reported reasonable results. Zhang *et al.* (1997) used an ANN to forecast river raw water quality in the Northern Saskatchewan. Huang *et al.* (2002) assessed the application of an ANN in the modeling of salinity variation at the Apalachicola River, Florida, USA.

Misaghi & Mohammadi (2003) used an ANN to analyze the water quality of Zāyandé-Rūd River. They employed a general regression neural network (GRNN) to model biochemical oxygen demand (BOD) and dissolved oxygen (DO). Kanani *et al.* (2008) reached agreeable results in applying the multilayer perceptron (MLP) neural network and input delay neural network (IDNN) to predict TDS in Achechay River basin, Iran. Asadollahfardi *et al.* (2012a, 2012b) used the MLP and recurrent neural network (RNN) to predict TDS one month in advance in the Talkheh Rud River. Nemati *et al.* (2014) employed an ANN to predict TDS in the Simineh River, Iran. Babu & Reddy (2014) developed a hybrid ARIMA-ANN to forecast time series data. They reported good accuracy of the model. Adhikari (2015) showed that combining time series forecast from several models would result in better outcomes. Gairaa *et al.* (2016) applied Box-Jenkins and ANN models to estimate the daily global solar radiation. The acquired results were satisfactory. Taneja *et al.* (2016) studied time series analysis of aerosol optical depth over New Delhi using the Box-Jenkins ARIMA modeling approach. The data were gathered from 2004 to 2014. The results showed that simple modeling suggested reliable prediction for aerosol optical depth. Asadollahfardi *et al.* (2016) applied a time delay neural network (TDNN) and radial basis function (RBF) to forecast TDS in Zāyandé-Rūd River. Their results of sensitivity illustrated that Ca^{2+} and SO_{4}^{2−} parameters had the highest effect on the TDS prediction.

Asadollahfardi *et al.* (2017) applied the MLP and RBF neural network for two stations to predict TDS in Amirkabir Dam in Karaj, Iran. They concluded that RBF performance was better than that of the MLP neural network.

Montaseri *et al.* (2018) applied different types of ANN including two types of adaptive-neuro fuzzy inference system (ANFIS) including ANFIS with grid partition, ANFIS with subtractive clustering, gene expression programming (GEP), wavelet-ANN, wavelet ANFIS and wavelet GEP to forecast TDS of four rivers in different areas of Iran, including Nazlu Chay River, Tajan River, Zāyandé-Rūd River and Helleh River. They concluded that the hybrid wavelet-AI (artificial intelligence) performed better than the AI model in all the rivers.

### Study area

The Zāyandé-Rūd originates in the Zard-Kuh subrange of the Zagros Mountains and reaches to the Gavkhouni Swamp. The geographic coordinates are N 31 30 30.32 and E 49 30 52.49 (Figure 1, The study area). Average annual rainfall varies between 1,600 mm at the height of Zardkooh Mountain to less than 40 mm in the plain. Average annual temperature is 3.5 °C in the northwestern highlands and 21.5 °C in the eastern areas of the central region of Iran. Sedimentary and metamorphic rocks of the Jurassic and new alluvium Quaternary are the most abundant rock of the riverbed, which are the major sources of fine-grained particles along the river. In general, fine-grained particles are considered themselves a cause for absorption and accumulation of elements of a toxic nature in stream sediments due to the high adsorption (Hossieni Abri 2000; Moienian 2008).

This paper employed Box-Jenkins time series models and MLP neural network RBF to predict TDS in Zāyandé-Rūd River at Moisan monitoring station. The data were monitored by Isfahan Province Regional Water authority, which collected from 2001 to 2010.

## MATERIALS AND METHODS

### Box-Jenkins methodology

Box & Jenkins (1976) developed a new methodology, which basically breaks down stationary time series data into their components. In the stationary time series, the statistical characteristics of a time series (such as mean and variance) remain constant over time. Development of the Box-Jenkins method is based on the following steps (Sun & Koch 2001):

Check the data for normality.

Identification, including plot of the transformed series: autocorrelation function (ACF) and partial autocorrelation function (PACF).

Estimation, which consists of maximum likelihood estimates (MLE) for the model parameters (Ansley algorithm)

Diagnostic checks which consist of overfitting and check of residuals (modified Portmanteau test).

Model structure chosen criteria which include Akaike information criterion (AIC) and Bayesian information criterion (BIC).

*Z*,

_{t}*Z*, …,

_{t-1}*Z*is non-stationary (the mean and variance of the series is varying with time), Equation (4) indicates the relation between

_{n}*φ*and

_{p}(B)*θq(B)*. In other words, Equation (4) presents the autoregressive integrated moving average

*ARIMA (p, d, q)*model with integers

*p*,

*d*,

*q*, indicating the order of the model.

The perfect description of the Box-Jenkins method can be found in Box & Jenkins (1976).

*k*lag and the PACF.where,

*k*is the ACF value with

*k*time lag;

*z(i)*and

*z(i*

*+*

*k)*are variable amounts of time series data in the

*i*step and

*i*

*+*

*k*step with

*k*time lag; is the average value of the variables; and

*φ(k)*is the PACF value with

*k*time lag.

*k-*th output of a single hidden layer MLP can be written as follows (Menhaj 1998):where,

*y*represents the

_{k}*k*-th output,

*w*is the weights and biases,

*g*is the output layer transfer function,

_{out}*g*is the input layer transfer function, and

_{in}*i*is the

*i*-th layer.

### Determination of the network architecture

An appropriate number of hidden layers in an ANN is essential for best presentation of the network. In insufficient hidden layers, the mapping might not correctly be assessed. Having excessive hidden layers raises the network intricacy and does not inevitably lead to a rise in the precision of the network. Studies represented that a feed forward neural network with a hidden layer of sigmoid tangent and a linear output layer would be able to estimate each obscured function (Figure 4; Cybenko 1989; Hornik 1991, 1993; Leshno *et al.* 1993). Consequently, the network would be simpler (Hornik *et al.* 1989). The transfer function of the hidden layer in the MLP was a sigmoid tangent. The function of the output layer was a linear function. The number of neurons in the hidden layer was obtained by applying a trial and error method.

*A*and

_{S}*A*are the scaled and measured (observed) values of input at time t, respectively.

_{i}*A*and

*B*are the lowest and highest values of the series of inputs. MATLAB software (R2012b) was applied to operate the model. In an MLP neural network, we normalized the measured input data and changed them to bipolar (−1 to 1) with

*k*segment criteria. Therefore, the results were bipolar.

*k*was employed to change the output to the original form (Razavi 2006).

To obtain robust results from the ANN, low correlated inputs are preferred (Kuncheva 2004). According to protocol for developing ANN models stated by Wu *et al.* (2014), we can use only significant input parameters or can also consider input independence. Input parameters which were significant were chosen. Also, in our previous study, Asadollahfardi *et al.* (2016) performed a factor analysis to find suitable water quality input parameters for the RBF neural network and the results of factor analysis were four water quality parameters including total hardness (TH), , Ca^{2+} and Cl^{−} which were selected in this study and two other parameters including Na^{+} and SO_{4}^{2−} were added to them because they had a significant role in prediction of TDS in the river (input significance).

### Learning rate

Prior to training the MLP neural network, it is necessary to prepare the weights and biases and their values are chosen randomly between 0 and 1. After that, the neural network prepares for training. The training process needs a set of training data. Throughout training, weights and biases of the neural network are collaboratively adjusted to minimize the mean squared error (MSE) between the predicted and measured data. The performance of the algorithm is very sensitive to the appropriate setting of the learning rate. On the condition that the learning rate is very high, the algorithm may fluctuate and become unstable. On the condition that the learning rate is very small, the algorithm takes too long to achieve coverage. We selected the learning rate with an iterative procedure. The performance of the steepest descent algorithm would be increased if the learning rate is permitted to change during the training process (Amini 2008).

A back-propagation learning algorithm and fast convergences of the network parameters can be applied to determine the *α* coefficient (learning rate). In each time step, small values of *α* reduce variations of the parameters and slow the converging process. Meanwhile, high values of *α* increase the convergence amount, resulting in instability in the network parameters. Obtaining a suitable *α* is one of the most sensitive processes in using the algorithm for back propagation. An adaptive *α* aims to make the learning step as big as possible while keeping the learning stable (Asadollahfardi *et al.* 2016). *α* is based on the steepest descent technique and aims to quickly minimize the sum square error of the outputs learning rate (Menhaj 1998). The learning rate is a hyper-parameter which controls the amount of adjustment of the weights of the MLP neural network with respect to the loss gradient performed by the user.

The steepest descent is likewise named the error back propagation (EBP) algorithm. For the EBP algorithm, the larger the training with constant *α*, the faster and less stable the process will be. The Levenberg-Marquardt algorithm is much faster than the EBP algorithm and more stable than the Gauss-Newton algorithm (Yu & Wilamowski 2010). The Levenberg-Marquardt algorithm solves the flaw existing in both the gradient technique and the Gauss-Newton algorithm for artificial neural networks by a combination of two mentioned algorithms. The Levenberg-Marquardt algorithm also had a few disadvantages, but most of them have been solved by researchers such as Wilamowski & Yu (2010) and Transtrum & Senthna (2012). Wilamowski and Yu computed the Quasi-Hessian matrix and gradient vector directly without Jacobin matrix multiplications and storage. Therefore, the memory limitation of the Levenberg-Marquardt algorithm was solved. Transtrum & Senthna (2012) introduced some improvements to the Levenberg-Marquardt algorithm to improve both its convergence speed and strength to initial guess. They obtained the geodesic acceleration correction to the Levenberg-Marquardt algorithm step by using second order correction in the Taylor approximation of the residuals and ensuring that the extrinsic curvature of the model graph is small. They stated that the correction can be calculated at each step with minimal addition computation. Therefore, we used the Levenberg-Marquardt algorithm.

*A*is the measured data at time

_{t}*t*(1<

*t*<

*n*);

*n*is the number of data;

*F*is the predicted data at time

_{t}*t*; is the mean predicted data; and is the mean measured data.

We used the TH, bicarbonate (), calcium ions (Ca^{2+}), sodium ion (Na^{+}), chlorine ion (Cl^{−}) and sulfate ion (SO_{4}^{2−}) as the input data and predicted TDS at the Mosian monitoring station. One hundred and twenty monthly data were available from 2001–2010. Ninety-six sets of data were assigned for training. Twelve sets were used to validate the model and 12 sets were considered for model testing (Table 1). These data were available for authors. We can update our model whenever new data become available.

Parameters . | Ca^{2+}, (mg/l)
. | Na^{+}, (mg/l)
. | Cl^{−} (mg/l)
. | SO_{4}^{2−} (mg/l)
. | HCO_{3}^{−} (mg/l)
. | TH (mg/l) . | TDS (mg/l) . |
---|---|---|---|---|---|---|---|

Number (months) | 120 | 120 | 120 | 120 | 120 | 120 | 120 |

Average | 3.40 | 2.20 | 1.83 | 2.14 | 2.98 | 2.35 | 432.98 |

Minimum | 1.70 | 0.10 | 0.20 | 0.10 | 1.10 | 1.40 | 232.05 |

Maximum | 7.80 | 8.30 | 9.60 | 9.51 | 5.30 | 5.20 | 1220.70 |

Standard deviation | 0.96 | 1.78 | 1.48 | 1.53 | 0.67 | 0.65 | 187.51 |

Mode | 2.9 | 1 | 0.9 | 1.4 | 2.9 | 2.1 | 320.45 |

Parameters . | Ca^{2+}, (mg/l)
. | Na^{+}, (mg/l)
. | Cl^{−} (mg/l)
. | SO_{4}^{2−} (mg/l)
. | HCO_{3}^{−} (mg/l)
. | TH (mg/l) . | TDS (mg/l) . |
---|---|---|---|---|---|---|---|

Number (months) | 120 | 120 | 120 | 120 | 120 | 120 | 120 |

Average | 3.40 | 2.20 | 1.83 | 2.14 | 2.98 | 2.35 | 432.98 |

Minimum | 1.70 | 0.10 | 0.20 | 0.10 | 1.10 | 1.40 | 232.05 |

Maximum | 7.80 | 8.30 | 9.60 | 9.51 | 5.30 | 5.20 | 1220.70 |

Standard deviation | 0.96 | 1.78 | 1.48 | 1.53 | 0.67 | 0.65 | 187.51 |

Mode | 2.9 | 1 | 0.9 | 1.4 | 2.9 | 2.1 | 320.45 |

First, the input data were normalized. Then, the required numbers of layers and neurons were determined using a trial and error approach.

### Model efficiency

*R*

^{2}), index of agreement (

*IA*) and Nash-Sutcliffe efficiency (

*E*) were applied (Equations (13) to (15); Heckman 1979; Krause

*et al.*2005; Willmott

*et al.*2012). We selected three indicators for performance of the models because testing model efficiency with three indicators is more reliable than using only one indicator (Noriasi

*et al.*2007).

## RESULTS AND DISCUSSION

### Box-Jenkins results

Due to the necessity of the normalized data for model development, a normality test was conducted with the TDS data before and after Box Cox transformation (Figure 5).

To estimate the order of AR (autoregressive) and MA (moving average), ACF and PACF of TDS data were computed (Figure 6). Three models ARIMA (2,0,3) (1,0,2), ARIMA (2,0,3) (2,0,2) and A1RIMA (1,0,3) (1,0,1) were examined. Afterward, the AIC for each model was determined, which were 1226.9, 1223.5, and 1229.5, respectively. A model with the smallest AIC would be preferable, which was ARIMA (2, 0, 3) (2, 0, 2).

To investigate the independence of residuals assumption, ACF and PACF of the residuals were considered. Figure 7 indicates residual plots for TDS data using ARIMA (2, 0, 3) (2, 0, 2). Therefore, the normality of the residuals was obtained by fitting the ARIMA (2,0,3) (2,0,2) model and graph of normality (histogram), which indicated that the graph was not funnel-shaped.

Figure 8 illustrates the ACF and PACF of TDS residuals of TDS. The functions were varied between −0.2 and 0.2, which depicted the independence of residuals and the suitability of the selected model.

To investigate the robustness of the model, *IA* and *R*^{2} of the measured and predicted TDS were calculated (Table 2). They could be considered acceptable and were 0.73 and 0.78, respectively.

Months . | Predicted data (mg/l) . | Measured data (mg/l) . | Lower limit . | Upper limit . |
---|---|---|---|---|

1 | 450.367 | 575.25 | 142.593 | 758.14 |

2 | 622.195 | 872.30 | 266.849 | 977.54 |

3 | 466.367 | 560.95 | 101.649 | 831.09 |

4 | 714.697 | 991.25 | 348.489 | 1080.90 |

5 | 424.513 | 531.70 | 57.951 | 91.077 |

6 | 533.371 | 449.15 | 164.431 | 902.31 |

7 | 407.223 | 344.50 | 36.187 | 778.26 |

8 | 398.286 | 539.50 | 26.763 | 769.81 |

9 | 542.163 | 661.05 | 170.620 | 913.71 |

10 | 493.017 | 555.10 | 121.057 | 864.98 |

11 | 588.222 | 747.50 | 215.801 | 960.64 |

12 | 441.634 | 534.95 | 69.068 | 814.20 |

Error criteria | IA = 0.73 | R^{2} = 0.78 |

Months . | Predicted data (mg/l) . | Measured data (mg/l) . | Lower limit . | Upper limit . |
---|---|---|---|---|

1 | 450.367 | 575.25 | 142.593 | 758.14 |

2 | 622.195 | 872.30 | 266.849 | 977.54 |

3 | 466.367 | 560.95 | 101.649 | 831.09 |

4 | 714.697 | 991.25 | 348.489 | 1080.90 |

5 | 424.513 | 531.70 | 57.951 | 91.077 |

6 | 533.371 | 449.15 | 164.431 | 902.31 |

7 | 407.223 | 344.50 | 36.187 | 778.26 |

8 | 398.286 | 539.50 | 26.763 | 769.81 |

9 | 542.163 | 661.05 | 170.620 | 913.71 |

10 | 493.017 | 555.10 | 121.057 | 864.98 |

11 | 588.222 | 747.50 | 215.801 | 960.64 |

12 | 441.634 | 534.95 | 69.068 | 814.20 |

Error criteria | IA = 0.73 | R^{2} = 0.78 |

### MLP neural network results

We selected the MLP with one hidden layer with different numbers of neurons, from 2 to 11 neurons, using Tansig, Purelin and Logsig transfer functions with the Levenberg-Marquardt (trainlm) training method to find the best model. To obtain the optimum performance of the model, we evaluated *R*^{2}, *IA*, *RMSE*, *E* and *MBE* with one hidden layer. Therefore, the model was run with different and possible transfer and amounts of hidden layers and neurons (Table 3). The results indicated that an MLP with one hidden layer and 10 neurons in the hidden layer as well as the Tansig transfer function would give a good performance.

No . | Number of neurons . | Training method . | Transfer function . | The model performance . | ||||
---|---|---|---|---|---|---|---|---|

R^{2}
. | RMSE
. | IA
. | E
. | MBE
. | ||||

1 | 2 | Levenberg-Marquardt (trainlm) | Tansig^{a} | 0.75 | 3.58 | 0.78 | 0.73 | 0.127 |

2 | Purelin^{b} | 0.56 | 7.42 | 0.57 | 0.54 | 0.319 | ||

3 | Logsig^{c} | 0.73 | 4.19 | 0.74 | 0.71 | 0.154 | ||

4 | 4 | Tansig | 0.81 | 2.14 | 0.84 | 0.80 | 0.094 | |

5 | Purelin | 0.59 | 5.87 | 0.61 | 0.58 | 0.257 | ||

6 | Logsig | 0.80 | 1.98 | 0.82 | 0.80 | 0.085 | ||

7 | 6 | Tansig | 0.90 | 0.749 | 0.91 | 0.89 | 0.051 | |

8 | Purelin | 0.63 | 5.23 | 0.66 | 0.61 | 0.228 | ||

9 | Logsig | 0.84 | 0.926 | 0.86 | 0.83 | 0.062 | ||

10 | 8 | Tansig | 0.89 | 0.841 | 0.92 | 0.89 | 0.045 | |

11 | Purelin | 0.64 | 4.84 | 0.65 | 0.64 | 0.211 | ||

12 | Logsig | 0.86 | 0.957 | 0.88 | 0.84 | 0.052 | ||

13 | 10 | Tansig | .091 | .0547 | .094 | .090 | .0023 | |

14 | Purelin | 0.68 | 3.21 | 0.70 | 0.67 | 0.157 | ||

15 | Logsig | 0.89 | 0.613 | 0.91 | 0.88 | 0.037 | ||

16 | 12 | Tansig | 0.91 | 0.677 | 0.92 | 0.89 | 0.031 | |

17 | Purelin | 0.66 | 3.79 | 0.69 | 0.64 | 0.182 | ||

18 | Logsig | 0.87 | 0.655 | 0.89 | 0.86 | 0.041 |

No . | Number of neurons . | Training method . | Transfer function . | The model performance . | ||||
---|---|---|---|---|---|---|---|---|

R^{2}
. | RMSE
. | IA
. | E
. | MBE
. | ||||

1 | 2 | Levenberg-Marquardt (trainlm) | Tansig^{a} | 0.75 | 3.58 | 0.78 | 0.73 | 0.127 |

2 | Purelin^{b} | 0.56 | 7.42 | 0.57 | 0.54 | 0.319 | ||

3 | Logsig^{c} | 0.73 | 4.19 | 0.74 | 0.71 | 0.154 | ||

4 | 4 | Tansig | 0.81 | 2.14 | 0.84 | 0.80 | 0.094 | |

5 | Purelin | 0.59 | 5.87 | 0.61 | 0.58 | 0.257 | ||

6 | Logsig | 0.80 | 1.98 | 0.82 | 0.80 | 0.085 | ||

7 | 6 | Tansig | 0.90 | 0.749 | 0.91 | 0.89 | 0.051 | |

8 | Purelin | 0.63 | 5.23 | 0.66 | 0.61 | 0.228 | ||

9 | Logsig | 0.84 | 0.926 | 0.86 | 0.83 | 0.062 | ||

10 | 8 | Tansig | 0.89 | 0.841 | 0.92 | 0.89 | 0.045 | |

11 | Purelin | 0.64 | 4.84 | 0.65 | 0.64 | 0.211 | ||

12 | Logsig | 0.86 | 0.957 | 0.88 | 0.84 | 0.052 | ||

13 | 10 | Tansig | .091 | .0547 | .094 | .090 | .0023 | |

14 | Purelin | 0.68 | 3.21 | 0.70 | 0.67 | 0.157 | ||

15 | Logsig | 0.89 | 0.613 | 0.91 | 0.88 | 0.037 | ||

16 | 12 | Tansig | 0.91 | 0.677 | 0.92 | 0.89 | 0.031 | |

17 | Purelin | 0.66 | 3.79 | 0.69 | 0.64 | 0.182 | ||

18 | Logsig | 0.87 | 0.655 | 0.89 | 0.86 | 0.041 |

^{a}Tansig: tan-sigmoid transfer function.

^{b}Logsig: log-sigmoid transfer function.

^{c}Purelin: linear transfer function.

In higher numbers of epochs, the errors of training, validation and testing of MLP would decrease. So that the error would be insignificant in an epoch number of 1000 (Figure 9).

The acquired results of the MLP indicated a good agreement between the measured and the predicted amounts of TDS. So that, *R*^{2}, *IA* and *E* were 0.91, 0.94 and 0.9 respectively (Figure 10).

Figure 11 indicates the comparison between measured data, time series Box-Jenkins prediction and MLP neural network prediction during testing. As indicated in Figure 11, the MLP method was more reliable than the Box-Jenkins time series in the prediction of TDS concentrations.

The difference between the present study and previous work with Asadollahfardi *et al.* (2016) is that we applied univariate Box-Jenkins time series and the MLP neural network, while in the previous study we applied only a RBF neural network. We conducted this study to find the reliability of Box-Jenkins time series and the MLP neural network compared to our previous study using the RBF neural network in the river, since predicting accurate and reliable TDS is very significant in water quality management.

The difference between our study and the Montaseri *et al.* (2018) investigation is that they applied different types of ANN to predict TDS of four different rivers, including Zaandam rud River, and compared different ANNs to forecast TDS, whereas we compared the MLP neural network and Box-Jenkins time series to predict TDS of Zāyandé-Rūd River.

## CONCLUSIONS

Considering the results and discussions of the application of MLP and Box-Jenkins time series in the prediction of TDS concentration at the Zāyandé-Rūd River, the following conclusions can be drawn:

We developed ARIMA (2, 0, 3) (2, 0, 2) Box-Jenkins time series in the prediction of TDS in the river. The coefficient of determination (

*R*^{2}) and index of agreement (*IA*) between the measured and predicted TDS were equal to 0.78 and 0.73, respectively.Applying the MLP with 10 neurons in a hidden layer using the Tansig transfer function, a suitable TDS prediction was achieved. Values of

*R*^{2}and*IA*between the measured and predicted TDS concentrations were 0.91 and 0.94, respectively.The model efficiency of the MLP neural network to predict TDS in Zāyandé-Rūd River was better than that of the Box-Jenkins time series.