## Abstract

In this study, the performance of SWAT hydrological model and three computational intelligence methods used to simulate river flow are investigated. After collecting the data required for all models used, the calibration and validation stages were performed. Using the SWAT model and three methods of the Extreme Machine Learning (EML), the Support Vector Regression (SVR), and the Least Squares Support Vector Regression (LSSVR), Maharlu Lake Basin stream flow was simulated and the results obtained at Shiraz station were used for this study. A noise reduction filter was employed to improve the results from the computational intelligence methods, and SUFI-2 algorithm was used to analyze the uncertainty of the SWAT model. Finally, in order to evaluate the models developed and the SWAT model, three statistics (RMSE), (R²), and (NS) coefficient were used. The results indicated that the SWAT model and the machine learning models were generally appropriate tools for daily flow modeling, but the LSSVR model showed less errors in both learning and testing, with the coefficients NS = 0.997 and R² = 0.997 in the calibration stage and NS = 0.994 and R² = 0.994 in the validation stage, which prove their better performance compared to the other methods and the SWAT model.

## HIGHLIGHTS

In this study, we investigated the performance of SWAT hydrological model and three computational intelligence methods for simulating a river flow.

The calibration and validation stages were performed after collecting the data required for all models.

The study used the SWAT model and three methods of Extreme Learning Machine (ELM), Support Vector Regression (SVR) and Least Squares Support Vector Regression (LSSVR).

## INTRODUCTION

Stream flow is one of the most important variables in the hydrological cycle (Jimeno-Sáez *et al.* 2018). The estimation of accurate stream flow is required for the estimation of floods, development of agricultural strategies, and planning hydraulic structures (Vilaysane *et al.* 2015; Jimeno-Sáez *et al.* 2018).

There are several models for hydrological modeling of catchments and the appropriate model is selected considering the capabilities of such models and the topographic and climatic conditions of the area concerned. A sophisticated mathematical model is the Soil and Water Assessment Tool (SWAT) (Arnold *et al.* 1998; Jimeno-Sáez *et al.* 2018). The SWAT model is a semi-distributed, physically-based river basin model that is computationally efficient and operates on a continuous time-step that simulates the processes (water movement, sediment, plant growth, food cycle, etc.) rather than using the regression equations. In this regard, the model will require specific climatic information (soil properties, topography, and type of land vegetation). To facilitate this process, the SWAT-CUP (Calibration and Uncertainty Procedures) model has been developed. The performance of the SWAT Hydrological model was tested on Sher River at Belkheri in Narsimhpur District of Madhya Pradesh, India; the results showed that the performance of the SWAT model in simulating streamflow at Belkheri gauging site can be rated as very good and the calibrated model can be used for the runoff simulation of this agriculture-dominated watershed.

On the other hand, the nonlinear property, the lack of inherent certainty of the process, the need for extensive information, and the complexity of these models have led researchers to use common models such as artificial neural networks and new methods such as machine learning methods.

Over the past two decades, the machine learning methods have emerged as a powerful computing system for highly complex and nonlinear systems. These methods provide a relatively flexible and fast technique for modeling and can, thanks to their parallel architecture, make the nonlinear behavior of the system understandable and then estimate the values of the objective function accurately.

The Extreme Machine Learning (EML) Model is a high rate generalization performance learning technique that basically employs the Single hidden-Layer Feedforward Neural (SLFN) network (Huang *et al.* 2004, 2006). The Extreme Machine Learning model is easy to use and requires no parameter other than a predefined network architecture, so this model has implications such as learning rate, learning period, and local minimum where the slope-based algorithms encountered are far away (Deo *et al.* 2016).

The EML model not only learns much faster than traditional gradient-based learning algorithms but also avoids many complexities of the algorithms. Therefore, due to the its significant advantages such as simplicity, generalization performance and high implementation rate, many researchers in various research fields have focused on this model (Rezaeenour *et al.* 2019). Support vector regression (SVR), which is based on the VC-theory (Vapnik 1998), has recently emerged as an effective method for prediction problems**.** The SVR is a mathematical framework that estimates dependency by learning from finite samples. The SVR minimizes the estimation errors and the complexity of the model, simultaneously. This framework provides a good generalization ability and leads the model to be less prone to over-fitting. Considering these good properties, the SVR has attracted more and more attention in machine condition prognostics in recent years (Qu & Zuo 2012). The type of SVR depends on the loss function used. The standard SVR employs the *ε*-insensitive loss function which constructs the resultant regression model with selected samples known as Support Vectors. Another SVR called the Least Square SVR (LSSVR) employs the squared loss function to construct the resultant regression model, but it uses the entire samples (Qu & Zuo 2012). The Least Squares Support Vector Regression (LSSVR) is a machine learning technology based on a structure risk minimization (Suykens & Vandewalle 1999; Zhang *et al.* 2018). The LSSVR transforms the localization problem into the constraint optimization problem of the quadratic programming, which can greatly reduce the computational complexity and improve the computational efficiency. Meanwhile, the LSSVR uses all learning samples as support vectors, which can greatly improve the regression modeling ability of the learning samples (Zhang *et al.* 2018).

Huang *et al.* (2014) predicted the monthly flow of the river at Huaxi station using the support vector machine, and their results showed that the support vector machine model has high accuracy in predicting the monthly flow of the river. Ghorbani *et al.* (2016) used the support vector machine models and the artificial neural network to predict the daily flow of Cypress River in Texas and found that the support vector machine model had a better performance than the neural network model did in terms of the river flow prediction and it also had good accuracy. By examining some of the norm computing methods based on machine learning including the Least Squares Regression Support Vector Machine (LSSVR), the artificial neural network (RBFNN), the Multi-Linear regression (MLR), and the decision tree models such as CART and M5 for predicting Kopili river suspended load in India, Kumar *et al.* (2016) introduced the LSSVR model as the most accurate method for estimating the above values.

Few studies have been conducted aimed at comparing the SWAT and the data-driven models in the past decade. Jajarmizadeh *et al.* (2015) performed monthly stream flow estimations for southern Iran using the SWAT and the SVM models. Although high accuracy levels were achieved with both models, the SVM model was found to be more successful in the estimation study. Noori & Kalin (2016) employed the SWAT and ANN models in order to perform stream flow estimations at 29 watersheds located around Atlanta during hot and cold seasons. At the end of the study, higher accuracy was attained in hot seasons than in cold seasons (Koycegiz & Buyukyildiz 2019). Afkhamifar & Sarraf (2020) investigated a hybrid wavelet-transform-based extreme learning machine (ELM) model predicting GWL. Also, two other popular models – a wavelet-transform based artificial neural network and a wavelet-transform-based adaptive neuro-fuzzy interference system – were used to evaluate the model. The wavelet-transform-based ELM model outperformed the other two models with a correlation coefficient of 0.983 during a 1-month period. The model was also superior to the others in terms of training and testing speeds. Donyaii & Sarraf (2020) evolved the Multi-Objective Coronavirus Optimization Algorithm (MOCVOA) to obtain the operational optimum rules of Voshmgir Dam reservoir under climate change conditions. The climatic variables downscaled and predicted by the Bias Correction Spatial Disaggregation (BCSD) method of the MIROC-ESM model were introduced into the Extreme Learning Machine (ELM) model to evaluate the future runoff flowing into the reservoir. Donyaii *et al.* (2020) re-evaluated the management crisis of Iran's groundwater resources from the perspective of the administrative, social, and legal system with the GRP challenging approach. Then, the best-worst fuzzy multi-criteria decision model in Lingo9 software was developed by collecting the opinions of 23 GRP experts, based on three general criteria of change in administrative structure (C1), policy structure (C2) and modification of GRP legal structure (C3), which includes 18 sub-criteria. Donyaii *et al.* (2021) evaluated and compared the performance of the farmland fertility algorithm (FFA) and that of the Harris hawks optimization (HHO) algorithm with each other to solve the two-objective problem of optimal operation of the Golestan dam reservoir under baseline (April 2006–October 2018) and climate change conditions (April 2021–October 2033) in Golestan Province, Iran and the results showed that when considering a constant value of the reliability index (i.e. 80%), the rate of vulnerability changes in the multi-objective FFA (Moffa) is lower than that of the multi-objective HHO (Mohho) algorithm. Ghanbari *et al.* (2021) obtained the important socioeconomic parameters of the city of Tehran, Iran in the period of 1991–2013. Important and optimum variables were analyzed and selected using the Pearson correlation analysis, and four variables including income, pop, GDP, and month were selected. In order to find a proper model, three common machine learning (ML) techniques including artificial neural network (ANN), random forest (RF), and multivariate adaptive regression splines (MARS) were used and the results revealed that the MARS model outperformed all the other models. In addition, in the last step, the crow search algorithm (CSA) was applied to the MARS model to increase the accuracy of the selected model. Therefore, the SWAT and the computational intelligence methods have been widely used for stream flow estimation. However, few studies have compared both models for daily stream flow estimation. (Koycegiz & Buyukyildiz 2019) investigated the performance of two SWAT and Data-Driven Models (the RBNN and the SVR models) for simulating river runoff in Turkey. The results showed that the data-driven models were more successful than the SWAT model, although the SWAT model is a comprehensive physical model and has a significant problem-solving capability (Eini *et al.* 2020b).

Therefore, the aim of the present study was to investigate the limitations and strengths of machine learning based models compared the SWAT hydrological model for monthly streamflow modeling in Maharlu Lake Basin. Also, the review of previous studies showed that no study has been conducted in this field so far. So, this study attempted to evaluate the effectiveness of the SWAT hydrological model and three computational intelligence methods.

## MATERIALS AND METHODS

### Methods

Firstly, the SWAT model was developed regarding the required inputs. The facts that a section of the basin is of karstic type and it has different hydrological characteristics should be considered to increase the accuracy of the basin simulations. It should be noted that for the calibration and the validation of simulation, we used the SUFI2 provided in the software SWAT-CUP (Eini *et al.* 2020a).

Regarding the widespread use of the machine learning methods, the flexibility as well as the high power of these methods for estimating nonlinear functions, in the next section, the three methods of the Extreme Learning Machine (EML), the Support Vector Regression (SVR); and the Least Squares Support Vector Regression (LSSVR) are used to estimate the streamflow. In this study, the mean-weighted noise reduction filter was used to eliminate the out-of-range points. Eighty percent of the data was used to train the model and the remaining 20% was used to evaluate the accuracy of the models. Finally, the accuracies of the developed models and the SWAT model for the estimation of the streamflow were evaluated and compared using various statistical criteria such as the Root Mean Square Error (RMSE), the Determination Coefficient (R²), and the NSE. A graphical error analysis was also performed by the use of the regression and comparison graphs between the actual values and the estimated values of the models in order to provide a better analysis of the error of the developed models and the semi-distributed SWAT model.

### The study area and the input data

To compare the accuracy of the SWAT model and the methods of extreme machine learning, the watershed in Iran was selected as the case study in this work. Figure 1 shows the location map of the watersheds. It is located in the catchment area of the central plateau of Iran and in the basin of Bakhtegan and Maharlu lakes with an area of about 4,270 km^{2} and lies between 29°1′‒30°6′ N and 52°12′‒53°28′ E. The average annual precipitation and temperature in the region are 368 mm and 17.4 °C, respectively. The precipitation in wet seasons (occurring from November to March) accounts for more than 90% of the annual precipitation (Eini *et al.* 2020a).

### Soil and water assessment tool (SWAT)

*et al.*2018). For the SWAT modeling purposes, a basin may be partitioned into several sub-basins and then further sub-classiﬁed as the Hydrologic Response Units (HRU) with a unique grouping of land use, soil, and slope (Flügel 1995). The calculation of the runoff yield after the basin subdivision can improve the accuracy of the runoff simulation (Liang

*et al.*2018). The hydrological simulation at each HRU is performed using the water balance equation (Arnold

*et al.*1998):where,

*SW*: Final soil water content (mm)._{t}*SW*_{0}: Initial soil water content (mm).*R*: The amount of precipitation on day_{day}*i*(mm).*Q*: The amount of surface runoff on day_{surf}*i*(mm).*E*: The amount of evapotranspiration on day_{a}*i*(mm).*W*: The amount of water entering the vale zone from the soil proﬁle on day_{seep}*i*(mm).*Q*: The amount of return flow on day_{gw}*i*(mm).

### The SWAT model setup and data sets

The model data mainly include two parts: the model input and the calibration verification data. In the SWAT model, it is necessary to build a spatial database of the elevation, the land use, the soil type, and the meteorological data. The elevation data used a Digital Elevation Model (DEM) with a spatial resolution of 15 m (https://gdex.cr.usgs.gov/gdex/) (Figure 2(a)). The DEM and the land use data are from the United States Geographical Survey (USGS). According to the standard, the land use data are divided into five categories (Table 1 and Figure 2(b)). Shrub lands are dominant in this basin, covering 76.60% of the area. The soil data were obtained by downloading the attributes from the FAO global soil data (1:50000) (https://www.fao.org) and include 3 soil categories and 5 sub-categories (Figure 2(c) and Table 2). Also, the dominant soil in this basin is loamy (59.5%).

Land use type . | SWAT code . | Area (%) . |
---|---|---|

Shrub land | SHRB | 76.60 |

Crop land | CRWO | 8.90 |

Crop land | CRDY | 8.74 |

Crop land | CRGR | 0.51 |

Grass land | GRAS | 3.08 |

Crop land | CRIR | 0.62 |

Urban construction land | URMD | 1.45 |

Water area | WATR | 0.04 |

Land use type . | SWAT code . | Area (%) . |
---|---|---|

Shrub land | SHRB | 76.60 |

Crop land | CRWO | 8.90 |

Crop land | CRDY | 8.74 |

Crop land | CRGR | 0.51 |

Grass land | GRAS | 3.08 |

Crop land | CRIR | 0.62 |

Urban construction land | URMD | 1.45 |

Water area | WATR | 0.04 |

^{a}GLCC organization (The Global Land Cover Characterization (GLCC) is a series of the global land cover classification datasets that are primarily based on the unsupervised classification of 1-km AVHRR (Advanced Very High Resolution Radiometer) 10-day NDVI (Normalized Difference Vegetation Index) composites. The AVHRR source imagery dates from April 1992 through March 1993. The Ancillary data sources included the digital elevation data, the eco-regions interpretation, the country- or regional-level vegetation, and the land cover maps. https://lta.cr.usgs.gov/GLCC

FAO soil layer^{a} (1 km^{2} cells). | Area (%) . | ||
---|---|---|---|

Name . | Soil type (FAO classification) . | Partial . | Aggregated . |

I-Rc-Yk-c-3508 | LOAM | 29.67 | 59.56 |

Rc37-3c-3552 | LOAM | 29.89 | |

Xk32-3a-3573 | SANDY-CLAY | 21.37 | 21.37 |

Xk33-3a-3574 | CLAY-LOAM | 18.89 | 19.05 |

Xk5-2-3a-3578 | CLAY-LOAM | 0.16 |

FAO soil layer^{a} (1 km^{2} cells). | Area (%) . | ||
---|---|---|---|

Name . | Soil type (FAO classification) . | Partial . | Aggregated . |

I-Rc-Yk-c-3508 | LOAM | 29.67 | 59.56 |

Rc37-3c-3552 | LOAM | 29.89 | |

Xk32-3a-3573 | SANDY-CLAY | 21.37 | 21.37 |

Xk33-3a-3574 | CLAY-LOAM | 18.89 | 19.05 |

Xk5-2-3a-3578 | CLAY-LOAM | 0.16 |

^{a}The Food and Agriculture Organization (FAO) is a specialized agency of the United Nations that leads international efforts to defeat hunger. The goal is to achieve food security for all and make sure that people have regular access to enough high-quality food to lead active, healthy lives. With over 194 member states, FAO works in over 130 countries worldwide. They believe that everyone can play a part in ending hunger. http://www.fao.org/soils-portal/soil-survey.

The historical meteorological period was selected from January 1980 to December 2013 and included the daily precipitation and the maximum and minimum temperatures. The meteorological and hydrological data were extracted from one Synoptic station in Shiraz, two rain gauge stations, and three hydrological stations for model calibration and assessment including Bagh Safa, Chenar Sokhte Azam, and Chenar Sokhte Khoshk stations. For precise simulation, the major crops in the region were modeled according to the comprehensive reports provided by the Iranian Ministry of Energy. These crops include almonds, apples, peaches, tomatoes, olives, winter wheat, and grapes. The agricultural area is 25,840 ha (17% of the total area).

### Sensitivity analysis: calibration and validation

The accuracy of the SWAT simulations is highly dependent on the calibration and the validation processes (Zhang *et al.* 2019). The observed stream flow was obtained from three hydrologic stations. The observed data were divided into two parts before running the SWAT simulations: one was used for the model calibration and the other for the model validation.

The SWAT-CUP app is available to calibrate and analyze the uncertainty of the SWAT model. This application is capable of performing calibration and uncertainty analyses using Sequential Uncertainty Fitting (SUFI2), Particle Swarm Optimization (PSO), Generalized Likelihood Uncertainty Estimation (GLUE), the Parameter Solution (ParaSol) and Markov Chain Monte Carlo (MCMC) algorithms (Abbaspour 2007). In this study, the SUFI2 algorithm, which is widely used to calibrate the SWAT model, was used for the calibration and uncertainty analyses. The SUFI-2 algorithm is known as an inverse optimization method based on the Bayesian framework, in which all sources of uncertainty, including the uncertainties of the parameters, the input variables like precipitation, the model conceptualization, and the measured data in the domain introduced for each parameter are considered. The initial range of the parameters after each iteration is substituted using a new range of parameters obtained from the model, and a limited uncertainty domain of the parameters is calculated. This procedure continues until the two quantifying uncertainty indices and the R-Factor-P-Factor reach a proper limit (Abbaspour 2007). In the SUFI-2, there were 11 statistical indices, including the product of the RMSE, the sum of the RMSE, the interpreting coefficient, the Nash-Sutcliffe coefficient, and the PBIAS method.

### Machine learning models

#### Extreme machine learning (EML)

Extreme Machine Learning (EML) has three main layers consisting of the input layer, the hidden layer, and the output layer. As with any general neural network, the input data is first presented to the network and then each of these input data enters the neurons in the hidden layer. In the hidden layer, the characteristics are selected by mapping the input data to the N-dimensional space, where the most important and relevant characteristics are extracted effectively (Roul *et al.* 2015). The output of the hidden layer is then processed by the output layer neurons for classification or regression. The distinctive characteristic of the EML compared to the conventional neural network methods is that the network training process is performed only by adjusting the output layer values, while the weights of the hidden layer neurons are randomly initialized and remain unchanged in the training process. This is a unique advantage of fast network training, which reduces the computational volume.

*i*in

*M*.

*M*is the total number of the units in the output layer with activation functions for . Also,

*N*denotes the number of hidden units with active functions for .

*H*are expressed by the training vector

*T*and as Equation (3). The inverse of matrix can be obtained using the orthogonal projection method where . As a result, the output weight matrix can be obtained from Equation (4) (Zyarah & Kudithipudi 2017).

#### Support vector regression (SVR)

*y*is a function of

*n*input variables

*x*, we assume that a training dataset of length

*N*is , where and . In fact, s are dimensional vectors that take the values of each input in step

*k*and s are the output variable values in step

*k*. The basic SVR formula for a linear model is as follows:where,

*y*is the estimated output of the model,

*w*is the weight vector and/or an element of the feature space of the problem, and

*b*is the bias value. However, most real-world problems cannot be modeled using linear shapes. For nonlinear modeling, the SVR method uses nonlinear functions called Kernel functions. These nonlinear functions are responsible for mapping data from the input space to the feature space. So, the previous equation can be rewritten as follows:

*W*and

*b*, so that the value of the cost function is minimized (Ma

*et al.*2019).where slack variables and ,( are adopted to measure the deviations outside the -insensitive zone, and parameter C is a positive constant called the regularization parameter, which determines a compromise between the model complexity and the amount up to which deviations larger than should be tolerated (Qu & Zuo 2012). This optimization problem is often solved in its dual form, in which the problem constraints are applied using Lagrange multipliers, and Lagrange () is minimized as follows:where, , , and are Lagrange multipliers. The above statement, together with the Kuhn-Tucker complementary terms, produces a quadratic programming problem as follows:wherewhile is a Kernel function, which handles the data transfer from the input space to the feature space. The solution of the above Quadratic Programming (QP) problem is obtained from the optimization of the parameters and for such that for each training dataset, there is a pair of and . Achieving a global optimum is guaranteed by the use of a QP problem. Finally, the model form in the dual space can be written as follows:

This function has a single parameter that calculates the width of the function (Kavaklioglu 2011).

#### Least square SVR (LSSVR)

*et al.*2002; Qu & Zuo 2012). The LSSVR model is formulated as:where a mapping is directly adopted to allow the LSSVR to work in a linear space, as for the regular SVR. Parameter C is the regularization parameter, which needs to be pre-specified. Equation (14) has equality constraints resulting from the squared loss function. Using the Lagrange multiplier method for Equation (14) yields an un-constrained optimization problem (Qu & Zuo 2012):where s are the Lagrange coefficients. The optimization conditions are as follows:

*et al.*2019):

#### Method of modeling machine learning models

In order to develop the models studied in this paper, one part of the collected data should be randomly separated for the model training and another for the model performance evaluation. For this purpose, after improving the data quality through the pre-processing process, the data from 1980 to 1996 were used to train the models and the data from 1997 to 1999 were used for testing and validation. Also, in order to remove the effect of the variable interval on the modeling performance, all variables were normalized in a range between zero and one. The modeling process by the methods studied in this paper is illustrated as a flowchart in Figure 4, through which the generalizations of these methods can be well understood.

One of the most important components affecting the SVR and LSSVR models is the type of Kernel function used. Three Gaussian, linear, and polynomial functions are the functions commonly used as the Kernel functions in support vector machine models. In this study, the Gaussian Kernel function was used to develop the two models SVR and LSSVR. The training process of these two models is similar to that of the EML method, the training model is first produced by the training dataset and is finally evaluated and validated by the test dataset.

#### Evaluation criteria for model comparison

In the present paper, the performance of the models used for estimating the streamflows were evaluated by the use of the RMSE, the Determination Coefficient (R^{2}), and the Nash-Sutcliffe efficiency (NSE), which are most widely used in hydrology studies. These statistics are defined in Table 3.

Equation . | Rang . |
---|---|

) | [0,1] |

Equation . | Rang . |
---|---|

) | [0,1] |

*Yi ^{obs}* is the

*i*

^{th}observation for the constituent being evaluated,

*Yi*is the i

^{sim}^{th}simulated value for the constituent being evaluated,

*Y*is the mean of the observed data for the constituent being evaluated, and n is the total number of the observations.

^{mean}R^{2} describes the degree of the collinearity between the simulated and the measured data and the proportion of the variance in the measured data explained by the model (Santhi *et al.* 2001; Moriasi *et al.* 2007). The Nash-Sutcliffe efficiency (NSE) is a normalized statistic that determines the relative magnitude of the residual variance compared to the measured data variance (Nash & Sutcliffe 1970). The NSE indicates how well the plot of the observed versus the simulated data fits the 1:1 line.

The Root-Mean-Square Deviation Error is the difference between the value predicted by the model and the actual value. The closer it is to zero, the less it is.

## RESULTS AND DISCUSSION

### Results of the SWAT model

The first step in the calibration and validation of the model is to identify the parameters that affect the variables. The effects of the parameters on the variables varies depending on the hydrological conditions in different basins. Each parameter in the SWAT model has a specific impact on the streamflow or on the other components of the water balance. Since each of these parameters can have different effects alone or together, the sensitivity analysis of the input parameters to the model was performed using the SWAT-CUP program. The sensitivity analysis was performed using two t-stat and *p*-value parameters that determine the effect of each parameter on the variables. The higher the absolute value of t-stat is [±∞] and the closer the *p*-value is to zero, the greater the effect of the parameter changes on the variable is (Abbaspour *et al.* 2015).

In the SWAT Model, crack flow is defined as a process during which water directly infiltrates into the aquifer via cracks and gaps. To control crack flow, it is essential to calibrate the parameters linked to the cracks. The CN2^{1} parameters showed the highest impacts on the streamflow (Eini *et al.* 2018). The basic flow parameters associated with the groundwater also exhibited a high sensitivity to the model. The soil parameters such as Sol_BD^{2} and Sol_AWC^{3} along with the parameters such as the surface water time delay, the sub-basin average slope, and the hydraulic conductivity of waterways, were also sensitive. Table 4 presents 16 parameters with higher sensitivity, which were selected to calibrate and verify the model. The initial values and the range of the parameters can refer to the existing research, which can save time for parameter adjustment and improve efﬁciency. Also, the performance criteria for the SWAT-Crack flow model are shown in Table 5.

Rank . | Parameter name . | Parameter name . | Initial values . | Final values . | t-stat . | p-value
. | |
---|---|---|---|---|---|---|---|

max . | min . | ||||||

1 | ALPHA_BF.gw(v) | Base flow alpha factor (days) | 1 | 0 | 0.0037 | −10.35 | 0 |

2 | CH_K2.rte(v) | Effective hydraulic conductivity in main channel alluvium. | 500 | −0.01 | 352.12 | 4.15 | 0 |

3 | OV-N.hru(v) | Manning's ‘n’ value for overland flow. | 30 | 0.01 | 12.41 | 2.41 | 0.016 |

4 | SOL_AWC.sol(r) | Soil available water storage capacity | 0 | 1 | 0.214 | 1.83 | 0.067 |

5 | GW_REVAP.gw(v) | Groundwater ‘revap’ coefficient. | 0.2 | 0.02 | 0.095 | −1.73 | 0.085 |

6 | CANMX.hru(v) | Maximum canopy storage. | 100 | 0 | 53.74 | −1.63 | 0.104 |

7 | CH_S2.rte(v) | Average slope of main channel. | 10 | −0.001 | 6.7 | 1.59 | 0.112 |

8 | GW_DELAY.gw(v) | Groundwater delay (days). | 500 | 0 | 108.7 | −1.48 | 0.14 |

9 | REVAPMN.gw(v) | Threshold depth of water in the shallow aquifer for ‘revap’ to occur (mm). | 500 | 0 | 407.1 | −1.38 | 0.168 |

10 | GWQMN.gw(v) | Threshold depth of water in the shallow aquifer required for return flow to occur (mm). | 5,000 | 0 | 2,275.8 | −1 | 0.317 |

11 | SLSUBBSN.hru(v) | Average slope length. | 150 | 10 | 72.98 | 0.79 | 0.425 |

12 | CH_N2.rte(v) | Manning's ‘n’ value for the main channel. | 0.3 | −0.01 | 0.1 | −0.68 | 0.498 |

12 | CN2.mgt(r) | SCS runoff curve number | 0.2 | −0.2 | −0.013 | −0.67 | 0.505 |

13 | SOL_BD.sol(r) | Moist bulk density. | 0.6 | −0.5 | 0.302 | −0.66 | 0.507 |

14 | ESCO.hru(v) | Soil evaporation compensation factor. | 1 | 0 | 0.59 | 0.6 | 0.544 |

15 | SOL_ALB.sol(r) | Moist soil albedo. | 0.5 | −0.5 | −0.036 | −0.55 | 0.579 |

16 | EPCO.hru(v) | Plant uptake compensation factor. | 1 | 0 | 0.56 | −0.14 | 0.888 |

Rank . | Parameter name . | Parameter name . | Initial values . | Final values . | t-stat . | p-value
. | |
---|---|---|---|---|---|---|---|

max . | min . | ||||||

1 | ALPHA_BF.gw(v) | Base flow alpha factor (days) | 1 | 0 | 0.0037 | −10.35 | 0 |

2 | CH_K2.rte(v) | Effective hydraulic conductivity in main channel alluvium. | 500 | −0.01 | 352.12 | 4.15 | 0 |

3 | OV-N.hru(v) | Manning's ‘n’ value for overland flow. | 30 | 0.01 | 12.41 | 2.41 | 0.016 |

4 | SOL_AWC.sol(r) | Soil available water storage capacity | 0 | 1 | 0.214 | 1.83 | 0.067 |

5 | GW_REVAP.gw(v) | Groundwater ‘revap’ coefficient. | 0.2 | 0.02 | 0.095 | −1.73 | 0.085 |

6 | CANMX.hru(v) | Maximum canopy storage. | 100 | 0 | 53.74 | −1.63 | 0.104 |

7 | CH_S2.rte(v) | Average slope of main channel. | 10 | −0.001 | 6.7 | 1.59 | 0.112 |

8 | GW_DELAY.gw(v) | Groundwater delay (days). | 500 | 0 | 108.7 | −1.48 | 0.14 |

9 | REVAPMN.gw(v) | Threshold depth of water in the shallow aquifer for ‘revap’ to occur (mm). | 500 | 0 | 407.1 | −1.38 | 0.168 |

10 | GWQMN.gw(v) | Threshold depth of water in the shallow aquifer required for return flow to occur (mm). | 5,000 | 0 | 2,275.8 | −1 | 0.317 |

11 | SLSUBBSN.hru(v) | Average slope length. | 150 | 10 | 72.98 | 0.79 | 0.425 |

12 | CH_N2.rte(v) | Manning's ‘n’ value for the main channel. | 0.3 | −0.01 | 0.1 | −0.68 | 0.498 |

12 | CN2.mgt(r) | SCS runoff curve number | 0.2 | −0.2 | −0.013 | −0.67 | 0.505 |

13 | SOL_BD.sol(r) | Moist bulk density. | 0.6 | −0.5 | 0.302 | −0.66 | 0.507 |

14 | ESCO.hru(v) | Soil evaporation compensation factor. | 1 | 0 | 0.59 | 0.6 | 0.544 |

15 | SOL_ALB.sol(r) | Moist soil albedo. | 0.5 | −0.5 | −0.036 | −0.55 | 0.579 |

16 | EPCO.hru(v) | Plant uptake compensation factor. | 1 | 0 | 0.56 | −0.14 | 0.888 |

Method . | Set . | RMSE . | . | NS . |
---|---|---|---|---|

SWAT-CRK | Training | 1.07 | 0.75 | 0.70 |

Testing | 1.02 | 0.88 | 0.86 |

Method . | Set . | RMSE . | . | NS . |
---|---|---|---|---|

SWAT-CRK | Training | 1.07 | 0.75 | 0.70 |

Testing | 1.02 | 0.88 | 0.86 |

The period 1980–1982 was used as the warm up period of the model, 1982–1997 was used as the calibration period of the model, and 1997–1999 was used as the validation period of the model. The observed and the simulated streamflow time series in the baseline period (1980–1999) is shown in Figure 5.

### The results of machine learning models

The daily meteorological data of Shiraz Synoptic Station were used for the data collection of the machine learning models, covering the period between 1980 and 1999. The parameters used include 20 independent variables (input variables) and one dependent variable (target variable), whose statistical information are presented in Table 6. The initial data could not be used for the modeling because of the large number of out-of-range values and had to be pre-processed before modeling. Finally, after removing incomplete and out-of-range samples, 2689 samples remained, which were used for modeling the discharge value.

No. . | Parameter . | Min . | Max . |
---|---|---|---|

1 | Year | 1980 | 1999 |

2 | Month | 1 | 12 |

3 | Day | 1 | 31 |

4 | Maximum wind speed | 0 | 14 |

5 | Maximum wind direction | 0 | 360 |

6 | Mean wind speed | 0 | 5.375 |

7 | Maximum temperature | 1.5 | 43.2 |

8 | Minimum temperature | −9.6 | 28.6 |

9 | 24-h rainfall | 0 | 99 |

10 | Mean wind speed | 0 | 5.375 |

11 | Mean temperature | −3.5 | 32.2 |

12 | Maximum daily cloudiness | 19 | 100 |

13 | Minimum relative moisture | 2 | 59 |

14 | Mean relative moisture | 10.75 | 82.62 |

15 | Mean soil temperature | −15 | 24 |

16 | Minimum sea level pressure | 994.81 | 1,027.20 |

17 | Maximum sea level pressure | 998.15 | 1,033.92 |

18 | Mean sea level pressure | 996.4 | 1,030.23 |

19 | Maximum station level pressure | 846.30 | 863 |

20 | Minimum station level pressure | 846.30 | 859.5 |

21 | Mean station level pressure | 845.1 | 861.07 |

22 | Discharge | 0 | 10 |

No. . | Parameter . | Min . | Max . |
---|---|---|---|

1 | Year | 1980 | 1999 |

2 | Month | 1 | 12 |

3 | Day | 1 | 31 |

4 | Maximum wind speed | 0 | 14 |

5 | Maximum wind direction | 0 | 360 |

6 | Mean wind speed | 0 | 5.375 |

7 | Maximum temperature | 1.5 | 43.2 |

8 | Minimum temperature | −9.6 | 28.6 |

9 | 24-h rainfall | 0 | 99 |

10 | Mean wind speed | 0 | 5.375 |

11 | Mean temperature | −3.5 | 32.2 |

12 | Maximum daily cloudiness | 19 | 100 |

13 | Minimum relative moisture | 2 | 59 |

14 | Mean relative moisture | 10.75 | 82.62 |

15 | Mean soil temperature | −15 | 24 |

16 | Minimum sea level pressure | 994.81 | 1,027.20 |

17 | Maximum sea level pressure | 998.15 | 1,033.92 |

18 | Mean sea level pressure | 996.4 | 1,030.23 |

19 | Maximum station level pressure | 846.30 | 863 |

20 | Minimum station level pressure | 846.30 | 859.5 |

21 | Mean station level pressure | 845.1 | 861.07 |

22 | Discharge | 0 | 10 |

According to Table 7, it seems that the input variables including maximum temperature, minimum temperature, rainfall, and mean soil temperature have higher values of linear correlation with the target variable (discharge). Also, the other variables presented in the above table have lower values of linear correlation due to the non-linear relationship with the discharge variable.

No. . | Parameter . | Min . | Max . | Correlation with the target variable (R) . |
---|---|---|---|---|

1 | Year | 1980 | 1999 | 0.002 |

2 | Month | 1 | 12 | −0.193 |

3 | Day | 1 | 31 | −0.011 |

4 | Maximum wind speed | 0 | 14 | −0.076 |

5 | Maximum wind direction | 0 | 360 | 0.053 |

6 | Mean wind speed | 0 | 5.375 | 0.091 |

7 | Maximum temperature | 1.5 | 43.2 | −0.83 |

8 | Minimum temperature | −9.6 | 28.6 | 0.42 |

9 | 24-h rainfall | 0 | 99 | 0.67 |

10 | Mean wind speed | 0 | 5.375 | 0.27 |

11 | Mean temperature | −3.5 | 32.2 | 0.201 |

12 | Maximum daily cloudiness | 19 | 100 | 0.247 |

13 | Minimum relative moisture | 2 | 59 | −0.26 |

14 | Mean relative moisture | 10.75 | 82.62 | 0.193 |

15 | Mean soil temperature | −15 | 24 | −0.45 |

16 | Minimum sea level pressure | 994.81 | 1,027.20 | 0.193 |

17 | Maximum sea level pressure | 998.15 | 1,033.92 | 0.167 |

18 | Mean sea level pressure | 996.4 | 1,030.23 | 0.185 |

19 | Maximum station level pressure | 846.30 | 863 | −0.024 |

20 | Minimum station level pressure | 846.30 | 859.5 | −0.009 |

21 | Mean station level pressure | 845.1 | 861.07 | −0.012 |

22 | Discharge | 0 | 10 | 1 |

No. . | Parameter . | Min . | Max . | Correlation with the target variable (R) . |
---|---|---|---|---|

1 | Year | 1980 | 1999 | 0.002 |

2 | Month | 1 | 12 | −0.193 |

3 | Day | 1 | 31 | −0.011 |

4 | Maximum wind speed | 0 | 14 | −0.076 |

5 | Maximum wind direction | 0 | 360 | 0.053 |

6 | Mean wind speed | 0 | 5.375 | 0.091 |

7 | Maximum temperature | 1.5 | 43.2 | −0.83 |

8 | Minimum temperature | −9.6 | 28.6 | 0.42 |

9 | 24-h rainfall | 0 | 99 | 0.67 |

10 | Mean wind speed | 0 | 5.375 | 0.27 |

11 | Mean temperature | −3.5 | 32.2 | 0.201 |

12 | Maximum daily cloudiness | 19 | 100 | 0.247 |

13 | Minimum relative moisture | 2 | 59 | −0.26 |

14 | Mean relative moisture | 10.75 | 82.62 | 0.193 |

15 | Mean soil temperature | −15 | 24 | −0.45 |

16 | Minimum sea level pressure | 994.81 | 1,027.20 | 0.193 |

17 | Maximum sea level pressure | 998.15 | 1,033.92 | 0.167 |

18 | Mean sea level pressure | 996.4 | 1,030.23 | 0.185 |

19 | Maximum station level pressure | 846.30 | 863 | −0.024 |

20 | Minimum station level pressure | 846.30 | 859.5 | −0.009 |

21 | Mean station level pressure | 845.1 | 861.07 | −0.012 |

22 | Discharge | 0 | 10 | 1 |

The noise in the data is an alternative problem in the hydrological time series that affects the model performance adversely. Also, the obtained flow measurements are often subject to many errors. The presence of noise in the hydrological data is so serious that the noise level in these data has been proven to be between 5 and 15%, which is sufficient to reduce the model performance and produce unreliable results (Sivakumar *et al.* 1999). One of the fundamental problems of the data mining is the data noise reduction. The noise in the data is an inevitable problem that affects the performance of the model adversely. In this study, after removing the out-of-range points, a noise reduction filter was used to remove the data noise. The filter is based on creating a window of a constant length and then calculating the mean value per window, thereby reducing the data noise. In Figure 6, the flow rates of the two measured and the noise-reduced states are compared.

In order to develop the EML model, selecting the number of neurons in the hidden layer is one of the basic steps before starting the modeling process by this method. As stated in the theory section of this method, the EML model removed the computation of the number of hidden layers, making it much more efficient than the MLP neural networks. Also, in this method, adjusting the weights between the input layer and the hidden layer is not involved in the training process and the main focus is on neuronal computation. In this study, the trial-and-error approach with the Mean Squared Error (MSE) was used to select the number of the neurons and the type of the activation functions. Different number of neurons and activation functions were studied to obtain the Mean Squared Error. Finally, it was found that the EML model with a structure consisting of a hidden layer with 400 neurons containing the sigmoid activator function provided the best performance in the discharge approximation. After training the EML model by the training dataset (the data from 1980 to 1996), the model was evaluated by the test dataset (the data between 1997 and 1999).

Figures 7–9 show the results of the point diagrams and the fit lines on these points, which can be used to analyze the model error in both training and testing sections as well as the correlation between the actual and the estimated values. In these diagrams, the closer those points are to the first-quarter bisector, the better the prediction will be, and vice versa, the farther away the points are, the worse the data match is. In Figures 10 and 11, the actual and the estimated values of the three EML, SVR, and LSSVR models are compared for the two training and test sections, respectively. Also in Figure 12, the RMSE values of the three developed models are compared. According to Figure 12 it can be said that the results of the LSSVR model have the least error for estimating the streamflow rate and are more accurate than those of the other models.

### Comparison of the SWAT and the computational intelligence methods

In order to evaluate the error of the developed models and the SWAT model, different statistical parameters were used. As shown in Table 8, the LSSVR model with the least error shows the best performance compared to the different models as well as the SWAT model. The results from evaluating the efficiency of the computational intelligence methods show that the models had less error at the validation stage than at the calibration stage because the model learning was performed by the training data. The EML model takes less computation than the MLP neural network method does. The reduced computation in this method has caused the overfitting problem not to occur during the model training process and the model evaluation results to be accurate in spite of the large increase in the hidden layer units. Also, given that one of the important cases for developing the SVR and the LSSVR models is to determine the Kernel function, the results showed that the Gaussian Kernel (radial basis) provides a very good accuracy for estimating the streamflow. However, among the models studied in this paper, the best results were obtained from the LSSVR method. The RMSE values obtained from the LSSVR method for the training and test datasets were 0.038 and 0.057, respectively, which presented less error than the two EML and SVR methods do.

Method . | Set . | RMSE . | . | NS . |
---|---|---|---|---|

ELM | Training | 0.172 | 0.954 | 0.954 |

Testing | 0.299 | 0.854 | 0.85 | |

LSSVRL | Training | 0.038 | 0.997 | 0.997 |

Testing | 0.057 | 0.994 | 0.994 | |

SVR | Training | 0.161 | 0.960 | 0.9601 |

Testing | 0.199 | 0.935 | 0.9355 | |

SWAT-CRK | Training | 1.07 | 0.75 | 0.70 |

Testing | 1.02 | 0.88 | 0.86 |

Method . | Set . | RMSE . | . | NS . |
---|---|---|---|---|

ELM | Training | 0.172 | 0.954 | 0.954 |

Testing | 0.299 | 0.854 | 0.85 | |

LSSVRL | Training | 0.038 | 0.997 | 0.997 |

Testing | 0.057 | 0.994 | 0.994 | |

SVR | Training | 0.161 | 0.960 | 0.9601 |

Testing | 0.199 | 0.935 | 0.9355 | |

SWAT-CRK | Training | 1.07 | 0.75 | 0.70 |

Testing | 1.02 | 0.88 | 0.86 |

## SUMMARY AND CONCLUSION

Determining the flow of rivers is one of the most important and effective components in the management of water resources in watersheds. In this study, the performances of the semi-distributed SWAT model and three computational intelligence methods were evaluated for monthly simulation of Maharlu Lake Basin stream flow. The methods used for modeling and estimating the flow include three methods of the Extreme Machine Learning (EML), the Least Squares Support Vector Regression (LSSVR), and the Support Vector Regression (SVR) and the process of developing these models is performed by the software MATLAB 2018. In order to enhance the quality level of the data, the blank lines of the data and the out-of-range data were first removed. Then, the data noise was removed by a mean weight-based noise reduction filter. The study of the diagram comparing the measured values and the noise reduction values indicated by the filter showed that this filter can very well remove the noise data by preserving the original signal form. According to the results of the correlation between the input data and the stream flow variable, it was proved that the four variables of maximum temperature, minimum temperature, 24-h rainfall, and mean soil temperature have a good correlation with the objective variable, so can be effective variables for estimating the watershed flow.

In addition, after preparing the required data, such as DEM, land use, and soil maps, the SWAT model was calibrated using the SUFI-2 algorithm as a semi-auto calibration procedure.

Studies show that the models perform well in simulating the watershed monthly flows, but the support vector machine least squares methodology had higher accuracy in estimating monthly flows. The structure of this method is based on the solution of the linear optimization problem of the support vector and has equivalents whose opposite Lagrangian coefficients are nonzero. Here, the data are defined as the support vectors.

The study results show that the support vector machine method is used in cases where complex systems cannot be defined by mathematical equations or problems of data disturbance or model recognition as well as in areas with defects of statistics. These models with the minimum input parameters show acceptable results and given that these models are based on the principle of the structural error minimization, so for simulation, using the learning method that supervises the radial basis functions makes parameter estimations have higher rate and lower error than other Kernels do, and this is one of the distinctive characteristics of radial basis functions. On the one hand, the SWAT model is a semi-distributed-conceptual model with a very high accuracy in solving problems. The reason for using this model is that it simulates all hydrological phenomena in watersheds and is not limited to large volumes of information. This model is especially useful when different areas of the watershed have different soil or land use so that the heterogeneity and difference can affect the hydrology of the watershed. Given that the implementation of this model requires a large amount of catchment information and usually due to the complexity and large volume of the data, it requires computers with high processing ability and more time to analyze the uncertainty. Therefore, in studies where we are faced with the lack of information and time, it is justifiable to use models such as vector machine that can be implemented with the least available data and provide acceptable results. In addition, in studies in which simulation is used for a comprehensive investigation of hydrological events and their effects on each other, using hydrological models such as SWAT is recommended.

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

The number of curves in the average moisture conditions of the basin.

Moist bulk density.

Available water capacity of the soil layer.