Modelling runoff in an arid watershed through integrated support vector machine

Modelling of runoff is a significant practice in water resources engineering. Therefore, discovering consistent and advanced methods for prediction of runoff is crucial for hydrologic processes. Here, a narrative integrated intelligence model attached with PSR (phase space reconstruction) is anticipated to estimate runoff for five watersheds of Balangir, Odisha, India. Monthly monsoon precipitation, temperature, humidity data of five watersheds over 28 years (1990–2017) are employed and validated. Here, the proposed model is an integration of support vector machine (SVM) with firefly algorithm (FFA) and PSR. Various indices such as NSE (Nash–Sutcliffe), RMSE (root mean square error) and WI (Willmott’s index) are used to find the performance of the model. The developed PSR-SVM-FFA model demonstrates pre-eminent WI value ranging from 0.97 to 0.98 while the SVM and SVM-FFA models encompass 0.92 to 0.93 and 0.94 to 0.95, respectively. Also, an assessment of data from the suggested model is schemed and validated. The proposed PSR-SVM-FFA model gives better accuracy results and error limiting up to 2–3%.


INTRODUCTION
Improvement in water resources is essential for balancing water demand around the globe. Due to the complex nature and uncertainty in water parameters, traditional methods like statistical method, Dickens formula (1865), Ryves formula (1984), and Khosla formula (1960) are insufficient to understand the cycle of rainfall-runoff (Das 2009). Innovative computational algorithms and their application to water resources engineering are necessary to understand the behaviour of the water cycle in the context of climate change. Computational algorithms in view of modern techniques like FFA, SVM and PSR are important for engineering applications. Attempts have also been made to apply these techniques to water resources engineering (Tayfur 2014). Tabari et al. (2012) employed the SVM, ANFIS (adaptive neuro-fuzzy inference system), MLR (multiple linear regression) method utilizing six climate data variables as inputs to estimate evapotranspiration at Nozheh station, Iran. Outcomes suggested that SVM and ANFIS models perform better with greater precision than those of regression and climate-based models. Baydaroglu & Kocak (2014) applied support vector regression (SVR) to envisage water loss using solar radiation, relative humidity, wind speed, evaporation and temperature as input. SVR helps in predicting evaporation effectively since it has a good generalization ability with greater determination coefficients of 83 and 97% for univariate and multivariate time series entrenching. Raghavendra & Deka (2014) in their review found that SVM is more suitable for prediction than other techniques pertinent to hydrology. Cha et al. (2014) projected the SVM-FFA model for more precise forecast of malaria incidences in the Jodhpur and Bikaner area. Performance of the projected hybrid model was compared with ANN, autoregressive moving average as well as SVM and results indicated that the proposed integrated model provides forecasts with better accuracy as compared to traditional methods. The FNN (feed forward neural network) technique was used for monthly stream flow forecasting for a period of 53 years in the United States by Vignesh et al. (2015). Obtained outcomes are valuable in identifying suitable model intricacy at discrete stations, arrangements across regions and sub-regions, interpolating and extrapolating data, and classifying catchments. SVM-FFA and SVM-wavelet were used to evaluate precipitation trends in 29 meteorological stations in Serbia from 1946 to 2012 by Gocic et al. (2016). Estimation and prediction outcomes of hybrid models were contrasted and results showed that the SVM-wavelet approach gives improved prediction accuracy and ability to generalize. Moghaddam et al. (2016) implemented SVM-FFA for predicting the exhaustion lifetime of polyethylene terephthalate (PET) reformed asphalt mixture taking PET percentages, stress levels and environmental temperatures as inputs. Prediction results by SVM-FFA were compared against ANN and genetic programming (GP) and it was found that the hybrid model performs better. Al-Shammari et al. (2016) projected a hybrid method integrating SVM with FFA for predicting daily dew point temperature (T dew ) for an Iranian city and performance of the hybrid model is compared to SVM, ANN and GP. They observed that SVM-FFA is certainly very efficient in predicting T dew with better accuracy and consistency. Integrated PSR-ANN and ANN techniques were utilized to predict daily river flow on river gauging stations in the USA by Delafrouz et al. (2017). The outcomes of PSR-ANN, ANN and gene expression programming models were compared and it was found that the projected hybrid model gives the best prediction accuracy for daily river flow. Mehr et al. (2018) employed integration of FFA and SVM for rainfall forecasting on a monthly basis at Tabriz and Urmia stations, Iran. The effectiveness of the hybrid model was cross-checked with SVR and GP-based forecasting models and it was found that the SVM-FFA model performs well with promising accuracy in terms of rainfall forecasting. Ghose & Samantaray (2019) employed LRNN (layer recurrent NN), RBFN (radial basis function NN) and FFBPN (feed forward back propagation NN) to evaluate runoff as a loss function of evapotranspiration, temperature and precipitation. Outcomes revealed that performance of the LRNN model is best as compared to FFBPN and RBFN for predicting runoff in the watershed which helps in planning, designing and managing hydraulic structures in the neighbourhood of the watershed. Zaini et al. (2018) used an SVM model and also a hybridized SVM-PSO (particle swarm optimization) model to scheme daily stream flow at Upper Bertam watershed, Malaysia. Hybridized models SVM-PSO1 and SVM-PSO2 perform best in comparison to SVM1 and SVM2 at forecasting river flow 1-7 days ahead. Tao et al. (2018) used SVM and SVM-FFA to anticipate rainfall at Chhattisgarh. The proposed hybrid model significantly improved forecasting accuracy and can also be used for monthly rainfall forecasting in provincial areas of India. Gandomi et al. (2011) utilized FFA to solve mixed variable structural optimization problems and also the implications of FFA are thoroughly analysed in comparison with PSO, GA, simulated annealing for future research. Ji & Sun (2013) presented a method for multitasking multiple classes of SVM on the basis of minimizing regularization functions and found it to be best among other multitask learning methods. Shamshirband et al. (2015) proposed a hybrid SVM-FFA model for estimating monthly mean horizontal global solar radiation (HGSR). Performance of the hybrid model is evaluated by comparing it with ANN, GP and ARMA (autoregressive moving average) models and results revealed that the projected hybrid model performs best.
Several researchers have found different individual algorithms for predicting runoff. Some have also developed optimization methods and least integrated methods to predict runoff. Thus, an attempt has been made to think about the hybrid model via integral algorithms. The objective of this research is to compare results of the SVM-FFA, PSR-SVM-FFA model with the SVM and empirical models to tune up the magnitude of runoff for controlling flood in the region. Basically, it is applicable for efficient planning and management of water resources such as flood control and management of a watershed.

STUDY AREA
Balangir district is located in the west part of Odisha, India, with an area of 5,165 km 2 , as shown in Figure 1. It lies between the latitudes of 20°59 0 00″ and longitudes of 83°32 0 22″. It has an average elevation of 115 metres (377 feet). The watershed is located towards the mid-north edge of Balangir district. Five gauging stations, Loisinga, Balangir, Tushura, Saintala and Patnagarh, are considered in the present research.  Physical and statistical characteristics of the gauging stations are described in Table 1. Here, prediction of runoff is assessed by taking the dataset from 1990 to 2017. Mean monthly precipitation and temperature data for the monsoon months (May to October) from the period 1990-2017, spanning over 28 years, are obtained from IMD (India Meteorological Department) Bhubaneswar. Runoff data are collected from the soil conservation office, Bolangir. Here, the runoff data are computed using Khosla's empirical equation to understand the coherence between developed empirical data and observed data collected from that department.

METHODOLOGY Khosla formula
Khosla considered precipitation, discharge and temperature data for different catchments in India and the USA to arrive at a pragmatic connection as follows in Equation (1). It is circuitously anchored in the water-balance perception and mean monthly temperature to replicate losses due to evapotranspiration: where R m is monthly runoff, P m is monthly rainfall, L m is monthly losses and T m is mean monthly temperature of the catchment.

Support vector machine model
A technique with statistical learning theory and promising empirical results for recognition of pattern with high dimension data to avoid error with decision boundary using training in terms of support vector is known as a support vector machine (Cortes & Vapnik 1995). A graphical description of the SVM model is shown in Figure 2. The principles of SVM include statistical learning known as structural risk minimization, which affords a higher boundary for generalization error of a classifier (R) concerning training error (Re), model complexity ( where w is monotonic increasing function of the model complexity. This is enviable when designing a linear classifier which maximizes margins of their decision boundaries to minimize error. In this regard, SVM is an effective linear classifier which searches a hyper plane with the largest margin called the maximal marginal classifier. The decision boundary of the linear classifier is: The training phase of SVM involves estimation of constraints weight (w), bias (b) of decision boundary from the training data set. Parameters are preferred with the below conditions: The objective function is the Lagrangian multiplier and denoted as: The linear programming is to be utilized through Karush-Kuhn-Tucker (KKT) conditions and the KKT condition is: The FFA is a natural algorithm based on the blinking manners of fireflies (Yang 2009;Abd-Elazim & Ali 2018;Lieu et al. 2018;Wang et al. 2018bWang et al. , 2018c. Since the complex nature of fireflies induces global communication among swarming particles, effective multi-objective optimization is necessary to recognize the multifaceted algorithm of FFA. In this algorithm, fireflies use flashing light to search for mates, to attract potential prey for protection from their predators. In order to achieve efficient optimal solutions, flashing light intensity induces fireflies to a more attractive position associated with objective function of the problem. There are some components in FFA, such as distance, attractiveness and movement.

Euclidian distance
Euclidian distance is the distance between two fireflies i and j at locations u i and u j , defined as Euclidean distance (r ij ) applying Equation (7), where u i,k is the k th factor of spatial coordinate u i of i th firefly and d is dimension number: Attractiveness Attractiveness task of a firefly is revealed in Equation (8): Progress of firefly i is towards an extra striking firefly j and is specified by Equation (9): where u i is the current position of firefly, b 0 Â exp (gr 2 ij ) Â (u j À u i ) is active for allowing a firefly's attractiveness to light intensity and a r À 1 2 is applied for random movement of a firefly.
The settings of firefly algorithm (FA) parameters: g (light absorption coefficient) ¼ 1.0, a (randomization parameter) ¼ 0.3, b 0 (attractiveness value) ¼ 1.0 and random number generator consistently disseminated in space [0, 1] ¼ 0.2. A flowchart of the SVM-FFA model is shown in Figure 3. Illustration of hybrid extravagance model is demonstrated in Figure 4. The models involve the subsequent modelling process (Dutta et al. 2018;Fan et al. 2018;Hajiloo et al. 2018;Sun & Wang 2018).   (i) First phase, optimal τ and m is established to accomplish MIF and FFN approaches for embattled index phase space reconstruction. In this manner, phase space will converge from input aspects and abstract critical dynamics of chaotic time series. (ii) Second phase, input matrix is assembled rooted in several dimensions X t , X tÀτ , X tÀ2τ , …, X tÀ(mÀ1)τ by considering optimal τ and m, for targeted value X tþ1 . According to the PSR technique, an input matrix is proposed (Sivakumar et al. 2001). (iii) Third phase, the hybrid SVM-FFA model is fed by the produced matrix using phase space signal.

Delay estimation
For delay, PSR allows average mutual information (AMI). For reconstruction, time delay is first local lowest AMI.
AMI is computed as:

Embedding dimension (ED) estimation
ED for PSR is appraised utilizing the false nearest neighbour (FNN) algorithm. For I point at d dimension, points X r i and adjacent point X r* I in reconstructed phase space . distance threshold where, R 2 1 (d) ¼ kX r i À X rÃ i k 2 is distance metric, d is smallest value; P fnn , per cent false neighbours and P fnn is ratio of FNN points to total PSR quantity.
where, QtÀ1 is one-month lag runoff, QtÀ2 is two-month lag runoff, QtÀ3 is three-month lag runoff, QtÀ4 is four-month lag runoff and QtÀ5 is five-month lag runoff.

Model performance evaluation
The indicators NSE, RMSE and WI are used to assess the performance of model efficiency. The formulae can be articulated as: where, O i and P i are observed and predicted i th runoff and O i is mean of observed runoff.

RESULTS AND DISCUSSION
Runoff is estimated by using the Khosla formula for five watersheds in the monsoon period (May-October) for 1990-2017. A graphical representation of mean monthly runoff is presented in Figure 5. The performances of the SVM model with different inputs for the five proposed watersheds are presented in Table 3. Three evaluating parameters, NSE, RMSE and WI, are estimated for both testing and training phases as explained below. Among the five models, the SVM5 model shows the best result with WI 0.935 and 0.942 for training and testing period, respectively, while Q tÀ1 , Q tÀ2 ,Q tÀ3 ,Q tÀ4 , Q tÀ5 are considered as input parameters for Balangir watershed. For the remaining watersheds, model SVM5 gives best WI value for both phases out of five simulations. In the case of the Loisinga watershed, the pre-eminent WI values for training and testing periods are 0.919 and 0.925 for model SVM5. For the training phase at Patnagarh, Saintala and Deogaon watersheds, principal WI values are 0.927, 0.938 and 0.945, respectively. The result for the SVM-FFA model based on NSE, RMSE and WI for testing and training periods is given in Table 4.
Here five different models are used to estimate NSE, RMSE and WI value for five projected watersheds. When Q tÀ1 , Q tÀ2 , Q tÀ3 , Q tÀ4 , Q tÀ5 is used as input scenario, WI value provides the best result. Considering Balangir, the best values for WI are 0.965 and 0.971 for the training and testing periods, respectively. For the training phase, the WI values for Loisinga, Patnagarh, Saintala, Deogaon watershed are 0.958, 0.953, 0.955, 0.961, respectively. Similarly, the WI values of the testing phase are 0.968, 0.962, 0.966 and 0.974 for Loisinga, Patnagarh, Saintala and Deogaon watershed, respectively. Results of the PSR-SVM-FFA model for both testing and training phases are presented in Table 5.  The linear scale plot of actual versus predicted monthly runoff for proposed model of projected area is shown in Figure 9. Results illustrate that estimated peak runoff is 141.208 mm, 142.313 mm and 149.272 mm for SVM, SVM-FFA and PSR-SVM-FFA against actual peak 157.22 mm for the Balangir watershed. The approximated peak runoffs are 126.4107 mm, 147.091 mm and 154.655 mm for SVM,

Comparison of best results
The SVM, SVM-FFA and PSR-SVM-FFA models are used to evaluate the performance of NSE, RMSE and WI indictors for five gauging watersheds. Assessments of performance indicators are specified in Table 6, which illustrates the efficiency of each model. Evaluating runoff is important and so the methods used here are significant for demonstrating runoff information. Therefore, calculation of RMSE, WI and NSE values are essential for predicting runoff. It is apparent that the PSR-SVM-FFA model is executed well, comparably to SVM-FFA and SVM. Accuracy of models is evaluated and assessed. Assessment of each model is demonstrated in Figure 10 in terms of linear bar scheme.