## Abstract

Modelling of runoff is a significant practice in water resources engineering. Therefore, discovering consistent and advanced methods for prediction of runoff is crucial for hydrologic processes. Here, a narrative integrated intelligence model attached with PSR (phase space reconstruction) is anticipated to estimate runoff for five watersheds of Balangir, Odisha, India. Monthly monsoon precipitation, temperature, humidity data of five watersheds over 28 years (1990–2017) are employed and validated. Here, the proposed model is an integration of support vector machine (SVM) with firefly algorithm (FFA) and PSR. Various indices such as NSE (Nash–Sutcliffe), RMSE (root mean square error) and WI (Willmott's index) are used to find the performance of the model. The developed PSR-SVM-FFA model demonstrates pre-eminent WI value ranging from 0.97 to 0.98 while the SVM and SVM-FFA models encompass 0.92 to 0.93 and 0.94 to 0.95, respectively. Also, an assessment of data from the suggested model is schemed and validated. The proposed PSR-SVM-FFA model gives better accuracy results and error limiting up to 2–3%.

## Highlights

The most relevant techniques are used for performing the output of rainfall-runoff process at a single outlet. The advances gained from a modelling point of view are considered and discussed.

The models developed using three techniques are support vector machine (SVM), support vector machine with firefly algorithm (SVM-FFA) and support vector machine and firefly algorithm with Phase space reconstruction (PSR-SVM-FFA) in the study.

The study is different from previous studies by including and comparing two new techniques, SVM-FFA and PSR-SVM-FFA, while past research has used hybridization of SVM for predicting runoff in a watershed.

Interaction of all techniques along with five different models is presented.

The model proposed reasonably predicts the runoff at a single outlet and improves the method of measuring runoff without the detailed study for all sub-outlets of the watershed.

## INTRODUCTION

Improvement in water resources is essential for balancing water demand around the globe. Due to the complex nature and uncertainty in water parameters, traditional methods like statistical method, Dickens formula (1865), Ryves formula (1984), and Khosla formula (1960) are insufficient to understand the cycle of rainfall-runoff (Das 2009). Innovative computational algorithms and their application to water resources engineering are necessary to understand the behaviour of the water cycle in the context of climate change. Computational algorithms in view of modern techniques like FFA, SVM and PSR are important for engineering applications. Attempts have also been made to apply these techniques to water resources engineering (Tayfur 2014).

Tabari *et al.* (2012) employed the SVM, ANFIS (adaptive neuro-fuzzy inference system), MLR (multiple linear regression) method utilizing six climate data variables as inputs to estimate evapotranspiration at Nozheh station, Iran. Outcomes suggested that SVM and ANFIS models perform better with greater precision than those of regression and climate-based models. Baydaroglu & Kocak (2014) applied support vector regression (SVR) to envisage water loss using solar radiation, relative humidity, wind speed, evaporation and temperature as input. SVR helps in predicting evaporation effectively since it has a good generalization ability with greater determination coefficients of 83 and 97% for univariate and multivariate time series entrenching. Raghavendra & Deka (2014) in their review found that SVM is more suitable for prediction than other techniques pertinent to hydrology. Cha *et al.* (2014) projected the SVM-FFA model for more precise forecast of malaria incidences in the Jodhpur and Bikaner area. Performance of the projected hybrid model was compared with ANN, autoregressive moving average as well as SVM and results indicated that the proposed integrated model provides forecasts with better accuracy as compared to traditional methods. The FNN (feed forward neural network) technique was used for monthly stream flow forecasting for a period of 53 years in the United States by Vignesh *et al.* (2015). Obtained outcomes are valuable in identifying suitable model intricacy at discrete stations, arrangements across regions and sub-regions, interpolating and extrapolating data, and classifying catchments. SVM-FFA and SVM-wavelet were used to evaluate precipitation trends in 29 meteorological stations in Serbia from 1946 to 2012 by Gocic *et al.* (2016). Estimation and prediction outcomes of hybrid models were contrasted and results showed that the SVM-wavelet approach gives improved prediction accuracy and ability to generalize. Moghaddam *et al.* (2016) implemented SVM-FFA for predicting the exhaustion lifetime of polyethylene terephthalate (PET) reformed asphalt mixture taking PET percentages, stress levels and environmental temperatures as inputs. Prediction results by SVM-FFA were compared against ANN and genetic programming (GP) and it was found that the hybrid model performs better. Al-Shammari *et al.* (2016) projected a hybrid method integrating SVM with FFA for predicting daily dew point temperature (T_{dew}) for an Iranian city and performance of the hybrid model is compared to SVM, ANN and GP. They observed that SVM-FFA is certainly very efficient in predicting T_{dew} with better accuracy and consistency. Integrated PSR–ANN and ANN techniques were utilized to predict daily river ﬂow on river gauging stations in the USA by Delafrouz *et al.* (2017). The outcomes of PSR–ANN, ANN and gene expression programming models were compared and it was found that the projected hybrid model gives the best prediction accuracy for daily river flow. Mehr *et al.* (2018) employed integration of FFA and SVM for rainfall forecasting on a monthly basis at Tabriz and Urmia stations, Iran. The effectiveness of the hybrid model was cross-checked with SVR and GP-based forecasting models and it was found that the SVM-FFA model performs well with promising accuracy in terms of rainfall forecasting. Ghose & Samantaray (2019) employed LRNN (layer recurrent NN), RBFN (radial basis function NN) and FFBPN (feed forward back propagation NN) to evaluate runoff as a loss function of evapotranspiration, temperature and precipitation. Outcomes revealed that performance of the LRNN model is best as compared to FFBPN and RBFN for predicting runoff in the watershed which helps in planning, designing and managing hydraulic structures in the neighbourhood of the watershed. Zaini *et al.* (2018) used an SVM model and also a hybridized SVM-PSO (particle swarm optimization) model to scheme daily stream flow at Upper Bertam watershed, Malaysia. Hybridized models SVM-PSO1 and SVM-PSO2 perform best in comparison to SVM1 and SVM2 at forecasting river flow 1–7 days ahead. Tao *et al.* (2018) used SVM and SVM-FFA to anticipate rainfall at Chhattisgarh. The proposed hybrid model significantly improved forecasting accuracy and can also be used for monthly rainfall forecasting in provincial areas of India. Gandomi *et al.* (2011) utilized FFA to solve mixed variable structural optimization problems and also the implications of FFA are thoroughly analysed in comparison with PSO, GA, simulated annealing for future research. Ji & Sun (2013) presented a method for multitasking multiple classes of SVM on the basis of minimizing regularization functions and found it to be best among other multitask learning methods. Shamshirband *et al.* (2015) proposed a hybrid SVM-FFA model for estimating monthly mean horizontal global solar radiation (HGSR). Performance of the hybrid model is evaluated by comparing it with ANN, GP and ARMA (autoregressive moving average) models and results revealed that the projected hybrid model performs best.

Several researchers have found different individual algorithms for predicting runoff. Some have also developed optimization methods and least integrated methods to predict runoff. Thus, an attempt has been made to think about the hybrid model via integral algorithms. The objective of this research is to compare results of the SVM-FFA, PSR-SVM-FFA model with the SVM and empirical models to tune up the magnitude of runoff for controlling flood in the region. Basically, it is applicable for efficient planning and management of water resources such as flood control and management of a watershed.

## STUDY AREA

Balangir district is located in the west part of Odisha, India, with an area of 5,165 km^{2}, as shown in Figure 1. It lies between the latitudes of 20°59′00″ and longitudes of 83°32′22″. It has an average elevation of 115 metres (377 feet). The watershed is located towards the mid-north edge of Balangir district. Five gauging stations, Loisinga, Balangir, Tushura, Saintala and Patnagarh, are considered in the present research.

Physical and statistical characteristics of the gauging stations are described in Table 1. Here, prediction of runoff is assessed by taking the dataset from 1990 to 2017. Mean monthly precipitation and temperature data for the monsoon months (May to October) from the period 1990–2017, spanning over 28 years, are obtained from IMD (India Meteorological Department) Bhubaneswar. Runoff data are collected from the soil conservation office, Bolangir. Here, the runoff data are computed using Khosla's empirical equation to understand the coherence between developed empirical data and observed data collected from that department.

## METHODOLOGY

### Khosla formula

### Support vector machine model

A technique with statistical learning theory and promising empirical results for recognition of pattern with high dimension data to avoid error with decision boundary using training in terms of support vector is known as a support vector machine (Cortes & Vapnik 1995).

*R*) concerning training error (

*Re*), model complexity (

*h*) and number of training examples (

*N*) to evaluate probability of error and is defined as (Chen

*et al.*2018; Thanh & Kappas 2018; Wang

*et al.*2018a):where is monotonic increasing function of the model complexity.

### Integration of SVM with FFA model fitness

The FFA is a natural algorithm based on the blinking manners of fireflies (Yang 2009; Abd-Elazim & Ali 2018; Lieu *et al.* 2018; Wang *et al.* 2018b, 2018c). Since the complex nature of fireflies induces global communication among swarming particles, effective multi-objective optimization is necessary to recognize the multifaceted algorithm of FFA. In this algorithm, fireflies use flashing light to search for mates, to attract potential prey for protection from their predators. In order to achieve efficient optimal solutions, flashing light intensity induces fireflies to a more attractive position associated with objective function of the problem.

There are some components in FFA, such as distance, attractiveness and movement.

#### Euclidian distance

*i*and

*j*at locations

*u*and

_{i}*u*, defined as Euclidean distance (

_{j}*r*) applying Equation (7), where

_{ij}*u*is the

_{i,k}*k*factor of spatial coordinate

^{th}*u*of

_{i}*i*firefly and

^{th}*d*is dimension number:

#### Attractiveness

#### Movement

*i*is towards an extra striking firefly

*j*and is specified by Equation (9):where is the current position of firefly, is active for allowing a firefly's attractiveness to light intensity and is applied for random movement of a firefly.

The settings of firefly algorithm (FA) parameters: (light absorption coefficient) = 1.0, (randomization parameter) = 0.3, (attractiveness value) = 1.0 and random number generator consistently disseminated in space [0, 1] = 0.2. A flowchart of the SVM-FFA model is shown in Figure 3.

#### Objective function

*F (Y), Y**=**(Y _{1}, Y_{2}, Y_{3}, ………..,Y_{k})^{Z}*

Fireflies' initialization, *Y _{j}(j*

*=*

*1,2,3,4,………….m)*

Concentration of light *L _{j} at Y_{j}, f(Y_{j})*

Light preoccupation coefficient *β*

While *z**<**maximum generation do*

For *j**=**1:m* all m fireflies do

For *i**=**1:i* all *m* fireflies do

If *L _{i}*

*>*

*L*then

_{j}Transfer firefly *j* towards *i* in *k* dimension

End if

Attractiveness diverges with distance *r* through *exp[-βr]*

Calculate high result and update intensity of light

End for

End for

Evaluate best value

End while

Final result and conception

### Integration of PSR coupled with SVM-FFA model

Illustration of hybrid extravagance model is demonstrated in Figure 4. The models involve the subsequent modelling process (Dutta *et al.* 2018; Fan *et al.* 2018; Hajiloo *et al.* 2018; Sun & Wang 2018).

- (i)
First phase, optimal

*τ*and*m*is established to accomplish MIF and FFN approaches for embattled index phase space reconstruction. In this manner, phase space will converge from input aspects and abstract critical dynamics of chaotic time series. - (ii)
Second phase, input matrix is assembled rooted in several dimensions

*X*by considering optimal_{t}, X_{t−τ}, X_{t−2τ}, …, X_{t−(m−1)τ}*τ*and*m*, for targeted value*X*According to the PSR technique, an input matrix is proposed (Sivakumar_{t+1}.*et al.*2001). - (iii)
Third phase, the hybrid SVM-FFA model is fed by the produced matrix using phase space signal.

#### Objective function

*s*: measurement number.

#### Delay estimation

For delay, PSR allows average mutual information (AMI).

Τ = 1 [for max. lag].

#### Embedding dimension (ED) estimation

ED for PSR is appraised utilizing the false nearest neighbour (FNN) algorithm. For I point at d dimension, points X^{r}_{i} and adjacent point X^{r*}_{I} in reconstructed phase space *{*X^{r}_{i}*},* i = 1:N*,* are false neighbours if > distance threshold where, is distance metric, *d* is smallest value; *<**per cent false neighbours* and is ratio of FNN points to total PSR quantity.

### Processing and preparation of data

Humidity, temperature and monthly average rainfall are composed of data from the meteorological department of India for a season of monsoon months (May to October) for 1990–2017. Data from 1990 to 2009 are employed for training and from 2010 to 2017 are considered for testing the model. Daily data are transformed into monthly data, which finally helps in training and testing the model. Subsequent runoff arrangements are employed as input (Table 2).

### Model performance evaluation

## RESULTS AND DISCUSSION

Runoff is estimated by using the Khosla formula for five watersheds in the monsoon period (May–October) for 1990–2017. A graphical representation of mean monthly runoff is presented in Figure 5.

The performances of the SVM model with different inputs for the five proposed watersheds are presented in Table 3. Three evaluating parameters, NSE, RMSE and WI, are estimated for both testing and training phases as explained below.

Among the five models, the SVM5 model shows the best result with WI 0.935 and 0.942 for training and testing period, respectively, while *,**,* are considered as input parameters for Balangir watershed. For the remaining watersheds, model SVM5 gives best WI value for both phases out of five simulations. In the case of the Loisinga watershed, the pre-eminent WI values for training and testing periods are 0.919 and 0.925 for model SVM5. For the training phase at Patnagarh, Saintala and Deogaon watersheds, principal WI values are 0.927, 0.938 and 0.945, respectively. The result for the SVM-FFA model based on NSE, RMSE and WI for testing and training periods is given in Table 4.

Here five different models are used to estimate NSE, RMSE and WI value for five projected watersheds. When , , is used as input scenario, WI value provides the best result. Considering Balangir, the best values for WI are 0.965 and 0.971 for the training and testing periods, respectively. For the training phase, the WI values for Loisinga, Patnagarh, Saintala, Deogaon watershed are 0.958, 0.953, 0.955, 0.961, respectively. Similarly, the WI values of the testing phase are 0.968, 0.962, 0.966 and 0.974 for Loisinga, Patnagarh, Saintala and Deogaon watershed, respectively. Results of the PSR-SVM-FFA model for both testing and training phases are presented in Table 5.

### Assessment of results for recommended model

A comparison between the PSR-SVM-FFA, SVM-FFA and SVM models for testing of all projected watersheds is presented in Figures 6–8. The pre-eminent WI values for SVM, SVM-FFA and PSR-SVM-FFA models are 0.935, 0.965 and 0.984, correspondingly, for Balangir watershed. Similarly, for Loisinga station, the paramount WI value is 0.919, 0.958 and 0.9702 for SVM, SVM-FFA and PSR-SVM-FFA models, respectively. Patnagarh, Saintala and Deogaon illustrate values for the SVM model of 0.9276, 0.9386 and 0.9459, respectively, during the testing phase. For the SVM-FFA model, the excellent values of WI are 0.953, 0.955 and 0.961 with respect to Patnagarh, Saintala and Deogaon. Similarly, the best values of the PSR-SVM-FFA model are 0.971, 0.972 and 0.982 for Patnagarh, Saintala and Deogaon, respectively.

The linear scale plot of actual versus predicted monthly runoff for proposed model of projected area is shown in Figure 9. Results illustrate that estimated peak runoff is 141.208 mm, 142.313 mm and 149.272 mm for SVM, SVM-FFA and PSR-SVM-FFA against actual peak 157.22 mm for the Balangir watershed. The approximated peak runoffs are 126.4107 mm, 147.091 mm and 154.655 mm for SVM, SVM-FFA and PSR-SVM-FFA adjacent to the actual peak 158.8 mm for Loisinga division. For Patnagarh gauging station, tangible runoff is 150.43 mm aligned with predicted runoff of 136.032 mm, 141.329 mm and 145.744 mm for SVM, SVM-FFA and PSR-SVM-FFA, respectively. Correspondingly, for Saintala watershed, the observed maximum runoff is 156.34 mm associated with estimated runoff of 151.728 mm, 147.522 mm and 144.911 mm for PSR-SVM-FFA, SVM-FFA and SVM, respectively. Similarly, evaluated peak runoffs are 149.322 mm, 141.591 mm and 140.376 mm, for PSR-SVM-FFA, SVM-FFA and SVM models with respect to genuine runoff of 154.23 mm for Deogaon watershed.

### Comparison of best results

The SVM, SVM-FFA and PSR-SVM-FFA models are used to evaluate the performance of NSE, RMSE and WI indictors for five gauging watersheds. Assessments of performance indicators are specified in Table 6, which illustrates the efficiency of each model. Evaluating runoff is important and so the methods used here are significant for demonstrating runoff information. Therefore, calculation of RMSE, WI and NSE values are essential for predicting runoff. It is apparent that the PSR-SVM-FFA model is executed well, comparably to SVM-FFA and SVM. Accuracy of models is evaluated and assessed.

Assessment of each model is demonstrated in Figure 10 in terms of linear bar scheme.

## CONCLUSIONS

The efficiency of SVM, SVM-FFA and PSR-SVM-FFA approaches is explored for runoff prediction. Five gauging stations in Bolangir district, India are used for execution over a period of 28 years. The performance accuracy of PSR-SVM-FFA models is demonstrated in contrast to SVM-FFA and SVM models. It is appraised with various statistical indicators, WI, NSE and RMSE parameters, to reveal the projected PSR-SVM-FFA model is superior to SVM-FFA and SVM. It is observed that there is a significant improvement in the performance of the selected hybrid model for predicting the accuracy of each watershed. It is observed that the development of PSR coupled with the hybrid SVM-FFA model is superior to SVM models because of comprehensive withdrawal of input arrangements to produce targeted values. The PSR-SVM-FFA model is established to be a suitable model for runoff forecasting in arid watersheds. The research carried out over five gauge watersheds will help in the widespread good performance of non-gauging arid watersheds to predict runoff in similar conditions. Also, the present research findings can be used for irrigation and water resources engineering to design hydraulic structures regarding agriculture.