## Abstract

This study employed soft computing techniques, namely, support vector machine (SVM) and Gaussian process regression (GPR) techniques, to predict the properties of a scour hole [depth (ds) and length (Ls)] in a diversion channel flow system. The study considered different geometries of diversion channels (angles and bed widths) and different hydraulic conditions. Four kernel function models for each technique (polynomial kernel function, normalized polynomial kernel function, radial basis kernel, and the Pearson VII function kernel) were evaluated in this investigation. Root mean square error (RMSE) values are 8.3949 for training datasets and 11.6922 for testing datasets, confirming that the normalized polynomial kernel function-based GP outperformed other models in predicting Ls. Regarding predicting ds, the polynomial kernel function-based SVM outperforms other models, recording RMSE of 0.5175 for training datasets and 0.6019 for testing datasets. The sensitivity investigation of input parameters shows that the diversion angle had a major influence in predicting Ls and ds.

## HIGHLIGHTS

Soft computing implementation for prediction of the properties of scour hole.

Benchmarking of SVM and GP-based data-intelligent models.

The diversion angle had a major influence in predicting the properties of scour hole.

## INTRODUCTION

The use of unlined diversion channels for irrigation, domestic, or hydropower projects has introduced concern over the scouring of the beds of such channels. The scouring reduces the cover of related hydraulic structures' foundations due to sediment transport; thereby influencing their stability (Hoffmans & Pilarczyk 1995; Khwairakpam & Mazumdar 2009). Diversion channels may occur naturally as well in rivers, known as river bifurcations; however, they are commonly unstable, evolve in time splits, and merge due to the annual dynamics of river geomorphology (Kleinhans *et al.* 2013; Herrero *et al.* 2015; Redolfi *et al.* 2016).

Owing to the importance of flow behavior in diversion channels for water and sedimentation management, numerous studies (Ramamurthy & Satish 1988; Ramamurthy *et al.* 1990; Hsu *et al.* 2002; Mignot *et al.* 2013, 2014; Seyedian *et al.* 2014; Xu *et al.* 2016; Momplot *et al.* 2017) have been conducted to investigate different phenomena that accompany diversion flow. Despite the presence of bed morphology as a very important factor in designing diversion channels (Xu *et al.* 2016), the majority of these researches are about diversion channels with rigid boundary conditions, while the effects of bed morphology in unlined diversion channels were ignored.

Flow in diversion channels with sand bed conditions was studied by Barkdoll *et al.* (1999), Dehghani *et al.* (2009), Herrero *et al.* (2015), Alomari *et al.* (2018), Abdalhafedh & Alomari (2021), and different diversion angles were considered by Keshavarzi & Habibi (2005) and Alomari *et al.* (2020). Although Barkdoll *et al.* (1999) and Herrero *et al.* (2015) investigated flow in unlined diversion channels with a diversion angle of 90°, Alomari *et al.* (2020) found that the diversion channel received maximum water discharge and minimum sediment discharge when its diversion angle was 30° or 45° among 90°, 75°, 60°, 45°, and 30° diversion angles. Moreover, the optimum diversion angle based on intake separation zone size was found to be at an angle of 55°, as reported by Keshavarzi & Habibi (2005).

Several studies (Nakato 1984; Kerssens & van Urk 1986; Nakato *et al.* 1990; Nakato & Ogden 1998; Michell *et al.* 2006) were conducted to investigate the effect of sediment transport in diversion channels, and hence, several physical hydraulic models were provided. Furthermore, the minimum foundation depth that is safe from the effects of scour in rivers and unlined bed channels has been studied for many hydraulic structure types, such as rock structures (Pagliara *et al.* 2016), pile groups (Amini *et al.* 2012), complex piers (Amini *et al.* 2011; Solaimani *et al.* 2017), submerged obstacles (Euler & Herget 2012), and spur dykes (Duan *et al.* 2009). Moreover, a scour hole was observed by Barkdoll *et al.* (1999), Herrero *et al.* (2015), Alomari *et al.* (2018, 2020) at the bed of the main channel down diversion channel conjunction due to the secondary vortex.

Since the presence of soft computing techniques to deal with different time-consuming and difficult engineering problems, (Taylor & Meldrum 1994; Duch 2007; Aggarwal *et al.* 2013; Liu *et al.* 2017; Dibs *et al.* 2018), different soft computing models, such as support vector machines (SVM), random forest (RF), and Gaussian process regression (GPR) have been employed to solve various water resources engineering problems (Ehteram *et al.* 2021; Sihag *et al.* 2021; Yaseen *et al.* 2021) and they are recommended to be applied for analyzing scour depth and length problems (Moradi *et al.* 2019).

The main objective of this study is to evaluate the performance of two soft computing techniques, namely, SVM and GPR, in predicting the properties of scour holes [scour depth (ds) and scour length (Ls)] due to diversion channels with considering different geometries of diversion channels (angles and bed widths). Moreover, the effects of water flow, depth, and velocity in the main channel on (ds) and (Ls) were considered in this study.

## METHODOLOGY

### Experimental set-up

The sand was used for the bed of the diversion and main channels with a medium particle diameter of 0.4 mm (*σ*_{g} = 1.46 and *ρ*_{s} = 2,530). The sand layer was 0.18 m thick and prior to starting each experiment, the bed was checked, filled, and flattened. Flow in the diversion channel system was re-circulated by collecting water and sediment at the system's ends and pushing it back into the system. A flow meter was fitted at the pumping pipe to measure the overall discharge and a control valve was used to regulate the overall discharge.

To alleviate flow turbulence, a mesh was added at the main channel entry. Moreover, the surface of the soil at the inlet and outlets of the working section was covered by paces of Perspex to protect against the local scour. Controlling the discharge of the diversion channel flow was accomplished by inserting teeth-shaped pieces with 30% flow opening and 70% contraction at the ends of both channels. Using the teeth-shaped components helps to control the ratio of diversion to total discharge and lets the sand move through the channels without getting stuck at the channel's outlets. At selected points, the depth of the water and the scour were measured with precision up to parts of a millimeter using a vertical point gauge. Scour depth was measured in 30-min time intervals at the first 2 h, then once each hour.

### Experimental tests

A total of 75 experiments were accomplished experimentally as detailed in Table 1. Throughout each experiment, the discharge in the diversion channel was measured using the volumetric approach and the discharge in the downstream main channel (Qd) was determined using the continuity equation. Before each experiment, the sand bed was smoothed. Then, an experiment was started by closing the channel's outlets and gently supplying the water to an appropriate depth into the channels and then opening the outlets by setting the desired discharge. Each experiment lasted 12 h to complete. The temperature in the laboratory was kept at around (27 °C ± 1.5 °C) throughout the study.

Diversion angle (θ°)
. | Bb/Bm = Br (%)^{a}
. | Total discharge, Qu (L/s) . |
---|---|---|

30 | ||

45 | ||

60 | 29, 38, and 48 | 7.25, 8.5, 9.75, 11, and 12.25 |

75 | ||

90 |

Diversion angle (θ°)
. | Bb/Bm = Br (%)^{a}
. | Total discharge, Qu (L/s) . |
---|---|---|

30 | ||

45 | ||

60 | 29, 38, and 48 | 7.25, 8.5, 9.75, 11, and 12.25 |

75 | ||

90 |

^{a}Bb/Bm is the bed width ratio of the main channel to the diversion channel.

### Soft computing analysis

The effectiveness of two soft computing techniques, namely, SVM and GPR, in predicting the length and depth of the scour in the diversion channel flow system was evaluated. The details of the two soft computing techniques are presented below.

#### Support vector machines

As per Cortes & Vapnik (1995), the SVM can be described as classification and regression methods, which are derivatives of the theory of statistical learning. The concept of optimum class separation serves as a major base for the SVM classification methods. If the classes are separable, it has been suggested that this technique chooses, among an unlimited quantity of linear classifiers, the one that records the minimum error, resulting from structural risk minimization. Hence, the carefully chosen hyperplane would leave the greatest error between the two classes (Cortes & Vapnik 1995).

If both classes are inseparable, SVM will attempt to locate the hyperplane that simultaneously maximizes the margin and minimizes a quantity proportional to the number of misclassification errors. A positive constant must be chosen beforehand; it determines the tradeoff between margin error and misclassification error. There can be a further extension of this particular technique of designing an SVM for the countenance of the nonlinear decision surfaces. Cortes & Vapnik (1995) have suggested that this can be consummated by the projection of the variables (original set) into a higher dimensional feature space and by the formulation of a linear grouping problem in the feature space.

Cortes & Vapnik (1995) suggested the support vector regression (SVR) through the introduction of an alternative insensitive loss function. The resolution of the SVR, as suggested by Smola (1996), is to develop a function that has the smallest possible divergence from the actual target vectors for all training data and that is as flat as feasible. The conception of the nonlinear support vector kernel function for regression is additionally presented by Cortes & Vapnik (1995). Readers can refer to Cortes & Vapnik (1995) and Smola (1996) for more details about the SVR. Fewer user-defined parameters are the prerequisites for SVR. SVR needs the configuration of kernel-specific factors, as well as, the selection of a kernel. In addition, it is necessary to calculate the appropriate regularization parameter *C* and error size in the sensitive zone *ε*. The selection of parameters governs the intricacy of extrapolation.

#### Gaussian process regression

According to Neal (2000), Gaussian processes (GP) may be termed as a generality of the Gaussian distribution and here vector is the mean and the matrix serves as the covariance. GP regression is an expedient approach to nonparametric regression owing to its theoretical simplicity as well as its worthy generalization capability and it provides an output that is probabilistic (Williams & Rasmussen 2006).

*x*in GP regression, an accidental variable represents the amount of the stochastic function

*f*at that point is attached. Another assumption, which is the error in the observations, it is typically self-determining and identically distributed, with a mean of zero , a variance of and drawn from the Gaussian process is determined by the parameter

*k*.where and

**is the identity matrix.**

*I*The data were divided into two sets, training and testing dataset, as *n* and , respectively. Then, the covariance matrix to evaluate all two sets, which is characterized as and in a comparable fashion, this applies to the other amounts of, , and ; here ** X** and

**are the vectors of both of the training data and labels (Yetilmezsoy**

*Y**et al.*2021).

*K*, here , a quantified covariance function is necessary. The terms kernel function and covariance function, which are both utilized in SVM and GP regression, are interchangeable. By identifying both degrees of noise () and kernel function, it would be enough to have Equations (1) and (2) for inference purposes. Throughout the GP regression model training process, the user has to pick out an opposite covariance function, the parameters of it as well as the degree of noise. In the situation of GP regression with Gaussian noise having a constant value, a GP model may be constructed using Bayesian inference. After minimizing the negative log-posterior, the equation will be:

For determining the hyperparameters, Equation (4) follows a partial derivative regarding , *k*, and minimization can be with the assistance of gradient descent. Kuss (2006) has published a comprehensive descriptive account of the GP regression in addition to different covariance functions.

#### Details of kernel function

The kernel function idea is used in the creation of the SVM and GP-based regression approaches (Mehdipour *et al.* 2018; Sihag *et al.* 2018, 2019). The four most frequently used kernel functions: a polynomial kernel function (*(x, x’)**=**((x . x’)* + 1)* ^{d}**), normalized polynomial kernel function , radial basis kernel , and the Pearson VII function kernel , where , , and are kernel-specific parameters.

### Dataset

To develop the soft computing models, data preparation is the first step. The collected dataset is split into training and testing groups, which were randomly assigned. The training dataset is utilized for model development, whereas the testing dataset is utilized for model validation. Table 2 provides the range of datasets assigned for training and testing. A total of 75 datasets were collected, 53 datasets for training, and 22 datasets are utilized for testing the developed models. Total discharge (Qu), diversion channel bed width (Bb), the critical velocity of the beginning of motion of bed materials (Vc), the ratio of the main channel bed width to the diversion channel bed width (Br = Bb/Bm), diversion to main channel water discharge (Qr), water depth in the main channel at upstream (yu), main channel velocity of the flow at upstream (Vu), and sine of diversion channel angle (Sin *θ*) were considered as input parameters whereas scour length (Ls) and scour depth (ds) at was considered as a target.

Parameters . | Training dataset . | Testing dataset . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|

Min. . | Max. . | Mean . | StD . | CL (95%) . | Min. . | Max. . | Mean . | StD . | CL (95%) . | |

Qu (L/s) | 7.250 | 12.250 | 9.7736 | 1.7246 | 0.4754 | 7.250 | 12.250 | 9.693 | 1.947 | 0.8633 |

Bb (cm) | 9.000 | 15.000 | 12.000 | 2.4962 | 0.6880 | 9.000 | 15.000 | 12.000 | 2.4495 | 1.086 |

Vc (m/s) | 0.212 | 0.233 | 0.2236 | 0.0055 | 0.0015 | 0.213 | 0.233 | 0.2233 | 0.0064 | 0.0028 |

Br (%) | 28.571 | 47.619 | 38.095 | 7.9243 | 2.1842 | 28.571 | 47.619 | 38.095 | 7.776 | 3.4478 |

Qr (%) | 18.480 | 31.180 | 24.959 | 4.0168 | 1.1072 | 18.210 | 31.180 | 24.820 | 4.2436 | 1.8815 |

yu (cm) | 8.200 | 12.850 | 10.535 | 1.2631 | 0.3481 | 8.350 | 12.900 | 10.493 | 1.4828 | 0.6574 |

Vu (m/s) | 0.224 | 0.584 | 0.3716 | 0.0997 | 0.0275 | 0.223 | 0.5739 | 0.3752 | 0.1036 | 0.0459 |

Sin θ | 0.500 | 1.000 | 0.8054 | 0.1867 | 0.0515 | 0.500 | 1.000 | 0.8135 | 0.1876 | 0.0832 |

Ls (cm) | 33.25 | 118.05 | 68.759 | 21.5925 | 5.9516 | 31.650 | 113.30 | 71.079 | 23.214 | 10.2925 |

ds (cm) | 7.10 | 17.50 | 12.005 | 2.3036 | 0.6350 | 6.500 | 15.900 | 12.084 | 2.2146 | 0.9819 |

Parameters . | Training dataset . | Testing dataset . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|

Min. . | Max. . | Mean . | StD . | CL (95%) . | Min. . | Max. . | Mean . | StD . | CL (95%) . | |

Qu (L/s) | 7.250 | 12.250 | 9.7736 | 1.7246 | 0.4754 | 7.250 | 12.250 | 9.693 | 1.947 | 0.8633 |

Bb (cm) | 9.000 | 15.000 | 12.000 | 2.4962 | 0.6880 | 9.000 | 15.000 | 12.000 | 2.4495 | 1.086 |

Vc (m/s) | 0.212 | 0.233 | 0.2236 | 0.0055 | 0.0015 | 0.213 | 0.233 | 0.2233 | 0.0064 | 0.0028 |

Br (%) | 28.571 | 47.619 | 38.095 | 7.9243 | 2.1842 | 28.571 | 47.619 | 38.095 | 7.776 | 3.4478 |

Qr (%) | 18.480 | 31.180 | 24.959 | 4.0168 | 1.1072 | 18.210 | 31.180 | 24.820 | 4.2436 | 1.8815 |

yu (cm) | 8.200 | 12.850 | 10.535 | 1.2631 | 0.3481 | 8.350 | 12.900 | 10.493 | 1.4828 | 0.6574 |

Vu (m/s) | 0.224 | 0.584 | 0.3716 | 0.0997 | 0.0275 | 0.223 | 0.5739 | 0.3752 | 0.1036 | 0.0459 |

Sin θ | 0.500 | 1.000 | 0.8054 | 0.1867 | 0.0515 | 0.500 | 1.000 | 0.8135 | 0.1876 | 0.0832 |

Ls (cm) | 33.25 | 118.05 | 68.759 | 21.5925 | 5.9516 | 31.650 | 113.30 | 71.079 | 23.214 | 10.2925 |

ds (cm) | 7.10 | 17.50 | 12.005 | 2.3036 | 0.6350 | 6.500 | 15.900 | 12.084 | 2.2146 | 0.9819 |

### Model performance indices parameters

*n*is the number of the observed data.

Furthermore, two graphical performance assessment methods, the Taylor diagram and the box plot are also plotted for the comparison of applied models. Taylor diagrams depict the similarity between two patterns and the degree to which a model pattern corresponds to the actual pattern (Taylor 2001). Box plots have been also selected for assessment. It uses five descriptive statistics, such as lower, median, and upper quartile, beside the minimum and maximum in a graphical presentation.

## RESULTS AND DISCUSSION

### Assessment of SVM-based model for the prediction of Ls

Approaches . | CC . | RMSE . | MAE . | NS . | SI . |
---|---|---|---|---|---|

Training dataset | |||||

SVM_Poly | 0.9173 | 8.5265 | 6.3916 | 0.8411 | 0.1240 |

SVM_NPoly | 0.8950 | 9.7084 | 6.6587 | 0.7940 | 0.1412 |

SVM_PUK | 1.0000 | 0.0907 | 0.0800 | 1.0000 | 0.0013 |

SVM_RBF | 0.9516 | 6.5801 | 3.7695 | 0.9053 | 0.0957 |

GP_Poly | 0.9167 | 11.9806 | 6.8399 | 0.8402 | 0.1742 |

GP_NPoly | 0.9201 | 8.3949 | 5.9964 | 0.8459 | 0.1221 |

GP_PUK | 1.0000 | 0.2385 | 0.1941 | 0.9999 | 0.0035 |

GP_RBF | 0.9782 | 4.4620 | 3.3245 | 0.9565 | 0.0649 |

Testing dataset | |||||

SVM_Poly | 0.8672 | 12.0189 | 9.1556 | 0.7192 | 0.1691 |

SVM_NPoly | 0.8090 | 13.6417 | 10.4960 | 0.6382 | 0.1919 |

SVM_PUK | 0.8643 | 12.9722 | 11.1883 | 0.6729 | 0.1825 |

SVM_RBF | 0.8601 | 12.7411 | 9.5466 | 0.6844 | 0.1793 |

GP_Poly | 0.8614 | 16.5979 | 9.6998 | 0.7200 | 0.2335 |

GP_NPoly | 0.8744 | 11.6922 | 9.0375 | 0.7342 | 0.1645 |

GP_PUK | 0.8643 | 12.9855 | 11.2002 | 0.6722 | 0.1827 |

GP_RBF | 0.8621 | 13.3579 | 10.9985 | 0.6531 | 0.1879 |

Approaches . | CC . | RMSE . | MAE . | NS . | SI . |
---|---|---|---|---|---|

Training dataset | |||||

SVM_Poly | 0.9173 | 8.5265 | 6.3916 | 0.8411 | 0.1240 |

SVM_NPoly | 0.8950 | 9.7084 | 6.6587 | 0.7940 | 0.1412 |

SVM_PUK | 1.0000 | 0.0907 | 0.0800 | 1.0000 | 0.0013 |

SVM_RBF | 0.9516 | 6.5801 | 3.7695 | 0.9053 | 0.0957 |

GP_Poly | 0.9167 | 11.9806 | 6.8399 | 0.8402 | 0.1742 |

GP_NPoly | 0.9201 | 8.3949 | 5.9964 | 0.8459 | 0.1221 |

GP_PUK | 1.0000 | 0.2385 | 0.1941 | 0.9999 | 0.0035 |

GP_RBF | 0.9782 | 4.4620 | 3.3245 | 0.9565 | 0.0649 |

Testing dataset | |||||

SVM_Poly | 0.8672 | 12.0189 | 9.1556 | 0.7192 | 0.1691 |

SVM_NPoly | 0.8090 | 13.6417 | 10.4960 | 0.6382 | 0.1919 |

SVM_PUK | 0.8643 | 12.9722 | 11.1883 | 0.6729 | 0.1825 |

SVM_RBF | 0.8601 | 12.7411 | 9.5466 | 0.6844 | 0.1793 |

GP_Poly | 0.8614 | 16.5979 | 9.6998 | 0.7200 | 0.2335 |

GP_NPoly | 0.8744 | 11.6922 | 9.0375 | 0.7342 | 0.1645 |

GP_PUK | 0.8643 | 12.9855 | 11.2002 | 0.6722 | 0.1827 |

GP_RBF | 0.8621 | 13.3579 | 10.9985 | 0.6531 | 0.1879 |

Statistic . | Minimum . | Maximum . | First Quartile . | Mean . | Third Quartile . | IQR . |
---|---|---|---|---|---|---|

Training dataset | ||||||

Actual | 33.2500 | 118.0500 | 53.0000 | 68.7594 | 82.9000 | 29.9000 |

SVM_Poly | 28.7480 | 105.7670 | 55.3790 | 68.7991 | 82.8650 | 27.4860 |

SVM_NPoly | 33.0850 | 98.6050 | 56.9040 | 69.1282 | 82.4800 | 25.5760 |

SVM_PUK | 33.3770 | 117.9000 | 53.0740 | 68.7586 | 82.9820 | 29.9080 |

SVM_RBF | 33.4130 | 106.6770 | 55.2540 | 68.5379 | 84.4980 | 29.2440 |

GP_Poly | 32.5540 | 109.8290 | 53.4510 | 68.9407 | 83.8120 | 30.3610 |

GP_NPoly | 29.3760 | 103.1920 | 55.1060 | 68.7377 | 86.5710 | 31.4650 |

GP_PUK | 33.5230 | 117.3450 | 53.0840 | 68.7599 | 83.2230 | 30.1390 |

GP_RBF | 33.2520 | 114.4150 | 55.4980 | 68.7554 | 81.8270 | 26.3290 |

Testing dataset | ||||||

Actual | 31.6500 | 113.3000 | 55.5500 | 71.0795 | 86.2250 | 30.6750 |

SVM_Poly | 28.9200 | 103.5730 | 55.6640 | 67.7903 | 81.4423 | 25.7783 |

SVM_NPoly | 28.1200 | 98.0590 | 55.2615 | 68.2825 | 82.6013 | 27.3398 |

SVM_PUK | 42.7130 | 95.3540 | 55.3488 | 67.5545 | 76.0533 | 20.7045 |

SVM_RBF | 33.8080 | 107.3110 | 50.0443 | 65.9115 | 79.3268 | 29.2825 |

GP_Poly | 35.3810 | 107.5230 | 52.6613 | 67.7335 | 81.0240 | 28.3628 |

GP_NPoly | 28.7720 | 102.8350 | 54.5743 | 67.2309 | 81.6085 | 27.0343 |

GP_PUK | 42.8760 | 95.2580 | 55.4145 | 67.5934 | 76.0918 | 20.6773 |

GP_RBF | 33.0290 | 116.3420 | 43.7145 | 64.5294 | 75.4365 | 31.7220 |

Statistic . | Minimum . | Maximum . | First Quartile . | Mean . | Third Quartile . | IQR . |
---|---|---|---|---|---|---|

Training dataset | ||||||

Actual | 33.2500 | 118.0500 | 53.0000 | 68.7594 | 82.9000 | 29.9000 |

SVM_Poly | 28.7480 | 105.7670 | 55.3790 | 68.7991 | 82.8650 | 27.4860 |

SVM_NPoly | 33.0850 | 98.6050 | 56.9040 | 69.1282 | 82.4800 | 25.5760 |

SVM_PUK | 33.3770 | 117.9000 | 53.0740 | 68.7586 | 82.9820 | 29.9080 |

SVM_RBF | 33.4130 | 106.6770 | 55.2540 | 68.5379 | 84.4980 | 29.2440 |

GP_Poly | 32.5540 | 109.8290 | 53.4510 | 68.9407 | 83.8120 | 30.3610 |

GP_NPoly | 29.3760 | 103.1920 | 55.1060 | 68.7377 | 86.5710 | 31.4650 |

GP_PUK | 33.5230 | 117.3450 | 53.0840 | 68.7599 | 83.2230 | 30.1390 |

GP_RBF | 33.2520 | 114.4150 | 55.4980 | 68.7554 | 81.8270 | 26.3290 |

Testing dataset | ||||||

Actual | 31.6500 | 113.3000 | 55.5500 | 71.0795 | 86.2250 | 30.6750 |

SVM_Poly | 28.9200 | 103.5730 | 55.6640 | 67.7903 | 81.4423 | 25.7783 |

SVM_NPoly | 28.1200 | 98.0590 | 55.2615 | 68.2825 | 82.6013 | 27.3398 |

SVM_PUK | 42.7130 | 95.3540 | 55.3488 | 67.5545 | 76.0533 | 20.7045 |

SVM_RBF | 33.8080 | 107.3110 | 50.0443 | 65.9115 | 79.3268 | 29.2825 |

GP_Poly | 35.3810 | 107.5230 | 52.6613 | 67.7335 | 81.0240 | 28.3628 |

GP_NPoly | 28.7720 | 102.8350 | 54.5743 | 67.2309 | 81.6085 | 27.0343 |

GP_PUK | 42.8760 | 95.2580 | 55.4145 | 67.5934 | 76.0918 | 20.6773 |

GP_RBF | 33.0290 | 116.3420 | 43.7145 | 64.5294 | 75.4365 | 31.7220 |

### Assessment of GP-based model for the prediction of Ls

### Assessment of SVM-based model for ds prediction

Approaches . | CC . | RMSE . | MAE . | NE . | SI . |
---|---|---|---|---|---|

Training dataset | |||||

SVM_Poly | 0.9746 | 0.5175 | 0.3563 | 0.9486 | 0.0431 |

SVM_NPoly | 0.9647 | 0.6284 | 0.4166 | 0.9241 | 0.0523 |

SVM_PUK | 1.0000 | 0.0106 | 0.0096 | 1.0000 | 0.0009 |

SVM_RBF | 0.9875 | 0.3711 | 0.2192 | 0.9736 | 0.0309 |

GP_Poly | 0.9775 | 0.6772 | 0.3527 | 0.9551 | 0.0564 |

GP_NPoly | 0.9834 | 0.4158 | 0.3193 | 0.9668 | 0.0346 |

GP_PUK | 1.0000 | 0.0228 | 0.0171 | 0.9999 | 0.0019 |

GP_RBF | 0.9956 | 0.2156 | 0.1652 | 0.9911 | 0.0180 |

Testing dataset | |||||

SVM_Poly | 0.9710 | 0.6019 | 0.4531 | 0.9226 | 0.0498 |

SVM_NPoly | 0.9258 | 0.8772 | 0.7410 | 0.8356 | 0.0726 |

SVM_PUK | 0.9412 | 1.0725 | 0.7749 | 0.7543 | 0.0888 |

SVM_RBF | 0.9610 | 0.6570 | 0.5412 | 0.9078 | 0.0544 |

GP_Poly | 0.9663 | 0.6033 | 0.5182 | 0.9222 | 0.0499 |

GP_NPoly | 0.9481 | 0.7562 | 0.6475 | 0.8779 | 0.0626 |

GP_PUK | 0.9404 | 1.0796 | 0.7794 | 0.7510 | 0.0893 |

GP_RBF | 0.9540 | 0.6943 | 0.5589 | 0.8970 | 0.0575 |

Approaches . | CC . | RMSE . | MAE . | NE . | SI . |
---|---|---|---|---|---|

Training dataset | |||||

SVM_Poly | 0.9746 | 0.5175 | 0.3563 | 0.9486 | 0.0431 |

SVM_NPoly | 0.9647 | 0.6284 | 0.4166 | 0.9241 | 0.0523 |

SVM_PUK | 1.0000 | 0.0106 | 0.0096 | 1.0000 | 0.0009 |

SVM_RBF | 0.9875 | 0.3711 | 0.2192 | 0.9736 | 0.0309 |

GP_Poly | 0.9775 | 0.6772 | 0.3527 | 0.9551 | 0.0564 |

GP_NPoly | 0.9834 | 0.4158 | 0.3193 | 0.9668 | 0.0346 |

GP_PUK | 1.0000 | 0.0228 | 0.0171 | 0.9999 | 0.0019 |

GP_RBF | 0.9956 | 0.2156 | 0.1652 | 0.9911 | 0.0180 |

Testing dataset | |||||

SVM_Poly | 0.9710 | 0.6019 | 0.4531 | 0.9226 | 0.0498 |

SVM_NPoly | 0.9258 | 0.8772 | 0.7410 | 0.8356 | 0.0726 |

SVM_PUK | 0.9412 | 1.0725 | 0.7749 | 0.7543 | 0.0888 |

SVM_RBF | 0.9610 | 0.6570 | 0.5412 | 0.9078 | 0.0544 |

GP_Poly | 0.9663 | 0.6033 | 0.5182 | 0.9222 | 0.0499 |

GP_NPoly | 0.9481 | 0.7562 | 0.6475 | 0.8779 | 0.0626 |

GP_PUK | 0.9404 | 1.0796 | 0.7794 | 0.7510 | 0.0893 |

GP_RBF | 0.9540 | 0.6943 | 0.5589 | 0.8970 | 0.0575 |

Statistic . | Minimum . | Maximum . | First Quartile . | Mean . | Third Quartile . | IQR . |
---|---|---|---|---|---|---|

Training dataset | ||||||

Actual | 7.1000 | 17.5000 | 10.6000 | 12.0047 | 13.4000 | 2.8000 |

SVM_Poly | 7.1010 | 17.0800 | 10.5800 | 11.9252 | 13.3910 | 2.8110 |

SVM_NPoly | 7.4160 | 15.3300 | 10.6200 | 12.0099 | 13.2320 | 2.6120 |

SVM_PUK | 7.1050 | 17.4900 | 10.6110 | 12.0056 | 13.3900 | 2.7790 |

SVM_RBF | 7.2010 | 16.4540 | 10.6000 | 11.9480 | 13.4390 | 2.8390 |

GP_Poly | 7.3370 | 17.1210 | 10.4600 | 12.0312 | 13.3840 | 2.9240 |

GP_NPoly | 7.5080 | 16.3010 | 10.6540 | 12.0063 | 13.4500 | 2.7960 |

GP_PUK | 7.1560 | 17.4390 | 10.6120 | 12.0050 | 13.3920 | 2.7800 |

GP_RBF | 7.1570 | 17.0590 | 10.6180 | 12.0060 | 13.4540 | 2.8360 |

Testing dataset | ||||||

Actual | 6.5000 | 15.9000 | 10.9875 | 12.0841 | 13.0625 | 2.0750 |

SVM_Poly | 5.7780 | 15.3610 | 10.6213 | 11.7792 | 12.7130 | 2.0918 |

SVM_NPoly | 7.9930 | 14.7810 | 10.5963 | 11.8257 | 13.2755 | 2.6793 |

SVM_PUK | 9.7290 | 14.2950 | 11.1855 | 11.9570 | 12.5828 | 1.3973 |

SVM_RBF | 7.2620 | 15.7910 | 10.1173 | 11.8129 | 13.1218 | 3.0045 |

GP_Poly | 5.9750 | 15.6080 | 10.5605 | 11.8541 | 12.9935 | 2.4330 |

GP_NPoly | 7.1240 | 15.3840 | 10.3128 | 11.7725 | 13.3313 | 3.0185 |

GP_PUK | 9.7720 | 14.2910 | 11.1998 | 11.9669 | 12.5768 | 1.3770 |

GP_RBF | 7.2440 | 15.8720 | 10.2078 | 11.8367 | 13.2750 | 3.0673 |

Statistic . | Minimum . | Maximum . | First Quartile . | Mean . | Third Quartile . | IQR . |
---|---|---|---|---|---|---|

Training dataset | ||||||

Actual | 7.1000 | 17.5000 | 10.6000 | 12.0047 | 13.4000 | 2.8000 |

SVM_Poly | 7.1010 | 17.0800 | 10.5800 | 11.9252 | 13.3910 | 2.8110 |

SVM_NPoly | 7.4160 | 15.3300 | 10.6200 | 12.0099 | 13.2320 | 2.6120 |

SVM_PUK | 7.1050 | 17.4900 | 10.6110 | 12.0056 | 13.3900 | 2.7790 |

SVM_RBF | 7.2010 | 16.4540 | 10.6000 | 11.9480 | 13.4390 | 2.8390 |

GP_Poly | 7.3370 | 17.1210 | 10.4600 | 12.0312 | 13.3840 | 2.9240 |

GP_NPoly | 7.5080 | 16.3010 | 10.6540 | 12.0063 | 13.4500 | 2.7960 |

GP_PUK | 7.1560 | 17.4390 | 10.6120 | 12.0050 | 13.3920 | 2.7800 |

GP_RBF | 7.1570 | 17.0590 | 10.6180 | 12.0060 | 13.4540 | 2.8360 |

Testing dataset | ||||||

Actual | 6.5000 | 15.9000 | 10.9875 | 12.0841 | 13.0625 | 2.0750 |

SVM_Poly | 5.7780 | 15.3610 | 10.6213 | 11.7792 | 12.7130 | 2.0918 |

SVM_NPoly | 7.9930 | 14.7810 | 10.5963 | 11.8257 | 13.2755 | 2.6793 |

SVM_PUK | 9.7290 | 14.2950 | 11.1855 | 11.9570 | 12.5828 | 1.3973 |

SVM_RBF | 7.2620 | 15.7910 | 10.1173 | 11.8129 | 13.1218 | 3.0045 |

GP_Poly | 5.9750 | 15.6080 | 10.5605 | 11.8541 | 12.9935 | 2.4330 |

GP_NPoly | 7.1240 | 15.3840 | 10.3128 | 11.7725 | 13.3313 | 3.0185 |

GP_PUK | 9.7720 | 14.2910 | 11.1998 | 11.9669 | 12.5768 | 1.3770 |

GP_RBF | 7.2440 | 15.8720 | 10.2078 | 11.8367 | 13.2750 | 3.0673 |

### Assessment of GP-based model for the prediction of ds

To evaluate the discrepancy of the most ds prediction besides the actual values, the 25, 50, and 75% quartile values of the actual and predicted ds are assessed using Table 6. Table 6 shows that the GP_Poly model has closer quartiles to the actual values compared with other kernel function-based SVM models. The (IQR of GP_Poly is also closer to the IQR of actual data.

### Intercomparing of developed models

*F*-values were fewer than

*F*-critical and

*P*-values were more than 0.05 suggesting that the difference in predictive values using numerous models and actual values is insignificant. Taylor diagrams are plotted in Figures 12 and 13 for the comparison among the best models for Ls prediction and ds. These figures confirm that GP_NPoly is outperforming other models for Ls prediction and SVM_Poly is outperforming other models for ds prediction.

No. . | Source of variation . | F
. | P-value
. | F crit
. | Variation among groups . |
---|---|---|---|---|---|

Ls | |||||

1 | Between actual value and SVM_Poly | 0.244356 | 0.623653 | 4.072654 | Insignificant |

2 | Between actual and GP_NPoly | 0.329399 | 0.569074 | 4.072654 | Insignificant |

ds | |||||

3 | Between actual and SVM_Poly | 0.210644 | 0.64863 | 4.072654 | Insignificant |

4 | Between actual and GP_Poly | 0.120739 | 0.72997 | 4.072654 | Insignificant |

No. . | Source of variation . | F
. | P-value
. | F crit
. | Variation among groups . |
---|---|---|---|---|---|

Ls | |||||

1 | Between actual value and SVM_Poly | 0.244356 | 0.623653 | 4.072654 | Insignificant |

2 | Between actual and GP_NPoly | 0.329399 | 0.569074 | 4.072654 | Insignificant |

ds | |||||

3 | Between actual and SVM_Poly | 0.210644 | 0.64863 | 4.072654 | Insignificant |

4 | Between actual and GP_Poly | 0.120739 | 0.72997 | 4.072654 | Insignificant |

### Sensitivity investigation

To find the influence of each parameter, which is used for the estimation of the target, sensitivity investigations were performed. Several factors that affect the Ls and ds were included, namely, friction angle (), slope angle (), and stability numbers (m). Best-performing models were employed for this investigation. Performance of the GP_NPoly model with different input combinations was compared performed as presented in Table 8, which suggests that the Sin *θ* had a significant impact on Ls prediction. Table 9 compares the performance of the SVM_Poly model with different input combinations. From Table 9, Sin *θ* had a significant impact on ds prediction. Overall, Sin *θ* has a major influence in predicting Ls and ds using this dataset.

## CONCLUSIONS

The performance of two soft computing techniques (SVM) and (GPR) in predicting scour length and scour depth due to diversion flow was evaluated in this study. Fifteen geometries of the diversion channel represented by five angles of the diversion channel between 30° and 90° and three Br between about 30 and 50% were considered in modeling the diversion flow. In addition, different hydraulic conditions for each model were considered. The investigation used polynomial, normalized polynomial, radial basis, and the Pearson VII kernel function for both SVM and GPR computing techniques. Using different model performance assessing parameters (CC, MAE, RMSE, NS, and SI) to evaluate the performance of different kernel functions of SVM and GPR computing techniques, the GP_NPoly model was recorded outperforming other models in prediction of Ls and the SVM_Poly model was recorded outperforming the other models in prediction of ds. Sensitivity analysis was undertaken for the input parameters to evaluate the importance of each one for the estimation of the scour length and depth. It suggested that the diversion angle of the diversion channel (*θ*) has a significant impact on Ls and ds prediction using this dataset. For future work, it is worth conducting further investigation and performing soft computing techniques to predict diversion water and sediment amount.

## ACKNOWLEDGEMENTS

Universiti Putra Malaysia funded the experimental tests of this study through its Putra grant (GP-IPS/2015/9453100). The authors extend their sincere gratitude to the University of Mosul.

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.

## CONFLICT OF INTEREST

The authors declare there is no conflict.

## REFERENCES

*Advances in Water Resources and Hydraulic Engineering: Proceedings of 16th IAHR-APD Congress and 3rd Symposium of IAHR-ISHS*(C. Zhang & H. Tang, eds). Springer Berlin Heidelberg, pp. 821–825. https://doi.org/10.1007/978-3-540-89465-0_144.

*PhD Thesis*.

*T*

*Regression Estimation with Support Vector Learning Machines*

*Master's Thesis*