## Abstract

Energy dissipation in culverts is a complex phenomenon due to the nonlinearity and uncertainties of the process. In the current study, the capability of Gaussian process regression (GPR) and support vector machine (SVM) as kernel-based approaches and the GEP method was assessed in predicting energy losses in culverts. Two types of bend loss in rectangular culverts and entrance loss in circular culverts with different inlet end treatments were considered. Various input combinations were developed and tested using experimental data. The OAT (one-at-a-time), factorial sensitivity analysis and Monte-Carlo uncertainty analysis were used to select the effective parameters in modeling. The results of performance criteria proved the capability of the applied methods (i.e. high R and DC and low RMSE). For rectangular culverts, the model with parameters *Fr* (Froude number) and *θ* (bend angle), and for circular culverts, the model with parameters *Fr* and *Hw/D* (depth ratio), were the superior models. It showed that using the bend downstream Froude number caused an increment in model efficiency. Among the four end inlet treatments, mitered flush to 1.5:1 fill slope inlet yielded more accurate prediction. The sensitivity and uncertainty analysis showed that *θ* and *Hw/D* had the most significant impact on modeling, and *Fr* had the highest uncertainty.

## INTRODUCTION

A culvert is a hydraulically short segment of conduit which conveys stream flow through a roadway embankment or past some other type of flow obstruction. Numerous cross-sectional shapes are available. The most commonly used shapes include circular, box (rectangular), elliptical, pipe-arch, and arch. Prediction of the accurate amount of local loss in the culvert systems is important due to its impact on saving costs and time of construction processes and determination of the size, shape, and diameter of the culverts. In a culvert system, with decreasing energy loss, its effect on the upstream flow profile decreases. The energy loss is divided into two categories: major loss and minor (or local) loss, which in the culvert systems due to its short length, major energy loss is negligible compared with minor loss. In fact, major loss is caused due to the friction between the flow and pipe walls and since culverts are usually used in short lengths in practice, therefore, the frictional or longitudinal loss is negligible in comparison with minor loss. So far, various studies have been conducted to explain the complex phenomenon of energy losses in culvert systems with different geometries. Tullis (2012) investigated the minor loss in buried-invert culverts and determined the optimum section from the point of view of the least loss. Malone & Parr (2008) investigated bend loss in rectangular culverts and proposed some graphs for calculating this parameter. Tullis *et al.* (2005) studied the inlet losses in elliptical culverts and offered a precise method for calculating the outlet loss of the culvert and to identify the best section with minimum loss. Anderson (2006) studied the worn-out culverts and embedding new culverts inside them and determined the local loss empirically. Kotowski *et al.* (2011) studied the inlet and outlet loss coefficient in the conduits and concluded that inlet loss coefficient in the pipes was not constant. However, due to the complexity and uncertainty of the local losses phenomenon, the results of the classical models are not general and under variable conditions do not present the same results. Therefore, it is essential to use other methods with more accuracy in predicting energy loss in culverts with different shapes under varied hydraulic conditions.

In recent years, the application of nonlinear machine learning (ML) (e.g. artificial neural networks (ANNs), neuro-fuzzy models (NF), genetic programming (GP), gene expression programming (GEP), support vector machine (SVM), and Gaussian process regression (GPR)) in water resources engineering has become viable leading to numerous publications in this field. A complete review of all the applications is beyond the scope of this paper and only some studies are mentioned here, such as assessing tree-based methods concepts, uses and limitations (Carvalho *et al.* 2018), modeling historical land use changes using ANN (Tayyebi *et al.* 2017), prediction of groundwater levels using data-driven models (Huang *et al.* 2017; Amaranto *et al.* 2018), estimation of hydraulic jump energy dissipation in channels with rough elements using SVM (Roushangar & Ghasempour 2018), prediction of pile scour using ANN and kernel methods (Ghazanfari-Hashemi *et al.* 2011; Pal *et al.* 2014), computing longitudinal dispersion coefficients in natural streams using SVM (Azamathulla & Wu 2011), real time hydrologic forecasting using EC-SVM (Yu *et al.* 2004), quantify runoff contributions from different land uses in tropical urban environments using GP (Meshgi *et al.* 2015), side weir discharge coefficient using SVM (Azamathulla *et al.* 2017), and forecasting monthly and seasonal streamflow using mixture-kernel GPR approach (Zhu *et al.* 2018).

In artificial intelligence models we are looking for a learning machine capable of ﬁnding an accurate approximation of a natural phenomenon, as well as expressing it in the form of an interpretable equation. However, this bias towards interpretability creates several new issues. The computer-generated hypotheses should take advantage of the already existing body of knowledge about the domain in question. However, the method by which we express our knowledge and make it available to a learning machine remains rather unclear (Babovic 2009). Machine learning, a branch of artificial intelligence, deals with the representation and generalization using a data learning technique. Representation of data instances and functions evaluated on these instances are part of all machine learning systems. Generalization is the property that the system will perform well on unseen data instances; the conditions under which this can be guaranteed are a key object of study in the subfield of computational learning theory. There is a wide variety of machine learning tasks and successful applications (Mitchell 1997). In general, the task of a machine learning algorithm can be described as follows: Given a set of input variables and the associated output variable(s), the objective is learning a functional relationship for the input–output variables set. It should be noted that artificial intelligence models typically do not really represent the physics of a modeled process; they are just devices used to capture relationships between the relevant input and output variables. However, when the interrelationships among the relevant variables are poorly understood, finding the size and shape of the ultimate solution is difficult, and conventional mathematical analysis methods do not (or cannot) provide analytical solutions; these methods can predict the interest variable with more accuracy.

Due to the complexity and uncertainties of the energy losses process, the existing regression models do not show desired accuracy and their output is often associated with large errors. Therefore, the present research proposed kernel based models to predict the energy losses coefficient and also to investigate the best input models and determine the effective parameters of different shaped culverts. To the best of the authors' knowledge there is a lack of research on the comprehensive study of predicting local losses in culverts using artificial intelligence. In all previous studies the local loss coefficient in the culvert was measured and recorded experimentally at various velocities, but the relationship between this coefficient and the Froude and Reynolds numbers, and the geometric parameters, and the dependency rate of this coefficient on these parameters, was not investigated. Therefore, this study aimed to assess the capability of GPR and SVM as kernel-based approaches for modeling the losses of culverts with different geometries. Also, the GEP method was used to develop new equations for predicting the local loss coefficient in different shaped culverts. In order to determine the most effective combination for modeling the losses of culverts, different input combinations were considered under two scenarios (losses due to the culvert bend and culvert entrance) and the impact of hydraulic characteristics and culvert shapes was assessed. In addition, the most important parameters in predicting the energy losses were determined using OAT and factorial sensitivity analysis and Monte Carlo uncertainty sensitivity.

## MATERIALS AND METHODS

### The data sets

The data sets of laboratory experiments of local losses of culverts performed by Malone & Parr (2008) and Tullis (2012) were used in the present study. Malone & Parr (2008) studied bend losses in rectangular culverts. Laboratory experiments were performed in rectangular channels with abrupt bends. Bend angles of approximately 30, 45, 60, 75 and 90° were tested. Tullis (2012) conducted experiments on circular culverts in order to determine the entrance loss coefficient and the inlet control head discharge relationships for circular culverts with invert burial depths of 20, 40, and 50%. All buried-invert culverts were tested with four different end treatments: (1) thin-wall projecting, (2) mitered flush to 1.5:1.0 (horizontal to vertical) fill slope, (3) square-edged inlet with vertical headwall, and (4) 45° beveled entrance with vertical headwall. The ranges of some parameters used in the tests are given in Table 1 in which *Ke*, *θ* and *Fr*, and *Re* represent the entrance loss coefficient, culvert bend angle, Froude number, and Reynolds number respectively.

Bend loss | Entrance loss | |||||
---|---|---|---|---|---|---|

Circular culvert (Tullis 2012) | ||||||

Rectangular culvert (Malone & Parr 2008) | Thin-wall projecting | Mitered to flush 1.5 h:1 v fill slope | Square edge inlet with vertical headwalls | 45° beveled inlet with vertical headwalls | ||

Parameters | Ke | 0.157–1.078 | 0.157–1.03 | 0.42–0.93 | 0.3–0.6 | 0.22–0.38 |

θ (radian) | 0.523–1.578 | – | – | – | – | |

Fr | 0.181–0.86 | 0.0124–1.058 | 0.01–0.81 | 0.43–0.97 | 0.049–1.05 | |

Re | 42,138–140,590 | 14,408–30,711 | 18,743–268,0463 | 9,616–305,469 | 79,175–292,240 | |

No. of data | 190 | 66 | 65 | 45 | 48 |

Bend loss | Entrance loss | |||||
---|---|---|---|---|---|---|

Circular culvert (Tullis 2012) | ||||||

Rectangular culvert (Malone & Parr 2008) | Thin-wall projecting | Mitered to flush 1.5 h:1 v fill slope | Square edge inlet with vertical headwalls | 45° beveled inlet with vertical headwalls | ||

Parameters | Ke | 0.157–1.078 | 0.157–1.03 | 0.42–0.93 | 0.3–0.6 | 0.22–0.38 |

θ (radian) | 0.523–1.578 | – | – | – | – | |

Fr | 0.181–0.86 | 0.0124–1.058 | 0.01–0.81 | 0.43–0.97 | 0.049–1.05 | |

Re | 42,138–140,590 | 14,408–30,711 | 18,743–268,0463 | 9,616–305,469 | 79,175–292,240 | |

No. of data | 190 | 66 | 65 | 45 | 48 |

### Kernel-based approaches

Kernel based approaches, such as GPR and SVM, are a relatively new important method based on the different kernel types which are based on initiation of statistical learning theory. Such models are capable of adapting themselves to predict any variable of interest via sufficient inputs. These methods can model non-linear decision boundaries, and there are many kernels to choose from. They are also fairly robust against overfitting, especially in high-dimensional space. However, the appropriate selection of kernel type is the most important step in the GPR and SVM due to its direct impact on the training and classification precision. In fact, these methods are memory intensive, trickier to tune due to the importance of picking the right kernel, and do not scale well to larger datasets. In these models we will be able to predict the proper behavior of the system, although we will not be able to characterize its intrinsic structure and behavior. In other words, we will be able to say what the model does, but not how. In addition to this, we will not be able to guarantee the behavior of such a model in regions not covered by the data from which the model was constructed. This is due to the fact that the model covers only the relationships found within the given data (Babovic 2009).

### Gaussian process regression

GPR models are based on the assumption that adjacent observations should convey information about each other. Gaussian processes are a way of specifying a priori directly over function space. This is a natural generalization of the Gaussian distribution, whose mean and covariance are a vector and matrix, respectively. The Gaussian distribution is over vectors, whereas the Gaussian process is over functions. Thus, due to prior knowledge about the data and functional dependencies, no validation process is required for generalization and GP regression models are able to understand the predictive distribution corresponding to the test input (Rasmussen & Williams 2006). A GP is defined as a collection of random variables, any finite number of which has a joint multivariate Gaussian distribution. Let represent the domains of inputs and outputs, respectively, from which *n* pairs are drawn independently and identically distributed. For regression, assume that ; then, a GP on is defined by a mean function and a covariance function .

*y*is given by , where . In GP regression, for every input

*x*there is an associated random variable

*f*(

*x*), which is the value of the stochastic function

*f*at that location. In this work, it is assumed that the observational error is normally independent and identically distributed, with a mean value of zero (), a variance of and drawn from the Gaussian process on specified by

*k*. That is, where , and I is the identity matrix. Because is normal, so is the conditional distribution of test labels given the training and test data of . Then, one has , where: If there are

*n*training data and test data, then represents the matrix of covariances evaluated at all pairs of training and test data sets, and this is similarly true for the other values of , and ; here

*X*and

*Y*are the vectors of the training data and training data labels , whereas is the vector of the test data. A specified covariance function is required to generate a positive semi-definite covariance matrix

*K*, where . The term of the kernel function used in SVM is equivalent to the covariance function used in GP regression. With the known values of kernel

*k*and degree of noise , Equations (1) and (2) would be enough for inference. During the training process of GP regression models, one needs to choose a suitable covariance function as well as its parameters. In the case of GP regression with a fixed value of Gaussian noise, a GP model can be trained by applying Bayesian inference, i.e. maximizing the marginal likelihood. This leads to the minimization of the negative log-posterior:

To find the hyperparameters, the partial derivative of Equation (3) can be obtained with respect to and *k*, and minimization can be achieved by gradient descent. For more details about GP regression and different covariance functions, readers are referred to Kuss (2006). The optimal value of capacity constant (C) and the size of error-intensive zone (ɛ) in SVM and Gaussian noise in GPR are required due to their high impact on the accuracy of the mentioned regression approaches. The optimum values of these parameters were obtained after the trial-and-error process.

### Support vector machine

Support vector machines (SVM) as an intelligence approach are used in information categorization and data set classification. This approach, developed by Vapnik (1995), is known as structural risk minimization (SRM), which minimizes an upper bound on the expected risk, as opposed to the traditional empirical risk (ERM) which minimizes the error on the training data. The SVM method is based on the concept of the optimal hyper plane that separates samples of two classes by considering the widest gap between two classes. SVR is an extension of SVM regression. The purpose of the SVR is to find a function having the most deviation from the actual target vectors for all given training data and to have it be as flat as possible (Smola 1996). Vapnik (1995) introduced the concept of kernel function for non-linear support vector regression. The most important step in the SVM is the appropriate selection of kernel type. In general, there are several types of kernel functions, namely linear, polynomial, radial basis function (RBF) and sigmoid functions. Due to the black-box nature of SVM and GPR models, the learned relationship between the inputs and output is not revealed. This requires cautious usage of the new model, such as GEP, and it should not be used beyond the ranges of the data for which it was trained.

### Gene expression programming (GEP)

Gene expression programming (GEP) was developed by Ferreria (2001) using fundamental principles of genetic algorithms (GA) and genetic programming (GP). One strength of the GEP approach is that the creation of genetic diversity is extremely simplified as genetic operators work at the chromosome level. Another strength of GEP consists of its unique, multigenic nature, which allows the evolution of more complex programs composed of several subprograms. GEP as GA mimics the biological evolution to create a computer program for simulating a specified phenomenon. A GEP algorithm begins by selecting five elements such as the function set, terminal set, fitness function, control parameters, and stopping condition. There is a comparison between predicted values and actual values in each subsequent step. When desired results in accordance with previously selected error criteria are found, the GEP process is terminated. If the desired error criteria could not be found, some chromosomes are chosen by a method called roulette wheel sampling and they are mutated to obtain new chromosomes. After the desired fitness score is found, this process terminates and then the chromosomes are decoded for the best solution of the problem. The advantages of a system like GEP are clear from nature, but the most important are (Ferreria 2001): (1) the chromosomes are simple entities: linear, compact, relatively small, easy to manipulate genetically (replicate, mutate, recombine, etc.); (2) the expression trees are exclusively the expression of their respective chromosomes; they are entities upon which selection acts, and according to fitness, they are selected to reproduce with modification.

### Performance criteria

*K*). However, science

_{e}*K*is a non-dimensional parameter, therefore, RMSE is also a dimensional value. Expressions for performance criteria are as follows: where , , , ,

_{e}*N*are, respectively: the measured values, predicted values, mean measured values, mean predicted values and number of data samples.

### Simulation and models development

#### Input variables

*K*) prediction in culverts with different shapes are as follows: where

_{e}*V*is upstream flow velocity,

_{1}*µ*is dynamic viscosity of water,

*g*is acceleration due to gravity,

*ρ*is density of water,

*D*is culvert diameter,

*Hw*is flow depth. Also,

*θ*is a geometric parameter and indicates the bend angle along the culvert. From the dimensional analysis the above parameters can be expressed as follows: where

*Fr*=

*V/*(

*g*×

*Hw*)

^{0.5}is the Froude number. The Froude number is a dimensionless value that describes different flow regimes of open channel flow. The Froude number is a ratio of inertial and gravitational forces. As flow passes through critical conditions its Froude number has a value of 1. For subcritical flow the depth is greater and the velocity lower, therefore the Froude number is always less than 1; for supercritical flow the opposite is true and the Froude number is always greater than 1. Calculation of the Froude number thus provides an immediate check on the type of flow and how near the flow conditions are to those at critical depth.

*Re*

*=*

*VR/ν*is Reynolds number (

*R*: hydraulic radius,

*ν*: kinematic viscosity). The Reynolds number is the ratio of inertial forces to viscous forces and is a convenient parameter for predicting if a flow condition will be laminar or turbulent. In this study, for developing the local loss models in rectangular and circular culverts, two scenarios with different input variables were considered (Figure 1). In the first scenario, the impact of culvert bend was evaluated and in the second scenario, the entrance energy loss was investigated. Figure 1 illustrates the flowchart of considered scenarios. Developed models of GPR and SVM for predicting local losses are given in Table 2. In assessing the local loss in culvert bend, Froude numbers of three points (downstream and upstream of bend and at the bend) were used. It should be noted that for all models 75% of the dataset was used for training goals and the remaining 25% of data were used for testing goals.

Bend energy loss | Entrance loss | ||
---|---|---|---|

Model | Input variables | Model | Input variables |

B(I) | Fr downstream | E(I) | Re |

B(II) | Fr downstream, θ | E(II) | Re, Hw/D |

B(III) | Fr average | E(III) | Fr |

B(IV) | Fr average, θ | E(IV) | Fr, Hw/D |

B (V) | Fr upstream | ||

B(VI) | Fr upstream, θ |

Bend energy loss | Entrance loss | ||
---|---|---|---|

Model | Input variables | Model | Input variables |

B(I) | Fr downstream | E(I) | Re |

B(II) | Fr downstream, θ | E(II) | Re, Hw/D |

B(III) | Fr average | E(III) | Fr |

B(IV) | Fr average, θ | E(IV) | Fr, Hw/D |

B (V) | Fr upstream | ||

B(VI) | Fr upstream, θ |

### SVM, GPR, and GEP models development

The design of GP and SVM-based regression approaches involve the use of the concept of the kernel function. A number of kernels are discussed in the literature, but studies suggest a better performance by radial basis kernels for different civil engineering problems (Gill *et al.* 2006). In this study, for determining the best performance of SVM and GPR and selecting the best kernel function, the model B(II) from Scenario 1 in a rectangular culvert was predicted via SVM and GPR using various kernels. Figure 2 indicates the results of the statistical parameters of different kernels for this model. According to the results, using the kernel function of RBF in the SVM model led to better prediction accuracy in comparison to the other kernels and for the GPR model using the kernel function of Pearson led to better prediction accuracy. Therefore, RBF and Pearson kernels were used as a core tool of SVM and GPR which were applied for the rest of the models.

GEP was trained for energy losses prediction in rectangular and circular culverts. Basic arithmetic operators of (+, –, *, /) and several mathematical functions (exp, X^{2}, X^{3}, √) were utilized as the GEP function set. The architecture of the chromosomes, including number of chromosomes (25-30-35), head size (7-8) and number of genes (3–4), were selected and different combinations of the mentioned parameters were tested. The model was run for a number of generations and was stopped when there was no significant change in the fitness function value and coefficient of correlation. It is observed that the model with the number of chromosomes of 30, head size of 7, and number of genes of 3 yielded better results. Also, addition and multiplication were tested as linking functions and it was found that linking the sub-ETs by addition represented better fitness values. One of the important steps in preparing the GEP model is to choose the set of genetic operators. In the current study, a combination of all genetic operators (recombination, mutation, transposition, and crossover) was used for this aim. Parameters of the optimized GEP model are shown in Table 3.

Description of parameter | Setting of parameter | Description of parameter | Setting of parameter |
---|---|---|---|

Function set | +, –, ×, /, X^{2}, X^{3}, √ | Fitness function error type | Root mean square error |

Chromosomes | 30 | Mutation rate | 0.044 |

Head size | 7 | Inversion, IS and RIS transposition rate | 0.1 |

Number of genes | 3 | One- and two-point recombination rate | 0.3 |

Linking function | Addition | Gene recombination and transposition rate | 0.1 |

Description of parameter | Setting of parameter | Description of parameter | Setting of parameter |
---|---|---|---|

Function set | +, –, ×, /, X^{2}, X^{3}, √ | Fitness function error type | Root mean square error |

Chromosomes | 30 | Mutation rate | 0.044 |

Head size | 7 | Inversion, IS and RIS transposition rate | 0.1 |

Number of genes | 3 | One- and two-point recombination rate | 0.3 |

Linking function | Addition | Gene recombination and transposition rate | 0.1 |

## RESULTS AND DISCUSSION

### Developed models for rectangular culvert with bend (Scenario 1)

*Fr*and

*θ*show better performance than the models which only use

*Fr*as an input variable. According to the results, among the six developed models, the model B(II) led to more accurate results. It could be inferred that adding the geometric

*θ*parameter to the input combination caused an increment in models efficiency and this parameter had a significant impact on local loss prediction process. Also, the Froude number of the bend downstream led to more accurate outcomes. However, considering the results of the model B(VI), using the upstream Froude number did not lead to undesirable results. This issue confirms that the local loss can be estimated using upstream flow parameters when there is no information about hydraulic conditions of bend downstream. According to the results, it can be seen that the results of the GPR models are slightly more accurate than the SVM and GEP models. The comparison of observed and predicted local loss for the superior model is shown in Figure 3. The mathematical expression of GEP for the best model is as follows:

Models | |||||||
---|---|---|---|---|---|---|---|

Train | Test | ||||||

R | DC | RMSE | R | DC | RMSE | ||

B(I) | SVM | 0.601 | 0.503 | 0.243 | 0.564 | 0.487 | 0.271 |

GPR | 0.604 | 0.507 | 0.241 | 0.566 | 0.490 | 0.268 | |

GEP | 0.521 | 0.402 | 0.302 | 0.505 | 0.334 | 0.312 | |

B(II) | SVM | 0.981 | 0.964 | 0.053 | 0.976 | 0.956 | 0.058 |

GPR | 0.985 | 0.971 | 0.051 | 0.981 | 0.961 | 0.055 | |

GEP | 0.973 | 0.947 | 0.064 | 0.972 | 0.943 | 0.068 | |

B(III) | SVM | 0.701 | 0.612 | 0.241 | 0.659 | 0.537 | 0.266 |

GPR | 0.705 | 0.613 | 0.240 | 0.661 | 0.539 | 0.263 | |

GEP | 0.691 | 0.532 | 0.260 | 0.618 | 0.438 | 0.298 | |

B(IV) | SVM | 0.980 | 0.961 | 0.056 | 0.976 | 0.951 | 0.062 |

GPR | 0.984 | 0.965 | 0.053 | 0.981 | 0.955 | 0.059 | |

GEP | 0.972 | 0.945 | 0.065 | 0.971 | 0.942 | 0.075 | |

B(V) | SVM | 0.705 | 0.529 | 0.272 | 0.647 | 0.424 | 0.283 |

GPR | 0.709 | 0.530 | 0.270 | 0.650 | 0.427 | 0.280 | |

GEP | 0.652 | 0.512 | 0.298 | 0.611 | 0.412 | 0.302 | |

B(VI) | SVM | 0.977 | 0.962 | 0.055 | 0.973 | 0.949 | 0.064 |

GPR | 0.982 | 0.964 | 0.056 | 0.978 | 0.953 | 0.062 | |

GEP | 0.965 | 0.931 | 0.074 | 0.962 | 0.926 | 0.086 |

Models | |||||||
---|---|---|---|---|---|---|---|

Train | Test | ||||||

R | DC | RMSE | R | DC | RMSE | ||

B(I) | SVM | 0.601 | 0.503 | 0.243 | 0.564 | 0.487 | 0.271 |

GPR | 0.604 | 0.507 | 0.241 | 0.566 | 0.490 | 0.268 | |

GEP | 0.521 | 0.402 | 0.302 | 0.505 | 0.334 | 0.312 | |

B(II) | SVM | 0.981 | 0.964 | 0.053 | 0.976 | 0.956 | 0.058 |

GPR | 0.985 | 0.971 | 0.051 | 0.981 | 0.961 | 0.055 | |

GEP | 0.973 | 0.947 | 0.064 | 0.972 | 0.943 | 0.068 | |

B(III) | SVM | 0.701 | 0.612 | 0.241 | 0.659 | 0.537 | 0.266 |

GPR | 0.705 | 0.613 | 0.240 | 0.661 | 0.539 | 0.263 | |

GEP | 0.691 | 0.532 | 0.260 | 0.618 | 0.438 | 0.298 | |

B(IV) | SVM | 0.980 | 0.961 | 0.056 | 0.976 | 0.951 | 0.062 |

GPR | 0.984 | 0.965 | 0.053 | 0.981 | 0.955 | 0.059 | |

GEP | 0.972 | 0.945 | 0.065 | 0.971 | 0.942 | 0.075 | |

B(V) | SVM | 0.705 | 0.529 | 0.272 | 0.647 | 0.424 | 0.283 |

GPR | 0.709 | 0.530 | 0.270 | 0.650 | 0.427 | 0.280 | |

GEP | 0.652 | 0.512 | 0.298 | 0.611 | 0.412 | 0.302 | |

B(VI) | SVM | 0.977 | 0.962 | 0.055 | 0.973 | 0.949 | 0.064 |

GPR | 0.982 | 0.964 | 0.056 | 0.978 | 0.953 | 0.062 | |

GEP | 0.965 | 0.931 | 0.074 | 0.962 | 0.926 | 0.086 |

### Developed models for circular culverts with different end inlet treatments (Scenario 2)

For Scenario 2, different models were developed based on flow condition and culvert diameters in order to assess the entrance loss in circular culverts with different end inlet treatments. The obtained results of GPR, SVM, and GEP models are listed in Table 5 and shown in Figure 4. The superior performance for this state and for all end inlet treatments was obtained for the model E(IV), in which the inputs were *Fr* and *Hw/D*. According to Table 5, it seems that for modeling entrance loss in culverts, using relative flow depth as the input parameter improved the efficiency of the models. Comparing the models E(I) and E(III) and considering the RMSE values, the obtained error percentage for the model E(I) is almost 8–22% more than the model E(II), therefore, in models with only one input parameter, using the *Fr* number led to better prediction than using the *Re* number. Among the four end inlet treatments, culverts with a mitered flush to 1.5:1 (horizontal to vertical) fill slope yielded a more accurate prediction. The mathematical expressions of GEP for all cases are as follows:

Model | |||||||
---|---|---|---|---|---|---|---|

Train | Test | ||||||

R | DC | RMSE | R | DC | RMSE | ||

E(I) | SVM | 0.705 | 0.632 | 0.075 | 0.634 | 0.547 | 0.073 |

GPR | 0.716 | 0.635 | 0.073 | 0.653 | 0.574 | 0.071 | |

GEP | 0.622 | 0.62 | 0.088 | 0.617 | 0.592 | 0.092 | |

E(II) | SVM | 0.841 | 0.732 | 0.059 | 0.832 | 0.681 | 0.068 |

GPR | 0.842 | 0.733 | 0.057 | 0.837 | 0.684 | 0.065 | |

GEP | 0.831 | 0.720 | 0.058 | 0.790 | 0.610 | 0.080 | |

E(III) | SVM | 0.820 | 0.692 | 0.065 | 0.695 | 0.612 | 0.071 |

GPR | 0.822 | 0.695 | 0.063 | 0.716 | 0.643 | 0.068 | |

GEP | 0.815 | 0.659 | 0.068 | 0.593 | 0.63 | 0.073 | |

E(IV) | SVM | 0.846 | 0.769 | 0.055 | 0.832 | 0.741 | 0.058 |

GPR | 0.853 | 0.770 | 0.054 | 0.857 | 0.748 | 0.056 | |

GEP | 0.851 | 0.750 | 0.060 | 0.830 | 0.725 | 0.063 | |

E(I) | SVM | 0.852 | 0.747 | 0.073 | 0.833 | 0.741 | 0.075 |

GPR | 0.856 | 0.751 | 0.070 | 0.837 | 0.745 | 0.073 | |

GEP | 0.843 | 0.735 | 0.08 | 0.826 | 0.69 | 0.09 | |

E(II) | SVM | 0.980 | 0.960 | 0.033 | 0.975 | 0.943 | 0.041 |

GPR | 0.985 | 0.962 | 0.030 | 0.979 | 0.947 | 0.039 | |

GEP | 0.964 | 0.924 | 0.046 | 0.962 | 0.88 | 0.057 | |

E(III) | SVM | 0.941 | 0.854 | 0.056 | 0.912 | 0.856 | 0.059 |

GPR | 0.944 | 0.857 | 0.054 | 0.918 | 0.859 | 0.056 | |

GEP | 0.911 | 0.822 | 0.07 | 0.908 | 0.82 | 0.06 | |

E(IV) | SVM | 0.979 | 0.959 | 0.032 | 0.978 | 0.954 | 0.037 |

GPR | 0.984 | 0.963 | 0.029 | 0.979 | 0.959 | 0.035 | |

GEP | 0.975 | 0.950 | 0.033 | 0.962 | 0.92 | 0.040 | |

E(I) | SVM | 0.820 | 0.685 | 0.041 | 0.812 | 0.635 | 0.052 |

GPR | 0.824 | 0.688 | 0.039 | 0.817 | 0.636 | 0.048 | |

GEP | 0.816 | 0.641 | 0.039 | 0.790 | 0.560 | 0.064 | |

E(II) | SVM | 0.887 | 0.791 | 0.031 | 0.876 | 0.671 | 0.036 |

GPR | 0.891 | 0.795 | 0.029 | 0.883 | 0.673 | 0.034 | |

GEP | 0.883 | 0.781 | 0.032 | 0.860 | 0.641 | 0.038 | |

E(III) | SVM | 0.834 | 0.752 | 0.033 | 0.817 | 0.662 | 0.046 |

GPR | 0.835 | 0.755 | 0.031 | 0.821 | 0.666 | 0.043 | |

GEP | 0.820 | 0.661 | 0.036 | 0.810 | 0.592 | 0.058 | |

E(IV) | SVM | 0.905 | 0.817 | 0.028 | 0.894 | 0.742 | 0.034 |

GPR | 0.911 | 0.822 | 0.025 | 0.896 | 0.745 | 0.032 | |

GEP | 0.889 | 0.791 | 0.030 | 0.876 | 0.732 | 0.034 | |

E(I) | SVM | 0.732 | 0.605 | 0.027 | 0.699 | 0.532 | 0.036 |

GPR | 0.736 | 0.608 | 0.026 | 0.702 | 0.533 | 0.034 | |

GEP | 0.687 | 0.504 | 0.029 | 0.664 | 0.470 | 0.035 | |

E(II) | SVM | 0.821 | 0.672 | 0.021 | 0.812 | 0.632 | 0.024 |

GPR | 0.824 | 0.677 | 0.018 | 0.816 | 0.633 | 0.023 | |

GEP | 0.753 | 0.630 | 0.022 | 0.680 | 0.612 | 0.024 | |

E(III) | SVM | 0.795 | 0.671 | 0.022 | 0.714 | 0.602 | 0.029 |

GPR | 0.799 | 0.673 | 0.020 | 0.718 | 0.605 | 0.028 | |

GEP | 0.730 | 0.582 | 0.024 | 0.654 | 0.530 | 0.032 | |

E(IV) | SVM | 0.914 | 0.833 | 0.013 | 0.895 | 0.736 | 0.015 |

GPR | 0.919 | 0.837 | 0.011 | 0.899 | 0.739 | 0.014 | |

GEP | 0.802 | 0.784 | 0.018 | 0.730 | 0.651 | 0.021 |

Model | |||||||
---|---|---|---|---|---|---|---|

Train | Test | ||||||

R | DC | RMSE | R | DC | RMSE | ||

E(I) | SVM | 0.705 | 0.632 | 0.075 | 0.634 | 0.547 | 0.073 |

GPR | 0.716 | 0.635 | 0.073 | 0.653 | 0.574 | 0.071 | |

GEP | 0.622 | 0.62 | 0.088 | 0.617 | 0.592 | 0.092 | |

E(II) | SVM | 0.841 | 0.732 | 0.059 | 0.832 | 0.681 | 0.068 |

GPR | 0.842 | 0.733 | 0.057 | 0.837 | 0.684 | 0.065 | |

GEP | 0.831 | 0.720 | 0.058 | 0.790 | 0.610 | 0.080 | |

E(III) | SVM | 0.820 | 0.692 | 0.065 | 0.695 | 0.612 | 0.071 |

GPR | 0.822 | 0.695 | 0.063 | 0.716 | 0.643 | 0.068 | |

GEP | 0.815 | 0.659 | 0.068 | 0.593 | 0.63 | 0.073 | |

E(IV) | SVM | 0.846 | 0.769 | 0.055 | 0.832 | 0.741 | 0.058 |

GPR | 0.853 | 0.770 | 0.054 | 0.857 | 0.748 | 0.056 | |

GEP | 0.851 | 0.750 | 0.060 | 0.830 | 0.725 | 0.063 | |

E(I) | SVM | 0.852 | 0.747 | 0.073 | 0.833 | 0.741 | 0.075 |

GPR | 0.856 | 0.751 | 0.070 | 0.837 | 0.745 | 0.073 | |

GEP | 0.843 | 0.735 | 0.08 | 0.826 | 0.69 | 0.09 | |

E(II) | SVM | 0.980 | 0.960 | 0.033 | 0.975 | 0.943 | 0.041 |

GPR | 0.985 | 0.962 | 0.030 | 0.979 | 0.947 | 0.039 | |

GEP | 0.964 | 0.924 | 0.046 | 0.962 | 0.88 | 0.057 | |

E(III) | SVM | 0.941 | 0.854 | 0.056 | 0.912 | 0.856 | 0.059 |

GPR | 0.944 | 0.857 | 0.054 | 0.918 | 0.859 | 0.056 | |

GEP | 0.911 | 0.822 | 0.07 | 0.908 | 0.82 | 0.06 | |

E(IV) | SVM | 0.979 | 0.959 | 0.032 | 0.978 | 0.954 | 0.037 |

GPR | 0.984 | 0.963 | 0.029 | 0.979 | 0.959 | 0.035 | |

GEP | 0.975 | 0.950 | 0.033 | 0.962 | 0.92 | 0.040 | |

E(I) | SVM | 0.820 | 0.685 | 0.041 | 0.812 | 0.635 | 0.052 |

GPR | 0.824 | 0.688 | 0.039 | 0.817 | 0.636 | 0.048 | |

GEP | 0.816 | 0.641 | 0.039 | 0.790 | 0.560 | 0.064 | |

E(II) | SVM | 0.887 | 0.791 | 0.031 | 0.876 | 0.671 | 0.036 |

GPR | 0.891 | 0.795 | 0.029 | 0.883 | 0.673 | 0.034 | |

GEP | 0.883 | 0.781 | 0.032 | 0.860 | 0.641 | 0.038 | |

E(III) | SVM | 0.834 | 0.752 | 0.033 | 0.817 | 0.662 | 0.046 |

GPR | 0.835 | 0.755 | 0.031 | 0.821 | 0.666 | 0.043 | |

GEP | 0.820 | 0.661 | 0.036 | 0.810 | 0.592 | 0.058 | |

E(IV) | SVM | 0.905 | 0.817 | 0.028 | 0.894 | 0.742 | 0.034 |

GPR | 0.911 | 0.822 | 0.025 | 0.896 | 0.745 | 0.032 | |

GEP | 0.889 | 0.791 | 0.030 | 0.876 | 0.732 | 0.034 | |

E(I) | SVM | 0.732 | 0.605 | 0.027 | 0.699 | 0.532 | 0.036 |

GPR | 0.736 | 0.608 | 0.026 | 0.702 | 0.533 | 0.034 | |

GEP | 0.687 | 0.504 | 0.029 | 0.664 | 0.470 | 0.035 | |

E(II) | SVM | 0.821 | 0.672 | 0.021 | 0.812 | 0.632 | 0.024 |

GPR | 0.824 | 0.677 | 0.018 | 0.816 | 0.633 | 0.023 | |

GEP | 0.753 | 0.630 | 0.022 | 0.680 | 0.612 | 0.024 | |

E(III) | SVM | 0.795 | 0.671 | 0.022 | 0.714 | 0.602 | 0.029 |

GPR | 0.799 | 0.673 | 0.020 | 0.718 | 0.605 | 0.028 | |

GEP | 0.730 | 0.582 | 0.024 | 0.654 | 0.530 | 0.032 | |

E(IV) | SVM | 0.914 | 0.833 | 0.013 | 0.895 | 0.736 | 0.015 |

GPR | 0.919 | 0.837 | 0.011 | 0.899 | 0.739 | 0.014 | |

GEP | 0.802 | 0.784 | 0.018 | 0.730 | 0.651 | 0.021 |

### Sensitivity and uncertainty analysis

*et al.*2005). In this study, for evaluating the impact of each parameter, the model was run with all input parameters and then one of the input parameters was eliminated and the model was re-run. Based on the results from Figure 5, it could be induced that in the bend loss prediction process the variable

*θ*, and in entrance loss prediction process the variable

*Hw/D*, had the most significant impact on local loss, respectively. With eliminating the

*θ*and

*Hw/D*variables the amount of RMSE error criteria increased to 0.212 and 0.066, respectively. Also, the uncertainty analysis was performed in order to determine the uncertainty of each parameter. Uncertainty is a result-dependent factor that demonstrates the range of values a modeling result can attain. It also represents the possibility that the measured value may fall into the speciﬁed range. In this study, the Monte Carlo uncertainty analysis method was used. In this procedure, two factors are considered to examine the robustness and analyze the uncertainty of the models. The first one is the percentage of the studied outputs that are in the range of 95PPU and the second one is the average distance between the upper (X

_{U}) and lower (X

_{L}) uncertainty bands (Noori

*et al.*2015). To do that, the considered model is developed many times (1,000 in the present study), and the empirical cumulative distribution probability of the models is obtained. After that, the X

_{L}and X

_{U}are considered 2.5 and 97.5% probabilities of the cumulative distribution, respectively. The appropriate conﬁdence level is the level in which two requirements are met: (1) The 95PPU band brackets most of the observations, and (2) the average distance between the upper (at the 97.5% level) and the lower (at the 2.5% level) parts of the 95PPU is small. Quantiﬁcations of the two requirements are problem dependent to an extent. For the second requirement it is essential that the average distance between the upper and the lower 95PPU be smaller than the standard deviation of the measured data (Abbaspour

*et al.*2007). The above two indices were used to quantify the strength of calibration, accounting for the combined parameter, model, and input uncertainties. To evaluate the average width of the conﬁdence interval band, the band width indicator was suggested by Abbaspour

*et al.*(2007) as follows: where

*σx*is the standard deviation of observed data and is the conﬁdence band's average width. The percentage of the data within the conﬁdence band of 95% is determined as follows: where 95PPU indicates 95% predicted uncertainty;

*k*is the number of observed data; l is the current month which changes from 1 to

*k*and

*X*is the current registered data. The results are shown in Figure 6. According to Figure 6, and based on the d-factor and 95PPU values, it can be seen that the

_{reg}*Fr*parameter has higher uncertainty compared with the

*Hw/D*and

*θ*parameters. The

*Fr*parameter higher uncertainty is due to the high value of d-factor (i.e. 0.48 in bend loss state and 0.72 in inlet loss stat) and less value of 95PPU (i.e. 58.33 in bend loss state and 68.78 in inlet loss state).

For investigating the main effects of parameters quantitatively, the factorial analysis (FA) was also performed. FA is originated from experimental design to explore both the main and interaction effects of several factors on a response variable (Tezcan *et al.* 2015). It is particularly useful when there is a curvilinear relationship between design factors and the response variable. In fact, FA attempts to identify underlying variables, or factors, that explain the pattern of correlations within a set of observed variables. It is often used in data reduction to identify a small number of factors that explain most of the variance that is observed in a much larger number of manifest variables. FA can also be used to generate hypotheses regarding causal mechanisms or to screen variables for subsequent analysis. The results of FA are listed in Table 6. According to the results, it could be seen that the correlation coefficients between *K* and *θ* (in bend loss sate), and between *K* and *Hw/D* (in local loss state) are higher than other parameters. Therefore, the *θ* and *Hw/D* variables are more effective in energy losses modeling.

State | Parameters | |||
---|---|---|---|---|

Best model of bend loss in rectangular culvert | Fr | θ | K | |

Fr | 1 | –0.301 | –0.485 | |

θ | 1 | 0.947 | ||

K | 1 | |||

Best model of inlet loss in circular culverts | Fr | Hw/D | K | |

Fr | 1 | –0.719 | –0.779 | |

Hw/D | 1 | 0.947 | ||

K | 1 |

State | Parameters | |||
---|---|---|---|---|

Best model of bend loss in rectangular culvert | Fr | θ | K | |

Fr | 1 | –0.301 | –0.485 | |

θ | 1 | 0.947 | ||

K | 1 | |||

Best model of inlet loss in circular culverts | Fr | Hw/D | K | |

Fr | 1 | –0.719 | –0.779 | |

Hw/D | 1 | 0.947 | ||

K | 1 |

### Combined data

For evaluating the performance of the GPR method for a wide range of data, data series of entrance loss were combined. Two states were considered in the data combining process: pairwise mixing and mixing all data series. Then, for predicting *Ke*, the superior model of Scenario 2 (the model E(IV)) was re-run for the mixed data. The results of this state are given in Figure 7. It could be seen that pairwise mixing of thin-wall projecting and mitered flush to 1.5:1 (horizontal to vertical) fill slope data sets led to better results in comparison with another pairwise mixing. However, according to Figure 7, the results revealed that using the mixed dataset decreased the model accuracy, especially for the state of combining all data series, it could be seen that the error criteria increased significantly. For all mixed data sets the values of R and DC decreased and the RMSE values increased. However, it should be noted that the models based on mixed data sets are able to cover a wider range of data and in this case, entrance loss can be studied without regarding the end inlet treatment shape.

## CONCLUSIONS

In the current study, the capability of the GPR, SVM, and GEP approaches were assessed for predicting local loss in culverts. The culvert experimental data with a different shape was applied for training and testing the models. In the model development process, two scenarios were considered and bend and inlet end treatment losses were evaluated. According to the results, it was found that in Scenario 1, which investigated the bend loss in a rectangular culvert, the model with input parameters of *Fr* and *θ* led to more accurate results. It was observed that using the Froude number of the bend downstream caused an increment in model efficiency. Also, the bend upstream Froude number did not lead to undesirable results, therefore, this parameter can be used when there is no information about flow conditions of bend downstream. It showed that the bend angle had a significant impact on local loss prediction process. The superior performance for Scenario 2 and for all end inlet treatments was obtained for the model E(IV), in which the inputs were *Fr* and *Hw/D*. It was observed that for modeling entrance loss in culverts, using relative flow depth as the input parameter improved the efficiency of the models. For models with only one input variable, using the *Fr* number led to better prediction than the *Re* number. Among the four end inlet treatments, culverts with a mitered flush to 1.5:1 (horizontal to vertical) fill slope yielded more accurate prediction. It was also observed that the mixed data set led to a less accurate outcome. From the obtained results of OAT and factorial sensitivity analysis and Monte Carlo uncertainty analysis, it was found that the correlation coefficients between *K* and *θ* (in bend loss state), and between *K* and *Hw/D* (in local loss state) were higher than other parameters. Therefore, the variable *θ* and *Hw/D* had the most significant impact on local loss prediction. Also, the *Fr* parameter had higher uncertainty compared with *Hw/D* and *θ* parameters. The proposed approaches were found to be able to predict local loss in different shaped culverts successfully, however, it should be noted that the used methods are data-driven models and the SVM, GPR, and GEP-based models are data sensitive, so further studies using data ranges out of this study and field data should be carried out to find the merits of the models to estimate local energy loss in real conditions of flow.