Abstract
Energy dissipation in culverts is a complex phenomenon due to the nonlinearity and uncertainties of the process. In the current study, the capability of Gaussian process regression (GPR) and support vector machine (SVM) as kernel-based approaches and the gene expression programming (GEP) method was assessed in predicting energy losses in culverts. Two types of bend loss in rectangular culverts and entrance loss in circular culverts with different inlet end treatments were considered. Various input combinations were developed and tested using experimental data. The OAT (one-at-a-time), factorial sensitivity analysis and Monte Carlo uncertainty analysis were used to select the effective parameters in modeling. The results of performance criteria proved the capability of the applied methods (i.e. high correlation coefficient (R) and determination coefficient (DC) and low root mean square error (RSME)). For rectangular culverts, the model with parameters Fr (Froude number) and θ (bend angle), and for circular culverts, the model with parameters Fr and Hw/D (depth ratio), were the superior models. It showed that using the bend downstream Froude number caused an increment in model efficiency. Among the four end inlet treatments, mitered flush to 1.5:1 fill slope inlet yielded more accurate prediction. The sensitivity and uncertainty analysis showed that θ and Hw/D had the most significant impact on modeling, and Fr had the highest uncertainty.
INTRODUCTION
A culvert is a hydraulically short segment of conduit which conveys stream flow through a roadway embankment or past some other type of flow obstruction. Numerous cross-sectional shapes are available. The most commonly used shapes include circular, box (rectangular), elliptical, pipe-arch, and arch. Prediction of the accurate amount of local loss in the culvert systems is important due to its impact on saving costs and time of construction processes and determination of the size, shape, and diameter of the culverts. In a culvert system, with decreasing energy loss, its effect on the upstream flow profile decreases. The energy loss is divided into two categories: major loss and minor (or local) loss, which in the culvert systems due to its short length, major energy loss is negligible compared with minor loss. In fact, major loss is caused due to the friction between the flow and pipe walls and since culverts are usually used in short lengths in practice, therefore, the frictional or longitudinal loss is negligible in comparison with minor loss. So far, various studies have been conducted to explain the complex phenomenon of energy losses in culvert systems with different geometries. Tullis (2012) investigated the minor loss in buried-invert culverts and determined the optimum section from the point of view of the least loss. Malone & Parr (2008) investigated bend loss in rectangular culverts and proposed some graphs for calculating this parameter. Tullis et al. (2005) studied the inlet losses in elliptical culverts and offered a precise method for calculating the outlet loss of the culvert and to identify the best section with minimum loss. Anderson (2006) studied the worn-out culverts and embedding new culverts inside them and determined the local loss empirically. Kotowski et al. (2011) studied the inlet and outlet loss coefficient in the conduits and concluded that inlet loss coefficient in the pipes was not constant. However, due to the complexity and uncertainty of the local losses phenomenon, the results of the classical models are not general and under variable conditions do not present the same results. Therefore, it is essential to use other methods with more accuracy in predicting energy loss in culverts with different shapes under varied hydraulic conditions.
In recent years, the application of nonlinear machine learning (ML) (e.g. artificial neural networks (ANNs), neuro-fuzzy models (NF), genetic programming (GP), gene expression programming (GEP), support vector machine (SVM), and Gaussian process regression (GPR)) in water resources engineering has become viable leading to numerous publications in this field. A complete review of all the applications is beyond the scope of this paper and only some studies are mentioned here, such as assessing tree-based methods concepts, uses and limitations (Carvalho et al. 2018), modeling historical land use changes using ANN (Tayyebi et al. 2017), prediction of groundwater levels using data-driven models (Huang et al. 2017; Amaranto et al. 2018), estimation of hydraulic jump energy dissipation in channels with rough elements using SVM (Roushangar & Ghasempour 2018), prediction of pile scour using ANN and kernel methods (Ghazanfari-Hashemi et al. 2011; Pal et al. 2014), computing longitudinal dispersion coefficients in natural streams using SVM (Azamathulla & Wu 2011), real time hydrologic forecasting using EC-SVM (Yu et al. 2004), quantify runoff contributions from different land uses in tropical urban environments using GP (Meshgi et al. 2015), side weir discharge coefficient using SVM (Azamathulla et al. 2017), and forecasting monthly and seasonal streamflow using mixture-kernel GPR approach (Zhu et al. 2018).
In artificial intelligence models we are looking for a learning machine capable of finding an accurate approximation of a natural phenomenon, as well as expressing it in the form of an interpretable equation. However, this bias towards interpretability creates several new issues. The computer-generated hypotheses should take advantage of the already existing body of knowledge about the domain in question. However, the method by which we express our knowledge and make it available to a learning machine remains rather unclear (Babovic 2009). Machine learning, a branch of artificial intelligence, deals with the representation and generalization using a data learning technique. Representation of data instances and functions evaluated on these instances are part of all machine learning systems. Generalization is the property that the system will perform well on unseen data instances; the conditions under which this can be guaranteed are a key object of study in the subfield of computational learning theory. There is a wide variety of machine learning tasks and successful applications (Mitchell 1997). In general, the task of a machine learning algorithm can be described as follows: Given a set of input variables and the associated output variable(s), the objective is learning a functional relationship for the input–output variables set. It should be noted that artificial intelligence models typically do not really represent the physics of a modeled process; they are just devices used to capture relationships between the relevant input and output variables. However, when the interrelationships among the relevant variables are poorly understood, finding the size and shape of the ultimate solution is difficult, and conventional mathematical analysis methods do not (or cannot) provide analytical solutions; these methods can predict the interest variable with more accuracy.
Due to the complexity and uncertainties of the energy losses process, the existing regression models do not show desired accuracy and their output is often associated with large errors. Therefore, the present research proposed kernel based models to predict the energy losses coefficient and also to investigate the best input models and determine the effective parameters of different shaped culverts. To the best of the authors' knowledge there is a lack of research on the comprehensive study of predicting local losses in culverts using artificial intelligence. In all previous studies the local loss coefficient in the culvert was measured and recorded experimentally at various velocities, but the relationship between this coefficient and the Froude and Reynolds numbers, and the geometric parameters, and the dependency rate of this coefficient on these parameters, was not investigated. Therefore, this study aimed to assess the capability of GPR and SVM as kernel-based approaches for modeling the losses of culverts with different geometries. Also, the GEP method was used to develop new equations for predicting the local loss coefficient in different shaped culverts. In order to determine the most effective combination for modeling the losses of culverts, different input combinations were considered under two scenarios (losses due to the culvert bend and culvert entrance) and the impact of hydraulic characteristics and culvert shapes was assessed. In addition, the most important parameters in predicting the energy losses were determined using one-at-a-time (OAT) and factorial sensitivity analysis and Monte Carlo uncertainty sensitivity.
MATERIALS AND METHODS
The data sets
The data sets of laboratory experiments of local losses of culverts performed by Malone & Parr (2008) and Tullis (2012) were used in the present study. Malone & Parr (2008) studied bend losses in rectangular culverts. Laboratory experiments were performed in rectangular channels with abrupt bends. Bend angles of approximately 30, 45, 60, 75 and 90° were tested. Tullis (2012) conducted experiments on circular culverts in order to determine the entrance loss coefficient and the inlet control head discharge relationships for circular culverts with invert burial depths of 20, 40, and 50%. All buried-invert culverts were tested with four different end treatments: (1) thin-wall projecting, (2) mitered flush to 1.5:1.0 (horizontal to vertical) fill slope, (3) square-edged inlet with vertical headwall, and (4) 45° beveled entrance with vertical headwall. The ranges of some parameters used in the tests are given in Table 1 in which Ke, θ and Fr, and Re represent the entrance loss coefficient, culvert bend angle, Froude number, and Reynolds number respectively.
Bend loss . | Entrance loss . | |||||
---|---|---|---|---|---|---|
. | . | . | Circular culvert (Tullis 2012) . | |||
Rectangular culvert (Malone & Parr 2008) . | . | . | Thin-wall projecting . | Mitered to flush 1.5 h:1 v fill slope . | Square edge inlet with vertical headwalls . | 45° beveled inlet with vertical headwalls . |
Parameters | Ke | 0.157–1.078 | 0.157–1.03 | 0.42–0.93 | 0.3–0.6 | 0.22–0.38 |
θ (radian) | 0.523–1.578 | – | – | – | – | |
Fr | 0.181–0.86 | 0.0124–1.058 | 0.01–0.81 | 0.43–0.97 | 0.049–1.05 | |
Re | 42,138–140,590 | 14,408–30,711 | 18,743–268,0463 | 9,616–305,469 | 79,175–292,240 | |
No. of data | 190 | 66 | 65 | 45 | 48 |
Bend loss . | Entrance loss . | |||||
---|---|---|---|---|---|---|
. | . | . | Circular culvert (Tullis 2012) . | |||
Rectangular culvert (Malone & Parr 2008) . | . | . | Thin-wall projecting . | Mitered to flush 1.5 h:1 v fill slope . | Square edge inlet with vertical headwalls . | 45° beveled inlet with vertical headwalls . |
Parameters | Ke | 0.157–1.078 | 0.157–1.03 | 0.42–0.93 | 0.3–0.6 | 0.22–0.38 |
θ (radian) | 0.523–1.578 | – | – | – | – | |
Fr | 0.181–0.86 | 0.0124–1.058 | 0.01–0.81 | 0.43–0.97 | 0.049–1.05 | |
Re | 42,138–140,590 | 14,408–30,711 | 18,743–268,0463 | 9,616–305,469 | 79,175–292,240 | |
No. of data | 190 | 66 | 65 | 45 | 48 |
Kernel-based approaches
Kernel based approaches, such as GPR and SVM, are a relatively new important method based on the different kernel types which are based on initiation of statistical learning theory. Such models are capable of adapting themselves to predict any variable of interest via sufficient inputs. These methods can model non-linear decision boundaries, and there are many kernels to choose from. They are also fairly robust against overfitting, especially in high-dimensional space. However, the appropriate selection of kernel type is the most important step in the GPR and SVM due to its direct impact on the training and classification precision. In fact, these methods are memory intensive, trickier to tune due to the importance of picking the right kernel, and do not scale well to larger data sets. In these models we will be able to predict the proper behavior of the system, although we will not be able to characterize its intrinsic structure and behavior. In other words, we will be able to say what the model does, but not how. In addition to this, we will not be able to guarantee the behavior of such a model in regions not covered by the data from which the model was constructed. This is due to the fact that the model covers only the relationships found within the given data (Babovic 2009).
Gaussian process regression
GPR models are based on the assumption that adjacent observations should convey information about each other. Gaussian processes are a way of specifying a priori directly over function space. This is a natural generalization of the Gaussian distribution, whose mean and covariance are a vector and matrix, respectively. The Gaussian distribution is over vectors, whereas the Gaussian process is over functions. Thus, due to prior knowledge about the data and functional dependencies, no validation process is required for generalization and GP regression models are able to understand the predictive distribution corresponding to the test input (Rasmussen & Williams 2006). A GP is defined as a collection of random variables, any finite number of which has a joint multivariate Gaussian distribution. Let represent the domains of inputs and outputs, respectively, from which n pairs are drawn independently and identically distributed. For regression, assume that ; then, a GP on is defined by a mean function and a covariance function .
To find the hyperparameters, the partial derivative of Equation (3) can be obtained with respect to and k, and minimization can be achieved by gradient descent. For more details about GP regression and different covariance functions, readers are referred to Kuss (2006). The optimal value of capacity constant (C) and the size of error-intensive zone (ɛ) in SVM and Gaussian noise in GPR are required due to their high impact on the accuracy of the mentioned regression approaches. The optimum values of these parameters were obtained after the trial-and-error process.
Support vector machine
Support vector machines as an intelligence approach are used in information categorization and data set classification. This approach, developed by Vapnik (1995), is known as structural risk minimization (SRM), which minimizes an upper bound on the expected risk, as opposed to the traditional empirical risk (ERM) which minimizes the error on the training data. The SVM method is based on the concept of the optimal hyper plane that separates samples of two classes by considering the widest gap between two classes. SVR is an extension of SVM regression. The purpose of the SVR is to find a function having the most deviation from the actual target vectors for all given training data and to have it be as flat as possible (Smola 1996). Vapnik (1995) introduced the concept of kernel function for non-linear support vector regression. The most important step in the SVM is the appropriate selection of kernel type. In general, there are several types of kernel functions, namely linear, polynomial, radial basis function (RBF) and sigmoid functions. Due to the black-box nature of SVM and GPR models, the learned relationship between the inputs and output is not revealed. This requires cautious usage of the new model, such as GEP, and it should not be used beyond the ranges of the data for which it was trained.
Gene expression programming
Gene expression programming was developed by Ferreria (2001) using fundamental principles of genetic algorithms (GA) and genetic programming (GP). One strength of the GEP approach is that the creation of genetic diversity is extremely simplified as genetic operators work at the chromosome level. Another strength of GEP consists of its unique, multigenic nature, which allows the evolution of more complex programs composed of several subprograms. GEP as GA mimics the biological evolution to create a computer program for simulating a specified phenomenon. A GEP algorithm begins by selecting five elements such as the function set, terminal set, fitness function, control parameters, and stopping condition. There is a comparison between predicted values and actual values in each subsequent step. When desired results in accordance with previously selected error criteria are found, the GEP process is terminated. If the desired error criteria could not be found, some chromosomes are chosen by a method called roulette wheel sampling and they are mutated to obtain new chromosomes. After the desired fitness score is found, this process terminates and then the chromosomes are decoded for the best solution of the problem. The advantages of a system like GEP are clear from nature, but the most important are (Ferreria 2001): (1) the chromosomes are simple entities: linear, compact, relatively small, easy to manipulate genetically (replicate, mutate, recombine, etc.); (2) the expression trees are exclusively the expression of their respective chromosomes; they are entities upon which selection acts, and according to fitness, they are selected to reproduce with modification.
Performance criteria
Simulation and models development
Input variables
Bend energy loss . | Entrance loss . | ||
---|---|---|---|
Model . | Input variables . | Model . | Input variables . |
B(I) | Fr downstream | E(I) | Re |
B(II) | Fr downstream, θ | E(II) | Re, Hw/D |
B(III) | Fr average | E(III) | Fr |
B(IV) | Fr average, θ | E(IV) | Fr, Hw/D |
B (V) | Fr upstream | ||
B(VI) | Fr upstream, θ |
Bend energy loss . | Entrance loss . | ||
---|---|---|---|
Model . | Input variables . | Model . | Input variables . |
B(I) | Fr downstream | E(I) | Re |
B(II) | Fr downstream, θ | E(II) | Re, Hw/D |
B(III) | Fr average | E(III) | Fr |
B(IV) | Fr average, θ | E(IV) | Fr, Hw/D |
B (V) | Fr upstream | ||
B(VI) | Fr upstream, θ |
SVM, GPR, and GEP models development
The design of GP and SVM-based regression approaches involve the use of the concept of the kernel function. A number of kernels are discussed in the literature, but studies suggest a better performance by radial basis kernels for different civil engineering problems (Gill et al. 2006). In this study, for determining the best performance of SVM and GPR and selecting the best kernel function, the model B(II) from Scenario 1 in a rectangular culvert was predicted via SVM and GPR using various kernels. Figure 2 indicates the results of the statistical parameters of different kernels for this model. According to the results, using the kernel function of RBF in the SVM model led to better prediction accuracy in comparison to the other kernels and for the GPR model using the kernel function of Pearson led to better prediction accuracy. Therefore, RBF and Pearson kernels were used as a core tool of SVM and GPR which were applied for the rest of the models.
GEP was trained for energy losses prediction in rectangular and circular culverts. Basic arithmetic operators of (+, –, *, /) and several mathematical functions (exp, X2, X3, √) were utilized as the GEP function set. The architecture of the chromosomes, including number of chromosomes (25-30-35), head size (7-8) and number of genes (3–4), were selected and different combinations of the mentioned parameters were tested. The model was run for a number of generations and was stopped when there was no significant change in the fitness function value and coefficient of correlation. It is observed that the model with the number of chromosomes of 30, head size of 7, and number of genes of 3 yielded better results. Also, addition and multiplication were tested as linking functions and it was found that linking the sub-ETs by addition represented better fitness values. One of the important steps in preparing the GEP model is to choose the set of genetic operators. In the current study, a combination of all genetic operators (recombination, mutation, transposition, and crossover) was used for this aim. Parameters of the optimized GEP model are shown in Table 3.
Description of parameter . | Setting of parameter . | Description of parameter . | Setting of parameter . |
---|---|---|---|
Function set | +, –, ×, /, X2, X3, √ | Fitness function error type | Root mean square error |
Chromosomes | 30 | Mutation rate | 0.044 |
Head size | 7 | Inversion, IS and RIS transposition rate | 0.1 |
Number of genes | 3 | One- and two-point recombination rate | 0.3 |
Linking function | Addition | Gene recombination and transposition rate | 0.1 |
Description of parameter . | Setting of parameter . | Description of parameter . | Setting of parameter . |
---|---|---|---|
Function set | +, –, ×, /, X2, X3, √ | Fitness function error type | Root mean square error |
Chromosomes | 30 | Mutation rate | 0.044 |
Head size | 7 | Inversion, IS and RIS transposition rate | 0.1 |
Number of genes | 3 | One- and two-point recombination rate | 0.3 |
Linking function | Addition | Gene recombination and transposition rate | 0.1 |
RESULTS AND DISCUSSION
Developed models for rectangular culvert with bend (Scenario 1)
Models . | |||||||
---|---|---|---|---|---|---|---|
. | . | Train . | Test . | ||||
. | . | R . | DC . | RMSE . | R . | DC . | RMSE . |
B(I) | SVM | 0.601 | 0.503 | 0.243 | 0.564 | 0.487 | 0.271 |
GPR | 0.604 | 0.507 | 0.241 | 0.566 | 0.490 | 0.268 | |
GEP | 0.521 | 0.402 | 0.302 | 0.505 | 0.334 | 0.312 | |
B(II) | SVM | 0.981 | 0.964 | 0.053 | 0.976 | 0.956 | 0.058 |
GPR | 0.985 | 0.971 | 0.051 | 0.981 | 0.961 | 0.055 | |
GEP | 0.973 | 0.947 | 0.064 | 0.972 | 0.943 | 0.068 | |
B(III) | SVM | 0.701 | 0.612 | 0.241 | 0.659 | 0.537 | 0.266 |
GPR | 0.705 | 0.613 | 0.240 | 0.661 | 0.539 | 0.263 | |
GEP | 0.691 | 0.532 | 0.260 | 0.618 | 0.438 | 0.298 | |
B(IV) | SVM | 0.980 | 0.961 | 0.056 | 0.976 | 0.951 | 0.062 |
GPR | 0.984 | 0.965 | 0.053 | 0.981 | 0.955 | 0.059 | |
GEP | 0.972 | 0.945 | 0.065 | 0.971 | 0.942 | 0.075 | |
B(V) | SVM | 0.705 | 0.529 | 0.272 | 0.647 | 0.424 | 0.283 |
GPR | 0.709 | 0.530 | 0.270 | 0.650 | 0.427 | 0.280 | |
GEP | 0.652 | 0.512 | 0.298 | 0.611 | 0.412 | 0.302 | |
B(VI) | SVM | 0.977 | 0.962 | 0.055 | 0.973 | 0.949 | 0.064 |
GPR | 0.982 | 0.964 | 0.056 | 0.978 | 0.953 | 0.062 | |
GEP | 0.965 | 0.931 | 0.074 | 0.962 | 0.926 | 0.086 |
Models . | |||||||
---|---|---|---|---|---|---|---|
. | . | Train . | Test . | ||||
. | . | R . | DC . | RMSE . | R . | DC . | RMSE . |
B(I) | SVM | 0.601 | 0.503 | 0.243 | 0.564 | 0.487 | 0.271 |
GPR | 0.604 | 0.507 | 0.241 | 0.566 | 0.490 | 0.268 | |
GEP | 0.521 | 0.402 | 0.302 | 0.505 | 0.334 | 0.312 | |
B(II) | SVM | 0.981 | 0.964 | 0.053 | 0.976 | 0.956 | 0.058 |
GPR | 0.985 | 0.971 | 0.051 | 0.981 | 0.961 | 0.055 | |
GEP | 0.973 | 0.947 | 0.064 | 0.972 | 0.943 | 0.068 | |
B(III) | SVM | 0.701 | 0.612 | 0.241 | 0.659 | 0.537 | 0.266 |
GPR | 0.705 | 0.613 | 0.240 | 0.661 | 0.539 | 0.263 | |
GEP | 0.691 | 0.532 | 0.260 | 0.618 | 0.438 | 0.298 | |
B(IV) | SVM | 0.980 | 0.961 | 0.056 | 0.976 | 0.951 | 0.062 |
GPR | 0.984 | 0.965 | 0.053 | 0.981 | 0.955 | 0.059 | |
GEP | 0.972 | 0.945 | 0.065 | 0.971 | 0.942 | 0.075 | |
B(V) | SVM | 0.705 | 0.529 | 0.272 | 0.647 | 0.424 | 0.283 |
GPR | 0.709 | 0.530 | 0.270 | 0.650 | 0.427 | 0.280 | |
GEP | 0.652 | 0.512 | 0.298 | 0.611 | 0.412 | 0.302 | |
B(VI) | SVM | 0.977 | 0.962 | 0.055 | 0.973 | 0.949 | 0.064 |
GPR | 0.982 | 0.964 | 0.056 | 0.978 | 0.953 | 0.062 | |
GEP | 0.965 | 0.931 | 0.074 | 0.962 | 0.926 | 0.086 |
Developed models for circular culverts with different end inlet treatments (Scenario 2)
For Scenario 2, different models were developed based on flow condition and culvert diameters in order to assess the entrance loss in circular culverts with different end inlet treatments. The obtained results of GPR, SVM, and GEP models are listed in Table 5 and shown in Figure 4. The superior performance for this state and for all end inlet treatments was obtained for the model E(IV), in which the inputs were Fr and Hw/D. According to Table 5, it seems that for modeling entrance loss in culverts, using relative flow depth as the input parameter improved the efficiency of the models. Comparing the models E(I) and E(III) and considering the RMSE values, the obtained error percentage for the model E(I) is almost 8–22% more than the model E(II), therefore, in models with only one input parameter, using the Fr number led to better prediction than using the Re number. Among the four end inlet treatments, culverts with a mitered flush to 1.5:1 (horizontal to vertical) fill slope yielded a more accurate prediction. The mathematical expressions of GEP for all cases are as follows.
. | . | Model . | |||||
---|---|---|---|---|---|---|---|
. | . | Train . | Test . | ||||
. | . | R . | DC . | RMSE . | R . | DC . | RMSE . |
E(I) | SVM | 0.705 | 0.632 | 0.075 | 0.634 | 0.547 | 0.073 |
GPR | 0.716 | 0.635 | 0.073 | 0.653 | 0.574 | 0.071 | |
GEP | 0.622 | 0.62 | 0.088 | 0.617 | 0.592 | 0.092 | |
E(II) | SVM | 0.841 | 0.732 | 0.059 | 0.832 | 0.681 | 0.068 |
GPR | 0.842 | 0.733 | 0.057 | 0.837 | 0.684 | 0.065 | |
GEP | 0.831 | 0.720 | 0.058 | 0.790 | 0.610 | 0.080 | |
E(III) | SVM | 0.820 | 0.692 | 0.065 | 0.695 | 0.612 | 0.071 |
GPR | 0.822 | 0.695 | 0.063 | 0.716 | 0.643 | 0.068 | |
GEP | 0.815 | 0.659 | 0.068 | 0.593 | 0.63 | 0.073 | |
E(IV) | SVM | 0.846 | 0.769 | 0.055 | 0.832 | 0.741 | 0.058 |
GPR | 0.853 | 0.770 | 0.054 | 0.857 | 0.748 | 0.056 | |
GEP | 0.851 | 0.750 | 0.060 | 0.830 | 0.725 | 0.063 | |
E(I) | SVM | 0.852 | 0.747 | 0.073 | 0.833 | 0.741 | 0.075 |
GPR | 0.856 | 0.751 | 0.070 | 0.837 | 0.745 | 0.073 | |
GEP | 0.843 | 0.735 | 0.08 | 0.826 | 0.69 | 0.09 | |
E(II) | SVM | 0.980 | 0.960 | 0.033 | 0.975 | 0.943 | 0.041 |
GPR | 0.985 | 0.962 | 0.030 | 0.979 | 0.947 | 0.039 | |
GEP | 0.964 | 0.924 | 0.046 | 0.962 | 0.88 | 0.057 | |
E(III) | SVM | 0.941 | 0.854 | 0.056 | 0.912 | 0.856 | 0.059 |
GPR | 0.944 | 0.857 | 0.054 | 0.918 | 0.859 | 0.056 | |
GEP | 0.911 | 0.822 | 0.07 | 0.908 | 0.82 | 0.06 | |
E(IV) | SVM | 0.979 | 0.959 | 0.032 | 0.978 | 0.954 | 0.037 |
GPR | 0.984 | 0.963 | 0.029 | 0.979 | 0.959 | 0.035 | |
GEP | 0.975 | 0.950 | 0.033 | 0.962 | 0.92 | 0.040 | |
E(I) | SVM | 0.820 | 0.685 | 0.041 | 0.812 | 0.635 | 0.052 |
GPR | 0.824 | 0.688 | 0.039 | 0.817 | 0.636 | 0.048 | |
GEP | 0.816 | 0.641 | 0.039 | 0.790 | 0.560 | 0.064 | |
E(II) | SVM | 0.887 | 0.791 | 0.031 | 0.876 | 0.671 | 0.036 |
GPR | 0.891 | 0.795 | 0.029 | 0.883 | 0.673 | 0.034 | |
GEP | 0.883 | 0.781 | 0.032 | 0.860 | 0.641 | 0.038 | |
E(III) | SVM | 0.834 | 0.752 | 0.033 | 0.817 | 0.662 | 0.046 |
GPR | 0.835 | 0.755 | 0.031 | 0.821 | 0.666 | 0.043 | |
GEP | 0.820 | 0.661 | 0.036 | 0.810 | 0.592 | 0.058 | |
E(IV) | SVM | 0.905 | 0.817 | 0.028 | 0.894 | 0.742 | 0.034 |
GPR | 0.911 | 0.822 | 0.025 | 0.896 | 0.745 | 0.032 | |
GEP | 0.889 | 0.791 | 0.030 | 0.876 | 0.732 | 0.034 | |
E(I) | SVM | 0.732 | 0.605 | 0.027 | 0.699 | 0.532 | 0.036 |
GPR | 0.736 | 0.608 | 0.026 | 0.702 | 0.533 | 0.034 | |
GEP | 0.687 | 0.504 | 0.029 | 0.664 | 0.470 | 0.035 | |
E(II) | SVM | 0.821 | 0.672 | 0.021 | 0.812 | 0.632 | 0.024 |
GPR | 0.824 | 0.677 | 0.018 | 0.816 | 0.633 | 0.023 | |
GEP | 0.753 | 0.630 | 0.022 | 0.680 | 0.612 | 0.024 | |
E(III) | SVM | 0.795 | 0.671 | 0.022 | 0.714 | 0.602 | 0.029 |
GPR | 0.799 | 0.673 | 0.020 | 0.718 | 0.605 | 0.028 | |
GEP | 0.730 | 0.582 | 0.024 | 0.654 | 0.530 | 0.032 | |
E(IV) | SVM | 0.914 | 0.833 | 0.013 | 0.895 | 0.736 | 0.015 |
GPR | 0.919 | 0.837 | 0.011 | 0.899 | 0.739 | 0.014 | |
GEP | 0.802 | 0.784 | 0.018 | 0.730 | 0.651 | 0.021 |
. | . | Model . | |||||
---|---|---|---|---|---|---|---|
. | . | Train . | Test . | ||||
. | . | R . | DC . | RMSE . | R . | DC . | RMSE . |
E(I) | SVM | 0.705 | 0.632 | 0.075 | 0.634 | 0.547 | 0.073 |
GPR | 0.716 | 0.635 | 0.073 | 0.653 | 0.574 | 0.071 | |
GEP | 0.622 | 0.62 | 0.088 | 0.617 | 0.592 | 0.092 | |
E(II) | SVM | 0.841 | 0.732 | 0.059 | 0.832 | 0.681 | 0.068 |
GPR | 0.842 | 0.733 | 0.057 | 0.837 | 0.684 | 0.065 | |
GEP | 0.831 | 0.720 | 0.058 | 0.790 | 0.610 | 0.080 | |
E(III) | SVM | 0.820 | 0.692 | 0.065 | 0.695 | 0.612 | 0.071 |
GPR | 0.822 | 0.695 | 0.063 | 0.716 | 0.643 | 0.068 | |
GEP | 0.815 | 0.659 | 0.068 | 0.593 | 0.63 | 0.073 | |
E(IV) | SVM | 0.846 | 0.769 | 0.055 | 0.832 | 0.741 | 0.058 |
GPR | 0.853 | 0.770 | 0.054 | 0.857 | 0.748 | 0.056 | |
GEP | 0.851 | 0.750 | 0.060 | 0.830 | 0.725 | 0.063 | |
E(I) | SVM | 0.852 | 0.747 | 0.073 | 0.833 | 0.741 | 0.075 |
GPR | 0.856 | 0.751 | 0.070 | 0.837 | 0.745 | 0.073 | |
GEP | 0.843 | 0.735 | 0.08 | 0.826 | 0.69 | 0.09 | |
E(II) | SVM | 0.980 | 0.960 | 0.033 | 0.975 | 0.943 | 0.041 |
GPR | 0.985 | 0.962 | 0.030 | 0.979 | 0.947 | 0.039 | |
GEP | 0.964 | 0.924 | 0.046 | 0.962 | 0.88 | 0.057 | |
E(III) | SVM | 0.941 | 0.854 | 0.056 | 0.912 | 0.856 | 0.059 |
GPR | 0.944 | 0.857 | 0.054 | 0.918 | 0.859 | 0.056 | |
GEP | 0.911 | 0.822 | 0.07 | 0.908 | 0.82 | 0.06 | |
E(IV) | SVM | 0.979 | 0.959 | 0.032 | 0.978 | 0.954 | 0.037 |
GPR | 0.984 | 0.963 | 0.029 | 0.979 | 0.959 | 0.035 | |
GEP | 0.975 | 0.950 | 0.033 | 0.962 | 0.92 | 0.040 | |
E(I) | SVM | 0.820 | 0.685 | 0.041 | 0.812 | 0.635 | 0.052 |
GPR | 0.824 | 0.688 | 0.039 | 0.817 | 0.636 | 0.048 | |
GEP | 0.816 | 0.641 | 0.039 | 0.790 | 0.560 | 0.064 | |
E(II) | SVM | 0.887 | 0.791 | 0.031 | 0.876 | 0.671 | 0.036 |
GPR | 0.891 | 0.795 | 0.029 | 0.883 | 0.673 | 0.034 | |
GEP | 0.883 | 0.781 | 0.032 | 0.860 | 0.641 | 0.038 | |
E(III) | SVM | 0.834 | 0.752 | 0.033 | 0.817 | 0.662 | 0.046 |
GPR | 0.835 | 0.755 | 0.031 | 0.821 | 0.666 | 0.043 | |
GEP | 0.820 | 0.661 | 0.036 | 0.810 | 0.592 | 0.058 | |
E(IV) | SVM | 0.905 | 0.817 | 0.028 | 0.894 | 0.742 | 0.034 |
GPR | 0.911 | 0.822 | 0.025 | 0.896 | 0.745 | 0.032 | |
GEP | 0.889 | 0.791 | 0.030 | 0.876 | 0.732 | 0.034 | |
E(I) | SVM | 0.732 | 0.605 | 0.027 | 0.699 | 0.532 | 0.036 |
GPR | 0.736 | 0.608 | 0.026 | 0.702 | 0.533 | 0.034 | |
GEP | 0.687 | 0.504 | 0.029 | 0.664 | 0.470 | 0.035 | |
E(II) | SVM | 0.821 | 0.672 | 0.021 | 0.812 | 0.632 | 0.024 |
GPR | 0.824 | 0.677 | 0.018 | 0.816 | 0.633 | 0.023 | |
GEP | 0.753 | 0.630 | 0.022 | 0.680 | 0.612 | 0.024 | |
E(III) | SVM | 0.795 | 0.671 | 0.022 | 0.714 | 0.602 | 0.029 |
GPR | 0.799 | 0.673 | 0.020 | 0.718 | 0.605 | 0.028 | |
GEP | 0.730 | 0.582 | 0.024 | 0.654 | 0.530 | 0.032 | |
E(IV) | SVM | 0.914 | 0.833 | 0.013 | 0.895 | 0.736 | 0.015 |
GPR | 0.919 | 0.837 | 0.011 | 0.899 | 0.739 | 0.014 | |
GEP | 0.802 | 0.784 | 0.018 | 0.730 | 0.651 | 0.021 |
Sensitivity and uncertainty analysis
For investigating the main effects of parameters quantitatively, the factorial analysis (FA) was also performed. FA is originated from experimental design to explore both the main and interaction effects of several factors on a response variable (Tezcan et al. 2015). It is particularly useful when there is a curvilinear relationship between design factors and the response variable. In fact, FA attempts to identify underlying variables, or factors, that explain the pattern of correlations within a set of observed variables. It is often used in data reduction to identify a small number of factors that explain most of the variance that is observed in a much larger number of manifest variables. FA can also be used to generate hypotheses regarding causal mechanisms or to screen variables for subsequent analysis. The results of FA are listed in Table 6. According to the results, it could be seen that the correlation coefficients between K and θ (in bend loss sate), and between K and Hw/D (in local loss state) are higher than other parameters. Therefore, the θ and Hw/D variables are more effective in energy losses modeling.
State . | Parameters . | |||
---|---|---|---|---|
Best model of bend loss in rectangular culvert | Fr | θ | K | |
Fr | 1 | –0.301 | –0.485 | |
θ | 1 | 0.947 | ||
K | 1 | |||
Best model of inlet loss in circular culverts | Fr | Hw/D | K | |
Fr | 1 | –0.719 | –0.779 | |
Hw/D | 1 | 0.947 | ||
K | 1 |
State . | Parameters . | |||
---|---|---|---|---|
Best model of bend loss in rectangular culvert | Fr | θ | K | |
Fr | 1 | –0.301 | –0.485 | |
θ | 1 | 0.947 | ||
K | 1 | |||
Best model of inlet loss in circular culverts | Fr | Hw/D | K | |
Fr | 1 | –0.719 | –0.779 | |
Hw/D | 1 | 0.947 | ||
K | 1 |
Combined data
For evaluating the performance of the GPR method for a wide range of data, data series of entrance loss were combined. Two states were considered in the data combining process: pairwise mixing and mixing all data series. Then, for predicting Ke, the superior model of Scenario 2 (the model E(IV)) was re-run for the mixed data. The results of this state are given in Figure 7. It could be seen that pairwise mixing of thin-wall projecting and mitered flush to 1.5:1 (horizontal to vertical) fill slope data sets led to better results in comparison with another pairwise mixing. However, according to Figure 7, the results revealed that using the mixed data set decreased the model accuracy, especially for the state of combining all data series, it could be seen that the error criteria increased significantly. For all mixed data sets the values of R and DC decreased and the RMSE values increased. However, it should be noted that the models based on mixed data sets are able to cover a wider range of data and in this case, entrance loss can be studied without regarding the end inlet treatment shape.
CONCLUSIONS
In the current study, the capability of the GPR, SVM, and GEP approaches were assessed for predicting local loss in culverts. The culvert experimental data with a different shape was applied for training and testing the models. In the model development process, two scenarios were considered and bend and inlet end treatment losses were evaluated. According to the results, it was found that in Scenario 1, which investigated the bend loss in a rectangular culvert, the model with input parameters of Fr and θ led to more accurate results. It was observed that using the Froude number of the bend downstream caused an increment in model efficiency. Also, the bend upstream Froude number did not lead to undesirable results, therefore, this parameter can be used when there is no information about flow conditions of bend downstream. It showed that the bend angle had a significant impact on local loss prediction process. The superior performance for Scenario 2 and for all end inlet treatments was obtained for the model E(IV), in which the inputs were Fr and Hw/D. It was observed that for modeling entrance loss in culverts, using relative flow depth as the input parameter improved the efficiency of the models. For models with only one input variable, using the Fr number led to better prediction than the Re number. Among the four end inlet treatments, culverts with a mitered flush to 1.5:1 (horizontal to vertical) fill slope yielded more accurate prediction. It was also observed that the mixed data set led to a less accurate outcome. From the obtained results of OAT and factorial sensitivity analysis and Monte Carlo uncertainty analysis, it was found that the correlation coefficients between K and θ (in bend loss state), and between K and Hw/D (in local loss state) were higher than other parameters. Therefore, the variable θ and Hw/D had the most significant impact on local loss prediction. Also, the Fr parameter had higher uncertainty compared with Hw/D and θ parameters. The proposed approaches were found to be able to predict local loss in different shaped culverts successfully, however, it should be noted that the used methods are data-driven models and the SVM, GPR, and GEP-based models are data sensitive, so further studies using data ranges out of this study and field data should be carried out to find the merits of the models to estimate local energy loss in real conditions of flow.