Scour around bridge piers is one of the main causes of bridge failures and is of great importance for hydraulic engineers and scientists. Prediction of the scour depth around piers is complicated, and accurate results are rarely achieved by the existing models. Recently, data mining approaches such as artificial neural networks and fuzzy inference systems have been applied successfully to predict scour depth around hydraulic structures. In this study, an alternative robust data mining approach was used for the predictions of the scour depth around piers, and the results were compared with those of three empirical approaches. Performances of developed models were tested by experimental data sets collected in laboratory experiments and field measurements, together with existing empirical approaches. Statistical measures indicate that the proposed M5′ model provides a better prediction of scour depth than the empirical approaches.
NOTATION
- CC
correlation coefficient
- D
pier diameter
- d50
median sediment diameter
- Fr
Froude number
- g
gravitational acceleration
- Ia
index of agreement
- n
number of measurements
- Re
pier Reynolds number
- S
equilibrium scour depth
- sd
standard deviation
- SDR
standard deviation reduction
- SI
scatter index
- S/Y
dimensionless scour depth
- U
flow velocity
- Uc
critical flow velocity
- x
measured value
- Y
the flow depth
- Y/D
relative water depth
- y
predicted value
- ρ
fluid density
- μ
fluid dynamic viscosity
INTRODUCTION
Local scour around piers is one of the common reasons of bridge failures during floods. Numerous bridge damages due to extreme scour around their piers have been reported recently (FDOT 2010). These damages result in huge economic loss and even human loss (Toth & Brandimarte 2011). Several bridges have been damaged due to storm and flood-induced scour around the world in both developed and developing countries (e.g., Blodgett 1978).
An accurate estimation of the maximum scour depth around bridges is vital in the design of bridge piers in term of safety and economics (e.g., Muzzammil 2010; Muzzammil & Alam 2011; Khan et al. 2012). Numerous studies have been conducted in the recent decades to develop a robust method for estimation of the equilibrium scour depth due to the current (e.g., Melville 1997; Bateni et al. 2007; Azmathulla et al. 2010; Ghaemi et al. 2013; Etemad-Shahidi & Rohani 2014).
There have been numerous small-scale laboratory experiments, mainly on cylindrical piers, using dimensional analysis of different formulae available in the literature. In these formulae, both the scour depth and influential parameters such as flow velocity and depth are given by non-dimensional variables. For example, Shen (1971) suggested a formula based on the pier Reynolds number while Breusers et al. (1997) used only the relative water depth in their equation. On the other hand, the HEC-18 equation (USDOT 2001) considered the Froude number and relative water depth as the governing parameters. In another approach, Melville (1997) considered the relative sediment size, relative approach velocity, and relative pier diameter in their equation. However, these semi-empirical methods show a large difference in the estimation of the scour depth (e.g., Breusers & Raudkivi 1991; Bateni et al. 2007). This discrepancy comes from the complexity of the problem, limited number of considered variables (Ettema et al. 1998), and the scaling effects (Lee & Sturm 2009), which is more vital in the prototype cases (Gulbahar 2009). Gaudio et al. (2013) showed that some of the semi-empirical scour formulae are very sensitive to different input parameters and a small error in an input parameter might significantly change the scour depth. However, they did not provide or suggest the most accurate formula.
Nowadays, traditional statistical analysis is replaced by artificial intelligence (AI)-based approaches which have been applied in different fields of engineering (Muzzammil & Ayyub 2010). Researchers have recently invoked data mining approaches to resolve the above-mentioned issues. Recently, these approaches have been used for tackling various complex problems in hydraulic engineering (e.g., Bhattacharya & Solomatine 2005; Zanganeh et al. 2009; Ayoubloo et al. 2010; Azamathulla & Ghani 2010; Farhoudi et al. 2010; Zanganeh et al. 2011; Azamathulla 2012; Etemad-Shahidi & Taghipour 2012; Pal et al. 2013). Artificial neural networks (ANN) are the most commonly used method in this category. ANNs have been invoked to estimate scour around culverts (Liriano & Day 2001), downstream of a ski-jump bucket (Azmathulla et al. 2005), scour below pipelines (Kazeminezhad et al. 2010), scour around pile groups (Ghazanfari et al. 2011), local scour depth at bridge piers (Toth & Brandimarte 2011), and scour depth around spur dikes (Karami et al. 2012). Bateni et al. (2007) applied ANNs and adaptive neuro-fuzzy inference systems (ANFIS) to estimate scour depth. They found that ANN outperforms ANFIS and previous empirical approaches and could be a suitable procedure to predict scour depth.
In summary, there have been several attempts to apply data mining methods for the prediction of scour depth around bridge piers (e.g., Bateni et al. 2007; Toth & Brandimarte 2011; Azamathulla 2012; Khan et al. 2012; Pal et al. 2013; Akib et al. 2014). However, the previous models did not provide a transparent and compact relationship between the governing parameters that can give us insight about the physics of the process. In addition, most of the previously developed models were based on small-scale laboratory experiments rather than field measurements to evaluate their performance in prototype situation. An alternative data mining approach called M5′ (Wang & Witten 1997) has been recently applied to provide compact and physically sound formulae in engineering problems. The main advantages of the model trees are that they are easily applied and yield comprehensible, compact, and transparent formulae (e.g., Bonakdar & Etemad-Shahidi 2011; Etemad-Shahidi & Jafari 2014). This method has been successfully used in modeling sediment transport (Bhattacharya et al. 2007), wind estimating from waves (Daga & Deo 2009), wave height predictions (Etemad-Shahidi & Mahjoobi 2009), land cover classification (Pal 2006), evapotranspiration (Pal & Deswal 2009), and design of rubble-mound breakwaters (Etemad-Shahidi & Bonakdar 2009; Etemad-Shahidi & Bali 2011; Jafari & Etemad-Shahidi 2012). The aim of this study is to explore how much this method will lead to an improvement in the scour depth prediction, particularly in terms of accuracy and efficiency. To achieve this goal, different M5′ models are developed, and the results are compared with those of existing formulae and against the available laboratory experimental data.
PREVIOUS APPROACHES AND THE USED DATA SET
Previous approaches
Model . | Formula . | Notes . |
---|---|---|
USDOT (2001) | S/Y = K Kw (Y/D)−0.65(Fr)0.43 | Smax = 3.0 D for Fr > 0.8 |
Smax = 2.4 D for Fr < 0.8 | ||
K = f (nose shape, current angle of attack, mode of sediment transport, armoring by bed material) | ||
Kw = correction factor when (Y/D) < 0.8; (D/d50) > 50 & Fr < 1 | ||
Kw = 2.58 (Y/D)0.34(Fr)0.65 for U/Uc < Kw = (Y/D)0.13(Fr)0.25 for U/Uc > 1 | ||
Breusers et al. (1977) | S/D = 2KVtanh (Y/D) | KV = 1 for U/Uc > 1 |
KV = (1 − 2U/Uc) for 0.5 > U/Uc > 1 | ||
KV = 0 for 0.5 < U/Uc | ||
Melville (1997) | S/D = K | K = f (nose shape, relative water depth, current angle of attack, relative velocity, relative sediment size) |
Conventional nonlinear regression | S/Y = 1.46 (Y/D)−0.36(Fr)0.37(U/Uc)0.12 |
Model . | Formula . | Notes . |
---|---|---|
USDOT (2001) | S/Y = K Kw (Y/D)−0.65(Fr)0.43 | Smax = 3.0 D for Fr > 0.8 |
Smax = 2.4 D for Fr < 0.8 | ||
K = f (nose shape, current angle of attack, mode of sediment transport, armoring by bed material) | ||
Kw = correction factor when (Y/D) < 0.8; (D/d50) > 50 & Fr < 1 | ||
Kw = 2.58 (Y/D)0.34(Fr)0.65 for U/Uc < Kw = (Y/D)0.13(Fr)0.25 for U/Uc > 1 | ||
Breusers et al. (1977) | S/D = 2KVtanh (Y/D) | KV = 1 for U/Uc > 1 |
KV = (1 − 2U/Uc) for 0.5 > U/Uc > 1 | ||
KV = 0 for 0.5 < U/Uc | ||
Melville (1997) | S/D = K | K = f (nose shape, relative water depth, current angle of attack, relative velocity, relative sediment size) |
Conventional nonlinear regression | S/Y = 1.46 (Y/D)−0.36(Fr)0.37(U/Uc)0.12 |
Johnson (1995) applied seven equations to field data in both live and clear conditions. Her results showed that Shen's (1971) formula performs better in shallow conditions while the USDOT formula is better for Y/D > 1.5. She also found that there is a significant difference between the results of different formulae and most of the semi-empirical equations overestimate the scour depth. Gulbahar (2009) compared the performances of different equations using field data in different hydrological conditions. This study showed that there is no unique best formula and the skills of different methods vary in different conditions.
Recently, soft computing methods have been widely applied to handle complicated hydraulic engineering problems (e.g., Zanganeh et al. 2009; Yasa & Etemad-Shahidi 2013). For example, Bateni et al. (2007) developed ANN and ANFIS models for predicating the scour depth and its temporal evolution. They compared their results with those of previous empirical approaches and reported that a multi-layer perception model outperforms the ANFIS and other regression models in predicting the scour depth. They attributed the superiority of ANN to its ability in solving complex problems. Azmathulla et al. (2010) used genetic programing to predict the scour depth. They also compared their results with those of USDOT (2001) and showed that their model outperforms both ANN and regression equations. Recently, Pal et al. (2012) used field data of Mueller & Wagner (2005) to develop a model for scour depth prediction using M5 and showed that their formula outperforms those of previous ones. However, they did not provide a dimensionally homogeneous formula.
Data set
To have a wider range of parameters, 14 data sets, i.e., Chabert & Engeldinger (1956), Hancu (1971), Ettema (1980), Jain & Fischer (1980), Chee (1982), Chiew (1984), Yanmaz & Altinbilek (1991), Kothyari et al. (1992), Graf (1995), Melville (1997), Melville & Chiew (1999), Oliveto & Hager (2002), Sheppard & Miller (2006), and unpublished data from the University of Auckland were used to predict the equilibrium scour depth in this study. The whole data set consists of 283 laboratory experimental data which were used for developing the models and evaluating the existing formulae. The distribution and the statistics of the governing dimensionless parameters are shown in Figures A1–A5 (Appendix A, available online at www.iwaponline.com/jh/017/051.pdf). As shown in Appendix A, the flow conditions are mostly subcritical with 75% clear water conditions and 25% live bed tests.
The above-mentioned data sets were first used to evaluate the performances of the existing formulae. As mentioned before, semi-empirical approaches reported in the literature have different forms with different dimensionless numbers. Among these, three different formulae which have been more commonly used in engineering applications, i.e., Breusers et al. (1977) (which considers Y/D and is hyperbolic), Melville (1997) (which considers U/Uc and D/d50), and USDOT (2001) (which considers Fr and Y/D) were selected for the evaluations. Figures 1,2–3 show that the scatters between the measured and predicted scour depths estimated by these approaches are large. It is worth noting that the existing models predict more or less constant scour depths for the measured values greater than 0.25 m. In addition, Breusers et al.'s (1997) formula tends to underpredict scour depths. This is mainly because in this formula scour depth is zero for U/Uc < 0.5.
Model . | Ia . | Bias . | SI (%) . |
---|---|---|---|
USDOT (2001), all data | 0.92 | 0.28 | 61 |
Breusers et al. (1977), all data | 0.64 | −0.45 | 92.8 |
Melville (1997), all data | 0.91 | 0.245 | 61 |
Conventional nonlinear regression | 0.85 | −0.226 | 67 |
M1, all data | 0.97 | 0.042 | 37 |
M2, all data | 0.95 | 0.003 | 49 |
M2, testing data | 0.95 | 0.004 | 49 |
Model . | Ia . | Bias . | SI (%) . |
---|---|---|---|
USDOT (2001), all data | 0.92 | 0.28 | 61 |
Breusers et al. (1977), all data | 0.64 | −0.45 | 92.8 |
Melville (1997), all data | 0.91 | 0.245 | 61 |
Conventional nonlinear regression | 0.85 | −0.226 | 67 |
M1, all data | 0.97 | 0.042 | 37 |
M2, all data | 0.95 | 0.003 | 49 |
M2, testing data | 0.95 | 0.004 | 49 |
DECISION TREE AND M5′ ALGORITHM
A decision tree is one of the most recent data mining methods that can be applied for classifications and predictions. In general, decision trees can be divided into two main types: classification trees and regression trees. The first type classifies instances or data records based on some attributes (input parameters) and is used when the model's output includes non-numeric values while a regression tree is applied when the model's output includes numeric values. A decision tree is similar to an inverse tree with a root node at the top and some leaves at the bottom. In general, decision trees represent a disjunction of conjunctions of constraints on the values of input parameters. Unlike other soft computing methods such as ANNs, decision trees represent rules or formulae. In fact, each path from the tree root to a leaf corresponds to a conjunction of attribute tests and the tree itself to a disjunction of these conjunctions. Decision trees classify instances by sorting them down the tree from the root node to some leaf node. Each node in the tree specifies a test of some attribute of the instance, and each branch descending from that node corresponds to one of the possible values for this attribute (Hand et al. 2001; Kantardzic 2003).
Model trees, which are a type of decision tree with linear regression functions at the leaves, form the basis of a modern technique for predicting continuous numeric values. Structurally, a model tree takes the form of a decision tree with linear regression functions instead of terminal class values at its leaves. The M5 model tree is a numerical prediction algorithm, and the nodes of the tree are chosen over the attribute (input parameters) that maximizes the expected error reduction as a function of the standard deviation of the output parameter (Zhang & Tsai 2007). The M5 model tree was first introduced by Quinlan (1992) and was expanded in a method called M5′ by Wang & Witten (1997). Model trees have a large number of advantages, making them a suitable regression method for performance analysis. The prediction accuracy of model trees is comparable to that of techniques such as ANNs (Etemad-Shahidi & Mahjoobi 2009) and is known to be higher than that of CART (Classification And Regression Tree) method (Ould-Ahmed-Vall et al. 2007). The advantage of a model tree is that it can efficiently handle large data sets with a high number of attributes and high dimensions.
MODELING, RESULTS, AND DISCUSSION
The success of data mining methods such as M5′ depends on the quality and quantity of the used data. In this study, 283 data records from 14 different data sets were used for developing the models. Models based on dimensionless variables have a wider domain of applicability and can be applied to the prototype cases. Hence, the governing input parameters considered in the modeling were the dimensionless ones mentioned in Equation (2). This ensures the generalization ability of the results. First, a conventional nonlinear multi-variate regression model was developed using the data set as a base prediction model, and a single formula was derived (Table 1). Then, the data set was randomly divided into two parts: 70% of them were used for training and the rest were used for testing the M5 model. However, the ranges of parameters used for training were checked to cover those used for testing to guarantee a proper modeling. The ranges of parameters used for the training and testing phases are shown in Table 3. As seen, the used ranges for training are wide and cover both clear water and live bed conditions. The first developed model (hereafter called M1) was based on all the dimensionless parameters of Equation (2). The comparison between the measured and predicted scour depth using this linear model is presented in Figure 5. As seen, the scatter is less compared to those of previous figures, but the model slightly underestimates high values of scour depth. This could be due to the lack of data records in this range. The error statistics of all models including the existing ones, nonlinear regression model, and developed model trees are given in Table 2, showing the high performance of the M1. In brief, the developed linear model yields accurate results. Nevertheless, the tree and formulae (not shown) made by this model were complex. In total, 11 formulae were generated for different ranges of Y/D, Fr, and Re. The given formulae were mostly linear combinations of Fr, Y/D, and U/Uc and the other variables were either neglected or had small coefficients.
Parameter . | Training data . | Testing data . |
---|---|---|
U/Uc | 0.32–6.7 | 0.33–5.3 |
Y/D | 0.05–21.05 | 0.05–21.05 |
D/d50 | 3.65–904.76 | 13.33–904.76 |
Fr | 0.08–2.14 | 0.11–1.06 |
Re | 3,408–328,320 | 6,612–228,000 |
S/Y | 0.02–8.17 | 0.02–6.65 |
Parameter . | Training data . | Testing data . |
---|---|---|
U/Uc | 0.32–6.7 | 0.33–5.3 |
Y/D | 0.05–21.05 | 0.05–21.05 |
D/d50 | 3.65–904.76 | 13.33–904.76 |
Fr | 0.08–2.14 | 0.11–1.06 |
Re | 3,408–328,320 | 6,612–228,000 |
S/Y | 0.02–8.17 | 0.02–6.65 |
It is apparent from Equation (8) that Y/D, Fr, and U/Uc are the most important dimensionless parameters on the relative scour depth around piers, while the influences of other parameters such as Reynolds number are marginal. The form of the developed model is similar to those of USDOT (2001) and derived nonlinear regression model. However, it reveals the interaction between hydraulics and sediment transport by considering the critical velocity and relative width of the pier. It is interesting to note that the model tree can distinguish between clear and live bed conditions automatically and show that the scour depth becomes independent of U/Uc in live bed condition which is in line with the findings of Melville (1997). In addition, M5 successfully yields a different formula for wide piers, and the splitting value is very close to the one used for wide piers (Johnson 1995; Jones & Sheppard 2000).
In terms of dimensional parameters, Equation (8b) implies that in live bed conditions, the scour depth is linearly related to the pier diameter and is independent of water depth in relatively deep waters. On the other hand, Equation (8c) shows that in the case of relatively shallow water and live bed condition, the scour depth depends on the water depth as well. Both these results are in line with the existing knowledge of physics of the scour process.
In summary, it can be inferred that the nonlinear M5′ model has succeeded in capturing the relationship among the scour governing parameters. Another advantage of M5′ was that it yielded a physically sound and simple equation relating the input variables to the output. This is not the case with traditional data mining methods such as ANN. The performance of Equation (8) was superior to those of other methods while that of Melville (1997) outperformed other existing formulae. Among other data mining approaches, group method of data handling (GMDH) can also be used to provide formulae for scour depth around piers. GMDH, which is based on the principles of heuristic self-organizing, can be improved by a GMDH-back propagation method (GMDH-BP) or other evolutionary algorithm. However, the formulae developed by this method are very complex (e.g., Najafzadeh et al. 2013) and hard to be physically justified. The application of GMDH-BP requires accurate determination of several parameters, such as topology of network, weightings, and operations; while using M5 the only parameter that needs to be determined is the minimum number of data sets in each leaf. In addition, the execution of heuristics models generally is computationally expensive while executing a M5 model usually takes a couple of seconds.
APPLICATION TO THE FIELD MEASUREMENTS
Field data were also used to evaluate the performance of different models. The field data were obtained from the study of Sheppard et al. (2011). This data set contains 791 good quality field equilibrium local scour data points. A total of 71 field data sets were selected and used to evaluate the performance of different formulae. All these data were for single, circular piers founded in non-cohesive sediments. The error statistics of different models are given in Table 4. As seen, even in this case, the developed model outperforms other formulae in predicting the scour depth. Compared to Table 2, the ‘Bias’ of M2 has increased significantly. This is mainly because the maturity of the scour depth is not known in the field during measurements which results in a larger ‘Bias’ for most of the models. In addition, the conditions in the field are not ideal, and therefore the measurements could be less accurate compared to those of laboratory experiments. This is in line with the findings of Landers et al. (1999). They evaluated formulae developed in the laboratory by use of transformed data and smoothing techniques to assess general trends in the data. They found only minimal agreement between the field data and laboratory-based relationships. Similar results were obtained by Pal et al. (2012), and they also found that the exiting formulae may not be suitable for application in the field.
Model . | Ia . | Bias . | SI (%) . |
---|---|---|---|
USDOT (2001) | 0.66 | 0.44 | 245 |
Breusers et al. (1977) | 0.71 | 0.40 | 73.1 |
Melville (1997) | 0.47 | 0.81 | 128 |
M2 | 0.88 | 0.13 | 46.0 |
Model . | Ia . | Bias . | SI (%) . |
---|---|---|---|
USDOT (2001) | 0.66 | 0.44 | 245 |
Breusers et al. (1977) | 0.71 | 0.40 | 73.1 |
Melville (1997) | 0.47 | 0.81 | 128 |
M2 | 0.88 | 0.13 | 46.0 |
One of the limitations of the present model is that its application is limited to the range of used parameters and cannot be directly used to analyze complexities such as pier geometry and armoring by bed materials. In addition, most of the data points used for developing the formulae were obtained from experiments in the clear water critical conditions, and therefore Equation (8a) is statistically more significant than the others.
SUMMARY AND CONCLUSION
In this study, 14 different laboratory data sets with a wide range of variables were used to develop a model for prediction of the current-induced scour depth around circular piers. Since the selection of input variables is very important for the model's accuracy, all governing dimensionless parameters were first used as the inputs of the model and an accurate but complex model was developed. Then, to establish a simpler model, an appropriate transformation of governing parameters was used. In this way, a simple model was obtained for estimation of relative scour depth based on the Froude number, the relative water depth, and relative flow velocity. The obtained formulae were transparent and compact and also revealed the physics of phenomena by distinguishing between different regimes, goals which are rarely achieved by other data mining methods. Drawing out the physics and knowledge from data mining models is as important as their accuracy. Using the statistical measures, it was shown that the obtained model is superior to the existing empirical approaches using both laboratory and field measurements.
The used approach is very promising considering the time savings in both the development and run-time of the model tree compared with those of other AI-based approaches such as ANN, SVM, GMDH-BP, and especially genetic programing. The appropriate transformation of the governing parameters combined with using rule-based models such as M5 provide an alternative and quick solution to provide compact and transparent design formulae with reasonable accuracy.
ACKNOWLEDGEMENT
We would like to thank the University of Waikato, New Zealand for providing WEKA software (http://www.cs.waikato.ac.nz/~ml/weka/).