Development of a local scour prediction model clustered by soil class

Several studies have been conducted to assess local scour formulas in order to select the most appropriate one. Confronted with the limits of the previous formulas, further studies have been performed to propose new local scour formulas. Generalizing a single scour formula, for all soil classes, seems approximate for such a complex phenomenon depending on several parameters and may eventually lead to considerable uncertainties in scour estimation. This study aims to propose several new scour formulas for different granulometric classes of the streambed by exploiting a large field database. The new scour formulas are based on multiple non-linear regression (MNLR) models. Supervised learning is used as an optimization tool to solve the hyper-parameters of each new equation by using the ‘Gradient Descent Algorithm’. The results show that the new formulas proposed in this study perform better than some other empirical formulas chosen for comparison. The results are presented as seven new formulas, as well as abacuses for the calculation of local scour by soil classes.


INTRODUCTION
Considered as the main structures connecting cities, ensuring the serviceability of bridges throughout their life cycle is a paramount task. For this purpose, a reliable pre-estimation of local scour allows avoidance of potential damages arising from this phenomenon, which may affect the stability of bridges (Pizarro et al. 2020). One of the most important types of local scour is that which occurs around bridge piers, threatening the stability of bridges built over watercourses (Lee et al. 2007;Mohammadpour et al. 2019). In the United States, for a set of 600,000 bridges, 1,000 failures were recorded, with 50% among these directly related to scour (Briaud et al. 1999). In addition, in the United States, 500 bridge failures, ranging in age from 1 to 157 years, for the period 1989 to 2000, were attributed mainly to flooding and collisions (Wardhana & Hadipriono 2003).
In response to this ongoing risk, several studies for scour prediction have been proposed. Although there is an abundance of equations in the literature to calculate scour, this phenomenon continues to destroy bridges (Wardhana & Hadipriono 2003;Van Leeuwen & Lamb 2014). This has prompted some concerns about the efficiency of these formulas, where several scientists have conducted comparative studies in order to assess the empirical formulas' accuracy, recently (Park et al. 2017;Wang et al. 2017;Namaee et al. 2018;Liang et al. 2019).
Some studies were performed differently from those mentioned above. These studies were about the estimation of scour by proposing new formulas; among those studies, the study conducted by Afzali (2016) proposed a new model to calculate scour using an optimization algorithm called 'Honey Bee Mating' (HBM). To validate his new model, it was compared with five empirical scour formulas.
Another study for the estimation of the scour at the abutments was conducted by Mohammadpour et al. (2017); developed in this study were two empirical methods for the temporal variation prediction of local scour at uniform and compound abutments. The models were built using the horseshoe vortex concept and the volumetric rate of sediment transport theory. The proposed models were verified using experiments conducted under clear water conditions and the computed results were compared to the observed data. Pizarro et al. (2017) used the first mathematical concept for simulating scour phenomena to develop a bridge-pier scour estimation based on energy concepts and entropy theory. The proposed BRISENT model was established on the effective flow work parameter and on the principle of maximum entropy (POME). Another study is the one performed by Pandey et al. (2020), who developed in their study a scour prediction equation around circular bridge piers by adopting a multiple nonlinear regression model 'MNLR regression'. For the error optimization, the genetic algorithm was used. The new scour formula, as that of Mohammadpour et al. (2017), takes into consideration the temporal variation of scour. To generate the data, an experimental study is established for the establishment of the formula; data collected from other studies are also exploited. Omara et al. (2020) proposed several scour formulas under shallow flow conditions including the effects of the flow intensity, inclination angle (at different angles of flow attack on circular piers), and pier length.
Scour, which is caused by erosion, appears different from one type of soil to another. Considered as the movement process of stripped sediment, it is defined in two different ways, clear-water scour and live-bed scour (Arneson et al. 2012). The generalization of a single formula for calculating scour for all soil classes could be among the failure reasons of the empirical formulas. In this study, new scour formulas are presented for each of the seven soil classes available in the 2014 USGS Pier-Scour Database (PSDB-2014) (Benedict & Caldwell 2014), namely; Fine soil, Fine sand, Medium sand, Coarse sand, Fine gravel, Medium gravel, and Coarse gravel. The new proposed formulas include the effect of sediment suspension and transport upstream of the pier (i.e., clear-water scour when V/Vc , 1, and live-bed scour when V/Vc . 1). In addition to what has been reported in the literature, the parameters included in the formulas are chosen based on a dimensional analysis, and a statistical analysis called Principal Component Analysis (PCA).

Database
(PSDB-2014) includes 569 laboratory measurements and 1,858 field measurements. For the elaboration of this study, the field observations are classified by soil class, according to 'D 50 ' as per the standard (ISO 14688-1:2017) (Table 1). Clustering the data by 'D 50 ', allows assigning a class name to the soil constituting the streambed, according to its granulometry and nature. After clustering the data by 'D 50 ', seven soil classes were available on the database, ranging from 'Fine soil' to 'Coarse gravel'. They were all selected for the elaboration of our study.
Since the local scour is mainly related to the sediment characteristics of the streambed (Qi et al. 2016). The classification of the field data by 'D 50 ' provides a better estimation of the local scour, this because it reduces the uncertainty related to the sediment. Also, the classification allows to define among the parameters involved in the scour estimation those having most influence for a given soil class and not for another, and therefore proposing finally for each soil class a new local scour formula.

Dimensional analysis
The formulas for calculating the local scour depth 'd s ' around bridge piers generally take in their equations three groups of parameters as follows (Akhlaghi et al. 2020): • Geometric parameters: Dimensions of the pier 'b' and 'L'; Approach section (or flow depth) 'y'; Angle of attack of the approach flow to the pier 'u' (when the pier is skewed to the flow); and Projected pier width in direction of flow 'b*'.
So, the local scour formula should combine the aforementioned parameters, as shown in Equation (1); As 'u' means the angle of attack of the approach flow to the pier; thus, it is mainly related to the pier width 'b'. In other words, when 'u' have to be considered (i.e., When the pier is skewed to the flow), 'b*' should replace 'b'. Hence, since 'u' it is an angle, mathematically, it can be included only through a parameter, in the case of local scour, the parameter is 'b*'. Therefore, we replace 'u' in Equation (1) by 'b*' as shown in Equation (2); As long as the pier is aligned with the flow, pier length 'L' has no discernible effect on local scour depth. When the pier is skewed to the flow, 'L' has a significant influence on scour depth (Arneson et al. 2012). Even in this last case, the effect of 'L' is already considered in the calculation of 'b*'. Thereby, 'L' could be neglected and the parameters impacting the local scour estimation become as shown in Equation (3); Local scour is defined as the erosive action of the sediment covering the bridge piers; neglecting the characteristics of sediment can lead to considerable uncertainty in the scour calculation. According to Briaud et al. (2001), for different soils having similar particle sizes, different erosion behaviors were observed. Hence, regarding 'D 50 ', it is rarely considered in the establishment of scour formulas, most of which were estimated based on simplifying assumptions regarding the streambed sediment because they were established based on laboratory reduced models, where the sediments are small uniform particles (Mohamed et al. 2006). In fact, there are very few formulas that include 'D 50 ', or other sediment characteristics. However, including such parameters using their mathematical values could potentially skew the result, because of the significant standard deviation between these parameters and the other parameters. So, in this study, it was decided to include 'D 50 ' by considering its distribution by soil classes, Saad et al. (2021) reported that particle size distribution has a significant effect on local scour depth.
Finally, the parameters affecting the estimation of local scour around bridge piers, as described in Equations (1)-(3) above, can be reduced as indicated below in Equation (4); In order to simplify the analysis and have simple formulas, it is convenient to group the parameters having the same dimensions as Equation (4) in new dimensionless parameters using the Vaschy-Buckingham Theorem π as described by Link et al. (2017); Mohammadpour (2017);

Principal component analysis (PCA)
For each of the seven soil classes selected in this study, the principal component analysis (PCA) was performed between the parameters influencing the scour result mentioned in Equation (4). PCA is a method of analyzing data, belonging to the field of multivariate statistics. It consists of transforming variables related to each other, called 'correlated', into new decorrelated variables. These new variables are called 'principal components' (Jolliffe & Cadima 2016).
Even if the parameters involved in the calculation of scour seem independent, in reality, they are correlated between each other (e.g., the Froude number 'F r ', as well as the critical flow velocity 'V c ' are both related to the flow depth 'y'). Decoloring the variables allows identification of the influence of each parameter accurately by choosing among the six variables of Equation (4) the most preponderant ones in the calculation of the scour depth 'd s '. The result of the analysis is presented in Figure 1, Table 2 and Table 3 below, as correlation circles after varimax rotation and variable contributions.
The choice of parameters involved in the scour calculation is carried out by iterative steps. Each time the parameters are selected, we rely on the statistical criteria chosen in our study (R2, RSR, PBIAS) and variable contributions (%) after Varimax rotation to define whether the parameters chosen are relevant or not. At the beginning, and based on the literature, we tried to select the majority of the parameters without grouping them into new dimensionless parameters (Figure 1), the variable contributions as indicated in Figure 1 was 63.96%. Then all the parameters having same nature were grouped into new dimensionless parameters as mentioned in Equation (5), the result was so much better compared to our first selection where the parameters were independent (the variable contributions were higher than 89%) ( Figure 2). So, to simplify the new equations, The dimensionless parameters of Equation (5) were retained. Indeed, the simpler the formulas, the easier it is to memorize.   After applying the Vaschy-Buckingham Theorem π, the PCA results for the different parameters involved in the scour calculation including the scour, are presented below in Figure 2.
For all soil classes, two distinct groups are seen in the PCA correlation circles in Figures 1 and 2. The first group is that of the parameters having a geometric nature (d s , y, and b*). The second group gathers the parameters related to the characteristics of the flow and the eroded material (V, V c , and F r ). Since scour is geometric (a depth), including all the parameters of Equation (5) as mathematical variables in the scour equation may lead to considerable uncertainties. Indeed, the standard deviation between some parameters and the scour values might be considered high, and this could skew the final result. And vice versa, neglecting such parameters as velocities 'V', 'V c ', or Froude number 'F r ', can also lead to the same negative effect. Therefore, it seems that the parameters influencing the scour presented in Equation (5) should be grouped into two distinct parts, namely; a. Parameters for the estimation of the scour (b*, y); b. Parameters for the description of the scour; • V and V c , (i.e., clear-water scour when V/V c , 1, and live-bed scour when V/V c . 1); • Froude number F r (i.e., subcritical flow when F r , 1, and supercritical when F r . 1); From the correlation circles obtained through PCA analysis in Figure 2, the two parameters V/V c and 'F r ' seem to be highly correlated; including both of them may lead to unnecessary redundancy in the scour estimation. Also, Water Practice & Technology Vol 16 No 4, 1164 as the data exploited in this study are in-situ data (i.e., the data are for a subcritical flow 'F r ' , 1), then the variation of the flow pattern is low (Arneson et al. 2012). For this reason, it seems better to include the effect of sediment transport only (i.e., the V/Vc ratio).
Based on the PCA results and the Vaschy-Buckingham Theorem π, the reduction of the least significant variables for the scour estimation is carried out, by removing those least correlated with the scour depth observed in the field. The new local scour formulas proposed based on MNLR models are presented by soil class in the following Equation (6).
for Fine soil class The parameter noted Ks defines the sediment transport process (clear-water scour when the bed material is not in movement, V/V c , 1, or vice versa, live-bed scour, when V/V c . 1).

Optimization of multiple nonlinear regression (MNLR) models
The regression between the simulated and the observed scour depths is performed using supervised learning, and the hyperparameters of each new equation are solved using the gradient descent algorithm.
Indeed, the gradient descent algorithm is an optimization tool, designed to minimize a differentiable real function f(x) defined in Hilbertian space E, such as x [ E ! f(x). We notice f 0 (x) the derivative, and rf(x) the gradient of f in x, so that for every The gradient algorithm defines a sequence of iterates x 1 , x 2 , . . . [ E. Until the stop test is satisfied, it goes from x k à x kþ1 by the following steps (Ruder 2016): a. Simulation: calculation of rf(x k ) b. Stop test: if krf(x k )k 1, stop. c. Calculation of the learning rate a k . 0 by a linear search rule on f in x k along the direction À rf(x k ). d. New iteration: x kþ1 ¼ x k À a k rf(x k ).

Validation criteria
The new scour formulas established for each soil class are tested for 20% of the field data, and the empirical formulas for each soil class using the statistical criteria (R2, RSR, PBIAS). The performance testing method of a simulated model and an observed model is explained by (Moriasi et al. 2007;Golmohammadi et al. 2014).
The coefficient of determination (R2) describes the degree of similarity between the predicted and measured data, which ranges from 0 to 1. R2 values close to 1 indicate a perfect similarity between the simulated and observed values (Maachou et al. 2017). where: • d s:o (i) is the scour depth observed in the field (taken from the database). The RSR is a dimensionless quantity that expresses the dispersion between the simulated and observed values. The RSR close to zero indicates a low residual variability, and therefore a perfect simulation of the model

RESULTS AND DISCUSSION
After the optimization of the hyperparameters (a0, a1) by the gradient descent algorithm, the new local scour formulas for each soil class (defined according to particle size D 50 ) are presented below: where: • d s is the local scour depth (m); • b is the bridge pier width (m); • y is the flow depth (m); • b* is the bridge pier width (when the pier is skewed to flow, otherwise b* ¼ b); • K s is the correction factor for the type of scour or sediment transport (clear-water or live-bed, as mentioned in Table 5 below. The new proposed formulas are compared with two scour formulas. The first one presented in Equation (11) below, called 'The Mississippi equation', was established by Van Wilson (1995). This first formula is selected based on the PCA analysis. As previously explained, local scour is a geometrical value (a depth) and seems to have good correlation with the other geometrical parameters (the flow depth 'y' and the bridge pier width 'b') d s ¼ 0:9 b 0:6 y 0:9 (11) The second formula presented in Equation (12)  where: • K1 is the correction factor for the nose shape of the pier; • K2 is the correction factor for the attack angle of the flow; • K3 is the correction factor for bed condition.
Very good appreciations are observed for soil classes with low granulometry (Fine Soil and Fine Sand). Satisfactory appreciations are noticed for the rest of the soil types. The results obtained are presented in Table 6.
From the R2 values, a similarity between very good to satisfactory (57 , R2 , 98) is observed between the new proposed formulas and the in-situ data. This similarity is relatively better, compared with the empirical formulas (0 , R2 , 82). Notwithstanding, the difference between the scour calculated by the empirical formulas and the observed scour may be considered as acceptable. Graphs by soil type are presented in Figure 3, relating the scour depths observed in-situ and the scour depths simulated by the new formulas and the two formulas chosen for comparison.
According to the values obtained from the RSR, the variability between the new proposed formulas and the scour depths observed in-situ is judged good to satisfactory (2 , RSR , 66) for the soil classes having small grain size (Fine Soil, Fine Sand, Medium Sand, Coarse Sand, Fine Gravel, and Medium Gravel), while this variability is judged unsatisfactory (RSR min . 86) for the coarse-grained soil class (Coarse gravel). The observed variability is better compared with the empirical formulas. The difference between the scour calculated by the empirical formulas and the observed scour is considered unsatisfactory (RSR min . 181) for all classes.
In the majority of cases, PBIAS values indicate that the new proposed formulas have a very good tendency to simulate scour for all soil classes (0 , PBIAS , 15). Nevertheless, in some cases, this performance becomes unsatisfactory (PBIAS . 36). Regarding empirical formulas, they tend to overestimate considerably the scour depth (PBIAS , ,, 0). This could be acceptable and beneficial for safety reasons of the structure. However, overestimation of scour may lead to additional and insignificant costs for scour risk mitigation . Better performance of the new models is observed, compared with the empirical formulas of HEC-18 (Arneson et al. 2012) andVan Wilson (1995) chosen for validation in this study. This can be explained for different reasons; concerning Van Wilson (1995), this formula takes into account in its scour estimation equation only the geometric parameters, which are the width of the pile 'b', and the flow depth 'y', and neglect the effects of other parameters such as 'V', 'V c ', and 'F r '. The PCA analysis established in this study showed that scour is mainly estimated depending on the geometric parameters ('b' and 'y') and its behavior is explained through 'V', 'V c ', or 'F r '. Neglecting one parameter can affect the accurate estimation of local scour. Also, the fact of neglecting the effect of the particle size distribution of the soil, and proposing a single formula whatever the soil class, can be amongst the reasons. It should also be highlighted that the new formulas proposed in this study come from a large and upto-date database; on the other hand, the empirical formulas (including the two chosen in this study), were mainly based on laboratory tests, based on overly simplified small-scale models. These models are often considered incapable of simulating a complex phenomenon such as scour (Gaudio et al. 2010).
In addition to what is mentioned above concerning the imprecision of the formulas, the uncertainty arising from the source can also be among the reasons. The uncertainty may be related to the temporal evolution and the type of storm hydrographs that cause the evolution of depth and rate of scour . Also, the long-term persistence behavior of precipitation and streamflow causes the clustering of storms and flood events, which cause a higher impact on the evolution of scour (Dimitriadis et al. 2021). This clustering of storms and floods would have a higher impact on the evolution of scour compared to temporal-independent storms and floods occurring between longer time periods.
Scour calculation Abacuses to calculate the scour depth based on the width of the bridge pier 'b*' and the flow depth 'y' are presented below (Figure 4). The abacuses are plotted from the formulas proposed in Equation (7), as a graphical illustration, the proposal of abacuses simplifies the interpretation of the local scour compared to the formulas.
The proposed abacuses will serve as new simple and practical scour calculation tools for engineers considering the distribution of the soil classes. The objective of the proposed abacuses is to substitute the classical methods used for the quantification of local scour.
From the abacuses, it appears that the local scour is more accentuated in the coarse class than the other classes. Indeed, the performance of the new formulas proposed is more appreciable for soils with low granulometry (Fine soil, Fine sand, Medium sand, and Coarse sand) then the other classes; this appreciation deteriorates more and more as the grain size increases (Fine gravel, Medium gravel, and Coarse gravel). This can be explained by the different rheological behavior of each soil type. A threshold shear stress noted 't 0 ' is necessary for the suspension of large solid particles. This threshold shear stress is proportional to the soil granulometry (Arneson et al. 2012). The cohesion 'c' and the grain dispersion coefficient 'σg' may also be among the failure reasons of the proposed deterministic formulas. Indeed, for a subcritical regime flow (according to the database 0.03 Fr 0.75), particles with small granulometry tend to be compacted, and therefore under the effect of hydrodynamic forces, erosion is localized vertically around the bridge piers, and scour hole sizes are typically smaller because of the higher bed shear resistance (Debnath & Chaudhuri 2011), and therefore appreciable by deterministic models. While for coarse particles, the scour behavior changes, large sediments tend to disperse under flow, and deterministic models fail to accurately estimate scour.

CONCLUSION
Sediment transport or local scour around bridge piers differs from one type of soil to another, depending on their characteristics (Fang & Wang 2000). From this observation, the generalization of a single formula for the calculation of scour for all soil classes seems imprecise, and this could probably explain the failure of the empirical formulas. This study aims to establish new formulas for calculating local scour around bridge piers, for each of the soil classes that can constitute the streambed. The study is carried out based on field observations from the database (PSDB-2014). The data used in this study are clustered by soil classes according to 'D 50 ', following the standard (ISO 14688-1:2017). After clustering, the seven soil classes available on the database (fine soil, fine sand, medium sand, coarse sand, fine gravel, medium gravel, and coarse gravel) are retained for the development of the study. Seven new soil class formulas, based on multiple nonlinear regression (MNLR) models are presented. Supervised learning is used to solve the hyperparameters of each new equation. To achieve efficient models, the gradient descent algorithm is used to optimize and minimize the difference between the scour observed and the scour simulated with the new proposed formulas.
Local scour depends on several parameters related to its triggering and influencing its evolution. To take into account, in the composition of the new proposed scour formulas, only the most preponderant parameters, principal component analysis (PCA) is carried out. It was observed from the correlation circles obtained from the PCA that the two geometric parameters ('b' and 'y') have more influence on the scour results compared to the rest of the parameters. And to provide simple models, these secondary parameters have been discarded.
Various statistical criteria (R2, RSR, and PBIAS) were used in this study to judge and validate the new formulas. From the validation results obtained, a similarity and variability judged to be good to satisfactory are observed between the new proposed formulas and the data observed in-situ for small grain size classes (Fine soil, Fine sand, Medium sand, and Coarse sand). On the other hand, for the soil classes with high granulometry (Fine gravel, Medium gravel, and Coarse gravel), the result is considered unsatisfactory. Indeed, the performance of the new formulas proposed tends to deteriorate and be less effective more and more as the particle size increases. The performance of the new scour calculation formulas is relatively better compared with the empirical formulas. The new formulas proposed have a very good tendency to simulate scour for all soil classes (0 , PBIAS , 15). Empirical formulas tend to overestimate the scour depth (PBIAS ,0).
A new tool for estimating scour is also proposed in this study, in the form of calculation abacuses per type of eroded soil. These abacuses correspond to the new formulas, and allow estimation of scour with a more appreciable and easy methodology.