In natural rivers, flow conditions are mainly dependent on flow resistance and type of roughness. The interactions among flow and bedforms are complex in nature as bedform dynamics primarily regulate the flow resistance. Manning's equation is the most frequently used equation for this purpose. Therefore, there is a need to develop alternate reliable techniques for adequate prediction of Manning's roughness coefficient (n) in alluvial channels with bedforms. Thus, the main objective of this study is to utilize machine learning (ML) models for predicting ‘n’ based on the six input features. The performance of ML models was assessed using Pearson's coefficient (R2), sensitivity analysis, Taylor's diagram, box plots, and K-fold method has been used for the cross-validation. Based on the output of the current work, models such as random forest, extra trees regression, and extreme gradient boosting performed extremely well (R2 ≥ 0.99), whereas, Lasso Regression models showed moderate efficiency in predicting roughness. The sensitivity analysis indicated that the energy grade line has a significant impact in predicting the roughness as compared to the other parameters. The alternate approach utilized in the present study provides insights into riverbed characteristics, enhancing the understanding of the complex relationship between roughness and other independent parameters.

  • This study focuses on accurately predicting n in alluvial channels with bedforms.

  • The intricate interplay between flowing water and bedforms adds complexity to flow resistance prediction.

  • A significant observation is that integrating all input parameters results in enhanced accuracy when predicting flow resistance.

  • Leveraging modern techniques, the study employs four machine learning models, to predict n.

In natural alluvial river channels, the flowing water mobilizes sediment particles on its bed surface and the particles begin to move, which is referred to as incipient motion. It is a critical threshold at which the forces acting on the particles try to overcome their resistance to motion, leading to the initiation of sediment transport. The mobile sediment mainly comprises of bed load and the suspended load. The different shapes and geometries developed due to the bed load transport under the varied flow conditions are known as bedforms. Furthermore, the flow conditions largely depend on these various types of bedforms (Kwoll 2016). The relatively small or large bedforms include dunes, ripples, antidunes, bars, chutes, pools, etc. Dunes and ripples are asymmetric triangular-shaped geometries that develop under lower flow conditions, while other bedforms are developed at higher flow conditions (Cardenas & Wilson 2007; Dey 2014a; Lefebvre 2019). The morphology (shape, size, and spacing) of bedforms depends upon the various characteristics, such as flow velocity, depth, and attributes of river bed sediments. As the size of the bedform is large, it can lead to higher resistance to flow (Venditti 2007, 2013).

In the case of open channel flows, the roughness coefficient (n) is more significant than the friction factor (Alam & Kennedy 1969). Various other factors, such as flow hydraulic condition, channel shape, sediment transport, bed load fluctuations, bedforms, and so on, notably affect this parameter (Bridge 1993; Kumar et al. 2023a, 2023b). It is essential to have a comprehensive understanding of these parameters and their impact on their values to obtain an accurate roughness in alluvial channels with bedforms. The parameter ‘n’ plays a crucial role in the channel dynamics and is used to characterize the roughness of the channel. The factors mentioned earlier, along with the surface conditions and vegetation cover, uniquely affect flow resistance within the channel. Therefore, by understanding the influence of these factors on the value of ‘n’, engineers and researchers can make more accurate predictions and calculations related to fluid flow and hydraulic systems.

The importance of various bedforms on ‘n’ has been examined by multiple researchers (van der Mark et al. 2008; Shiono et al. 2009; Aberle et al. 2010; Roushangar et al. 2017). A study examined the consequence of dune morphology on flow resistance in an alluvial channel (Talebbeydokhti et al. 2006). The characteristics of dune formations on the influence of flow resistance within the channel were observed. Moreover, the relationship between dune geometry and flow resistance was analyzed, and insights into the hydraulic behaviour of channels with dune bedforms were provided. The research was carried out to understand the impact of bed load fluctuations and sediment transport on flow resistance in channels that are covered with bedforms. The presence of dune formations and its effect on flow dynamics and resistance within the channel was observed. In addition to this, the interaction between bed load transport and dune bedforms was also studied. It was intended to enhance the understanding of sediment transport processes and hydraulic behaviour in channels with dunes (Omid et al. 2010).

The friction factor in rivers with gravel-beds with bedforms was observed by various researchers (Griffiths 1981; Clifford et al. 1992; Darby 1999; Venditti 2013; Dey 2014a). The contribution of bedforms on friction factor and its influence on flow resistance was also studied. By examining the relationship between bedforms and friction factor, it was suggested that improving hydraulic characteristics in gravel-bed rivers with varying bed configurations was required (Afzalimehr et al. 2010). The effects of different bedforms, such as dunes and ripples, were further investigated. Moreover, the impact of bank vegetation on the flow dynamics was observed (Murray & Paola 2003; Gilvear & Willby 2006; Kabiri et al. 2017). It was concluded that bedform configurations, along with the presence of vegetation on the channel banks, influenced flow conditions. It was suggested that the interaction between bedforms, vegetated banks, and flow conditions was intended to provide an understanding of the complex hydraulic behaviour of natural channels (Dehsorkhi et al. 2011). The influence of various flow conditions on sediment transport, bedload, and related bedforms in channels with different slope conditions, such as low, mild, and steep slopes, affects bed load transport processes and the formation of bedforms was examined (Lisle 1982; Young & Davies 1991; Harbor 1998; Buffington et al. 2002; Roushangar et al. 2017). By exploring the interconnection between flow conditions and bed load transport, understanding sediment dynamics in alluvial river channels with different slopes was enhanced (Chegini & Pender 2012). In other research, the effect of Reynolds stress distribution and velocity on sand bedded dunes was carried out, and the flow patterns and turbulence characteristics associated with gravel dunes in a channel were investigated. By studying the velocity and Reynolds stress distributions, the foresight into the flow dynamics and sediment transport processes influenced by gravel dunes was provided (Kabiri 2014).

Venditti (2007, 2013) conducted a study to observe the consequences of the dune leeside slope on flow resistance and turbulent flow structure (Venditti 2007, 2013). It was examined that the slope of dune formations influences flow resistance and flow characteristics. The relationship between the flow resistance, dune leeside slope, and turbulent flow structure was examined to enhance the understanding of flow dynamics in channels with dunes. A numerical model Delft3D was used to find the effects of bedform roughness on various hydrodynamic patterns and sediment transport patterns (Brakenhoff et al. 2020). Another study concluded that the bedform friction factor is armoured in river channel beds. It investigated the influence of armoured river bed conditions on flow resistance and hydraulic behaviour (Okhravi & Gohari 2020). Furthermore, a study examined the impact on two-dimensional dunes of flow hydrodynamics (Dey et al. 2020). The main aim was to understand flow characteristics and hydrodynamic forces that affect the development and behaviour of two-dimensional dune formations (Dey 2014a). A laboratory study was carried out to examine the impact of bedforms with varying particle sizes on parameters such as bed shear stresses and flow resistance. In addition to this, the impact of varying sizes of different bedforms on the influence of flow resistance and bed shear stress within a channel was also observed (Heydari & Yarahmadi 2022).

As discussed earlier, most of the research has been conducted in the laboratory using flume experiments to gather the essential data. The employment of an experimental approach in the laboratory is a time and effort-consuming process that may not be feasible in some situations. In a channel consisting of various bedforms, it is important to make appropriate predictions using various soft computing techniques when a parameter is related to hydraulic and sedimentary fields. A study was conducted in order to predict values of ‘n’ using various soft computing methods. An artificial neural network (ANN) model was used to forecast the non-linear linkage among the various parameters that influence it (Yuhong & Wenxin 2009). The study focused on developing a reliable method to estimate the friction factor by training the ANN model with relevant data. Another study involved the adaptive neuro-fuzzy inference system (ANFIS) that was developed to establish a strong relationship between the experimental and observed data for estimating the output parameter, i.e., friction factor (Essays 2011). The ANFIS model was designed to adaptively learn from the input data and provide accurate predictions based on fuzzy logic principles. Moreover, ANFIS models were also used to estimate the friction factor in dune-bed rivers was incorporated (Roushangar et al. 2014). In addition to this, models such as feed-forward neural networks (FFNN) and radial-based function neural networks (RBFNN) were also utilized. The study compared the performance of these models and found that ANFIS outperformed other models but was less efficient. The sensitivity analysis was carried out and it revealed that various parameters such as Reynolds number (Re) and the ratio of hydraulic radius to median grain size (R/d50) significantly influenced the prediction of friction factor.

The models such as ANN and genetic programming were used to estimate grain size and Manning's coefficients with higher accuracy than empirical formulas (Niazkar et al. 2019). In addition to this, the study aimed to improve the estimation of these coefficients by employing advanced computational methods. ANN and ANFIS models were incorporated to estimate the roughness coefficient in erodible channels (Zanganeh & Rastegar 2020). The sensitivity analysis indicated that Re had the most important effect on predicting ‘n’ of alluvial channels. The estimated friction factor in dune and ripple rivers using various others is also incorporated (Saghebian et al. 2020). The study also incorporated combined models such as multilayer perceptron firefly neural networks (MLP-FFNN) and multilayer perceptron firefly algorithms (MLP-FFA).

The study emphasized the importance of different parameters, such as Re, Froude number (Fr), and R/d50, in accurately modelling the friction factor based on the bedform characteristics (Yao et al. 2023). In another study, it has been demonstrated that a successive approximation-based stepwise optimizing strategy yielded better solutions than other models. The proposed model outperformed other models that did not consider the variations of the friction factor with both discharge and sediment conditions. Another recent study investigated the prediction of ‘n’ in rivers with bedforms using various soft computing models. They utilized soft computing models, such as the multilayer perceptron model, group method of data handling model, support vector machines model, and genetic programming (Yarahmadi et al. 2023). The study explored the influence of flow conditions, energy grade line, Froude number, relative submergence, and bed form dimensionless parameters on the estimation of ‘n’. The main aim of these studies was to provide valuable insights into accurately estimating the ‘n’ rivers with bedforms, contributing to the hydraulic analysis and design field.

The previous studies have made significant contributions to understanding flow conditions, bedforms, sediment transport, and flow resistance in natural channels (Patel et al. 2015, 2016, 2017; Patel 2017; Patel & Kumar 2017; Brakenhoff et al. 2020; Heydari & Yarahmadi 2022; Balachandar & Patel 2008). In addition to these studies, researchers have employed a variety of machine learning (ML) and hybrid models to forecast various parameters, emphasizing the significance of these techniques in various practical applications of civil engineering (Chadalawada et al. 2020; Jiang et al. 2022; Bassi et al. 2023a, 2023b; Kumar et al. 2023a, 2023b; Singh & Patel 2023; Wadhawan et al. 2023). In another recent study, various ML techniques were used to determine the friction factor in mobile bed channels (Bassi et al. 2023b). However, these studies did not take into consideration the application of advanced soft computing models such as Lasso regression (LR), extra trees regression (ETR), random forest (RF), and extreme gradient boosting (XGB) to predict the value of ‘n’ in rivers with bedforms. The significance of the present study lies in filling these research gaps by employing the above-mentioned alternate techniques to estimate ‘n’ in alluvial rivers with bedforms. By utilizing these models, the current study aims to enhance prediction accuracy by exploring complex interconnections among hydraulic, sedimentary, and geometric variables influencing ‘n’. The outcomes of the study can be anticipated to make noteworthy contributions to hydraulic analysis, offering invaluable understandings for accurate estimation of Manning's roughness coefficient in rivers characterized by bedforms. Furthermore, the findings of the present study provide advanced comprehension of flow dynamics and sediment transport mechanisms in natural alluvial channels, therefore refining hydraulic designs and management strategies across diverse river engineering projects.

The experimental data were used to predict the value of ‘n’ using various ML algorithms. This section discusses laboratory equipment, experimental procedure, and a summary of different ML models employed. The data for the analysis in the current study were extracted from a previous study (Yarahmadi et al. 2023). The details of the experimental setup and other important parameters are provided in the subsequent sections.

Experimentation and procedure

Bedforms

The ripples and dunes are the two types of bedforms used for experiments. The bedforms were developed in the shape of asymmetrical triangles. The upstream (u/s) side of the bedform with a triangular shape had a gentle slope, while the downstream (d/s) side had a steep slope. The slope of the d/s end was set equal to the angle of repose of the mobile bed sediments, approximately 32°. This design ensured that each bedform resembled a natural asymmetric triangle.

Experimental groups

The experiments were divided into two groups: in the first group, the bedforms had dimensions of 20 cm length, 30 cm width, and 4 cm height. The angle of the d/s side was maintained at 32°. In order to make the surface of the bedform rough, sediments of sizes 0.51, 1.29, and 2.18 mm were used, which were applied using an adhesive. While in the second group, bedforms had dimensions of 25 cm length and 30 cm width. The angle of the d/s remained at 32°. This group included bedforms with four different heights: 1, 2, 3, and 4 cm. To make the surface rough, sediments size 0.45 mm were incorporated, and sand was used in both cases with a relative density of 2.65.

Attachment of bedforms

The bedforms were fixed to the bottom of the experimental flume along the entire length of the test section. This attachment ensured that the bedforms remained stable during the experiments and allowed for controlled flow conditions.

Experimental parameters

The various experiments were conducted with different discharges and bed slopes. Discharge rates of 10, 15, 20, 25, and 30 L/s were tested. The bed slopes ranged from 0 to 0.0015. These parameters were chosen to study the influence of flow intensity and bed slope on the behaviour of the bedforms. A total number of 215 experiments were conducted, covering a wide range of flow conditions.

Geometric and hydraulic parameters

Several geometric and hydraulic parameters were measured and analyzed in this study. These included the dimensions of the bedforms (length, width, and height), depth of flow, sediment size, flow rates, velocity of flow, and Fr. These parameters provided valuable insights into the characteristics and interactions of the bedforms under different flow conditions.

Dimensional analysis

The dimensional analysis carried out in this study focused on determining the variables that affect the value of ‘n’ of channels with different bedforms. The various parameters which were considered are shown in Table 1.

Table 1

Variables used in this study

VariablesSymbolVariablesSymbol
Flow velocity (m/s) V Channel width (m) B 
Gravitational acceleration (m/s2g Energy grade line (m) Sf 
Flow depth (m) y Bedform length (m) λ 
Specific mass (kg/m3ρw Bedform height (m) Δ 
Dynamic viscosity (Pa s) μ Bedform u/s angle (degree) α 
Specific sediment mass (kg/m3ρs Bedform d/s angle (degree) θ 
Average diameter (mm) d50 Froude number Fr 
VariablesSymbolVariablesSymbol
Flow velocity (m/s) V Channel width (m) B 
Gravitational acceleration (m/s2g Energy grade line (m) Sf 
Flow depth (m) y Bedform length (m) λ 
Specific mass (kg/m3ρw Bedform height (m) Δ 
Dynamic viscosity (Pa s) μ Bedform u/s angle (degree) α 
Specific sediment mass (kg/m3ρs Bedform d/s angle (degree) θ 
Average diameter (mm) d50 Froude number Fr 

Therefore, applying Buckingham's theorem, the variables y, V, and ρw are taken as repeated variables. The dimensionless parameters were derived, analyzed, and different combinations were made including Δ/y (bedform height to depth ratio) and Δ (bedform height to length ratio). The parameters such as Re and Gs (relative density of sediment particle) were removed as the study used sand with a constant value of 2.65 and the flow was turbulent. In addition, the values of θ (32°) and the relationship between α and Δ/λ allowed for the removal of α as well. Consequently, the dimensionless parameters influencing the value of ‘n’ in channels with a bedform were determined in the following equations:
formula
(1)
formula
(2)

This dimensional analysis helps to identify the critical dimensionless parameters that affect the value of n in channels with bedforms thus providing valuable insights for understanding and predicting flow behaviour in such channels.

Theoretical analysis

The theoretical analysis for predicting ‘n’ in alluvial natural channels with bedforms using ML techniques consists of utilizing various input attributes, such as hydraulic principles, sediment transport theory, bedform mechanics, empirical correlations, feature complexity, and ML model interpretability. Hydraulic principles, particularly Manning's equation, serve as the theoretical basis for understanding the role of roughness coefficients ‘n’ in river flow resistance (Powell 2014). By analyzing the changes in ‘n’, the input parameters, such as flow velocity, channel geometry, and slope, provide critical understandings for predicting and managing flow dynamics in rivers with bedforms. Incorporating these principles into ML models enhances their efficiency in capturing the complex relationships between roughness coefficients and hydraulic parameters, thereby, improving predictive accuracy (Ekmekcioğlu et al. 2022). The Manning's equation is given in the following equation:
formula
(3)
where V is the velocity, S is the slope, and R is the hydraulic radius.

In the context of river hydraulics, Manning's equation plays a crucial role between flow velocity and ‘n’ (Bhattacharya et al. 2019; Tuozzolo et al. 2019). This relationship emphasizes the significance of understanding ‘n’ when predicting flow dynamics in rivers. Moreover, R is inherently linked to ‘n’ because it is dependent upon the shape and dimensions of the channel. The changes in channel geometry can have a direct impact on ‘n’. Furthermore, S plays a vital role within Manning's equation as variations in this parameter can initiate ripple effects, impacting flow velocity and, consequently, ‘n’. By following that, variations in S can often be attributable to bedforms or other geomorphic features and have a substantial influence over ‘n’ (Thomas & Nisbet 2007). However, understanding these interconnections is pivotal in comprehending and predicting the dynamic nature of river flows.

In addition to hydraulic features, the sediment transport phenomenon constitutes a crucial aspect of the theoretical analysis, examining the intricate mechanics of sediment movement within river channels (Venditti 2013; Dey 2014b). On the other hand, there are a number of significant factors, such as sediment particle size, concentration, and the dynamic behaviour of various bedforms, which may affect the sediment transport. After analyzing these parameters, the theoretical analysis aims to interpret the complex interaction between sediment characteristics and ‘n’. The size and concentration of the sediment particles have a significant impact on ‘n’, consequently helping in understanding the role of controlling flow resistance. Furthermore, the analysis considers the dynamic nature of bedforms, such as ripples or dunes, that can significantly alter sediment transport patterns, subsequently affecting flow resistance (Dey 2014a). By following that, the analysis establishes a robust theoretical basis for understanding the complex connection between sediment transport and ‘n’. This improves the capability to model and interpret these aspects in alluvial channels with bedforms.

Moreover, understanding fluvial hydrodynamics is essential when dealing with alluvial channels, as it requires a comprehensive exploration of natural geomorphic features that take different shapes and evolve over time. This theoretical framework considers the processes that give rise to the formation of different bedforms and their subsequent transformations. The current study thoroughly examines the experimental models that explain the bedforms' behaviour and interaction with the flow in closed environments.

Input parameter combinations

The various input parameter combinations were done to observe the output parameter. The different cases (C1–C6) and their respective combinations of input parameters and output parameters are presented in Table 2. Each possibility represents a specific scenario or condition considered in the analysis. In C1, the output parameter, ‘n’ (Manning's roughness coefficient), is determined by a combination of input parameters, including Sf (slope of the energy grade line in the channel), Fr (Froude number), y/d50 (ratio of flow depth to median grain size), Δ/d50 (ratio of height of the bedform depth to median grain size), Δ (ratio of height of the bedform to wavelength), and Δ/y (ratio of bedform height to flow depth). The C2 case simplifies the combination by excluding the Sf parameter while still considering Fr, y/d50, Δ/d50, Δ, and Δ/y as input parameters. In C3, the combination further reduces by excluding the Fr parameter, leaving y/d50, Δ/d50, Δ/λ, and Δ/y as the input parameters. The input scenario C4 focuses on the input parameters Δ/d50, Δ/λ, and Δ/y, excluding y/d50 and Fr. However, scenario C5 considers only Δ and Δ/y as the input parameters, excluding others. Finally, C6 focuses on considering the Δ/y parameter as the input only. Each case represents a specific combination of input parameters that contribute to determining the output parameter ‘n’.

Table 2

Different cases of the input parameter to determine the output parameter

CasesCombinations of input parametersOutput parameter
C1 f (Sf, Fr, y/d50, Δ/d50, Δ/λ, Δ/y) Manning's roughness coefficient (n
C2 f (Fr, y/d50, Δ/d50, Δ/λ, Δ/y) 
C3 f (y/d50, Δ/d50, Δ/λ, Δ/y) 
C4 f (Δ/d50, Δ/λ, Δ/y) 
C5 f (Δ/λ, Δ/y) 
C6 f (Δ/y) 
CasesCombinations of input parametersOutput parameter
C1 f (Sf, Fr, y/d50, Δ/d50, Δ/λ, Δ/y) Manning's roughness coefficient (n
C2 f (Fr, y/d50, Δ/d50, Δ/λ, Δ/y) 
C3 f (y/d50, Δ/d50, Δ/λ, Δ/y) 
C4 f (Δ/d50, Δ/λ, Δ/y) 
C5 f (Δ/λ, Δ/y) 
C6 f (Δ/y) 

Statistical analysis

Statistical analysis was conducted on the data to determine the variability and trends in the measured parameters. The energy grade line (Sf) ranged from 0.004 to 0.006, with an average value of 0.005083 and a standard deviation of 0.000797. The distribution of Sf exhibited a negative skewness of −0.1508, indicating a slight asymmetry towards lower values. The kurtosis of Sf was −1.40819, showing a platykurtic distribution. The Froude number (Fr) varied from 0.2 to 0.7, with an average value of 0.455349 and a standard deviation of 0.177092. The distribution of Fr showed a slight negative skewness of −0.07631. The kurtosis of Fr was −1.3743, indicating a platykurtic distribution. The ratio of flow depth to sediment size (y/d50) ranges between 70.849 and 287.709, with an average value of 172.6865 and a standard deviation of 62.42025. The distribution of y/d50 exhibited a nearly symmetrical distribution with a slight negative skewness of −0.07039. The kurtosis of y/d50 was −1.28172, indicating a platykurtic distribution. The ratio of bedform height to sediment size (Δ/d50) varies between 0.06 and 0.18, with an average value of 0.126953 and a standard deviation of 0.036966. The distribution of Δ/d50 showed a negative skewness of −0.34247. The kurtosis of Δ/d50 was −1.04904, indicating a platykurtic distribution. The ratio of bedform length to wavelength (Δ) ranges between 0.13 and 0.46, with an average value of 0.293023 and a standard deviation of 0.098083. The distribution of Δ exhibited a nearly symmetrical distribution with a slight negative skewness of −0.05843. The kurtosis of Δ/λ was −1.15867, indicating a platykurtic distribution. Finally, the output parameter ‘n’ had an average value of 0.023321 as shown in Table 3.

Table 3

The statistical description of various input and output parameters

ParametersSfFry/d50Δ/d50Δ/λΔ/yn
Maximum 0.006 0.7 287.709 0.18 0.46 82.317 0.032 
Minimum 0.004 0.2 70.849 0.06 0.13 19.276 0.013 
Standard deviation 0.000797 0.177092 62.42025 0.036966 0.098083 19.58988 0.005703 
Average 0.005083 0.455349 172.6865 0.126953 0.293023 49.89475 0.023321 
Kurtosis −1.40819 −1.3743 −1.28172 −1.04904 −1.15867 −1.34305 −1.21499 
Skewness −0.1508 −0.07631 −0.07039 −0.34247 −0.05843 −0.0669 −0.34548 
ParametersSfFry/d50Δ/d50Δ/λΔ/yn
Maximum 0.006 0.7 287.709 0.18 0.46 82.317 0.032 
Minimum 0.004 0.2 70.849 0.06 0.13 19.276 0.013 
Standard deviation 0.000797 0.177092 62.42025 0.036966 0.098083 19.58988 0.005703 
Average 0.005083 0.455349 172.6865 0.126953 0.293023 49.89475 0.023321 
Kurtosis −1.40819 −1.3743 −1.28172 −1.04904 −1.15867 −1.34305 −1.21499 
Skewness −0.1508 −0.07631 −0.07039 −0.34247 −0.05843 −0.0669 −0.34548 

Machine learning models

Random forest

RF is a powerful ensemble learning technique that leverages the concept of bagging to build a collection of decision trees for making predictions. By training each tree on a different subset of the training data, RF reduces the risk of overfitting and enhances the model's ability to generalize to new data. Unlike traditional decision trees, RF incorporates feature randomness by considering only a random subset of features at each split, further enhancing the model's robustness and reducing the correlation between trees. The final prediction of the RF model is obtained by aggregating the predictions of all individual trees, resulting in a more accurate and reliable prediction without an explicit equation (Belgiu & Drăgu 2016; Yoon 2021). Furthermore, RF is a powerful ensemble learning technique that has gained immense popularity in the field of ML due to its remarkable predictive capabilities and resilience against overfitting. This model is significant when dealing with intricate and high-dimensional datasets, and it has found applications in various other domains as well. At its core, RF is an ensemble of decision trees. However, it significantly improves upon traditional decision trees by incorporating two essential techniques: bagging and feature randomness. In addition to bagging, RF employs feature randomness to enhance its robustness and reduce the correlation between trees.

During the construction of each decision tree, only a random subset of features is considered at each split point. By doing so, the model prevents a single dominant feature from influencing the entire forest, making it less sensitive to noise in the data and improving its generalization performance (Qi 2012; Chang et al. 2018). The final prediction of an RF model is determined through ensemble averaging. Each decision tree within the forest makes its prediction, and for regression tasks, these predictions are averaged. In classification tasks, a majority vote is conducted to determine the final class label. One of the standout advantages of RF is its capability to handle high-dimensional datasets with a substantial number of features without succumbing to overfitting. Additionally, it demonstrates robustness to outliers and offers insights into feature importance, aiding in the identification of the most influential features for making predictions. Due to its versatility and effectiveness, RF has become a staple in real-world applications, such as disease diagnosis, credit risk assessment, and natural language processing tasks. While it lacks an explicit equation like linear regression, RF excels in capturing intricate relationships within the data and stands as a valuable tool in the era of ML and artificial intelligence (AI) (Probst et al. 2019).

Lasso regression

LR, also known as L1 regularization or the Lasso penalty, is a linear regression technique that introduces a penalty term to the least squares objective function. Its primary purpose is to perform feature selection by shrinking the coefficients of less significant features towards zero. This regularization technique helps mitigate the risk of overfitting and can yield a more interpretable model by excluding irrelevant features. While the equation for LR is similar to that of linear regression, it incorporates an additional penalty term that encourages sparsity in the coefficient estimates (Alhamzawi & Ali 2018; Pai et al. 2021). The primary objective of LR is twofold. First, it aims to minimize the residual sum of squares, which is the same objective function as in ordinary LR. The model LR seeks to fit a linear relationship between the dependent variable and the independent variables. However, the difference in the Lasso is its second objective: feature selection. The Lasso penalty term is designed to shrink the coefficients of less significant features toward zero. In practical terms, this means that Lasso can effectively eliminate or reduce the impact of irrelevant or less important variables in the model. By encouraging non-availability in the coefficient estimates, Lasso helps to select a subset of the most influential features while disregarding those that do not significantly contribute to the prediction. This feature selection aspect of Lasso makes it particularly valuable when working with high-dimensional datasets, where identifying and utilizing relevant features can be challenging (Chang et al. 2018; Jun & ZeXin 2021).

In addition to this, LR is widely used in various domains, such as economics, etc., where interpretable models are of paramount importance. The equation for LR resembles that of linear regression, with the addition of the L1 regularization term, which penalizes the absolute values of the coefficients. The choice of the regularization strength, often denoted as λ, determines the degree of shrinkage applied to the coefficients. A larger λ value results in greater shrinkage and, consequently, more features with coefficients reduced to zero. The model LR stands as a valuable tool for linear modeling when the goal is not only to predict accurately but also to identify the most relevant features and simplify the model. Its ability to strike a balance between model complexity and predictive performance makes it a significant choice in scenarios where interpretable models are desired (Alanazi 2022).

Extra trees regression

ETR is another ensemble learning method similar to RF. It also combines multiple decision trees through the technique of bagging. However, ETR introduces additional randomness by selecting random splits for each feature rather than searching for the best split. This randomness helps to increase the diversity among the trees and reduces overfitting. The equation for ETR is not explicitly defined as it combines multiple decision trees (Jaiswal & Lohani 2023). In standard decision trees, when selecting the best split for a node, the algorithm considers all available features and evaluates various splitting criteria to find the optimal one. In contrast, ETR takes a different approach as it introduces randomness by considering only a random subset of features at each node for determining the split. This means that instead of exhaustively searching for the best split, ETR makes a more randomized choice. This feature randomness adds diversity to the ensemble because different trees can choose different features for splitting at each node. The advantage of this added randomness is that it helps to reduce the correlation between individual trees within the ensemble, making the model less susceptible to overfitting. However, overfitting occurs when a model captures noise in the data rather than the underlying patterns by introducing diversity, ETR can mitigate this risk. Regarding the mathematical equation for ETR, similar to RF, there is not a single explicit equation that defines the model. Instead, ETR involves combining the predictions from multiple decision trees within the ensemble to make the final prediction. The ensemble approach ensures that the model leverages the combined knowledge of all trees, leading to more robust and accurate predictions. In addition, ETR is an ensemble learning method that builds upon the principles of bagging and introduces additional randomness in the feature selection process. This randomness enhances the diversity among trees, reducing overfitting and improving predictive accuracy (Qi 2012; Liu et al. 2015).

XGBoost

XGBoost (extreme gradient boosting) is an optimized gradient boosting algorithm that combines the concepts of gradient boosting and regularization techniques. It builds an ensemble of weak prediction models and updates them using a gradient descent algorithm (Alerskans et al. 2022). The equation for XGBoost involves aggregating the predictions of multiple weak models is given in Equation (7):
formula
(4)
where y is the predicted value and the bias term represents the global average of the target variable.

The model XGBoost is a cutting-edge gradient-boosting algorithm that is renowned for its remarkable predictive accuracy and versatility. It seamlessly integrates gradient boosting principles with regularization techniques to make an ensemble of weak prediction models, typically decision trees, and iteratively enhances their predictive capabilities. It is different from other models as it has advanced regularization methods, including L1 and L2 regularization, which significantly counteract overfitting and promote robust model generalization. These regularization terms are thoughtfully incorporated into the loss function algorithm, allowing for precise control through hyperparameters. XGBoost is further enhanced by a wide range of capabilities, including the assessment of feature importance, the ability to halt training early, and an efficient mechanism for trimming trees, all of which collectively strengthen its power and effectiveness. With its ensemble approach, regularization capabilities, and efficient optimization, XGBoost emerges as a formidable tool for diverse ML applications, spanning from classification and regression to ranking and recommendation systems, making it a top choice among data scientists seeking both exceptional predictive performance and interpretability (Tang et al. 2015; Sharma et al. 2022).

Data preprocessing

During the data preprocessing stage for the four regression models (LR, ETR, RF, and XGB), addressing the variations in parameter ranges is essential. A normalization technique is applied to ensure that numerical columns are transformed to a standard scale. This step helps achieve consistency and comparability among the different parameters used in the models. By scaling the values of the dataset, a scale is established, typically with a mean of 0 and a standard deviation of 1. This normalization process ensures that each parameter contributes equally to the overall analysis, preventing any bias arising from variations in their original scales. Data normalization plays a crucial role in preparing the data for regression modelling and enables accurate and meaningful interpretations of the results.

Data normalization is a crucial preprocessing step in ML which is carried out by the model itself. The effective data preprocessing plays a pivotal role in enhancing the performance of the ML models. This prevents features with larger scales from dominating the training process and helps the models to perform effectively. This standardization ensures that each input feature contributes equally to the analysis and prevents bias due to differences in the original scales of the variables. The different variables were given an equivalent amount of data points because it is essential to clarify that the normalization process itself does not affect the distribution of data points across variables. Instead, it ensures that the values within each variable are scaled consistently. The distribution of data points across variables is determined by the dataset itself and the process by which it is collected. The flow chart of this study is shown in Figure 1.
Figure 1

The methodology adopted in the current study.

Figure 1

The methodology adopted in the current study.

Close modal

Data split

In the data splitting process, the initial dataset, comprising 215 data points, is divided into two sets: the training set and the testing set. The splitting is performed in a ratio of 80:20, meaning that 80% of the data (172 data points) is allocated to the training set, while the remaining 20% (43 data points) forms the testing set. The purpose of this division is to have a dedicated portion of the data used solely for training the ML models. This allows the models to learn patterns, relationships, and features present in the data during the training phase. The training set serves as the basis for building and optimizing the models, and the testing set is utilized to evaluate the performance and generalization capabilities of the trained models. After being trained on the training set, the models are applied to the testing set to make predictions or classifications. By comparing the predicted outcomes with the actual known outcomes in the testing set, the models' performance metrics, such as accuracy, precision, recall, or mean squared error (MSE), can be computed to assess how well they generalize to unseen data. Splitting the data into training and testing sets is a common practice in ML to ensure unbiased model performance evaluations. It helps to assess the models' ability to handle new, unseen data and provides a realistic estimation of their effectiveness in real-world scenarios.

Criteria for model evaluation

To ensure the model's reliability, a range of measures are employed to evaluate the precision of the proposed models in predicting the λ (lambda) of the movable bed channel. These measures include root mean squared error (RMSE), mean absolute error (MAE), MSE, relative absolute error (RAE), root relative squared error (RRSE), and R2. The MAE is calculated by taking the average absolute difference between predicted and actual values. The MSE is computed by averaging the squared differences between the predicted and actual values (Al-Rousan & Trajkovic 2012; Khan 2018; Cihan 2019; Gkerekos et al. 2019). Equations (5)–(9) are given in the following.
formula
(5)
formula
(6)
formula
(7)
formula
(8)
formula
(9)

The RMSE is derived as the square root of the MSE, providing a measure of the average magnitude of the prediction errors. The RAE is determined by calculating the relative absolute difference between the forecasted and actual values, normalized by the mean of the actual values. This metric allows for the evaluation of the prediction accuracy relative to the scale of the data. The RRSE is obtained by dividing the RMSE by the range of the observed data, expressed as a percentage. The range is determined by subtracting the minimum value from the maximum value of the observed data. This metric provides a relative measure of the prediction error compared to the overall range of the data. Furthermore, the coefficient of determination (R2) is employed to assess the proportion of the variance in the dependent variable that the independent variables in the model can explain. A higher R2 value indicates a better fit of the model to the data.

In addition to these evaluation measures, heatmap data visualization and parametric analysis are conducted to examine the correlation between various parameters and the output parameter. These analyses help to identify the relationships and dependencies among the variables and provide insights into the predictive capabilities of the models. With the use of these evaluation measures and analytical techniques, the reliability and accuracy of the models in predicting the n of the movable bed channel can be assessed, allowing for informed decision-making and further refinement of the models if necessary. In addition to this, Table 4 presents hyperparameter settings and configurations for various ML models used in this study. It includes all the models along with their optimized hyperparameter values such as N-estimators, max-depth, learning-rate, and other metrics. These configurations serve as guidelines for setting hyperparameters while using these models in a predictive task.

Table 4

Optimized hyperparameters and configuration of all ML models used in the present study

ModelsHyperparameterOptimized valueModelsHyperparameterOptimized value
XGB N_estimators 1,800 RF Max_depth 
Eta 0.01 Min_sample_split 
Max_depth Max_features Auto 
Subsample 0.5   
Colsample_bytree   
ETR N_estimators 100 LR Alpha 
Max_features Auto  Selection Cyclic 
ModelsHyperparameterOptimized valueModelsHyperparameterOptimized value
XGB N_estimators 1,800 RF Max_depth 
Eta 0.01 Min_sample_split 
Max_depth Max_features Auto 
Subsample 0.5   
Colsample_bytree   
ETR N_estimators 100 LR Alpha 
Max_features Auto  Selection Cyclic 

The equations for the determination of ‘n’ in sand bed with bedforms were developed by Talebbeydokhti et al. (2006) as in Equations (10) and (11):
formula
(10)
formula
(11)

In the equations provided, the variable ‘n’ is the Manning's roughness coefficient related to the bedforms and n’ represents the overall ‘n. The various statistical error indices, such as R2 and RMSE, were calculated to assess the accuracy of these equations using the experimental data. The obtained values for R2 and RMSE were found for all the models using various input scenarios, respectively. Each input case was selected by removing one parameter in every case (C1, C2, C3, C4, C5, and C6). Moreover, the error indices measure the fit and precision of the equations in relation to the observed data from the laboratory study. The assessment of these error matrices validates the exactness of the equations in estimating ‘n’ for the alluvial river channels with bedforms such as dunes and ripples. However, considering the significance of ‘n’ in river hydraulic studies and the requirement for more adequate estimations, the current study emphasizes developing and evaluating different ML models to prediction of this parameter. The main aim is to explore the ‘n’ estimation accuracy in alluvial river channel applications.

The correlation coefficients provide insights into the relationships between different variables in the dataset. It presents the correlation coefficients for various pairs of variables. Starting with Sf (energy gradient line), it is perfectly correlated with itself (correlation coefficient of 1) and does not correlate with other variables. Fr (Froude number) exhibits a negative correlation of −0.283 with Sf but has a perfect positive correlation with itself. The y/d50 (Sediment size) shows a negative correlation of −0.3 with Sf and a positive correlation of 0.882 with Fr. It also has a perfect positive correlation with itself. Moving on to Δ/d50, it demonstrates a negative correlation of −0.335 with Sf, a positive correlation of 0.961 with Fr, and a positive correlation of 0.936 with y/d50. It also has a perfect positive correlation with itself. Similarly, Δ/λ is negatively correlated with Sf (−0.291) and positively correlated with Fr (0.936), y/d50 (0.919), and Δ/d50 (0.97). It is perfectly associated with itself. Finally, Δ/y shows negative correlations with Sf (−0.388) and positive correlations with Fr (0.876), y/d50 (0.938), Δ/d50 (0.94), and Δ (0.962). It also has a perfect positive correlation with itself. The ‘n’ exhibits negative correlations with Sf (−0.281) and positive correlations with Fr (0.934), y/d50 (0.955), Δ/d50 (0.973), Δ (0.978), Δ/y (0.974), and itself (perfect positive correlation of 1). The correlation coefficients highlight the strength and direction of relationships between the variables, providing valuable insights into their interdependencies.

Heatmap visualization

The heatmap displays the correlation coefficients between different parameters in a system, providing valuable insights into their relationships. The correlation coefficient measures the strength and direction of the linear relationship between two variables. The diagonal cells of the heat map are shaded with the darkest colour, indicating a perfect positive correlation. This suggests that the parameter is perfectly correlated with itself, which is expected. It is a reference point for the other cells in the heatmap. The darker colours signify strong negative correlations, while the lighter colours indicate strong positive correlations. These correlation coefficients reveal the degree to which two parameters are related. For instance, the intersection of Sf and Fr in the heat map shows a correlation coefficient of −0.283, denoted by a darker colour. This indicates a moderate negative correlation between these two parameters. In other words, Fr tends to decrease as Sf increases, and vice versa. This negative relationship suggests that changes in one parameter have an inverse effect on the other. Conversely, the cell where y/d50 intersects with Δ has a correlation coefficient of 0.91, represented by a lighter colour. This indicates a moderate positive correlation between these parameters. As y/d50 increases, Δ also tends to increase, demonstrating a positive linear relationship as shown in Figure 2.
Figure 2

Heatmap of correlation coefficient values of each input parameter with respect to the output parameter.

Figure 2

Heatmap of correlation coefficient values of each input parameter with respect to the output parameter.

Close modal
By interpreting the heat map, we can identify the strengths and directions of the correlations between different parameters. Strong correlations, whether positive or negative, indicate that changes in one parameter are likely to be accompanied by corresponding changes in the other parameter. On the other hand, weaker or near-zero correlations suggest a lack of a linear relationship between the parameters. The heat map visually represents the correlation structure among the parameters, enabling us to discern which parameters are more strongly related and potentially influencing each other within the system. This information is valuable for understanding the interplay of variables and can guide further analysis or decision-making processes. Moreover, the correlation coefficient matrix is calculated using Origin Pro software. The equations used to calculate the Pearson correlation coefficient between two variables, Xi and Yi, with n data points is as follows:
formula
(12)
where and are the means of variables X and Y, respectively.

In this study, various soft computing models, namely LR, ETR, RF, and XGB, were evaluated to assess their effectiveness in estimating ‘n’ in alluvial channels with bedforms. The initial step of the soft computing modelling process involved data preparation, which included dividing the dataset into training and testing sets. The allocation of percentages for these sets was determined by trial and error and taking into account the previous research studies (Bassi et al. 2023a; Wadhawan et al. 2023). In this study, 80% of the data were allocated for training (calibration), while the remaining 20% was used for testing (verification). It is important to note that the statistical characteristics of the dimensionless parameters in both the training and testing datasets exhibited similar patterns, ensuring the representativeness and suitability of the datasets for model evaluation and comparison.

Parametric analysis

Scatter plots

A scatter plot could be used to analyze the relationship between these variables. A scatter plot is a graphical representation that displays the relationship between two variables by plotting data points on a two-dimensional plane. The scatter plots have been created for each pair of variables to visualize any potential patterns or correlations. The first step was to determine which variables to plot against each other. Since the data consisted of multiple variables, there were several options. For example, the decimal value could have been plotted against the numeric value, the fraction against one of the decimal values, or any other combination of variables. By examining the scatter plots, insights could have been gained into any trends, clusters, or outliers in the data. This information would have helped understand the relationship between the variables and made further interpretations or predictions. The scatter plots if various input parameters are depicted in Figure 3.
Figure 3

Scatter plots of Manning's coefficient with various input parameters (a) energy grade line (Sf), (b) Froude number (Fr), (c) y/d50, (d) Δ/d50, (e) Δ, and (f) Δ/y.

Figure 3

Scatter plots of Manning's coefficient with various input parameters (a) energy grade line (Sf), (b) Froude number (Fr), (c) y/d50, (d) Δ/d50, (e) Δ, and (f) Δ/y.

Close modal

Box plot

Box plots are commonly used to compare the actual and predicted values of different parameters, providing a visual representation of their distribution. These plots display the parameter values on the horizontal axis and the actual/predicted values on the vertical axis. By examining the separate boxes for actual and predicted values, we can assess the accuracy of models generated by different techniques or algorithms. A well-fitting model will exhibit a box plot with a small interquartile range (IQR), indicating that the majority of predicted values are close to the actual parameter values. Additionally, the median value of the predicted values should closely align with the actual parameter value. Conversely, if the predicted values deviate significantly from the actual values, the IQR will be large, and the median value will differ substantially from the actual parameter value. In Figure 4, a visualization of the parameters and their variation within the normalized range is presented. The plot also indicates the outliers, mean, and range of data for all the variables used in this study.
Figure 4

Box plot of input and output parameters with mean and outliers.

Figure 4

Box plot of input and output parameters with mean and outliers.

Close modal

Figure 4 presents a boxplot illustrating the distribution of riverbed roughness values and their corresponding predictions generated by the ML models. The inclusion of two median lines, one for the input roughness values and another for the predicted values, serves to highlight the central tendencies of both datasets. By following that, the absence of outliers in the plot indicates a high level of consistency between predicted and actual values. The decision to use 1.5 times IQR for whiskers in the boxplot aims to identify potential outliers. This approach ensures a balanced representation of the data's central tendency and dispersion that effectively captures the majority of the distribution while considering the presence of extreme values.

Furthermore, the boxplot up to the 75th percentile is selected because it emphasizes the main body of the data distribution while minimizing the impact of potential outliers beyond the whiskers. This concentration on the lower and upper quartiles allows for a clearer visualization of the central tendencies and spreads in both the input and predicted riverbed roughness values. The careful design of the boxplot and the utilization of 1.5 times IQR and the 75th percentile, contribute to a refined understanding of the data distribution and the reliability of our predictions.

Histogram

The provided data represent a histogram with values ranging from 0 to 1. Each row in the histogram corresponds to a specific category or bin, while the columns represent the frequency or height of each bin. The histogram as in Figure 5 displays a distribution of values, where higher columns indicate a higher frequency or concentration of values in that particular bin.
Figure 5

Histogram of various parameters representing each as (a) energy grade line (Sf), (b) Froude number (Fr), (c) y/d50, (d) Δ/d50, (e) Δ, and (f) Δ/y.

Figure 5

Histogram of various parameters representing each as (a) energy grade line (Sf), (b) Froude number (Fr), (c) y/d50, (d) Δ/d50, (e) Δ, and (f) Δ/y.

Close modal

As per Figure 5, the data points in the histogram show variations across different categories, with some bins having higher values and others lower. In addition to this, histograms represent the distribution of six crucial parameters that hold significance in the present study: (a) Sf represents the slope of the energy grade in a hydraulic system. It is crucial in fluid dynamics and hydraulic engineering because it helps determine the energy losses and head distribution in open channels, and ranges of the values in the dataset can be visualized, (b) Fr values can help to classify flow regimes within our data. It depicts most of the data points fall within the subcritical, critical, or supercritical flow categories and helps in understanding the distribution of Fr, (c) y/d50 values in the histogram to determine the relationship between flow depth and sediment size. It can be significantly important to indicate the flow capability of entraining sediment particles when (y/d50 > 1) or (y/d50 < 1), providing valuable information for sediment transport studies, (d) Δ/d50 values illustrate the range of bedform heights relative to sediment size. It represents the dominance of bedforms over the landscape and the mixing of different bedform sizes, which help in the characterization of bedforms, (e) Δ depicts the distribution of various bedform shapes and indicates that bedforms have a specific wavelength relative to height, and (f) Δ/y values show that bedform heights are relatively small as compared to the flow depth and are significant features within the alluvial channel bed.

Relationship between different errors and ML models

The comprehensive overview of the R2 values obtained from various regression models in six cases is mentioned in Table 5. The performance metrics for different ML models (LR, ETR, RF, and XGB) across multiple scenarios (C1, C2, C3, C4, C5, C6). The performance metric is represented by numerical values, indicating accuracy. For each model and scenario, the corresponding numerical values are listed. The values range from relatively high (close to 1, indicating good performance) to lower values (around 0.1). The main purpose is to depict the comparative analysis of each model and its performance across different scenarios.

Table 5

The R2 values obtained from the models for all six different cases of input variables

ModelsC1C2C3C4C5C6
LR 0.822368321 0.822368321 0.822368321 0.109620603 0.109620603 0.822368321 
ETR 0.99928393 0.999230522 0.999248134 0.99928393 0.9866761 0.99928393 
RF 0.999544496 0.999559321 0.999516037 0.999544496 0.997588874 0.999544496 
XGB 0.997261388 0.997261388 0.997261388 0.997195934 0.990703555 0.997261388 
ModelsC1C2C3C4C5C6
LR 0.822368321 0.822368321 0.822368321 0.109620603 0.109620603 0.822368321 
ETR 0.99928393 0.999230522 0.999248134 0.99928393 0.9866761 0.99928393 
RF 0.999544496 0.999559321 0.999516037 0.999544496 0.997588874 0.999544496 
XGB 0.997261388 0.997261388 0.997261388 0.997195934 0.990703555 0.997261388 

Table 6 presents RMSE values for various regression algorithms across six scenarios. The RMSE values represent the average magnitude of prediction errors, with lower values indicating better model performance. It provides a concise comparison of algorithmic performance in terms of prediction accuracy across different scenarios.

Table 6

The MSE values obtained from the models for all six different cases of input variables

AlgorithmsC1C2C3C4C5C6
LR 0.049011 0.049011 4.90E − 02 0.073335 0.073335 0.073335 
ETR 0.000153 0.000158 1.56E − 04 1.53E − 04 0.000213 0.000658 
RF 0.000121 0.000119 1.12E − 02 1.21E − 04 0.011718 0.016715 
XGB 0.017256 0.017256 1.73E − 02 0.017358 0.018514 0.023423 
AlgorithmsC1C2C3C4C5C6
LR 0.049011 0.049011 4.90E − 02 0.073335 0.073335 0.073335 
ETR 0.000153 0.000158 1.56E − 04 1.53E − 04 0.000213 0.000658 
RF 0.000121 0.000119 1.12E − 02 1.21E − 04 0.011718 0.016715 
XGB 0.017256 0.017256 1.73E − 02 0.017358 0.018514 0.023423 

Table 7 displays MSE values for four regression algorithms across six distinct cases. The MSE values, represented in scientific notation, indicate the average magnitude of absolute prediction errors, with smaller values suggesting higher accuracy. For instance, LR exhibits consistently low MSE values across all scenarios, ranging from 2.89 × 10−5 to 5.77 × 10−6, showcasing its precision in prediction. Similarly, ETR, RF, and XGB also demonstrate exceptionally small MSE values, reflecting the high accuracy of these models in capturing the target variable across diverse situations. It provides a concise summary of the models' outstanding performance in minimizing prediction errors across various scenarios.

Table 7

The MAE values obtained from the models for all six different cases of input variables

AlgorithmsC1C2C3C4C5C6
LR 2.89E − 05 5.77E − 06 5.77E − 06 5.77E − 06 2.89E − 05 2.89E − 05 
ETR 4.33E − 07 2.33E − 08 2.50E − 08 2.44E − 08 2.33E − 08 4.54E − 08 
RF 7.81E − 08 1.47E − 08 1.43E − 08 1.57E − 08 1.47E − 08 1.89E − 08 
XGB 3.01E − 07 8.87E − 08 8.87E − 08 8.87E − 08 9.08E − 08 1.17E − 07 
AlgorithmsC1C2C3C4C5C6
LR 2.89E − 05 5.77E − 06 5.77E − 06 5.77E − 06 2.89E − 05 2.89E − 05 
ETR 4.33E − 07 2.33E − 08 2.50E − 08 2.44E − 08 2.33E − 08 4.54E − 08 
RF 7.81E − 08 1.47E − 08 1.43E − 08 1.57E − 08 1.47E − 08 1.89E − 08 
XGB 3.01E − 07 8.87E − 08 8.87E − 08 8.87E − 08 9.08E − 08 1.17E − 07 

Table 8 presents MAE values for four distinct regression algorithms across six scenarios. The MAE values, represented in scientific notation, serve as indicators of the average squared differences between predicted and actual values. The model LR exhibits MAE values ranging from 0.001944 to 0.00474 across different scenarios, reflecting its performance in minimizing prediction errors. However, other ML models such as ETR, RF, and XGB also demonstrate relatively low MAE values, with ETR showing consistently small values, emphasizing its accuracy in capturing target variable variations. It provides a concise overview of the performance of each algorithm across diverse scenarios, facilitating a comparative analysis of predictive capabilities in terms of minimizing squared errors.

Table 8

The MAE values obtained from the models for all six different cases of input variables

AlgorithmsC1C2C3C4C5C6
LR 0.001944 0.001944 1.94E − 03 0.00474 0.00474 0.00474 
ETR 4.00E − 05 4.19E − 05 4.33E − 05 4.00E − 05 6.86E − 05 3.94E − 04 
RF 0.000149 0.000149 1.49E − 04 0.000451 0.000451 0.000451 
XGB 0.000281 0.000226 2.87E − 04 0.000364 0.000487 0.000888 
AlgorithmsC1C2C3C4C5C6
LR 0.001944 0.001944 1.94E − 03 0.00474 0.00474 0.00474 
ETR 4.00E − 05 4.19E − 05 4.33E − 05 4.00E − 05 6.86E − 05 3.94E − 04 
RF 0.000149 0.000149 1.49E − 04 0.000451 0.000451 0.000451 
XGB 0.000281 0.000226 2.87E − 04 0.000364 0.000487 0.000888 

Table 9 represents RRSE percentage values for four different algorithms across six scenarios. The RRSE measures the accuracy of a forecasting model by calculating the average percentage difference between predicted and actual values. The model LR exhibits RRSE values ranging from 13.300 to 29.878%, indicating a relatively higher percentage error in its predictions across various scenarios. Whereas, the models, such as ETR, RF, and XGB, on the other hand, showcase lower RRSE values, with ETR consistently demonstrating the smallest percentage errors. Table 9 offers a concise summary of the algorithms' performance in terms of percentage accuracy, allowing for a quick comparison of the predictive capabilities across diverse scenarios.

Table 9

The RRSE values obtained from the models for all six different cases of input variables

AlgorithmsC1C2C3C4C5C6
LR 13.345% 13.345% 13.300% 29.878% 29.878% 29.878% 
ETR 0.847% 0.878% 0.868% 0.847% 1.184% 3.655% 
RF 0.639% 0.629% 0.659% 0.639% 0.723% 1.471% 
XGB 1.567% 1.567% 1.570% 1.586% 1.804% 2.888% 
AlgorithmsC1C2C3C4C5C6
LR 13.345% 13.345% 13.300% 29.878% 29.878% 29.878% 
ETR 0.847% 0.878% 0.868% 0.847% 1.184% 3.655% 
RF 0.639% 0.629% 0.659% 0.639% 0.723% 1.471% 
XGB 1.567% 1.567% 1.570% 1.586% 1.804% 2.888% 

Table 10 shows RAE values for four different regression algorithms in six cases. The numerical values represent the accuracy of each algorithm's predictions, with lower values indicating better performance. Model LR consistently shows higher values, suggesting less accurate predictions, while models such as ETR, XGB, and RF exhibit relatively lower RAE values, indicative of better predictive accuracy. Notably, ETR consistently demonstrates the smallest values across all scenarios, indicating superior performance in minimizing prediction errors. It allows for a quick comparison of the algorithms' effectiveness in capturing the underlying patterns in the data across diverse scenarios.

Table 10

The RAE values obtained from the models for all six different cases of input variables

AlgorithmsC1C2C3C4C5C6
LR 0.107978 0.107978 1.08E − 01 0.263359 0.263359 0.26335906 
ETR 0.002222 0.002326 2.40E − 03 2.22E − 03 0.003811 0.02189922 
XGB 0.009764 0.009764 9.76E − 03 0.009093 0.011017 0.0202274 
RF 0.002421 0.002353 2.56E − 03 2.42E − 03 0.003084 0.00788005 
AlgorithmsC1C2C3C4C5C6
LR 0.107978 0.107978 1.08E − 01 0.263359 0.263359 0.26335906 
ETR 0.002222 0.002326 2.40E − 03 2.22E − 03 0.003811 0.02189922 
XGB 0.009764 0.009764 9.76E − 03 0.009093 0.011017 0.0202274 
RF 0.002421 0.002353 2.56E − 03 2.42E − 03 0.003084 0.00788005 

Based on the six scenarios (C1–C6), the performance of different ML models reveals advanced insights into the effectiveness in predicting ‘n’ of the alluvial channel with bedforms. The model LR consistently demonstrates moderate accuracy with an R2 value of 0.822 in normal scenarios (C1–C3), highlighting its adequacy for capturing the relationship between riverbed roughness and depth. However, LR's limitations become evident in scenarios with variations (C4 and C5) and extreme changes (C6), where its accuracy significantly drops to 0.109. In contrast, ETR and RF consistently outperform other models across all scenarios, displaying exceptional accuracy about 0.99 in normal and varying conditions. The model XGB also performs impressively, with R2 values exceeding 0.997 in normal scenarios and maintaining robust accuracy equal to 0.99 in scenarios with variations and extreme changes. These findings emphasize the resilience of ensemble models such as ETR, RF, and XGB in capturing the complex relationships within alluvial channel data, making them promising choices for diverse prediction scenarios. Based on these scenarios, the models ETR, RF, and XGB are considered to be the best-fit models in the prediction of roughness in alluvial channels with bedforms. Figure 6 represents the R2, RMSE, MSE, MAE, RRSE, and RAE for four different ML models considered in the current study.
Figure 6

Values of (a) R2, (b) RMSE, (c) MSE, (d) MAE, (e) RRSE, and (f) RAE for four models along with the six input cases.

Figure 6

Values of (a) R2, (b) RMSE, (c) MSE, (d) MAE, (e) RRSE, and (f) RAE for four models along with the six input cases.

Close modal

Based on the data in Figure 6(a), R2 depicts values for different models across six cases (C1, C2, C3, C4, C5, C6). The R2 represents the proportion of the dependent variable's variance explained by the independent variables in a regression model. Higher R2 values indicate better fits. For instance, in category C1, the LR model in category C4 has a lower R2 value of 0.109, suggesting a weaker relationship. Further analysis requires a deeper understanding of the data and objectives. Figure 6(b) displays the distribution of RMSE values for different models across six categories. The RMSE represents the average magnitude of prediction errors, with smaller values indicating better model performance. For example, in category C1 the LR model in category C4 shows a higher RMSE value of 2.89E − 05, suggesting relatively larger prediction errors. Similarly, Figure 6(c) represents the MSE values for different models across six categories. The MSE measures the average magnitude of prediction errors, with smaller values indicating better model performance. For instance, in category C1, the LR model in category C4 has a higher MSE value of 0.073, suggesting larger prediction errors.

Figure 6(d) displays the MAE values for different algorithms across six categories. The MAE represents the average absolute difference between predicted and actual values, reflecting prediction accuracy. In category C1, the LR algorithms exhibit exceptionally low MAE values of 2.89E − 05. As we move through categories, the MAE values remain small, indicating accurate predictions. Figure 6(e) shows RRSE values for different algorithms across six categories. The RRSE measures the average percentage difference between predicted and actual values, representing prediction accuracy relative to the range of the output parameter. In category C1, the LR algorithm shows higher RRSE values, ranging from 13.30 to 29.878%. Other algorithms, such as ETR, RF, and XGB, also have varying RRSE values, reflecting the accuracy in predicting the output parameter. Figure 6(e) represents RAE values for different algorithms in each of the six categories. The metric RAE measures the average percentage difference between predicted and actual values, providing insight into the accuracy of the algorithms in predicting the output parameter. In category C1, the LR algorithm exhibits higher RAE values, ranging from 0.107 to 0.263. Other algorithms, such as ETR, RF, and XGB, also have varying RAE values, representing their accuracy in predicting the output parameter.

Relationship between actual and predicted values of ‘n’ using different models

The relationship between the actual and predicted value of the roughness coefficient using four different models was plotted. The various six cases were considered, each with a different input scenario. Similarly, six input cases were considered, as shown in Figures 712. Clearly, most of the models performed very well in this study with an R2 value equal to 0.99. The model ETR model outperformed all the models with an R2 value equal to unity.
Figure 7

Visualization of input and output parameters of all the models in C1 input criteria.

Figure 7

Visualization of input and output parameters of all the models in C1 input criteria.

Close modal
Figure 8

Visualization of input and output parameters of all the models in C2 input criteria.

Figure 8

Visualization of input and output parameters of all the models in C2 input criteria.

Close modal
Figure 9

Visualization of input and output parameters of all the models in C3 input criteria.

Figure 9

Visualization of input and output parameters of all the models in C3 input criteria.

Close modal
Figure 10

Visualization of input and output parameters of all the models in C4 input criteria.

Figure 10

Visualization of input and output parameters of all the models in C4 input criteria.

Close modal
Figure 11

Visualization of input and output parameters of all the models in C5 input criteria.

Figure 11

Visualization of input and output parameters of all the models in C5 input criteria.

Close modal
Figure 12

Visualization of input and output parameters of all the models in C6 input criteria.

Figure 12

Visualization of input and output parameters of all the models in C6 input criteria.

Close modal

In the current study, the relationship between predicted and experimental values of parameter ‘n’ using various ML models has been examined. The observations reveal distinct patterns in the performance of these models. It is evident that the predictions for parameter ‘n’ deviate significantly from the best-fit line when model LR is considered. This divergence implies a significant variation between the predicted values and the actual experimental values. The R2 values for these models are substantially below 1, which serves as a strong indication of the deviation in the results. However, models such as RF, ETR, and XGB show predictions for ‘n’ that closely align with the experimental values. Moreover, these models exhibit high R2 values (values approaching 1). This indicates that these two models, in particular, effectively capture the intricate relationship between predictor and target parameters, and these findings are consistent with prior research (Azamathulla et al. 2013, 2016; Kitsikoudis et al. 2015; Baharvand et al. 2021; Roushangar & Shahnazi 2021).

Sensitivity analysis

Sensitivity analysis is a valuable technique used to assess the impact of changes in input variables on the output of a model. It helps to understand the sensitivity or responsiveness of the model to variations in the input parameters. It helps to understand which variables have the most significant impact on the model's output. It can assist in decision-making processes, such as identifying critical factors or optimizing the input parameters for desired outcomes. The sensitivity analysis can be performed based on the input parameters and their corresponding regression model values. The various regression models (LR, RF, ETR, and XGB) used, and their associated values for the input parameters (Sf, Fr, y/d50, Δ/d50, Δ, Δ/y) were analyzed. Each regression model provides a different estimation or prediction based on the input parameters. To conduct sensitivity analysis, one approach is to compare the output variable ‘n’ for a baseline scenario with the output variable for scenarios where individual input parameters are varied while keeping others constant. By observing the changes in the output variable for each variation, the model's sensitivity to specific input parameters. Each parameter was varied, and the corresponding changes in the output variable were observed. This analysis can be repeated for all the models, as shown in Figure 13. Notably, LR and ETR consistently display higher sensitivity values for several parameters, suggesting a heightened responsiveness to variations in these factors. This insight into the relative importance of input parameters provides valuable guidance for refining model performance and enhancing the interpretability of predictions in the context of the research study.
Figure 13

Contribution of various input parameters on the output parameter.

Figure 13

Contribution of various input parameters on the output parameter.

Close modal

Taylor's diagram

In this research study, the Taylor diagram was utilized to evaluate and compare the performance of four different models. By examining their standard deviations and correlation coefficients, we gained valuable insights into the accuracy and reliability of these models. Taylor's diagram provided a concise visual representation, as represented in Figure 14, where each model was illustrated by a distinct marker with different colours. This analysis identified various models with lower standard deviations and higher correlation coefficients as the most accurate and reliable. It served as a valuable tool for evaluating and comparing the performance of the models, contributing essential findings to guide future model selection and development efforts in our research.
Figure 14

Taylor's diagram represents the behaviour of each model which performed best against each input case.

Figure 14

Taylor's diagram represents the behaviour of each model which performed best against each input case.

Close modal

Insights based on ML approach

The selected ensemble models, such as RF, and XGB, demonstrated exceptional performance due to their ensemble nature. These models perform well in capturing complex relationships within the data by combining multiple weaker models into a stronger one. The ensemble approach is particularly important when dealing with complex hydraulic phenomena, as it allows the models to collectively consider a wide range of factors and interactions, resulting in more accurate predictions of ‘n. Moreover, RF and XGB provide valuable insights through feature importance scores. These scores highlight the relative importance of input parameters in predicting ‘n’. This allows us to focus on the variables that have the most significant impact on the predictions. By identifying such influential factors, a deeper understanding of the underlying mechanisms driving ‘n’ variations is crucial for both prediction accuracy and physical interpretation. The various ML models utilized in the present study are inherently suited for handling non-linear relationships, which can often exist in hydraulic data. In the present analysis, the performance can be attributed to its capacity to adapt to complex relationships within the data, ultimately leading to more reliable predictions.

However, model LR are appropriate models for mitigating overfitting, a common task in predictive modeling. These regularization techniques reduce model complexity and emphasize the most relevant features. The success of LR in the present study can be attributed to their ability to select important input parameters while avoiding unnecessary complexity. This feature selection enhances the models' generalization to new data, contributing to their overall performance. The model XGB relies on similarity metrics to make predictions and performs well when data points with similar characteristics tend to have similar values of ‘n’. In the present analysis, the effectiveness of this model can be interpreted by its capacity to identify and leverage the similarity patterns within the dataset. By considering neighbouring data points, XGB can provide accurate predictions, especially when local patterns strongly influence ‘n’. In addition to this, the success of the selected ML models in predicting ‘n’ can be attributed to their ability to handle complexity, capture non-linear relationships, adapt to data patterns, mitigate overfitting, and leverage effective data preprocessing. These models, along with sensitivity analysis, provide valuable comprehension into the underlying mechanisms governing roughness predictions and advance our understanding of hydraulic processes in alluvial channels with bedforms.

Several ML models, such as RF and ETR influenced ensemble techniques to enhance predictive performance. These ensemble methods aggregate predictions from multiple models, reducing overfitting and supporting robustness. The model XGB excelled through gradient optimization for predictions, which were particularly effective when local patterns were significant. In contrast, a model such as LR showed moderate efficiency in predicting ‘n’ due to their limitation in capturing dataset intricacies and non-linearity. It assumes a linear relationship that may not represent complex patterns adequately. The model LaR, while using L1 regularization to curb overfitting, struggled in feature selection, potentially discarding vital information. This underlines the importance of employing advanced techniques such as ensemble methods and non-linear regression for datasets with diverse and intricate roughness coefficient patterns.

Comparison of results with previous studies

In a previous study, Roushangar et al. (2018) showed that different models (FFNN, RBFNN, and ANFIS) were used to estimate ‘n’ in the rivers with dunes. It was observed that the value of R2 was ≤ 0.9 during the verification stage. In another study, Saghebian et al. (2020) reported the value of R2 = 0.56 and RMSE = 0.0034 in predicting ‘n’ using MLP-FFA and FFNN algorithms from the best scenario. in addition to this, a study carried out by Yarahmadi et al. (2023) reported R2 values for the multilayer perceptron neural network (MLPNN), group method of data handling (GMDH), support vector machine (SVM), and GP models were found to be 0.982, 0.979, 0.999, and 0.926, respectively, during the verification phase. Additionally, the corresponding RMSE values were 0.0006, 0.0006, 0.0000, and 0.0013. However, a more in-depth analysis of the statistical error indices, specifically the R2 and RMSE, when comparing with ML models in this study with those established by previous researchers (Roushangar et al. 2018; Saghebian et al. 2020; Yarahmadi et al. 2023) reveals a clear trend of significantly improved accuracy in these models. By following that, ML models in this study outperformed the previously utilized soft computing models proposed in terms of their ability to accurately predict and represent the roughness coefficients in alluvial channels with bedforms. The enhanced accuracy is crucial for improving our understanding of flow dynamics in these complex environments and has the potential to contribute significantly to river management and engineering practices.

K-fold cross-validation

K-fold cross-validation is a fundamental technique in ML modeling used to assess and validate the performance of predictive models. It serves as a robust method for estimating the ability of ML models to generalize to new, unseen data. The process involves partitioning the dataset into K equally sized subsets or folds. The model is trained and evaluated K times, each time using a different fold as the validation set, while the remaining K-1 folds are used for training. This ensures that every data point is used for validation exactly once. The results from these K iterations are typically averaged to provide a more stable and representative evaluation of the model's performance. K-fold cross-validation helps detect issues like overfitting, where a model performs exceptionally well on the training data but poorly on new data and provides a more accurate estimate of the performance of the models.

In the comprehensive cross-validation analysis, the performance of a diverse set of regression models, such as LR, ETR, RF, and XGB was evaluated. Across all folds, the results consistently showcase the exceptional predictive capabilities of these models. The RMSE values specified the prediction accuracy is minimum, indicating that the models make precise predictions that closely align with the actual values. Equally impressive are the consistently high R2 values, which approach 1. These scores signify that the models effectively capture and explain a substantial portion of the variance within the dependent variable. While minor fluctuations in performance may be observed between folds, the values emphasize the robust and reliable predictive power of these ML techniques. The choice among these models can link among specific considerations such as model complexity and significance; however, these results collectively confirm the capability to elucidate underlying patterns within the dataset. The K-fold cross-validation for all the models is provided in Table 11.

Table 11

K-fold cross-validation of various ML models used in this study

FoldXGB
RF
LR
ETR
RMSER2 scoreRMSER2 scoreRMSER2 scoreRMSER2 score
0.0003 0.9974 0.0002 0.9987 0.0004 0.9944 0.0002 0.9988 
0.0002 0.9985 0.0000 0.9999 0.0003 0.9967 0.0000 1.0000 
0.0002 0.9987 0.0002 0.9983 0.0003 0.9961 0.0001 0.9998 
0.0006 0.9885 0.0004 0.9959 0.0004 0.9957 0.0003 0.9964 
0.0004 0.9950 0.0003 0.9977 0.0004 0.9942 0.0002 0.9990 
0.0011 0.9607 0.0006 0.9892 0.0004 0.9940 0.0006 0.9868 
0.0004 0.9952 0.0002 0.9987 0.0007 0.9883 0.0002 0.9985 
0.0004 0.9968 0.0003 0.9981 0.0004 0.9965 0.0003 0.9980 
0.0003 0.9960 0.0002 0.9982 0.0004 0.9932 0.0003 0.9972 
10 0.0007 0.9839 0.0005 0.9932 0.0005 0.9931 0.0004 0.9950 
FoldXGB
RF
LR
ETR
RMSER2 scoreRMSER2 scoreRMSER2 scoreRMSER2 score
0.0003 0.9974 0.0002 0.9987 0.0004 0.9944 0.0002 0.9988 
0.0002 0.9985 0.0000 0.9999 0.0003 0.9967 0.0000 1.0000 
0.0002 0.9987 0.0002 0.9983 0.0003 0.9961 0.0001 0.9998 
0.0006 0.9885 0.0004 0.9959 0.0004 0.9957 0.0003 0.9964 
0.0004 0.9950 0.0003 0.9977 0.0004 0.9942 0.0002 0.9990 
0.0011 0.9607 0.0006 0.9892 0.0004 0.9940 0.0006 0.9868 
0.0004 0.9952 0.0002 0.9987 0.0007 0.9883 0.0002 0.9985 
0.0004 0.9968 0.0003 0.9981 0.0004 0.9965 0.0003 0.9980 
0.0003 0.9960 0.0002 0.9982 0.0004 0.9932 0.0003 0.9972 
10 0.0007 0.9839 0.0005 0.9932 0.0005 0.9931 0.0004 0.9950 

In this study, four ML models, such as LR, ETR, RF, and XGB are utilized to predict the parameter ‘n’ value in alluvial channels while considering resistance due to bedforms. The prediction of ‘n’ value was based on six input parameters: Sf, Fr, y/d50, Δ/d50, Δ, and Δ/y. The values of R2 were used to assess the performance of the models in capturing the relationship between the input parameters and the output parameter ‘n’. The correlation matrix revealed significant relationships between the input parameters. The main conclusions drawn from the study are listed below:

  • The parameter Sf showed a negative correlation coefficient of −0.281, indicating an inverse relationship with ‘n. The input parameter Fr exhibited a positive correlation coefficient of 0.934, indicating a strong positive relationship, as also depicted in the heatmap. Similarly, y/d50, Δ/d50, Δ, and Δ/y demonstrated positive correlations of 0.882, 0.961, 0.936, and 0.876, respectively, suggesting their significant influence on ‘n’.

  • In the specific input scenario C1, all the parameters (Sf, Fr, y/d50, Δ/d50, Δ, and Δ/y) were found to be essential for predicting the parameter ‘n’ in the alluvial channel with bedforms. Moreover, Taylor's diagram showed that ETR, RF, and XGB models have high accuracy and strong correlation coefficients, indicating reliable predictions with minimal errors.

  • The sensitivity analysis highlighted that in predicting roughness, all the parameters were important in model LR, while Sf had a significant impact in model ETR as compared to the other parameters in predicting roughness in alluvial channels. These findings emphasize the importance of considering all these parameters when predicting and analyzing the behaviour of ‘n’ in alluvial channels.

  • The R2 values achieved by various ML models that exhibited variability among the various input features. RF, XGB, and RF models outperformed as per the values of different errors, and the R2 value is very close to one. In addition to this, the RF, XGB, and RF models demonstrated a higher R2 value ranging from 0.997 to 0.999 suggesting the effectiveness in predicting ‘n’. Whereas, model LR showed an R2 value of 0.82, indicating a good relationship between the input parameters and ‘n’.

  • K-fold cross-validation is carried out to assess the predictive performance of ML models. The results ensured the reliability of the models, the ability to generalize the new data and supporting the validity of the findings.

This study highlights the successful application of various ML models in predicting the value of ‘n’ in alluvial channels based on the given input parameters. The models demonstrated strong relationships and high accuracy, as evidenced by the high R2 values. The identified correlations between the input parameters and ‘n’ provide insights into the underlying relationships and can be valuable for understanding and managing alluvial channel dynamics. It is essential to note that the potential limitation of the present study is to incorporate actual field data because the current prediction has been carried out on an experimental dataset. However, further studies can be carried out to incorporate actual field data, hydrodynamic simulations and computational fluid dynamics modelling along with ML algorithms which could also provide a more in-depth understanding of flow resistance. In addition to this, the current methodology employed in this study exhibits robust performance in predicting riverbed characteristics when applied to experimental datasets characterized by noticeable patterns and clear relationships. However, its sensitivity to datasets consisting of random numbers, where the absence of distinct patterns can lead to challenges and prediction failures, highlights a limitation of the field data. Also, in scenarios, where the input data lack a clear structure or exhibits significant random variations, the ML models may struggle, resulting in prediction failures. Notably, when attempting to validate the models with real-world field data, the limitations become more pronounced, and the performance of ML models can be reduced significantly. This emphasizes the need for adequate attention when applying the method to datasets with inherent randomness, especially when attempting to extrapolate findings to real field scenarios. The model's failure to perform effectively with field data emphasizes the challenges in adapting the methodology to unpredictable variations in riverbed characteristics experienced in actual field conditions. Thus, it shows the need to conduct further study considering the complexities of field data for the adequate prediction of roughness in the streams.

The authors would like to thank their fellow researchers for providing valuable insights that helped to enhance the quality of the data presented in the manuscript.

The authors are grateful to the Core Research Grant, SERB Government of India (CRG/2021/002119), for their generous financial assistance, which made it possible to conduct the research presented in this paper.

A. A. M. wrote the original draft, conceptualized the whole article, developed the methodology, and reviewed and edited the article, and M. P. rendered support in data curation, prepared the article, visualized the data, investigated and supervised the work.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Aberle
J.
,
Nikora
V.
,
Henning
M.
&
Ettmer
B.
2010
Statistical characterization of bed roughness due to bed forms: A field study in the Elbe River at Aken, Germany
.
Water Resources Research
46
(
3
),
1
11
.
https://doi.org/10.1029/2008WR007406
.
Afzalimehr
H.
,
Singh
V. P.
&
Najafabadi
E. F.
2010
Determination of form friction factor
.
Journal of Hydrologic Engineering
15
(
3
),
237
243
.
https://doi.org/10.1061/(ASCE)HE.1943-5584.0000175
.
Alam
A. M. Z.
&
Kennedy
J. F.
1969
Friction factors for flow in sand-bed channels
.
Journal of the Hydraulics Division
95
(
6
),
1973
1992
.
https://doi.org/10.1061/JYCEAJ.0002200
.
Alanazi
A.
2022
Using machine learning for healthcare challenges and opportunities
.
Informatics in Medicine Unlocked
30
,
100924
.
https://doi.org/10.1016/J.IMU.2022.100924
.
Alerskans
E.
,
Zinck
A. S. P.
,
Nielsen-Englyst
P.
&
Høyer
J. L.
2022
Exploring machine learning techniques to retrieve sea surface temperatures from passive microwave measurements
.
Remote Sensing of Environment
281
,
113220
.
https://doi.org/10.1016/J.RSE.2022.113220
.
Alhamzawi
R.
&
Ali
H. T. M.
2018
The Bayesian adaptive lasso regression
.
Mathematical Biosciences
303
,
75
82
.
https://doi.org/10.1016/J.MBS.2018.06.004
.
Al-Rousan
N. M.
&
Trajkovic
L.
2012
Machine learning models for classification of BGP anomalies
. In
2012 IEEE 13th International Conference on High Performance Switching and Routing, HPSR 2012
, pp.
103
108
.
https://doi.org/10.1109/HPSR.2012.6260835
.
Azamathulla
H. M.
,
Ahmad
Z.
&
Ab. Ghani
A.
2013
An expert system for predicting Manning's roughness coefficient in open channels by using gene expression programming
.
Neural Computing and Applications
23
(
5
),
1343
1349
.
https://doi.org/10.1007/S00521-012-1078-Z
.
Azamathulla
H. M.
,
Haghiabi
A. H.
&
Parsaie
A.
2016
Prediction of side weir discharge coefficient by support vector machine technique
.
Water Science and Technology: Water Supply
16
(
4
),
1002
1016
.
https://doi.org/10.2166/WS.2016.014
.
Baharvand
S.
,
Jozaghi
A.
,
Fatahi-Alkouhi
R.
,
Karimzadeh
S.
,
Nasiri
R.
&
Lashkar-Ara
B.
2021
Comparative study on the machine learning and regression-based approaches to predict the hydraulic jump sequent depth ratio
.
Iranian Journal of Science and Technology – Transactions of Civil Engineering
45
(
4
),
2719
2732
.
https://doi.org/10.1007/S40996-020-00526-2/METRICS
.
Balachandar
R.
&
Patel
V. C.
2008
Flow over a fixed rough dune
.
Canadian Journal of Civil Engineering
.
https://doi.org/10.1139/L08-004
.
Bassi
A.
,
Manchanda
A.
,
Singh
R.
&
Patel
M.
2023a
A comparative study of machine learning algorithms for the prediction of compressive strength of rice husk ash-based concrete
.
Natural Hazards
118
(
1
),
209
238
.
https://doi.org/10.1007/S11069-023-05998-9/METRICS
.
Bassi
A.
,
Mir
A. A.
,
Kumar
B.
&
Patel
M.
2023b
A comprehensive study of various regressions and deep learning approaches for the prediction of friction factor in mobile bed channels
.
Journal of Hydroinformatics
00
,
1
.
https://doi.org/10.2166/HYDRO.2023.246
.
Belgiu
M.
&
Drăgu
L.
2016
Random forest in remote sensing: A review of applications and future directions
.
ISPRS Journal of Photogrammetry and Remote Sensing
114
,
24
31
.
https://doi.org/10.1016/J.ISPRSJPRS.2016.01.011
.
Bhattacharya
R.
,
Dolui
G.
&
Das Chatterjee
N.
2019
Effect of instream sand mining on hydraulic variables of bedload transport and channel planform: An alluvial stream in South Bengal basin, India
.
Environmental Earth Sciences
78
(
10
),
1
24
.
https://doi.org/10.1007/S12665-019-8267-3/FIGURES/16
.
Brakenhoff
L.
,
Schrijvershof
R.
,
Van Der Werf
J.
,
Grasmeijer
B.
,
Ruessink
G.
&
Van Der Vegt
M.
2020
From ripples to large-scale sand transport: The effects of bedform-related roughness on hydrodynamics and sediment transport patterns in delft3d
.
Mdpi.Com.
https://doi.org/10.3390/jmse8110892
.
Bridge
J. S.
1993
The interaction between channel geometry, water flow, sediment transport and deposition in braided rivers
.
Geological Society Special Publication
75
,
13
71
.
https://doi.org/10.1144/GSL.SP.1993.075.01.02
.
Buffington
J. M.
,
Lisle
T. E.
,
Woodsmith
R. D.
&
Hilton
S.
2002
Controls on the size and occurrence of pools in coarse-grained forest rivers
.
Wiley Online Library
18
(
6
),
507
531
.
https://doi.org/10.1002/rra.693
.
Cardenas
M. B.
&
Wilson
J. L.
2007
Hydrodynamics of coupled flow above and below a sediment–water interface with triangular bedforms
.
Advances in Water Resources
30
(
3
),
301
313
.
https://doi.org/10.1016/J.ADVWATRES.2006.06.009
.
Chadalawada
J.
,
Herath
H. M. V. V.
&
Babovic
V.
2020
Hydrologically informed machine learning for rainfall-runoff modeling: A genetic programming-based toolkit for automatic model induction
.
Water Resources Research
56
(
4
),
e2019WR026933
.
https://doi.org/10.1029/2019WR026933
.
Chang
L.
,
Roberts
S.
&
Welsh
A.
2018
Robust lasso regression using Tukey's biweight criterion
.
Technometrics
60
(
1
),
36
47
.
https://doi.org/10.1080/00401706.2017.1305299
.
Chegini
A.
&
Pender
G.
2012
Determination of small size bed load sediment transport and its related bed form under different uniform flow conditions
.
WSEAS Transactions on Environment and Development
8
(
4
),
158
167
.
Cihan
M.
2019
Prediction of concrete compressive strength and slump by machine learning methods
.
Advances in Civil Engineering
2019
,
1
11
.
https://doi.org/10.1155/2019/3069046
.
Clifford
N. J.
,
Robert
A.
&
Richards
K. S.
1992
Estimation of flow resistance in gravel-bedded rivers: A physical explanation of the multiplier of roughness length
.
Earth Surface Processes and Landforms
17
(
2
),
111
126
.
https://doi.org/10.1002/ESP.3290170202
.
Darby
S. E.
1999
Effect of riparian vegetation on flow resistance and flood potential
.
Journal of Hydraulic Engineering
125
(
5
),
443
454
.
https://doi.org/10.1061/(ASCE)0733-9429(1999)125:5(443)
.
Dehsorkhi
E. N.
,
Afzalimehr
H.
,
Singh
V. P.
&
Asce
F.
2011
Effect of bed forms and vegetated banks on velocity distributions and turbulent flow structure
.
Ascelibrary.Org
16
(
6
),
495
507
.
https://doi.org/10.1061/(ASCE)HE.1943-5584.0000337
.
Dey
S.
2014a
Bedforms
.
GeoPlanet: Earth and Planetary Sciences
4
,
453
528
.
https://doi.org/10.1007/978-3-642-19062-9_8
.
Dey
S.
2014b
Fluvial Hydrodynamics
, Vol.
1
.
Springer
,
Berlin, Heidelberg
.
https://doi.org/10.1007/978-3-642-19062-9
.
Dey
S.
,
Paul
P.
,
Fang
H.
&
Padh
E.
2020
Hydrodynamics of flow over two-dimensional dunes
.
Physics of Fluids
32
(
2
).
https://doi.org/10.1063/1.5144552
.
Ekmekcioğlu
Ö.
,
Başakın
E. E.
&
Özger
M.
2022
Developing meta-heuristic optimization based ensemble machine learning algorithms for hydraulic efficiency assessment of storm water grate inlets
.
Urban Water Journal
19
(
10
),
1093
1108
.
https://doi.org/10.1080/1573062X.2022.2134806
.
Essays
2011
A model of adaptive neural-based fuzzy inference system (ANFIS) for prediction of friction coefficient in open channel flow
.
Serkansubasi.Net
6
(
5
),
1020
1027
.
Gilvear
D.
&
Willby
N.
2006
Channel dynamics and geomorphic variability as controls on gravel bar vegetation; River Tummel, Scotland
.
River Research and Applications
22
(
4
),
457
474
.
https://doi.org/10.1002/RRA.917
.
Gkerekos
C.
,
Lazakis
I.
&
Theotokatos
G.
2019
Machine learning models for predicting ship main engine fuel oil consumption: A comparative study
.
Ocean Engineering
188
,
106282
.
https://doi.org/10.1016/J.OCEANENG.2019.106282
.
Griffiths
G. A.
1981
Flow resistance in coarse gravel bed rivers
.
Journal of the Hydraulics Division, ASCE
107
(
HY7, Proc. Paper 16363
),
899
918
.
https://doi.org/10.1061/JYCEAJ.0005699
.
Harbor
D. J.
1998
Dynamics of bedforms in the lower Mississippi River
.
Journal of Sedimentary Research
68
(
5
),
750
762
.
https://doi.org/10.2110/JSR.68.750
.
Heydari
M.
&
Yarahmadi
M. B.
2022
Experimental study of the effect of bed forms on Darcy-Weisbach friction coefficient in straight open channels
.
Journal of Hydraulics.
https://doi.org/10.30482/JHYD.2021.296873.1542
.
Jaiswal
R.
&
Lohani
A. K.
2023
A framework to assess the dynamics of climate extremes on irrigation water requirement using machine learning techniques
.
Journal of Earth System Science
132
(
1
).
https://doi.org/10.1007/S12040-022-02044-3
.
Jiang
S.
,
Zheng
Y.
,
Wang
C.
&
Babovic
V.
2022
Uncovering flooding mechanisms across the contiguous United States through interpretive deep learning on representative catchments
.
Water Resources Research
58
(
1
),
e2021WR030185
.
https://doi.org/10.1029/2021WR030185
.
Jun
H.
&
ZeXin
Z.
2021
Screening of pyroptosis-related genes influencing the therapeutic effect of dehydroabietic acid in liver cancer and construction of a survival nomogram
.
Biochemical and Biophysical Research Communications
585
,
103
110
.
https://doi.org/10.1016/J.BBRC.2021.11.027
.
Kabiri
F.
2014
Flow over gravel dunes
.
British Journal of Applied Science & Technology
4
(
6
),
905
911
.
https://doi.org/10.9734/BJAST/2014/7456
.
Kabiri
F.
,
Afzalimehr
H.
&
Sui
J.
2017
Flow structure over a wavy bed with vegetation cover
.
International Journal of Sediment Research
32
(
2
),
186
194
.
https://doi.org/10.1016/j.ijsrc.2016.07.004
.
Khan
G. M.
2018
Artificial neural network (ANNs)
. In:
Studies in Computational Intelligence
, Vol.
725
.
Springer Verlag
,
Berlin, Germany
, pp.
39
55
.
https://doi.org/10.1007/978-3-319-67466-7_4/COVER
.
Kitsikoudis
V.
,
Sidiropoulos
E.
,
Iliadis
L.
&
Hrissanthou
V.
2015
A machine learning approach for the mean flow velocity prediction in alluvial channels
.
Water Resources Management
29
(
12
),
4379
4395
.
https://doi.org/10.1007/S11269-015-1065-0/FIGURES/3
.
Kumar
R.
,
Rathore
A.
,
Singh
R.
,
Mir
A. A.
,
Tipu
R. K.
&
Patel
M.
2023a
Prognosis of flow of fly ash and blast furnace slag-based concrete: Leveraging advanced machine learning algorithms
.
Asian Journal of Civil Engineering
1
15
.
https://doi.org/10.1007/S42107-023-00922-9/METRICS
.
Kumar
S.
,
Pradhan
A.
,
Khuntia
J. R.
&
Khatua
K. K.
2023b
Evaluation of flow resistance using multi-gene genetic programming for bed-load transport in gravel-bed channels
.
Water Resources Management
37
(
8
),
2945
2967
.
https://doi.org/10.1007/S11269-022-03409-5
.
Kwoll
.
2016
Flow structure and resistance over subaquaeous high-and low-angle dunes
.
Wiley Online Library
121
(
3
),
545
564
.
https://doi.org/10.1002/2015JF003637
.
Lefebvre
A.
2019
Three-dimensional flow above river bedforms: Insights from numerical modeling of a natural dune field (Río Paraná, Argentina)
.
Journal of Geophysical Research: Earth Surface.
https://doi.org/10.1029/2018JF004928
.
Lisle
T. E.
1982
Effects of aggradation and degradation on riffle-pool morphology in natural gravel channels, northwestern California
.
Water Resources Research
18
(
6
),
1643
1651
.
https://doi.org/10.1029/WR018I006P01643
.
Liu
H.
,
Tian
H. Q.
,
Li
Y. F.
&
Zhang
L.
2015
Comparison of four Adaboost algorithm based artificial neural networks in wind speed predictions
.
Energy Conversion and Management
92
,
67
81
.
https://doi.org/10.1016/J.ENCONMAN.2014.12.053
.
Murray
B. A.
&
Paola
C.
2003
Modelling the effect of vegetation on channel pattern in bedload rivers
.
Earth Surface Processes and Landforms
28
(
2
),
131
143
.
https://doi.org/10.1002/ESP.428
.
Niazkar
M.
,
Talebbeydokhti
N.
&
Afzali
S. H.
2019
Novel grain and form roughness estimator scheme incorporating artificial intelligence models
.
Water Resources Management
33
(
2
),
757
773
.
https://doi.org/10.1007/S11269-018-2141-Z
.
Okhravi
S.
&
Gohari
S.
2020
Form friction factor of armored riverbeds
.
Canadian Journal of Civil Engineering
47
(
11
),
1238
1248
.
https://doi.org/10.1139/CJCE-2019-0103
.
Omid
M. H.
,
Karbasi
M.
&
Farhoudi
J.
2010
Effects of bed-load movement on flow resistance over bed forms
.
Sadhana – Academy Proceedings in Engineering Sciences
35
(
6
),
681
691
.
https://doi.org/10.1007/S12046-010-0045-6
.
Pai
S.
,
Visaria
D.
&
Weibel
J.
2021
A machine-learning-based surrogate model for internal flow Nusselt number and friction factor in various channel cross sections
. In
2021 20th IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm)
, pp.
1024
1029
.
https://doi.org/doi: 10.1109/ITherm51669.2021.9503289
.
Patel
M.
2017
Flow and Bed-Features Dynamics in Seepage Affected Alluvial Channels
.
Patel
M.
&
Kumar
B.
2017
Flow and bedform dynamics in an alluvial channel with downward seepage
.
CATENA
158
,
219
234
.
https://doi.org/10.1016/J.CATENA.2017.07.009
.
Patel
M.
,
Deshpande
V.
&
Kumar
B.
2015
Turbulent characteristics and evolution of sheet flow in an alluvial channel with downward seepage
.
Geomorphology
248
,
161
171
.
https://doi.org/10.1016/j.geomorph.2015.07.042
.
Patel
M.
,
Deshpande
V.
&
Kumar
B.
2016
Effect of seepage on the friction factor in an alluvial channel
. In
Sustainable Hydraulics in the Era of Global Change – Proceedings of the 4th European Congress of the International Association of Hydroenvironment Engineering and Research, IAHR 2016
, pp.
473
477
.
https://doi.org/10.1201/B21902-82
.
Patel
M.
,
Majumder
S.
&
Kumar
B.
2017
Effect of seepage on flow and bedforms dynamics
.
Earth Surface Processes and Landforms
42
(
12
),
1807
1819
.
https://doi.org/10.1002/ESP.4134
.
Powell
D. M.
2014
Flow resistance in gravel-bed rivers: Progress in research
.
Earth-Science Reviews
136
,
301
338
.
https://doi.org/10.1016/J.EARSCIREV.2014.06.001
.
Probst
P.
,
Wright
M. N.
&
Boulesteix
A. L.
2019
Hyperparameters and tuning strategies for random forest
.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
9
(
3
).
https://doi.org/10.1002/WIDM.1301
.
Qi
Y.
2012
Random forest for bioinformatics
.
Ensemble Machine Learning
307
323
.
https://doi.org/10.1007/978-1-4419-9326-7_11
.
Roushangar
K.
&
Shahnazi
S.
2021
Insights into the prediction capability of roughness coefficient in current ripple bedforms under varied hydraulic conditions
.
Journal of Hydroinformatics
23
(
6
),
1182
1196
.
https://doi.org/10.2166/HYDRO.2021.161
.
Roushangar
K.
,
Mouaze
D.
&
Shiri
J.
2014
Evaluation of genetic programming-based models for simulating friction factor in alluvial channels
.
Journal of Hydrology
517
,
1154
1161
.
https://doi.org/10.1016/J.JHYDROL.2014.06.047
.
Roushangar
K.
,
Saghebian
S. M.
,
Mouazé
D.
,
Saghebian
M.
&
Mouaze
D.
2017
Predicting characteristics of dune bedforms using PSO-LSSVM
.
International Journal of Sediment Research
32
(
4
),
515
526
.
https://doi.org/10.1016/j.ijsrc.2017.09.005
.
Roushangar
K.
,
Alipour
S. M.
&
Mouaze
D.
2018
Linear and non-linear approaches to predict the Darcy-Weisbach friction factor of overland flow using the extreme learning machine approach
.
International Journal of Sediment Research
33
(
4
),
415
432
.
https://doi.org/10.1016/j.ijsrc.2018.04.006
.
Saghebian
S.
,
Roushangar
K.
,
Kirca
V. S. O.
&
Ghasempour
R.
2020
Modeling total resistance and form resistance of movable bed channels via experimental data and a kernel-based approach
.
Journal of Hydroinformatics
22
(
3
),
528
540
.
https://doi.org/https://doi.org/10.2166/hydro.2020.094
.
Sharma
R.
,
Kim
M.
&
Gupta
A.
2022
Motor Imagery Classification in Brain-Machine Interface with Machine Learning Algorithms: Classical Approach to Multi-Layer Perceptron Model
.
Elsevier
,
London, UK
.
Shiono
K.
,
Chan
T. L.
,
Spooner
J.
,
Rameshwaran
P.
&
Chander
J. H.
2009
The effect of floodplain roughness on flow structures, bedforms and sediment transport rates in meandering channels with overbank flows: Part I
.
Journal of Hydraulic Research
47
(
1
),
5
19
.
https://doi.org/10.3826/JHR.2009.2944-I
.
Talebbeydokhti
N.
,
Hekmatzadeh
A. A.
&
Rakhshandehroo
G. R.
2006
Experimental modeling of dune bed form in a sand-bed channel
.
Iranian Journal of Science & Technology, Transaction B, Engineering
30
(
B4
).
Tang
J.
,
Deng
C.
&
Huang
G.
2015
Extreme learning machine for multilayer perceptron. Ieeexplore.Ieee.Org. Available from: https://ieeexplore.ieee.org/abstract/document/7103337/
Thomas
H.
&
Nisbet
T. R.
2007
An assessment of the impact of floodplain woodland on flood flows
.
Water and Environment Journal
21
(
2
),
114
126
.
https://doi.org/10.1111/J.1747-6593.2006.00056.X
.
Tuozzolo
S.
,
Langhorst
T.
,
de Moraes Frasson
R. P.
,
Pavelsky
T.
,
Durand
M.
&
Schobelock
J. J.
2019
The impact of reach averaging Manning's equation for an in-situ dataset of water surface elevation, width, and slope
.
Journal of Hydrology
578
,
123866
.
https://doi.org/10.1016/J.JHYDROL.2019.06.038
.
van der Mark
C. F.
,
Blom
A.
&
Hulscher
S. J. M. H.
2008
Quantification of variability in bedform geometry
.
Journal of Geophysical Research
113
(
3
).
https://doi.org/10.1029/2007JF000940
.
Venditti
J. G.
2007
Turbulent flow and drag over fixed two- and three-dimensional dunes
.
Journal of Geophysical Research: Earth Surface
112
(
F4
),
4008
.
https://doi.org/10.1029/2006JF000650
.
Venditti
J. G.
2013
Bedforms in Sand-Bedded rivers
.
Treatise on Geomorphology
9
,
137
162
.
https://doi.org/10.1016/B978-0-12-374739-6.00235-9
.
Wadhawan
S.
,
Bassi
A.
,
Singh
R.
&
Patel
M.
2023
Prediction of compressive strength for fly ash-based concrete: Critical comparison of machine learning algorithms
.
Journal of Soft Computing in Civil Engineering
7
(
3
),
68
110
.
https://doi.org/10.22115/SCCE.2023.353183.1493
.
Yao
L.
,
Peng
Y.
,
Yu
X.
,
Zhang
Z.
&
Luo
S.
2023
Optimal inversion of manning's roughness in unsteady open flow simulations using adaptive parallel genetic algorithm
.
Water Resources Management
37
(
2
),
879
897
.
https://doi.org/10.1007/S11269-022-03411-X
.
Yarahmadi
M. B.
,
Parsaie
A.
,
Shafai-Bejestan
M.
,
Heydari
M.
&
Badzanchin
M.
2023
Estimation of manning roughness coefficient in alluvial rivers with bed forms using soft computing models
.
Water Resources Management
1
22
.
https://doi.org/10.1007/S11269-023-03514-Z/FIGURES/10
.
Yoon
J.
2021
Forecasting of real GDP growth using machine learning models: Gradient boosting and random forest approach
.
Computational Economics
57
(
1
),
247
265
.
https://doi.org/10.1007/S10614-020-10054-W
.
Young
W. J.
&
Davies
T. R. H.
1991
Bedload transport processes in a braided gravel-bed river model
.
Earth Surface Processes and Landforms
16
(
6
),
499
511
.
https://doi.org/10.1002/ESP.3290160603
.
Yuhong
Z.
&
Wenxin
H.
2009
Application of artificial neural network to predict the friction factor of open channel flow
.
Communications in Nonlinear Science and Numerical Simulation
14
(
5
),
2373
2378
.
https://doi.org/10.1016/J.CNSNS.2008.06.020
.
Zanganeh
M.
&
Rastegar
A.
2020
Estimation of roughness coefficient in erodible channels by ANNs and the ANFIS methods
.
Amirkabir Journal of Civil Engineering
52
(
2
),
131
134
.
https://doi.org/10.22060/ceej.2018.14532.5678
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).