Sewer systems are usually built with a self-cleaning system that keeps the bottom of the channel free of sediment to lessen the effects of the constant buildup of sediment particles. Because of this, it is important to accurately predict the particle Froude number (Fr) when making sewer systems. For the prediction of Fr, five different sets of input variables were looked at. For the training and testing of the machine learning (ML) model, we used 10-fold cross-validation methodologies to prevent overfitting. M5Prime (M5P) model as a standalone and Bagging-M5P as a hybrid model were utilized, and the results were compared with the empirical equations proposed in the literature. Models perform best when all input variables are used for training and testing of models. The hybrid BA-M5P model performed better than the M5P model and empirical equations. We performed sensitivity analysis and compared the result based on MAE and MSE value, and we found sediment concentration (Svc) is the most important variable to predict the particle Froude number under non-deposition with deposited bed by best performing model BA-M5P. Hence, for the self-cleaning system, we prefer the BA-M5P ML model with Svc the most required variable.

  • A novel prediction model was developed to estimate the particle Froude number in a sewage system.

  • For the first time, the results of empirical equations were compared in this analysis.

  • Different performance metrics, including MSE, MAE, NSE, and CC were employed to illustrate the effectiveness of the proposed models.

  • The sensitivity analysis was used to find out the most sensitive input variable in prediction.

dimensional grain size of particles

mean velocity of non-deposition flow

density

sediment density

sediment relative density

acceleration due to gravity

median sediment size

kinematic viscosity

hydraulic radius

volumetric concentration of sediment

deposited bed width

the thickness of deposited bed

the thickness of sediment deposited

the diameter of pipe

friction factor of channel

Sediment buildup is a major technical and financial challenge when designing irrigation channels, urban drainage networks, rigid boundary channels, and sewer systems. In designing rigid boundary channels, the use of self-cleaning criteria is vital to prevent continuous sediment deposition (Kargar et al. 2019). A major part of building smooth bed channels, fundamental in environmental engineering, requires managing the sediment movement process (Safari et al. 2017). Based on the aforementioned issues, channels are created using the self-cleaning principle. According to the self-cleansing definition, flow can remove immobile deposited particles of the channel bed and convey sediments without permanently depositing them (Safari et al. 2018).

Sewers are now built using the self-cleaning principle, where sediments are anticipated to travel continuously without depositing. However, because of the intermittent nature of the flow, particle deposition in sewers could still happen, particularly at low flows like those that occur during the receding flow or during dry weathers. This applies to both rigid and loose boundary conditions (Ab Ghani 1993).

The non-deposition bed sediment transport condition reduces the required design velocity due to the thin layer of deposited sediments at the channel bottom. The channel design significantly depends on the channel size, and the self-cleansing velocity relies on the channel design. In order to build larger channels, it is necessary to have an improved design self-cleansing velocity. Therefore, designing these channels using a non-deposition with clean bed (NCB) is not an affordable procedure since it requires a steeper channel bed slope (Nalluri et al. 1997; Ota & Nalluri 2003).

The hydraulic parameters of the flow in channels with circular cross-sections and various sediment (rigid) bed thicknesses, as well as the impact of bed thickness and roughness on sediment transport capacity, have been thoroughly investigated (El-Zaemey 1991; Nalluri et al. 1994). Yang & Su (2008) employed image processing methods to describe the textures of the pipe faults, which included the wavelet transform and the generation of co-occurrence matrices. Then, to identify pipe defect patterns, back-propagation neural networks, radial basis networks, and support vector machines (SVMs) were used. Their performances were compared and analyzed. The outcome demonstrates that the diagnostic accuracy obtained using SVM, which is 60%, is the best.

Asset management requires regular evaluation of the state of the sewage network. Only a sizable portion of sewage systems are evaluated because of expensive inspection fees and scarce financing. Mashford et al. (2011) looked at the application of SVM models to sewer condition prediction. The results of the model testing demonstrated the SVM's success in making accurate predictions. To predict the of three phase flow within sewage channels, Yosefvand et al. (2019) proposed the extreme learning machine (ELM) model and compared its results to those of artificial neural network (ANN) and SVM models. The results presented that the ELM model simulated the aim function with the highest accuracy.

The prediction of the particle Froude number is crucial for numerous applications in fluid dynamics, sediment transport, and various engineering fields. This dimensionless parameter, defined as , where Un is the particle velocity, g is the gravitational acceleration, Sm is the particle diameter, and S is the specific gravity, provides essential insights into the movement and behavior of particles in a fluid medium. Precise forecasting in hydraulic engineering facilitates the analysis of sedimentation and scouring phenomena in the vicinity of structures such as dams and spillways, thereby guaranteeing their stability and durability (Danandeh Mehr & Safari 2020; Shakya et al. 2022a).

In the field of environmental engineering, a thorough comprehension of the particle Froude number is crucial for effectively controlling the movement of sediment and the spread of pollutants in water bodies. This knowledge is essential for safeguarding the health and stability of aquatic ecosystems. This parameter is crucial in civil engineering projects, particularly those that include construction in close proximity to water, as it helps to limit any negative impacts on sediment dynamics. The field of coastal and marine engineering uses these forecasts to enhance the effectiveness of beach nourishment and coastal defence projects (Safari & Mehr 2018; Scott Winton et al. 2019).

In addition, mining and dredging activities utilize the particle Froude number to optimize sediment extraction and relocation procedures, thereby minimizing environmental consequences. Essentially, in the field of fluid mechanics research, this parameter plays a crucial role in describing flow patterns and the interactions between particles and fluids, hence improving our theoretical and practical comprehension of intricate fluid systems. Hence, accurately forecasting the particle Froude number is crucial for developing efficient engineering solutions, maximizing operational effectiveness, and ensuring the protection of the environment and structural stability (Peacock & Ouillon 2023; Liu et al. 2024).

In the non-deposition situation of sediment movement in rigid boundary channels, Kargar et al. (2019) have used neuro-fuzzy (NF) and gene expression programming (GEP) to estimate the particle Froude number. They discovered that, when compared with the GEP model and empirical equations, the NF model yields the best results. Studies about hybrid machine learning (ML) models to predict the Froude number have been carried out to check and prove the effectiveness and accuracy of the hybrid models over experimental equations.

The application of hybrid models has been of great use when it comes to the prediction of the transportation rate for deposited beds due to improved predictability and increased accuracy. According to recent research, the Bagging (BA) model has been used to improve a wide range of base learners, such as trees (Mert et al. 2014), naive Bayes trees (Pham et al. 2018), and SVM (Pham et al. 2019). In the study conducted by Khosravi et al. (2020), the BA model was employed to train the M5P, random forest (RF), regression tree, and reduced error pruning tree (REPT) base learners for the prediction of bedload transport rate.

Danandeh Mehr & Safari (2020) examined the efficacy of three soft computing techniques, such as multilayer perceptron, multigene genetic programming (MGGP), and GEP in forecasting sedimentation in sewage networks under the non-deposition with a deposited bed self-cleansing condition. The outcomes demonstrated that in terms of statistical performance metrics, the MGGP models outperformed the others.

In a study Arya Azar et al. (2021), ANN was used to predict the Froude number using as the input variables and results were improved by evolutionary algorithms such as particle swarm algorithm, firefly algorithm, and differential evolution algorithm (DE). It was found that the use of evolutionary algorithms increased the accuracy of the ANN model and concluded that these models must be employed to forecast the Froude number instead of using experimental equations. Ebtehaj et al. (2016) developed a model by combining a feed-forward neural network with an extreme learning machine (FFNN-ELM), and they discovered that it outperformed previous empirical models.

For the purpose of predicting value, Shakya et al. (2022a, 2022b) employed boosting models such as the lightGBM regressor, catboost regressor, gradient boosting regressor, and adaboost regressor. Kumar et al. (2022) used a variety of hybrid algorithms in the prediction of value under non-deposition with deposited bed (NDB) conditions, including Kstar, AR-Kstar, RF, and AR-RF. They discovered that boosting and hybrid algorithms predicted value better than standalone approaches. Kumar et al. (2023a, 2023b, 2024) also used the hybrid ML model to improve the accuracy of the standalone ML model proposed to predict particle Froude numbers under NCB and NDB conditions.

Research Gaps: Predicting the particle Froude number is critical for various applications in fluid dynamics, sediment transport, and engineering. Traditional studies often rely on limited datasets, typically two to three, restricting the generalizability and robustness of their models. Objectives: This research proposes to enhance prediction accuracy by integrating diverse datasets and employing advanced ML techniques. The primary objective is to develop a comprehensive and adaptable predictive module by compiling and ensembling five datasets with varying diameters and sediment sizes. Specifically, datasets from El-Zaemey (1991), Perrusquia (1991), Ab Ghani (1993), May (1993), and Montes et al. (2020) are being used, encompassing sediment sizes from 0.47 to 8.4 mm and channel diameters from 225 to 595 mm. This research will provide a more generalized and reliable tool for hydraulic engineers, environmental scientists, and researchers, ultimately leading to better-informed decision-making and more effective management of sediment-related challenges. In terms of secondary objectives, we have provided the sensitivity analysis for determining the most sensitive parameter. This provides a better insight for the researchers to determine the importance of each input parameter.

By leveraging this comprehensive dataset, the study will develop and test hybrid ML models, BA-M5P, against standalone models (i.e., M5P) and compare them with models proposed in the literature (i.e., REPT, MGGP, ELM, and SVM). Although conventional ML models or empirical equations can predict the particle Froude number in sewage systems, however, conventional ML models often suffer from limited complexity handling, cannot handle non-linear interactions well, are prone to underfitting and overfitting, and often suffer from scalability issues. On the other hand, hybrid ML models that have been proposed in this study overcome these deficiencies.

Hybrid models combine the predictive power of different algorithms, such as linear regression, random trees, and neural networks, leveraging their strengths. Hybrid approaches can more correctly capture intricate patterns and relationships in the data than individual ML models alone by combining these ML models. By lowering the possibility of bias and variation, this model produces regression predictions which are more accurate and dependable across a range of scenarios and datasets. Hybrid ML models are more resilient to overfitting to sparse or noisy data when different techniques are combined.

The performance of the proposed BA-M5P model will be evaluated based on accuracy, robustness, and adaptability to different environmental scenarios. It is concluded that hybrid ML models outperform standalone models in terms of prediction accuracy and robustness, resulting in a highly reliable predictive module. Our proposed BA-M5P model improves by a factor of 0.5–22% in terms of CC values when compared with the current state-of-the-art models and empirical equations.

This article's contribution is as follows:

  • In this work, the value in a sewage system condition of NDB was estimated using a standalone M5P model along with a hybrid BA-M5P ML model. The results of BA-M5P, M5P, and empirical equations were compared.

  • The following independent factors were examined to calculate the particle Froude number (Fr): median sediment size (Sm), the diameter of pipe (dp), hydraulic radius (Rh), the thickness of sediment deposited (ts), volumetric concentration of sediment (Svc), dimensional grain size of particles (Dgr), and friction factor of channel (fc) in the condition of NDB.

  • Different performance metrics, including MSE, MAE, NSE, and CC, were employed to illustrate the effectiveness of the proposed models.

  • We used sensitivity analysis to find out the most sensitive input variable to predict the value.

Dependency analysis

According to the literature, the following factors influence the self-cleansing phenomenon: density of flow , sediment density , sediment relative density (S), mean velocity of non-deposition flow (Un), hydraulic radius (Rh), the diameter of pipe (dp), acceleration due to gravity (g), median sediment size (Sm), kinematic viscosity , volumetric concentration of sediment (Svc), the thickness of deposited bed (td), deposited bed width (Wb), the thickness of sediment deposited (ts), and friction factor of channel (fc). The self-cleansing idea was examined in two non-depositional situations with and without a deposited bed (Safari et al. 2018). As depicted in Equation (1) (Safari et al. 2018), the can be represented through a functional dependency. Equation (1) is shown as a functional dependency on the various parameters that the mean velocity depends on. Using Buckingham's Pi theorem, we reduce Equation (1) in a series of steps and finally its reduced form is shown in Equation (4).
(1)
(2)
Equation (2) may be easily transformed into the non-dimensional form indicated in Equation (3).
(3)
where is the dimensionless grain size. According to the self-cleaning design criteria, we can presume that the influence of the variables remains constant. Equation (3) is identified as the particle Froude number (Fr). Therefore, it can be re-written as:
(4)
The variables within Equation (4) are considered as a dimensionless and independent variable for predicting using the proposed model. The value of is computed as (Danandeh Mehr & Safari 2020)
(5)

Workflow

The architecture of the used models in this study for predicting the particle Froude value is depicted in Figure 1. Describe each step of the process that is shown in architecture in the below subsections.
Figure 1

Proposed workflow for this study to predict the particle Froude number.

Figure 1

Proposed workflow for this study to predict the particle Froude number.

Close modal

Dataset

In this work, the particle Froude number in sewage systems was predicted using five datasets for NDB derived from El-Zaemey (1991) [290 runs], Perrusquia (1991) [38 runs], Ab Ghani (1993) [26 runs], May (1993) [46 runs], and Montes et al. (2020) [54 runs].

El-Zaemey (1991) looked into the deposition of sediment in sewer networks, addressing how the characteristics of sediments change over time and the circumstances that lead to sediment cementation, particularly in dry weather flow. The results of experiments conducted in a circular channel demonstrated the considerable dependence of flow characteristics on bed roughness, thickness, and depth. These experiments also produced novel techniques for predicting sediment transport and flow friction. These discoveries have aided in the design of non-deposition sewers and resolved sediment building problems in existing systems.

Perrusquia (1991) looked at bedload transportation with permanently deposited sediment beds in partially filled concrete pipes. A lot of important things were found, like how to correctly predict flow resistance using Engelund-Hansen and van Rijn models, how to find critical shear stress using Shields' diagram, and how to find vertical velocity distribution using Shields' diagram. Furthermore, Perrusquia (1991) presented a novel relationship for predicting sediment transport rates that is effective for specific pipe diameters, sand sizes, pipe slopes, and sediment thicknesses, but requires further research for broader application.

Despite its self-cleaning architecture, even with low flows, silt can still accumulate in sewers. Ab Ghani (1993)'s analysis of the available data on sediment transportation in both clean and sediment-filled pipes showed that the current design is insufficient for pipes bigger than 300 mm. Larger sewers should allow for the best possible sediment deposits, although sewers with a diameter of up to 1.0 m may be self-cleaning, according to new formulas and design charts.

May (1993) focused on flow resistance and sediment transport rates when presenting laboratory data on non-cohesive sediment movement in sewer pipes. Tests utilizing a 450-mm concrete pipe and diverse sand grades determined the necessary flow parameters to prevent sediment deposition. May (1993) created new equations and theoretical models for bedload movement and flow resistance to enhance predictions for sediment transport under various flow conditions. These models demonstrated that rough pipes transport at a lower rate than smooth pipes.

The creation and compilation of many existing models to predict sediment flow in sewers has limited their applicability to pipes with a diameter of less than 500 mm. To address this, Montes et al. (2020) used experimental data from a 595 mm pipe at the University of Los Andes to create two new self-cleaning models. Both small and large sewer pipes, with or without deposited bed conditions, can use these models, which demonstrate better prediction precision compared with previous models.

Table 1 depicts all the primary statistical values of the variables utilized in this investigation. Furthermore, empirical Equation (6) suggested by Ab Ghani (1993) and empirical Equation (7) Nalluri et al. (1997) were used in the prediction of value under NDB condition.

Table 1

Details about the dataset that was used in this study in terms of statistics

Dimensionless groupMinimumMaximumStandard deviationMean
 11.81 209.777 65.981 73.355 
 0.005 0.233 0.05 0.057 
 0.02 0.001 0.002 
 0.001 0.4 0.113 0.213 
 0.003 0.065 0.01 0.027 
 1.263 17.453 2.651 4.424 
Dimensionless groupMinimumMaximumStandard deviationMean
 11.81 209.777 65.981 73.355 
 0.005 0.233 0.05 0.057 
 0.02 0.001 0.002 
 0.001 0.4 0.113 0.213 
 0.003 0.065 0.01 0.027 
 1.263 17.453 2.651 4.424 

Data split

The 10-fold cross-validation methodologies were explored for the training and testing approaches. They are variations on the k-fold cross-validation method, which involves dividing the total dataset into k divisions or subsamples. The proposed model is trained on (k − 1) subsamples and evaluated on the remaining subsample. This technique is done k times to ensure that each subsample is only utilized once as testing data. In this article, we look at how the proposed models perform when k = 10. Finally, we evaluate the accuracy by all metrics and use all subsamples that came from 10-fold cross-validation procedures, as illustrated in Figure 1.

Input combination (IC)

Five distinct input combinations (ICs) are developed and examined as described in Table 2 to select the best IC according to the CC value. Thus, out of all the input variables, has the highest CC value. So, in order to predict the value, we have taken into consideration as the first input variable (i.e., IC A). Here, it is assumed that the prediction will be more accurate because of the higher CC value between the input and output variables. Due to the fact that has the next highest CC value, we have added it into the previous input (i.e., IC B). The CC value of Svc and is equal, as depicted in Figure 2. Therefore, as indicated in Table 2, we added both variables to the second input one at a time (i.e., IC C and IC D). For the last IC, we added which has the lowest CC value, along with all input variables (i.e., IC E) to predict the value.
Table 2

Input combinations according to the CC values of input variables and output variables, along with the CC values of results

ModelsIC #Input variablesOutput variableCC
BA-M5P   0.882 
  0.890 
  0.932 
  0.938 
  0.944 
M5P   0.836 
  0.838 
  0.880 
  0.898 
  0.936 
ModelsIC #Input variablesOutput variableCC
BA-M5P   0.882 
  0.890 
  0.932 
  0.938 
  0.944 
M5P   0.836 
  0.838 
  0.880 
  0.898 
  0.936 

Bold values indicate the best performance.

Figure 2

Heat map of CC value of all used input variables with respect to output variable Fr.

Figure 2

Heat map of CC value of all used input variables with respect to output variable Fr.

Close modal

Model description

M5P: Here the standalone model used is M5P. M5P is a decision tree learner for regression problems that uses linear regression functions that can generate continuous numerical characteristics are the last nodes (leaves). The M5 tree model can handle tasks with extremely high dimensionality and works with continuous class problems rather than discrete class problems. It presents information in pieces about each linear model that was built to roughly represent the non-linear interactions in the dataset. Conclusions are drawn using models that take into account the tree partitioning technique. Two steps are needed to produce a tree model. A decision tree is created in the first stage by using a divergence metric. The behaved class values that reach a node when the quantification of the error and the expected decrease in error due to the testing of every attribute at that node are calculated and serve as the branching criterion for the M5P tree model (Quinlan 1992). Recent research has used M5P models to estimate dissolved oxygen levels and suspended sediment loads (Heddam & Kisi 2018; Khosravi et al. 2018).

One of the primary advantages of M5P models is that they provide interpretable results due to their tree structure, which aids in understanding the influence of various input variables. They also efficiently handle both continuous and categorical data, making them versatile for different datasets. Additionally, M5P can capture non-linear relationships through piecewise linear functions, enhancing predictive accuracy.

However, M5P models can become complex and prone to overfitting if not properly pruned, leading to reduced generalizability on unseen data. The training process can be computationally intensive for large datasets, impacting scalability. Furthermore, the performance of M5P models heavily relies on the quality and quantity of input data, which may require extensive preprocessing. Despite these challenges, when properly managed, M5P remains a better-performing ML model for the particle Froude number prediction.

Bagging: Bagging or bootstrap aggregation is a popular ensemble learning methodology that is used to lower variance in noisy datasets. Data from a training set is randomly replaced and sampled during bagging, providing for many options of the same data points. The base regressor is the weak learner, which is independently trained based on the task and after the formation of several data samples. It finds the average or majority of the predictions made by the base estimator for classification or regression, which results in a more accurate estimate. As per the study by Breiman (1996), bagging is carried out in three major steps, namely: (1) Bootstrapping, (2) Parallel training, and (3) Aggregation. Bauer & Kohavi (1999) further explained in their study that the BA algorithm carries out as follows:

  • Picking data from the main training dataset at random and independently. This process is repeated multiple times to produce a predetermined number of sub-datasets.

  • Choosing the fundamental learning model to use every sub-dataset and develop the sequence of predictive function.

  • Conducting a vote to determine the outcome and selecting the final result based on the highest number of votes.

For both non-deposition and deposition boundary conditions, Ab Ghani (1993) offered equations. The studies are performed in a channel with a 450 mm diameter and various pipe diameters and bed thicknesses up to 23% under deposition boundary conditions. In addition, the information gathered from Alvarez-Hernandez (1990), Perrusquia (1992), and May (1993).

One of the primary benefits of bagging is its ability to reduce variance and prevent overfitting by combining predictions from multiple models, resulting in more robust and stable outputs. It works particularly well with high-variance models, such as decision trees, improving overall performance. Furthermore, bagging is typically simple to implement and parallelize, which speeds up the training process.

However, one disadvantage is that bagging can be computationally expensive and memory-intensive because it requires training multiple models on different subsets of data. Furthermore, bagging reduces variance but does not address bias, so it may not improve the performance of inherently biased models. The technique additionally requires a large amount of data to ensure that each subset is representative of the overall distribution, which may be a limitation in datasets with fewer samples. Despite these challenges, bagging remains a better-performing ML model for predicting particle Froude numbers.

Proposed Bagging-M5P (BA-M5P): The Bagging-M5P model is an ensemble learning technique that combines the predictive power of multiple M5P models to enhance overall prediction accuracy and robustness. It involves training multiple M5P models on different bootstrapped subsets of the original dataset. Each iteration creates diverse training sets for each individual model by sampling a random subset of the training data with replacement. This diversity helps to capture different aspects of the underlying data distribution and reduces the risk of overfitting.

Once trained, the individual M5P models make predictions on unseen data points. We then aggregate the predictions of all the individual models, typically through averaging or weighted averaging, to obtain the final prediction. This approach effectively utilizes the combined knowledge of multiple models, resulting in enhanced prediction performance and generalization capability. In addition, Bagging-M5P is highly versatile in handling various types of data, making it suitable for a wide range of prediction tasks across different domains.

Ab Ghani (1993) also used the data from their own studies to create a model. This model is stated in Equation (6).
(6)
where the ratio of the hydraulic radius and median sediment size is denoted as .
Nalluri et al. (1997) investigated the available information in the literature and proposed Equation (7) using the datasets from El-Zaemey (1991) and Alvarez-Hernandez (1990).
(7)

Model evaluation criteria

Model evaluation criteria enable the comparison of different models on the same dataset, which is useful when selecting the best model for a particular problem. Evaluation criteria can highlight the strengths and weaknesses of a model, informing further development and improvements. It provides a way to communicate the performance of a model to others, such as colleagues or stakeholders, through the use of objective metrics. Overall, model evaluation criteria are crucial for assessing the effectiveness and reliability of a model. In this case, we used MSE, MAE, NSE, and CC static metrics with ideal values 0, 0, 1, and −1 or 1, respectively, which help to find the best model. These static metrics are calculated as follows.

  • Mean squared error (MSE):
    (8)
  • Mean absolute error (MAE):
    (9)
  • Nash–Sutcliffe efficiency (NSE):
    (10)
  • Correlation coefficient (CC):
    (11)

Here ith actual value of the particle Froude number is represented by , is the mean of the actual value of the particle Froude number, is the ith predicted value of the particle Froude number, and is the mean of the predicted value of the particle Froude number by the proposed models.

Sensitivity analysis

Sensitivity analysis is carried out to find the most sensitive input variable that affects the performance of the proposed model. In this analysis, we remove one input variable at a time and evaluate the performance of the model. Next, we choose the next input variable to remove. This process is repeated until all input variables have been removed once. Therefore, for a k input, we evaluate the model performance k times, where only one input is removed at one step. In this study, we first remove the input variable that is least correlated with the output variable, then remove the next least correlated variable in the next iteration, and finally, remove the most correlated variable in the final iteration. We evaluate the model's performance by comparing metric values and finding the most sensitive input variable.

An in-depth study of the outcomes of the proposed models is provided in this section. In order to comparison of the proposed model's accuracy with current empirical equations based on performance metrics, we first identify the best input variable combinations and then evaluate the accuracy of each model. In this study, the most sensitive input variable that significantly affects the model's performance must be identified.

Best input combination

By using the CC value among the input and output variables, we have a total of five different ICs (i.e., IC A, IC B,…, IC E). We compare the CC value between the actual and predicted values by employing various ICs with the proposed models. We find that the most accurate predictions are obtained when all variables are combined in input combination IC E, as shown in Table 2. Among all the models, the BA-M5P model produces the most accurate predictions of the value.

Model analysis and hyperparameter tuning

Here, we describe the hyperparameter tuning of various models used.

M5Prime: Final value of parameter after tuning M5P model: (a) The batch size of 100 is used; (b) Set False to unsmoothed; (c) Unpruned should be False; and (d) There must be at least four instances.

Bagging: The following operators were used for BA. Again, these numbers were calculated using the trial-and-error approach. (a) Seed value – 7; (b) The batch size –100; (c) Each bag size – 100; (d) Iterations no. – 10; (e) The base classifier used – M5P; (f) Weights used for instances – False.

Model performance

Table 3 displays the specific outcomes of the performance measures for value prediction. We took into account the CC metric to consistently explain the findings because we could quickly demonstrate that the model truly improves model fit. As a result, we could contrast the models. The CC values revealed that BA-M5P was able to the most accurately predict (0.944), followed by another proposed ML model M5P (0.936), state-of-the-art ML model REPT (0.939), MGGP (0.856), ELM (0.785), SVM (0.719), and empirical equation of Ab Ghani (1993) (0.856), and Nalluri et al. (1997) (0.733), when all input variables, i.e., IC E, were taken in the prediction of value.

Table 3

Metrics result of this study (decreasing order of CC value)

MSEMAENSECCBA-M5P
% Improvement
wrt to CC
Proposed
ML models 
BA-M5P 0.770 0.551 0.890 0.944 – 
M5P 0.870 0.573 0.876 0.936 0.838 
State-of-the-art
ML models 
REPT 0.833 0.634 0.881 0.939 0.540 
MGGP 2.079 0.925 0.703 0.856 9.310 
ELM 2.691 1.265 0.616 0.785 16.786 
SVM 3.531 1.211 0.496 0.719 23.850 
Empirical
equations 
Ab Ghani (1993)  5.537 2.017 0.210 0.856 9.326 
Nalluri et al. (1997)  7.158 2.005 −0.021 0.733 22.313 
MSEMAENSECCBA-M5P
% Improvement
wrt to CC
Proposed
ML models 
BA-M5P 0.770 0.551 0.890 0.944 – 
M5P 0.870 0.573 0.876 0.936 0.838 
State-of-the-art
ML models 
REPT 0.833 0.634 0.881 0.939 0.540 
MGGP 2.079 0.925 0.703 0.856 9.310 
ELM 2.691 1.265 0.616 0.785 16.786 
SVM 3.531 1.211 0.496 0.719 23.850 
Empirical
equations 
Ab Ghani (1993)  5.537 2.017 0.210 0.856 9.326 
Nalluri et al. (1997)  7.158 2.005 −0.021 0.733 22.313 

Bold values indicate the best performance. For benchmarking purposes, we have also compared the improvement obtained using the proposed BA-M5P model with the state-of-the-art ML models and empirical equations.

For a visual comparison of the models in this study, Figure 3 presents the predictive abilities of the proposed models alongside the empirical equations through the line and scatter plots of actual versus predicted values of the value. The scatter plots for the BA-M5P model show a regression line at the midpoint of both axes, indicating that the model accurately predicted the value. In contrast, the empirical equations of Ab Ghani (1993) and Nalluri et al. (1997) show a regression line far from the midpoint. According to Figure 3, the BA-M5P model outperformed the M5P model and the empirical equations.
Figure 3

Line and scatter plot of actual vs. predicted Fr by proposed ML models and empirical equations.

Figure 3

Line and scatter plot of actual vs. predicted Fr by proposed ML models and empirical equations.

Close modal
The box-and-whisker graphic displays actual and predicted values using both models and empirical equations. Figure 4 shows that the ML models quite well duplicated the first and second quartile values. The plot corresponding to the BA-M5P model in the ensemble had the most similar shape to the actual values. All quartile lines are not aligned with the quartiles of the empirical equations, indicating that the empirical equations are not well predicting the value. Therefore, the BA-M5P model outperforms the others.
Figure 4

Box-and-whisker plot used for visual comparison of the Fr value of this study.

Figure 4

Box-and-whisker plot used for visual comparison of the Fr value of this study.

Close modal
Taylor plots offer an advantage by incorporating two widely utilized standard deviations (SD) and CC (Taylor 2001). When comparing models, if the CC and SD values of a predicted value correlate closely to these values of the actual value, it indicates that the model has a higher predictive accuracy. BA-M5P exhibits the highest similarity in terms of CC and SD values compared with the actual values, indicating superior performance compared with other models, whereas Nalluri et al. (1997) performs the worst (Figure 5).
Figure 5

Comparison of results with the Taylor diagram.

Figure 5

Comparison of results with the Taylor diagram.

Close modal

Sensitivity analysis

The outcome of the sensitivity analysis is displayed in Table 4. That shows the metrics result of the different ICs where we remove each input variable (first from lowest CC to highest) at a time from the best IC (i.e., IC E). To observe sensitive analysis results, we choose the best proposed ML models (i.e., BA-M5P) to use training and testing with each IC of sensitive analysis by predicting the value. After comparing metrics, we discovered that is the most important variable compared with others, i.e., MSE = 1.156 and MAE = 0.694. When we remove from the IC, the prediction performance of the model gives the worst result, indicating that is the most sensitive variable and must be considered to predict the value.

Table 4

Sensitivity analysis of all input variables on the prediction of particle Froude number

IC #Input variablesRemovedMSEMAE
IC 1   0.842 0.559 
IC 2   0.851 0.571 
IC 3   1.156 0.694 
IC 4   0.909 0.660 
IC 5   1.003 0.622 
IC 6  – 0.801 0.547 
IC #Input variablesRemovedMSEMAE
IC 1   0.842 0.559 
IC 2   0.851 0.571 
IC 3   1.156 0.694 
IC 4   0.909 0.660 
IC 5   1.003 0.622 
IC 6  – 0.801 0.547 

On removal of Svc from the input combination, the prediction performance of the model gives the worst result, indicating that Svc is the most sensitive variable (given in bold).

Discussion

Self-cleaning sewer systems are designed to maintain the flow of wastewater through a sewer system by removing debris that can block pipes and cause backups. This is important because blockages in the sewer system can cause wastewater to overflow, leading to environmental contamination and public health hazards. In addition, the cost of manually cleaning blockages in the sewer system can be high, so a self-cleaning system can help to reduce maintenance costs. The value is a dimensionless number that is used to compare the inertial forces to the gravitational forces acting on an object in a fluid flow. In the context of a self-cleaning sewer system, the value can be used to predict the movement of solid particles in the wastewater flow. A high value indicates that the inertial forces acting on the particles are greater than the gravitational forces, which means that the particles will tend to move with the flow of the wastewater. On the other hand, a low value indicates that the gravitational forces acting on the particles are greater than the inertial forces, which means that the particles will tend to settle out of the flow. By understanding the value in a self-cleaning sewer system, engineers can design the system to optimize the flow of wastewater and remove debris effectively.

For prediction of the value, we used median sediment size (Sm), hydraulic radius (Rh), volumetric concentration of sediment (Svc), the diameter of pipe (dp), friction factor of channel (fc), and dimensional grain size of particles (Dgr) in the condition of NDB in this study. In all, we proposed two ML models: BA-M5P and M5P. These proposed models are compared with two empirical equations proposed by Ab Ghani (1993) and Nalluri et al. (1997). The performance of the models employed in this study was examined using four performance metrics: MSE, MAE, NSE, and CC. All models utilized in this study gave better results when all variables were combined, as was discovered.

It appears that the study evaluated different combinations of input variables for a regression model and compared their performance using various measures. The highest prediction power was achieved when all input variables were included in the model, and the best performing model was the BA-M5P model. We use various performance measures to compare the accuracy of the ML models used in this study to predict values. The best-performing model was found to be the BA-M5P model in comparison with others, as shown in Table 3. It is also shown in terms of accuracy, as demonstrated by the close match between the actual and predicted values in the line scatter plots and the box-and-whisker plot. The Taylor diagram illustrates the resemblance between the actual and predicted values by comparing the similarity between CC and SD values.

By performing sensitivity analysis on the best model to determine the impact of different input variables on the prediction of a dependent variable . The sensitivity analysis involved removing one input variable at a time from the IC E and evaluating the model's performance using MSE and MAE metrics. The results showed that the variable was the most sensitive, meaning that the model's prediction performance was the most affected when was removed. This suggests that is an important variable to consider when predicting the value.

Limitations

Prediction of the value under NDB conditions is a complex phenomenon involving independent variables such as mean velocity of non-deposition flow (Un), density , sediment density , sediment relative density (S), acceleration due to gravity (g), median sediment size (Sm), kinematic viscosity , hydraulic radius (Rh), volumetric concentration of sediment (Svc), the thickness of sediment deposited (ts), the diameter of pipe (dp), and friction factor of channel (fc). Other parameters, such as the bed shear stress, the bed form of the channel, the channel discharge, and so on, have the ability to influence the prediction of the Fr value. Due to the absence of these parameters in our dataset, the models that we provided were unable to take into consideration the impact that they had. In addition, the variable range plays a significant role in the training of the model. There are cases in which the ranges of input variables may surpass the values we have accounted for despite the fact that our dataset is derived from a variety of field and laboratory investigations. Under either of these two situations, there is a possibility that the proposed model will demonstrate poor performance. These difficulties are encountered by the majority of ML models, as the efficacy of these models is largely dependent on the training dataset and the characteristics of the dataset.

In this investigation, we assess how well BA-M5P and M5P predict the value in the sewage system. The following variables are utilized to make predictions: dimensional grain size of particles (Dgr), median sediment size (Sm), hydraulic radius (Rh), volumetric concentration of sediment (Svc), the thickness of sediment deposited (ts), the diameter of pipe (dp), and friction factor of channel (fc). In this study, we evaluate the performance of the proposed ensemble and standalone ML models in comparison with the empirical equations that come from theoretical considerations. We also compared different ICs to find the best ICs and the most sensitive variable. When compared with empirical equations and standalone ML models, ensemble ML models perform better with each IC. In this work, BA-M5P, an ensemble ML model, outperforms M5P and empirical equations, which are standalone ML models. Independent ML models outperform empirical equations. Out of all the models employed in this study, the BA-M5P (MSE = 0.770, MAE = 0.551, NSE = 0.890, and CC = 0.944) is found to be the most accurate model for the prediction of values in a sewage system. We find that the most accurate predictions are obtained when all variables are combined in input combination IC E . Among these variables, is the most sensitive variable observed in sensitive analysis.

S.K.: Formal analysis, Writing – original draft, Data curation, Data processing and analysis, Original draft preparation, and Writing; B.K.: Reviewing and Editing; M.A.: Research profile design, Writing, Reviewing, and Revising; V.D.: Research profile design, Reviewing, and Revising; U.R.: Reviewing and Revising, Supervision.

This research received no external funding.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Ab Ghani
A.
1993
Sediment Transport in Sewers
.
Newcastle University, Newcastle Upon Tyne, UK
.
Alvarez-Hernandez
E. M.
1990
The Influence of Cohesion on Sediment Movement in Channels of Circular Cross-Section
.
Newcastle University, Newcastle Upon Tyne, UK
.
Arya Azar
N.
,
Ghordoyee Milan
S.
&
Kardan
N.
2021
Development of a hybrid ANN-evolutionary algorithms models to predict the Froude number in open channel flows in modeling of sediment transport
.
Environment and Water Engineering
7
(
1
),
73
87
.
Breiman
L.
1996
Bagging predictors
.
Machine Learning
24
,
123
140
.
Danandeh Mehr
A.
&
Safari
M. J. S.
2020
Application of soft computing techniques for particle Froude number estimation in sewer pipes
.
Journal of Pipeline Systems Engineering and Practice
11
(
2
),
4020002
.
Ebtehaj
I.
,
Bonakdari
H.
&
Shamshirband
S.
2016
Extreme learning machine assessment for estimating sediment transport in open channels
.
Engineering with Computers
32
(
4
),
691
704
.
El-Zaemey
A. K. S.
1991
Sediment Transport Over Deposited Beds in Sewers
.
Newcastle University, Newcastle Upon Tyne, UK
.
Kargar
K.
,
Safari
M. J. S.
,
Mohammadi
M.
&
Samadianfard
S.
2019
Sediment transport modeling in open channels using neuro-fuzzy and gene expression programming techniques
.
Water Science and Technology
79
(
12
),
2318
2327
.
Khosravi
K.
,
Cooper
J. R.
,
Daggupati
P.
,
Pham
B. T.
&
Bui
D. T.
2020
Bedload transport rate prediction: Application of novel hybrid data mining techniques
.
Journal of Hydrology
585
,
124774
.
Kumar
S.
,
Kirar
B.
,
Agarwal
M.
&
Deshpande
V.
2022
Application of novel hybrid machine learning techniques for particle Froude number estimation in sewer pipes
.
Natural Hazards
116
,
1
20
.
Kumar
S.
,
Agarwal
M.
&
Deshpande
V.
2023a
Estimation of particle Froude number in deposited bed condition using hybrid machine learning models
. In:
International Conference on Advances in Data-Driven Computing and Intelligent Systems
, pp.
193
205
.
Kumar
S.
,
Agarwal
M.
&
Deshpande
V.
2023b
Radial basis function regression (RBFR), ARRBFR models for estimation of particle Froude number in sewer pipes under deposited conditions
. In:
2023 6th International Conference on Information Systems and Computer Networks (ISCON)
, pp.
1
6
.
Liu
S.
,
Yang
J.
,
Lyu
H.
,
Sun
P.
&
Zhang
B.
2024
Experimental and numerical investigation of the effect of deep-sea mining vehicles on the discharge plumes
.
Physics of Fluids
36
(
3
),
033358
.
Mashford
J.
,
Marlow
D.
,
Tran
D.
&
May
R.
2011
Prediction of sewer condition grade using support vector machines
.
Journal of Computing in Civil Engineering
25
(
4
),
283
290
.
May
R. W. P.
1993
Sediment Transport in Pipes and Sewers with Deposited Beds (Report SR 320)
.
HR Wallingford
,
Wallingford
.
Montes
C.
,
Vanegas
S.
,
Kapelan
Z.
,
Berardi
L.
&
Saldarriaga
J.
2020
Non-deposition self-cleansing models for large sewer pipes
.
Water Science and Technology
81
(
3
),
606
621
.
Nalluri
C.
,
Ghani
A. A.
&
El-Zaemey
A. K. S.
1994
Sediment transport over deposited beds in sewers
.
Water Science and Technology
29
(
1–2
),
125
133
.
Nalluri
C.
,
El-Zaemey
A. K.
&
Chan
H. L.
1997
Sediment transport over fixed deposited beds in sewers – An appraisal of existing models
.
Water Science and Technology
36
(
8–9
),
123
128
.
Ota
J. J.
&
Nalluri
C.
2003
Urban storm sewer design: approach in consideration of sediments
.
Journal of Hydraulic Engineering
129
(
4
),
291
297
.
Peacock
T.
&
Ouillon
R.
2023
The fluid mechanics of deep-sea mining
.
Annual Review of Fluid Mechanics
55
,
403
430
.
Perrusquia
G.
1991
Bedload Transport in Storm Sewers. Stream Traction in Pipe Channels
.
Chalmers University of Technology, Goteborg, Sweden
.
Pham
B. T.
,
Tien Bui
D.
&
Prakash
I.
2018
Bagging based support vector machines for spatial prediction of landslides
.
Environmental Earth Sciences
77
(
4
),
1
17
.
Pham
B. T.
,
Prakash
I.
,
Khosravi
K.
,
Chapi
K.
,
Trinh
P. T.
,
Ngo
T. Q.
,
Hosseini
S. V.
&
Bui
D. T.
2019
A comparison of support vector machines and Bayesian algorithms for landslide susceptibility modelling
.
Geocarto International
34
(
13
),
1385
1407
.
Quinlan
J. R.
1992
Learning with continuous classes
. In
5th Australian Joint Conference on Artificial Intelligence
, Vol.
92
, pp.
343
348
.
Safari
M. J. S.
,
Aksoy
H.
,
Unal
N. E.
&
Mohammadi
M.
2017
Experimental analysis of sediment incipient motion in rigid boundary open channels
.
Environmental Fluid Mechanics
17
(
6
),
1281
1298
.
Safari
M. J. S.
,
Mohammadi
M.
&
Ab Ghani
A.
2018
Experimental studies of self-cleansing drainage system design: A review
.
Journal of Pipeline Systems Engineering and Practice
9
(
4
),
4018017
.
https://doi.org/10.1061/(asce)ps.1949-1204.0000335
.
Scott Winton
R.
,
Calamita
E.
&
Wehrli
B.
2019
Reviews and syntheses: Dams, water quality and tropical reservoir stratification
.
Biogeosciences
16
(
8
),
1657
1671
.
https://doi.org/10.5194/bg-16-1657-2019
.
Shakya
D.
,
Agarwal
M.
,
Deshpande
V.
&
Kumar
B.
2022a
Estimating particle Froude number of sewer pipes by boosting machine-learning models
.
Journal of Pipeline Systems Engineering and Practice
13
(
2
),
4022012
.
Shakya
D.
,
Deshpande
V.
,
Agarwal
M.
&
Kumar
B.
2022b
Standalone and ensemble-based machine learning techniques for particle Froude number prediction in a sewer system
.
Neural Computing and Applications.
https://doi.org/10.1007/s00521-022-07237-x
.
Taylor
K. E.
2001
Summarizing multiple aspects of model performance in a single diagram
.
Journal of Geophysical Research: Atmospheres
106
(
D7
),
7183
7192
.
Yang
M.-D.
&
Su
T.-C.
2008
Automated diagnosis of sewer pipe defects based on machine learning approaches
.
Expert Systems with Applications
35
(
3
),
1327
1337
.
Yosefvand
F.
,
Shabanlo
S.
&
Izadbakhsh
M. A.
2019
Prediction of Froude number of three phases flow in sewer systems using extreme learning machines
.
Journal of Water and Wastewater; Ab va Fazilab (in Persian)
30
(
5
),
121
126
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).