ABSTRACT
Sewer systems are usually built with a self-cleaning system that keeps the bottom of the channel free of sediment to lessen the effects of the constant buildup of sediment particles. Because of this, it is important to accurately predict the particle Froude number (Fr) when making sewer systems. For the prediction of Fr, five different sets of input variables were looked at. For the training and testing of the machine learning (ML) model, we used 10-fold cross-validation methodologies to prevent overfitting. M5Prime (M5P) model as a standalone and Bagging-M5P as a hybrid model were utilized, and the results were compared with the empirical equations proposed in the literature. Models perform best when all input variables are used for training and testing of models. The hybrid BA-M5P model performed better than the M5P model and empirical equations. We performed sensitivity analysis and compared the result based on MAE and MSE value, and we found sediment concentration (Svc) is the most important variable to predict the particle Froude number under non-deposition with deposited bed by best performing model BA-M5P. Hence, for the self-cleaning system, we prefer the BA-M5P ML model with Svc the most required variable.
HIGHLIGHTS
A novel prediction model was developed to estimate the particle Froude number in a sewage system.
For the first time, the results of empirical equations were compared in this analysis.
Different performance metrics, including MSE, MAE, NSE, and CC were employed to illustrate the effectiveness of the proposed models.
The sensitivity analysis was used to find out the most sensitive input variable in prediction.
NOMENCLATURE
dimensional grain size of particles
mean velocity of non-deposition flow
density
sediment density
sediment relative density
acceleration due to gravity
median sediment size
kinematic viscosity
hydraulic radius
volumetric concentration of sediment
deposited bed width
the thickness of deposited bed
the thickness of sediment deposited
the diameter of pipe
friction factor of channel
INTRODUCTION
Sediment buildup is a major technical and financial challenge when designing irrigation channels, urban drainage networks, rigid boundary channels, and sewer systems. In designing rigid boundary channels, the use of self-cleaning criteria is vital to prevent continuous sediment deposition (Kargar et al. 2019). A major part of building smooth bed channels, fundamental in environmental engineering, requires managing the sediment movement process (Safari et al. 2017). Based on the aforementioned issues, channels are created using the self-cleaning principle. According to the self-cleansing definition, flow can remove immobile deposited particles of the channel bed and convey sediments without permanently depositing them (Safari et al. 2018).
Sewers are now built using the self-cleaning principle, where sediments are anticipated to travel continuously without depositing. However, because of the intermittent nature of the flow, particle deposition in sewers could still happen, particularly at low flows like those that occur during the receding flow or during dry weathers. This applies to both rigid and loose boundary conditions (Ab Ghani 1993).
The non-deposition bed sediment transport condition reduces the required design velocity due to the thin layer of deposited sediments at the channel bottom. The channel design significantly depends on the channel size, and the self-cleansing velocity relies on the channel design. In order to build larger channels, it is necessary to have an improved design self-cleansing velocity. Therefore, designing these channels using a non-deposition with clean bed (NCB) is not an affordable procedure since it requires a steeper channel bed slope (Nalluri et al. 1997; Ota & Nalluri 2003).
The hydraulic parameters of the flow in channels with circular cross-sections and various sediment (rigid) bed thicknesses, as well as the impact of bed thickness and roughness on sediment transport capacity, have been thoroughly investigated (El-Zaemey 1991; Nalluri et al. 1994). Yang & Su (2008) employed image processing methods to describe the textures of the pipe faults, which included the wavelet transform and the generation of co-occurrence matrices. Then, to identify pipe defect patterns, back-propagation neural networks, radial basis networks, and support vector machines (SVMs) were used. Their performances were compared and analyzed. The outcome demonstrates that the diagnostic accuracy obtained using SVM, which is 60%, is the best.
Asset management requires regular evaluation of the state of the sewage network. Only a sizable portion of sewage systems are evaluated because of expensive inspection fees and scarce financing. Mashford et al. (2011) looked at the application of SVM models to sewer condition prediction. The results of the model testing demonstrated the SVM's success in making accurate predictions. To predict the of three phase flow within sewage channels, Yosefvand et al. (2019) proposed the extreme learning machine (ELM) model and compared its results to those of artificial neural network (ANN) and SVM models. The results presented that the ELM model simulated the aim function with the highest accuracy.
The prediction of the particle Froude number is crucial for numerous applications in fluid dynamics, sediment transport, and various engineering fields. This dimensionless parameter, defined as , where Un is the particle velocity, g is the gravitational acceleration, Sm is the particle diameter, and S is the specific gravity, provides essential insights into the movement and behavior of particles in a fluid medium. Precise forecasting in hydraulic engineering facilitates the analysis of sedimentation and scouring phenomena in the vicinity of structures such as dams and spillways, thereby guaranteeing their stability and durability (Danandeh Mehr & Safari 2020; Shakya et al. 2022a).
In the field of environmental engineering, a thorough comprehension of the particle Froude number is crucial for effectively controlling the movement of sediment and the spread of pollutants in water bodies. This knowledge is essential for safeguarding the health and stability of aquatic ecosystems. This parameter is crucial in civil engineering projects, particularly those that include construction in close proximity to water, as it helps to limit any negative impacts on sediment dynamics. The field of coastal and marine engineering uses these forecasts to enhance the effectiveness of beach nourishment and coastal defence projects (Safari & Mehr 2018; Scott Winton et al. 2019).
In addition, mining and dredging activities utilize the particle Froude number to optimize sediment extraction and relocation procedures, thereby minimizing environmental consequences. Essentially, in the field of fluid mechanics research, this parameter plays a crucial role in describing flow patterns and the interactions between particles and fluids, hence improving our theoretical and practical comprehension of intricate fluid systems. Hence, accurately forecasting the particle Froude number is crucial for developing efficient engineering solutions, maximizing operational effectiveness, and ensuring the protection of the environment and structural stability (Peacock & Ouillon 2023; Liu et al. 2024).
In the non-deposition situation of sediment movement in rigid boundary channels, Kargar et al. (2019) have used neuro-fuzzy (NF) and gene expression programming (GEP) to estimate the particle Froude number. They discovered that, when compared with the GEP model and empirical equations, the NF model yields the best results. Studies about hybrid machine learning (ML) models to predict the Froude number have been carried out to check and prove the effectiveness and accuracy of the hybrid models over experimental equations.
The application of hybrid models has been of great use when it comes to the prediction of the transportation rate for deposited beds due to improved predictability and increased accuracy. According to recent research, the Bagging (BA) model has been used to improve a wide range of base learners, such as trees (Mert et al. 2014), naive Bayes trees (Pham et al. 2018), and SVM (Pham et al. 2019). In the study conducted by Khosravi et al. (2020), the BA model was employed to train the M5P, random forest (RF), regression tree, and reduced error pruning tree (REPT) base learners for the prediction of bedload transport rate.
Danandeh Mehr & Safari (2020) examined the efficacy of three soft computing techniques, such as multilayer perceptron, multigene genetic programming (MGGP), and GEP in forecasting sedimentation in sewage networks under the non-deposition with a deposited bed self-cleansing condition. The outcomes demonstrated that in terms of statistical performance metrics, the MGGP models outperformed the others.
In a study Arya Azar et al. (2021), ANN was used to predict the Froude number using as the input variables and results were improved by evolutionary algorithms such as particle swarm algorithm, firefly algorithm, and differential evolution algorithm (DE). It was found that the use of evolutionary algorithms increased the accuracy of the ANN model and concluded that these models must be employed to forecast the Froude number instead of using experimental equations. Ebtehaj et al. (2016) developed a model by combining a feed-forward neural network with an extreme learning machine (FFNN-ELM), and they discovered that it outperformed previous empirical models.
For the purpose of predicting value, Shakya et al. (2022a, 2022b) employed boosting models such as the lightGBM regressor, catboost regressor, gradient boosting regressor, and adaboost regressor. Kumar et al. (2022) used a variety of hybrid algorithms in the prediction of value under non-deposition with deposited bed (NDB) conditions, including Kstar, AR-Kstar, RF, and AR-RF. They discovered that boosting and hybrid algorithms predicted value better than standalone approaches. Kumar et al. (2023a, 2023b, 2024) also used the hybrid ML model to improve the accuracy of the standalone ML model proposed to predict particle Froude numbers under NCB and NDB conditions.
Research Gaps: Predicting the particle Froude number is critical for various applications in fluid dynamics, sediment transport, and engineering. Traditional studies often rely on limited datasets, typically two to three, restricting the generalizability and robustness of their models. Objectives: This research proposes to enhance prediction accuracy by integrating diverse datasets and employing advanced ML techniques. The primary objective is to develop a comprehensive and adaptable predictive module by compiling and ensembling five datasets with varying diameters and sediment sizes. Specifically, datasets from El-Zaemey (1991), Perrusquia (1991), Ab Ghani (1993), May (1993), and Montes et al. (2020) are being used, encompassing sediment sizes from 0.47 to 8.4 mm and channel diameters from 225 to 595 mm. This research will provide a more generalized and reliable tool for hydraulic engineers, environmental scientists, and researchers, ultimately leading to better-informed decision-making and more effective management of sediment-related challenges. In terms of secondary objectives, we have provided the sensitivity analysis for determining the most sensitive parameter. This provides a better insight for the researchers to determine the importance of each input parameter.
By leveraging this comprehensive dataset, the study will develop and test hybrid ML models, BA-M5P, against standalone models (i.e., M5P) and compare them with models proposed in the literature (i.e., REPT, MGGP, ELM, and SVM). Although conventional ML models or empirical equations can predict the particle Froude number in sewage systems, however, conventional ML models often suffer from limited complexity handling, cannot handle non-linear interactions well, are prone to underfitting and overfitting, and often suffer from scalability issues. On the other hand, hybrid ML models that have been proposed in this study overcome these deficiencies.
Hybrid models combine the predictive power of different algorithms, such as linear regression, random trees, and neural networks, leveraging their strengths. Hybrid approaches can more correctly capture intricate patterns and relationships in the data than individual ML models alone by combining these ML models. By lowering the possibility of bias and variation, this model produces regression predictions which are more accurate and dependable across a range of scenarios and datasets. Hybrid ML models are more resilient to overfitting to sparse or noisy data when different techniques are combined.
The performance of the proposed BA-M5P model will be evaluated based on accuracy, robustness, and adaptability to different environmental scenarios. It is concluded that hybrid ML models outperform standalone models in terms of prediction accuracy and robustness, resulting in a highly reliable predictive module. Our proposed BA-M5P model improves by a factor of 0.5–22% in terms of CC values when compared with the current state-of-the-art models and empirical equations.
This article's contribution is as follows:
In this work, the value in a sewage system condition of NDB was estimated using a standalone M5P model along with a hybrid BA-M5P ML model. The results of BA-M5P, M5P, and empirical equations were compared.
The following independent factors were examined to calculate the particle Froude number (Fr): median sediment size (Sm), the diameter of pipe (dp), hydraulic radius (Rh), the thickness of sediment deposited (ts), volumetric concentration of sediment (Svc), dimensional grain size of particles (Dgr), and friction factor of channel (fc) in the condition of NDB.
Different performance metrics, including MSE, MAE, NSE, and CC, were employed to illustrate the effectiveness of the proposed models.
We used sensitivity analysis to find out the most sensitive input variable to predict the value.
METHODOLOGY
Dependency analysis
Workflow
Dataset
In this work, the particle Froude number in sewage systems was predicted using five datasets for NDB derived from El-Zaemey (1991) [290 runs], Perrusquia (1991) [38 runs], Ab Ghani (1993) [26 runs], May (1993) [46 runs], and Montes et al. (2020) [54 runs].
El-Zaemey (1991) looked into the deposition of sediment in sewer networks, addressing how the characteristics of sediments change over time and the circumstances that lead to sediment cementation, particularly in dry weather flow. The results of experiments conducted in a circular channel demonstrated the considerable dependence of flow characteristics on bed roughness, thickness, and depth. These experiments also produced novel techniques for predicting sediment transport and flow friction. These discoveries have aided in the design of non-deposition sewers and resolved sediment building problems in existing systems.
Perrusquia (1991) looked at bedload transportation with permanently deposited sediment beds in partially filled concrete pipes. A lot of important things were found, like how to correctly predict flow resistance using Engelund-Hansen and van Rijn models, how to find critical shear stress using Shields' diagram, and how to find vertical velocity distribution using Shields' diagram. Furthermore, Perrusquia (1991) presented a novel relationship for predicting sediment transport rates that is effective for specific pipe diameters, sand sizes, pipe slopes, and sediment thicknesses, but requires further research for broader application.
Despite its self-cleaning architecture, even with low flows, silt can still accumulate in sewers. Ab Ghani (1993)'s analysis of the available data on sediment transportation in both clean and sediment-filled pipes showed that the current design is insufficient for pipes bigger than 300 mm. Larger sewers should allow for the best possible sediment deposits, although sewers with a diameter of up to 1.0 m may be self-cleaning, according to new formulas and design charts.
May (1993) focused on flow resistance and sediment transport rates when presenting laboratory data on non-cohesive sediment movement in sewer pipes. Tests utilizing a 450-mm concrete pipe and diverse sand grades determined the necessary flow parameters to prevent sediment deposition. May (1993) created new equations and theoretical models for bedload movement and flow resistance to enhance predictions for sediment transport under various flow conditions. These models demonstrated that rough pipes transport at a lower rate than smooth pipes.
The creation and compilation of many existing models to predict sediment flow in sewers has limited their applicability to pipes with a diameter of less than 500 mm. To address this, Montes et al. (2020) used experimental data from a 595 mm pipe at the University of Los Andes to create two new self-cleaning models. Both small and large sewer pipes, with or without deposited bed conditions, can use these models, which demonstrate better prediction precision compared with previous models.
Table 1 depicts all the primary statistical values of the variables utilized in this investigation. Furthermore, empirical Equation (6) suggested by Ab Ghani (1993) and empirical Equation (7) Nalluri et al. (1997) were used in the prediction of value under NDB condition.
Dimensionless group . | Minimum . | Maximum . | Standard deviation . | Mean . |
---|---|---|---|---|
11.81 | 209.777 | 65.981 | 73.355 | |
0.005 | 0.233 | 0.05 | 0.057 | |
0 | 0.02 | 0.001 | 0.002 | |
0.001 | 0.4 | 0.113 | 0.213 | |
0.003 | 0.065 | 0.01 | 0.027 | |
1.263 | 17.453 | 2.651 | 4.424 |
Dimensionless group . | Minimum . | Maximum . | Standard deviation . | Mean . |
---|---|---|---|---|
11.81 | 209.777 | 65.981 | 73.355 | |
0.005 | 0.233 | 0.05 | 0.057 | |
0 | 0.02 | 0.001 | 0.002 | |
0.001 | 0.4 | 0.113 | 0.213 | |
0.003 | 0.065 | 0.01 | 0.027 | |
1.263 | 17.453 | 2.651 | 4.424 |
Data split
The 10-fold cross-validation methodologies were explored for the training and testing approaches. They are variations on the k-fold cross-validation method, which involves dividing the total dataset into k divisions or subsamples. The proposed model is trained on (k − 1) subsamples and evaluated on the remaining subsample. This technique is done k times to ensure that each subsample is only utilized once as testing data. In this article, we look at how the proposed models perform when k = 10. Finally, we evaluate the accuracy by all metrics and use all subsamples that came from 10-fold cross-validation procedures, as illustrated in Figure 1.
Input combination (IC)
Models . | IC # . | Input variables . | Output variable . | CC . |
---|---|---|---|---|
BA-M5P | A | 0.882 | ||
B | 0.890 | |||
C | 0.932 | |||
D | 0.938 | |||
E | 0.944 | |||
M5P | A | 0.836 | ||
B | 0.838 | |||
C | 0.880 | |||
D | 0.898 | |||
E | 0.936 |
Models . | IC # . | Input variables . | Output variable . | CC . |
---|---|---|---|---|
BA-M5P | A | 0.882 | ||
B | 0.890 | |||
C | 0.932 | |||
D | 0.938 | |||
E | 0.944 | |||
M5P | A | 0.836 | ||
B | 0.838 | |||
C | 0.880 | |||
D | 0.898 | |||
E | 0.936 |
Bold values indicate the best performance.
Model description
M5P: Here the standalone model used is M5P. M5P is a decision tree learner for regression problems that uses linear regression functions that can generate continuous numerical characteristics are the last nodes (leaves). The M5 tree model can handle tasks with extremely high dimensionality and works with continuous class problems rather than discrete class problems. It presents information in pieces about each linear model that was built to roughly represent the non-linear interactions in the dataset. Conclusions are drawn using models that take into account the tree partitioning technique. Two steps are needed to produce a tree model. A decision tree is created in the first stage by using a divergence metric. The behaved class values that reach a node when the quantification of the error and the expected decrease in error due to the testing of every attribute at that node are calculated and serve as the branching criterion for the M5P tree model (Quinlan 1992). Recent research has used M5P models to estimate dissolved oxygen levels and suspended sediment loads (Heddam & Kisi 2018; Khosravi et al. 2018).
One of the primary advantages of M5P models is that they provide interpretable results due to their tree structure, which aids in understanding the influence of various input variables. They also efficiently handle both continuous and categorical data, making them versatile for different datasets. Additionally, M5P can capture non-linear relationships through piecewise linear functions, enhancing predictive accuracy.
However, M5P models can become complex and prone to overfitting if not properly pruned, leading to reduced generalizability on unseen data. The training process can be computationally intensive for large datasets, impacting scalability. Furthermore, the performance of M5P models heavily relies on the quality and quantity of input data, which may require extensive preprocessing. Despite these challenges, when properly managed, M5P remains a better-performing ML model for the particle Froude number prediction.
Bagging: Bagging or bootstrap aggregation is a popular ensemble learning methodology that is used to lower variance in noisy datasets. Data from a training set is randomly replaced and sampled during bagging, providing for many options of the same data points. The base regressor is the weak learner, which is independently trained based on the task and after the formation of several data samples. It finds the average or majority of the predictions made by the base estimator for classification or regression, which results in a more accurate estimate. As per the study by Breiman (1996), bagging is carried out in three major steps, namely: (1) Bootstrapping, (2) Parallel training, and (3) Aggregation. Bauer & Kohavi (1999) further explained in their study that the BA algorithm carries out as follows:
Picking data from the main training dataset at random and independently. This process is repeated multiple times to produce a predetermined number of sub-datasets.
Choosing the fundamental learning model to use every sub-dataset and develop the sequence of predictive function.
Conducting a vote to determine the outcome and selecting the final result based on the highest number of votes.
For both non-deposition and deposition boundary conditions, Ab Ghani (1993) offered equations. The studies are performed in a channel with a 450 mm diameter and various pipe diameters and bed thicknesses up to 23% under deposition boundary conditions. In addition, the information gathered from Alvarez-Hernandez (1990), Perrusquia (1992), and May (1993).
One of the primary benefits of bagging is its ability to reduce variance and prevent overfitting by combining predictions from multiple models, resulting in more robust and stable outputs. It works particularly well with high-variance models, such as decision trees, improving overall performance. Furthermore, bagging is typically simple to implement and parallelize, which speeds up the training process.
However, one disadvantage is that bagging can be computationally expensive and memory-intensive because it requires training multiple models on different subsets of data. Furthermore, bagging reduces variance but does not address bias, so it may not improve the performance of inherently biased models. The technique additionally requires a large amount of data to ensure that each subset is representative of the overall distribution, which may be a limitation in datasets with fewer samples. Despite these challenges, bagging remains a better-performing ML model for predicting particle Froude numbers.
Proposed Bagging-M5P (BA-M5P): The Bagging-M5P model is an ensemble learning technique that combines the predictive power of multiple M5P models to enhance overall prediction accuracy and robustness. It involves training multiple M5P models on different bootstrapped subsets of the original dataset. Each iteration creates diverse training sets for each individual model by sampling a random subset of the training data with replacement. This diversity helps to capture different aspects of the underlying data distribution and reduces the risk of overfitting.
Once trained, the individual M5P models make predictions on unseen data points. We then aggregate the predictions of all the individual models, typically through averaging or weighted averaging, to obtain the final prediction. This approach effectively utilizes the combined knowledge of multiple models, resulting in enhanced prediction performance and generalization capability. In addition, Bagging-M5P is highly versatile in handling various types of data, making it suitable for a wide range of prediction tasks across different domains.
Model evaluation criteria
Model evaluation criteria enable the comparison of different models on the same dataset, which is useful when selecting the best model for a particular problem. Evaluation criteria can highlight the strengths and weaknesses of a model, informing further development and improvements. It provides a way to communicate the performance of a model to others, such as colleagues or stakeholders, through the use of objective metrics. Overall, model evaluation criteria are crucial for assessing the effectiveness and reliability of a model. In this case, we used MSE, MAE, NSE, and CC static metrics with ideal values 0, 0, 1, and −1 or 1, respectively, which help to find the best model. These static metrics are calculated as follows.
Here ith actual value of the particle Froude number is represented by , is the mean of the actual value of the particle Froude number, is the ith predicted value of the particle Froude number, and is the mean of the predicted value of the particle Froude number by the proposed models.
Sensitivity analysis
Sensitivity analysis is carried out to find the most sensitive input variable that affects the performance of the proposed model. In this analysis, we remove one input variable at a time and evaluate the performance of the model. Next, we choose the next input variable to remove. This process is repeated until all input variables have been removed once. Therefore, for a k input, we evaluate the model performance k times, where only one input is removed at one step. In this study, we first remove the input variable that is least correlated with the output variable, then remove the next least correlated variable in the next iteration, and finally, remove the most correlated variable in the final iteration. We evaluate the model's performance by comparing metric values and finding the most sensitive input variable.
RESULTS AND DISCUSSION
An in-depth study of the outcomes of the proposed models is provided in this section. In order to comparison of the proposed model's accuracy with current empirical equations based on performance metrics, we first identify the best input variable combinations and then evaluate the accuracy of each model. In this study, the most sensitive input variable that significantly affects the model's performance must be identified.
Best input combination
By using the CC value among the input and output variables, we have a total of five different ICs (i.e., IC A, IC B,…, IC E). We compare the CC value between the actual and predicted values by employing various ICs with the proposed models. We find that the most accurate predictions are obtained when all variables are combined in input combination IC E, as shown in Table 2. Among all the models, the BA-M5P model produces the most accurate predictions of the value.
Model analysis and hyperparameter tuning
Here, we describe the hyperparameter tuning of various models used.
M5Prime: Final value of parameter after tuning M5P model: (a) The batch size of 100 is used; (b) Set False to unsmoothed; (c) Unpruned should be False; and (d) There must be at least four instances.
Bagging: The following operators were used for BA. Again, these numbers were calculated using the trial-and-error approach. (a) Seed value – 7; (b) The batch size –100; (c) Each bag size – 100; (d) Iterations no. – 10; (e) The base classifier used – M5P; (f) Weights used for instances – False.
Model performance
Table 3 displays the specific outcomes of the performance measures for value prediction. We took into account the CC metric to consistently explain the findings because we could quickly demonstrate that the model truly improves model fit. As a result, we could contrast the models. The CC values revealed that BA-M5P was able to the most accurately predict (0.944), followed by another proposed ML model M5P (0.936), state-of-the-art ML model REPT (0.939), MGGP (0.856), ELM (0.785), SVM (0.719), and empirical equation of Ab Ghani (1993) (0.856), and Nalluri et al. (1997) (0.733), when all input variables, i.e., IC E, were taken in the prediction of value.
. | . | MSE . | MAE . | NSE . | CC . | BA-M5P % Improvement wrt to CC . |
---|---|---|---|---|---|---|
Proposed ML models | BA-M5P | 0.770 | 0.551 | 0.890 | 0.944 | – |
M5P | 0.870 | 0.573 | 0.876 | 0.936 | 0.838 | |
State-of-the-art ML models | REPT | 0.833 | 0.634 | 0.881 | 0.939 | 0.540 |
MGGP | 2.079 | 0.925 | 0.703 | 0.856 | 9.310 | |
ELM | 2.691 | 1.265 | 0.616 | 0.785 | 16.786 | |
SVM | 3.531 | 1.211 | 0.496 | 0.719 | 23.850 | |
Empirical equations | Ab Ghani (1993) | 5.537 | 2.017 | 0.210 | 0.856 | 9.326 |
Nalluri et al. (1997) | 7.158 | 2.005 | −0.021 | 0.733 | 22.313 |
. | . | MSE . | MAE . | NSE . | CC . | BA-M5P % Improvement wrt to CC . |
---|---|---|---|---|---|---|
Proposed ML models | BA-M5P | 0.770 | 0.551 | 0.890 | 0.944 | – |
M5P | 0.870 | 0.573 | 0.876 | 0.936 | 0.838 | |
State-of-the-art ML models | REPT | 0.833 | 0.634 | 0.881 | 0.939 | 0.540 |
MGGP | 2.079 | 0.925 | 0.703 | 0.856 | 9.310 | |
ELM | 2.691 | 1.265 | 0.616 | 0.785 | 16.786 | |
SVM | 3.531 | 1.211 | 0.496 | 0.719 | 23.850 | |
Empirical equations | Ab Ghani (1993) | 5.537 | 2.017 | 0.210 | 0.856 | 9.326 |
Nalluri et al. (1997) | 7.158 | 2.005 | −0.021 | 0.733 | 22.313 |
Bold values indicate the best performance. For benchmarking purposes, we have also compared the improvement obtained using the proposed BA-M5P model with the state-of-the-art ML models and empirical equations.
Sensitivity analysis
The outcome of the sensitivity analysis is displayed in Table 4. That shows the metrics result of the different ICs where we remove each input variable (first from lowest CC to highest) at a time from the best IC (i.e., IC E). To observe sensitive analysis results, we choose the best proposed ML models (i.e., BA-M5P) to use training and testing with each IC of sensitive analysis by predicting the value. After comparing metrics, we discovered that is the most important variable compared with others, i.e., MSE = 1.156 and MAE = 0.694. When we remove from the IC, the prediction performance of the model gives the worst result, indicating that is the most sensitive variable and must be considered to predict the value.
IC # . | Input variables . | Removed . | MSE . | MAE . |
---|---|---|---|---|
IC 1 | 0.842 | 0.559 | ||
IC 2 | 0.851 | 0.571 | ||
IC 3 | 1.156 | 0.694 | ||
IC 4 | 0.909 | 0.660 | ||
IC 5 | 1.003 | 0.622 | ||
IC 6 | – | 0.801 | 0.547 |
IC # . | Input variables . | Removed . | MSE . | MAE . |
---|---|---|---|---|
IC 1 | 0.842 | 0.559 | ||
IC 2 | 0.851 | 0.571 | ||
IC 3 | 1.156 | 0.694 | ||
IC 4 | 0.909 | 0.660 | ||
IC 5 | 1.003 | 0.622 | ||
IC 6 | – | 0.801 | 0.547 |
On removal of Svc from the input combination, the prediction performance of the model gives the worst result, indicating that Svc is the most sensitive variable (given in bold).
Discussion
Self-cleaning sewer systems are designed to maintain the flow of wastewater through a sewer system by removing debris that can block pipes and cause backups. This is important because blockages in the sewer system can cause wastewater to overflow, leading to environmental contamination and public health hazards. In addition, the cost of manually cleaning blockages in the sewer system can be high, so a self-cleaning system can help to reduce maintenance costs. The value is a dimensionless number that is used to compare the inertial forces to the gravitational forces acting on an object in a fluid flow. In the context of a self-cleaning sewer system, the value can be used to predict the movement of solid particles in the wastewater flow. A high value indicates that the inertial forces acting on the particles are greater than the gravitational forces, which means that the particles will tend to move with the flow of the wastewater. On the other hand, a low value indicates that the gravitational forces acting on the particles are greater than the inertial forces, which means that the particles will tend to settle out of the flow. By understanding the value in a self-cleaning sewer system, engineers can design the system to optimize the flow of wastewater and remove debris effectively.
For prediction of the value, we used median sediment size (Sm), hydraulic radius (Rh), volumetric concentration of sediment (Svc), the diameter of pipe (dp), friction factor of channel (fc), and dimensional grain size of particles (Dgr) in the condition of NDB in this study. In all, we proposed two ML models: BA-M5P and M5P. These proposed models are compared with two empirical equations proposed by Ab Ghani (1993) and Nalluri et al. (1997). The performance of the models employed in this study was examined using four performance metrics: MSE, MAE, NSE, and CC. All models utilized in this study gave better results when all variables were combined, as was discovered.
It appears that the study evaluated different combinations of input variables for a regression model and compared their performance using various measures. The highest prediction power was achieved when all input variables were included in the model, and the best performing model was the BA-M5P model. We use various performance measures to compare the accuracy of the ML models used in this study to predict values. The best-performing model was found to be the BA-M5P model in comparison with others, as shown in Table 3. It is also shown in terms of accuracy, as demonstrated by the close match between the actual and predicted values in the line scatter plots and the box-and-whisker plot. The Taylor diagram illustrates the resemblance between the actual and predicted values by comparing the similarity between CC and SD values.
By performing sensitivity analysis on the best model to determine the impact of different input variables on the prediction of a dependent variable . The sensitivity analysis involved removing one input variable at a time from the IC E and evaluating the model's performance using MSE and MAE metrics. The results showed that the variable was the most sensitive, meaning that the model's prediction performance was the most affected when was removed. This suggests that is an important variable to consider when predicting the value.
Limitations
Prediction of the value under NDB conditions is a complex phenomenon involving independent variables such as mean velocity of non-deposition flow (Un), density , sediment density , sediment relative density (S), acceleration due to gravity (g), median sediment size (Sm), kinematic viscosity , hydraulic radius (Rh), volumetric concentration of sediment (Svc), the thickness of sediment deposited (ts), the diameter of pipe (dp), and friction factor of channel (fc). Other parameters, such as the bed shear stress, the bed form of the channel, the channel discharge, and so on, have the ability to influence the prediction of the Fr value. Due to the absence of these parameters in our dataset, the models that we provided were unable to take into consideration the impact that they had. In addition, the variable range plays a significant role in the training of the model. There are cases in which the ranges of input variables may surpass the values we have accounted for despite the fact that our dataset is derived from a variety of field and laboratory investigations. Under either of these two situations, there is a possibility that the proposed model will demonstrate poor performance. These difficulties are encountered by the majority of ML models, as the efficacy of these models is largely dependent on the training dataset and the characteristics of the dataset.
CONCLUSIONS
In this investigation, we assess how well BA-M5P and M5P predict the value in the sewage system. The following variables are utilized to make predictions: dimensional grain size of particles (Dgr), median sediment size (Sm), hydraulic radius (Rh), volumetric concentration of sediment (Svc), the thickness of sediment deposited (ts), the diameter of pipe (dp), and friction factor of channel (fc). In this study, we evaluate the performance of the proposed ensemble and standalone ML models in comparison with the empirical equations that come from theoretical considerations. We also compared different ICs to find the best ICs and the most sensitive variable. When compared with empirical equations and standalone ML models, ensemble ML models perform better with each IC. In this work, BA-M5P, an ensemble ML model, outperforms M5P and empirical equations, which are standalone ML models. Independent ML models outperform empirical equations. Out of all the models employed in this study, the BA-M5P (MSE = 0.770, MAE = 0.551, NSE = 0.890, and CC = 0.944) is found to be the most accurate model for the prediction of values in a sewage system. We find that the most accurate predictions are obtained when all variables are combined in input combination IC E . Among these variables, is the most sensitive variable observed in sensitive analysis.
CREDIT AUTHORSHIP CONTRIBUTION STATEMENT
S.K.: Formal analysis, Writing – original draft, Data curation, Data processing and analysis, Original draft preparation, and Writing; B.K.: Reviewing and Editing; M.A.: Research profile design, Writing, Reviewing, and Revising; V.D.: Research profile design, Reviewing, and Revising; U.R.: Reviewing and Revising, Supervision.
FUNDING
This research received no external funding.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.