Abstract
The coefficient of Manning's roughness (n) has been generally implemented in the determination of depth and discharge in open channels and canals. This study unravels the novel idea and potential of Random Forest (RF), M5P, and Random Tree (RT) approaches to evaluate and predict the coefficient of Manning's roughness for hydraulic designing. To achieve this purpose, 42 observations were collected for high-gradient streams in Colorado, USA. All the observations were from boulder-bed, cobble and high gradient (S > 0.002 m/m) streams within bank flows. In order to ascertain the best model, the above-mentioned approaches were evaluated and compared using performance evaluation indices such as mean absolute error (MAE), coefficient of correlation (CC), and root mean square error (RMSE). Outcomes of performance evaluation indices revealed that the proposed pruned M5P approach outperformed other applied models for predicting the coefficient of Manning's roughness for hydraulic designing with CC = 0.7858, 0.7910, RMSE = 0.0195, 0.0195, and MAE = 0.0157, 0.0165 for model development and validation period, correspondingly. Furthermore, Taylor diagram and Box plot also suggest that the M5P based approach works better than RF and RT based approaches for predicting the coefficient of Manning's roughness for high-gradient streams using the given data set.
HIGHLIGHTS
Three soft computing-based modelling approaches (M5P, RF and RT) were developed in the prediction of Manning's roughness coefficient.
The performance of modelling approaches was compared by mean absolute error (MAE), coefficient of correlation (CC), and root mean square error (RMSE).
The total dataset was divided into training and testing subset in the ratio of 70:30 to perform the modelling approaches.
M5P modelling approach is the best approach in the prediction of the Manning's roughness coefficient.
Graphical Abstract
INTRODUCTION
For the conditions of relative uniform flow, when Equations (1) and (3) were formulated, the energy gradient, friction slope, and water–surface slope are in the direction to the stream bed, and the hydraulic radius, area, and slope remain relatively constant. However, the energy slopes, friction, and water may be different in many natural channels, especially in high-gradient streams. Therefore, several authors select such cases which are relatively uniform with gradually varied flow for the investigation of resistance in the channels of natural flow (e.g. Barnes 1967; Limerinos 1970; Jarrett 1984). More so, velocity distribution is assumed to be logarithmic in the channels of natural flow (Chow 1959). Jarrett (1992) conducted a study which revealed that in higher gradient channels vertical-velocity profiles were found to be S-shaped which can affect sediment transport and hydraulic interpretations (assuming the profile is logarithmic).
Despite the availability of equations, guidelines and data to help in the estimation of n, the most extensively used approach for the computing depths or discharges of flow in natural channels, for instance morpho dynamics, sediment transport, flood inundation mapping, and the riverine ecosystem studies is Manning's equation, thus one n value is chosen for the entire range of flow depth. Costa & Jarrett (2008), Jiang & Li (2010), and Wohl (2000) have succinctly remarked that there is still no exact method for determining the n values in natural rivers, therefore, the research based on physical observations with greater reliance on the data obtained from the field is required to decrease the uncertainties for estimating the n-values.
There are certain modelling approaches used in an effective manner in the field of scour estimation and n-values estimation, i.e. multilayer perceptroninduced with the firefly algorithm (MLP_FFA) (Diop et al. 2020; Roushangar et al. 2020), terrestrial-remote sensing technique and flood modeling using FLO-2D (Demir & Keskin 2020), ANFIS (adaptive neuro-fuzzy inference systems) (Azamathulla et al. 2009; Bahramifar et al. 2013; Moharana & Khatua 2014; Singh et al. 2020) and ANN (Azamathulla et al. 2008; Singh et al. 2020; Zounemat-Kermani et al. 2020). Also, it has been found that ANNs give reasonably better solutions for the issues of hydraulic and hydrology engineering, especially in cases of complex and nonlinear relationships between the pairs of output-input in corresponding data (Guven 2009; Azamathulla et al. 2010; Guven & Talu 2010; Azamathulla & Ghani 2011; Traore & Guven 2012; Singh et al. 2020; Zounemat-Kermani et al. 2020).
Several researchers have resorted usage of modelling approaches for analyzing the field and laboratory data, and have found significantly better results as compared to conventional statistical methods during the last few decades (Giustolisi 2004; Azmathullah et al. 2005; Azamathulla et al. 2010; Singh et al. 2018; Mohanty et al. 2019; Singh Nain et al. 2019; Sihag et al. 2020; Rani et al. 2021; Sihag et al. 2021). Several researchers have analyzed the data obtained downstream of hydraulic structures and scour around using ANN (Azmathullah et al. 2005; Azamathulla et al. 2008). Recently, M5P has attracted the attention of researchers in the prediction of hydraulic characteristics. The present study presents RF, RT and M5P as alternative tools for estimating Manning's n value(s).
M5 tree model
Random forest (RF)
The random forest was introduced by Breiman (1996), which is an adaptable assembly of decision trees that seamlessly manages variance and bias for linear and nonlinear predictions. This approach is versatile and has been preferred to solve various nonlinear or complex engineering issues. This technique employs creation of a large number of trees with the root node achieving a different bootstrap (bagging) sample of the original data set (Breiman 1996, 2001). Using a randomly chosen subset of the parameters of the estimator at each node, division is performed that results in a single generic vote (Liaw & Wiener 2002) in which each tree undergoes a distinct prediction cycle. This model has two different standard frameworks: the number of input variables (q) selected to create a tree at each node, and the number of trees to be generated (k). For the best heterogeneity, only specific variables represent ‘whole’ at each node. So, the regression of the random forest includes k trees. The random forest predictor considers a form by taking over k trees joint of general error.The random forest algorithm is simple, comparatively oblivious to the training set features, and capable of achieving high precision in prediction (Breiman 1999, 2001). For the development of the model, a trial-and-error process is employed. The WEKA 3.9 software was used to develop the random forest-based model in this current investigation.
Random tree (RT)
The random tree algorithm is used to assess a particular number of random characteristics at every node without pruning. Random tree has little to do with machine learning however it uses arbitrary knowledge, particularly bagging (Hamoud et al. 2018). Every node in a random forest is finely divided into the arbitrarily chosen forerunner subsets of that node. It deals with both problems with classification and regression. Random trees are a set of forests known as tree estimators. The classification of RT is as follows: the vector input property is taken by the classifier of random trees, it classifies each tree in the forest, and extracts the category mark which receives maximum votes. Average response is the response of the classifier of all the trees in the forest in the denial period (Cutler et al. 2012). RTs are basically an amalgamation of two algorithms existing in machine learning: RF principles and single model trees. Model trees are decision trees which represent the linear pattern on the basis of which each leaf is designed for the local subdomain that this leaf represents. The performance of single stable trees has been shown to significantly improve RFs. Basically there are two methods that create tree diversity: first, by removing each tree, the training data is sampled, as in bagging; second, instead of always calculating the best possible division for each node when growing a tree, only one random subset of all attributes is considered for each node, and the best part of that subset is determined.For the first time, Random Model trees merge random forests and model trees. RTs employ this result for dividing criteria and thus encourage considerable balanced trees where a spherical ridge environment runs on all leaves, thus simplifying the optimization method (Barddal et al. 2019).
Performance evaluation indices
Data set
A total of 42 observations were used for this investigation. The complete data set was separated into two portions; the first portion is 70% of total data and the remainder is the second portion (Bhoria et al. 2021). The first portion (29 observations) was used for developing the model and the second portion (13 observations) was for validation. The range, features and statistical description of both portions are shown in Table 1. R, S and D84 are independent variables so these are selected as input variable whereas n is selected as output in the model development and validation stages. Figure 2 shows the correlation plot of all input and output variables.
Statistic . | First portion . | Second portion . | ||||||
---|---|---|---|---|---|---|---|---|
R . | S . | D84 . | n . | R . | S . | D84 . | n . | |
Minimum | 0.1500 | 0.0020 | 0.0910 | 0.0280 | 0.1800 | 0.0020 | 0.0910 | 0.0300 |
Maximum | 1.6800 | 0.0340 | 0.7920 | 0.1590 | 1.2300 | 0.0310 | 0.6100 | 0.1030 |
Mean | 0.6507 | 0.0122 | 0.3647 | 0.0667 | 0.6015 | 0.0102 | 0.3040 | 0.0528 |
Standard deviation | 0.3948 | 0.0097 | 0.1737 | 0.0378 | 0.3711 | 0.0096 | 0.1498 | 0.0208 |
Kurtosis | −0.1845 | −0.7745 | 0.8270 | 0.0211 | −0.8385 | 0.6385 | −0.1525 | 1.5271 |
Skewness | 0.5913 | 0.6759 | 0.9192 | 1.0281 | 0.7582 | 1.3583 | 0.6203 | 1.2408 |
Confidence level (95%) | 0.1502 | 0.0037 | 0.0661 | 0.0144 | 0.2242 | 0.0058 | 0.0905 | 0.0126 |
Statistic . | First portion . | Second portion . | ||||||
---|---|---|---|---|---|---|---|---|
R . | S . | D84 . | n . | R . | S . | D84 . | n . | |
Minimum | 0.1500 | 0.0020 | 0.0910 | 0.0280 | 0.1800 | 0.0020 | 0.0910 | 0.0300 |
Maximum | 1.6800 | 0.0340 | 0.7920 | 0.1590 | 1.2300 | 0.0310 | 0.6100 | 0.1030 |
Mean | 0.6507 | 0.0122 | 0.3647 | 0.0667 | 0.6015 | 0.0102 | 0.3040 | 0.0528 |
Standard deviation | 0.3948 | 0.0097 | 0.1737 | 0.0378 | 0.3711 | 0.0096 | 0.1498 | 0.0208 |
Kurtosis | −0.1845 | −0.7745 | 0.8270 | 0.0211 | −0.8385 | 0.6385 | −0.1525 | 1.5271 |
Skewness | 0.5913 | 0.6759 | 0.9192 | 1.0281 | 0.7582 | 1.3583 | 0.6203 | 1.2408 |
Confidence level (95%) | 0.1502 | 0.0037 | 0.0661 | 0.0144 | 0.2242 | 0.0058 | 0.0905 | 0.0126 |
RESULTS AND DISCUSSION
For the accurate prediction of the coefficient of Manning's roughness for features for hydraulic designing, soft computing and regression-based modelling approaches are used in the current investigation. The test performance of all the implemented models was carried out by utilizing three standard statistical parameters, CC, RMSE and MAE. The lower values of MAE and RMSE values show higher model accuracy, and higher CC values show higher model accuracy. The range of CC is −1 to 1. WEKA 3.9 software was implemented for the model development and validation in this study. A trial and error method was employed for model preparation. The ideal values of the first parameters was achieved after a number of tests. There are well-defined statistical criteria for selecting and defining first parameters that are unique to the model.
Assessment of M5P tree based modelling approach
The M5P modelling approach is developed by a trial and error processs. This model utilises linear regression models for defining an input-output relationship which is based on division of the parameter space of the data into several subspaces. In this investigation pruned and unpruned both models were developed for prediction of the coefficient of Manning's roughness for hydraulic designing. Developed linear equations using pruned and unpruned M5P based models are listed in Tables 2 and 3 respectively.
LM num . | Equation . |
---|---|
1 | n = 0.0258 * R + 3.1029 * S + 0.012 |
LM num . | Equation . |
---|---|
1 | n = 0.0258 * R + 3.1029 * S + 0.012 |
M5 unpruned model tree: (using smoothed linear models) . | LM num . | Equations . |
---|---|---|
S < = 0.008 : | R < = 0.615 : | | S < = 0.004 : | | | R < = 0.45 : LM1 (3/10.087%)| | | R > 0.45 : LM2 (2/2.696%)| | S > 0.004 : LM3 (2/4.044%)| R > 0.615 : | | D84< = 0.259 : LM4 (2/17.522%) | | D84 > 0.259 : | | | R < = 1.01 : LM5 (3/6.354%) | | | R > 1.01 : LM6 (2/5.391%) S > 0.008 : | R< = 0.37 : | | R < = 0.165 : LM7 (2/12.131%) | | R > 0.165 : LM8 (3/33.016%) | R > 0.37 : | | S< = 0.025 : | | | R < = 0.89 : | | | | R< = 0.625 : LM9 (2/26.957%) | | | | R > 0.625 : LM10 (2/53.915%) | | | R > 0.89 : LM11 (3/1.271%) | | S > 0.025 : LM12 (3/46.501%) | 1 | n = 0.0197 * R + 2.3019 * S + 0.0188 |
2 | n = 0.0197 * R + 2.3019 * S + 0.0188 | |
3 | n = 0.0202 * R + 2.4249 * S + 0.0184 | |
4 | n = 0.017 * R + 1.6049 * S + 0.0234 | |
5 | n = 0.0157 * R + 1.6049 * S + 0.0244 | |
6 | n = 0.0156 * R + 1.6049 * S + 0.0245 | |
7 | n = 0.0286 * R + 2.4454 * S + 0.0196 | |
8 | n = 0.0286 * R + 2.4454 * S + 0.0201 | |
9 | n = 0.0254 * R + 2.6391 * S + 0.0253 | |
10 | n = 0.0254 * R + 2.6391 * S + 0.0253 | |
11 | n = 0.0254 * R + 2.6391 * S + 0.025 | |
12 | n = 0.0254 * R + 2.7219 * S + 0.0251 |
M5 unpruned model tree: (using smoothed linear models) . | LM num . | Equations . |
---|---|---|
S < = 0.008 : | R < = 0.615 : | | S < = 0.004 : | | | R < = 0.45 : LM1 (3/10.087%)| | | R > 0.45 : LM2 (2/2.696%)| | S > 0.004 : LM3 (2/4.044%)| R > 0.615 : | | D84< = 0.259 : LM4 (2/17.522%) | | D84 > 0.259 : | | | R < = 1.01 : LM5 (3/6.354%) | | | R > 1.01 : LM6 (2/5.391%) S > 0.008 : | R< = 0.37 : | | R < = 0.165 : LM7 (2/12.131%) | | R > 0.165 : LM8 (3/33.016%) | R > 0.37 : | | S< = 0.025 : | | | R < = 0.89 : | | | | R< = 0.625 : LM9 (2/26.957%) | | | | R > 0.625 : LM10 (2/53.915%) | | | R > 0.89 : LM11 (3/1.271%) | | S > 0.025 : LM12 (3/46.501%) | 1 | n = 0.0197 * R + 2.3019 * S + 0.0188 |
2 | n = 0.0197 * R + 2.3019 * S + 0.0188 | |
3 | n = 0.0202 * R + 2.4249 * S + 0.0184 | |
4 | n = 0.017 * R + 1.6049 * S + 0.0234 | |
5 | n = 0.0157 * R + 1.6049 * S + 0.0244 | |
6 | n = 0.0156 * R + 1.6049 * S + 0.0245 | |
7 | n = 0.0286 * R + 2.4454 * S + 0.0196 | |
8 | n = 0.0286 * R + 2.4454 * S + 0.0201 | |
9 | n = 0.0254 * R + 2.6391 * S + 0.0253 | |
10 | n = 0.0254 * R + 2.6391 * S + 0.0253 | |
11 | n = 0.0254 * R + 2.6391 * S + 0.025 | |
12 | n = 0.0254 * R + 2.7219 * S + 0.0251 |
Results of the M5P model to predict n is shown in Figure 3 for model development as well as the validation stage. Results of the performance evaluation indices concludes that the pruned M5P based modelling approach is more accurate than the unpruned M5P based modelling approach for predicting n for hydraulic designing with CC = 0.7858, 0.7910, RMSE = 0.0195, 0.0195, and MAE = 0.0157, 0.0165 for model development and validation period, correspondingly. Overall, assessing Figure 3 and Table 4 (performance evaluation parameters) suggests that both pruned and unpruned M5P based modeling approaches are suitable for predicting n for hydraulic designing.
Models . | First portion (Model development) . | Second portion (Validation stage) . | ||||
---|---|---|---|---|---|---|
CC . | MAE . | RMSE . | CC . | MAE . | RMSE . | |
M5P_pruned | 0.8533 | 0.0133 | 0.0193 | 0.7910 | 0.0165 | 0.0195 |
M5P_unpruned | 0.8938 | 0.0115 | 0.0167 | 0.7858 | 0.0157 | 0.0195 |
RF | 0.9706 | 0.0064 | 0.0096 | 0.7581 | 0.0165 | 0.0203 |
RT | 1.0000 | 0.0001 | 0.0004 | 0.7755 | 0.0221 | 0.0281 |
Models . | First portion (Model development) . | Second portion (Validation stage) . | ||||
---|---|---|---|---|---|---|
CC . | MAE . | RMSE . | CC . | MAE . | RMSE . | |
M5P_pruned | 0.8533 | 0.0133 | 0.0193 | 0.7910 | 0.0165 | 0.0195 |
M5P_unpruned | 0.8938 | 0.0115 | 0.0167 | 0.7858 | 0.0157 | 0.0195 |
RF | 0.9706 | 0.0064 | 0.0096 | 0.7581 | 0.0165 | 0.0203 |
RT | 1.0000 | 0.0001 | 0.0004 | 0.7755 | 0.0221 | 0.0281 |
Assessment of RF based modelling approach
Figure 4 provide plots of agreement between actual and predicted Manning's values of n for hydraulic designing through an RF based modelling approach for model development and validation stages, respectively. Predicted values from the RF based modelling approach lies very close to the agreement line. Table 4 shows the results of model development and validation stages in terms of CC, RMSE and MAE which indicates that the performance of the RF approach is suitable for the prediction of n for hydraulic designing with CC, RMSE and MAE values are 0.9706, 0.0096, and 0.0065 for model development stage and 0.7581, 0.0203 and 0.0165 for validation stage respectively.
Assessment of RT based modelling approach
The random tree based model development process is similar to M5P and RF based models. This model is also developed using WEKA 3.9. Figure 5 provide plots of agreement between predicted and actual values of n for hydraulic designing through RT based modeling approach for model development and validation stages, respectively. Predicted values from RF based modelling approach lies very nearer to the agreement line. Performance evaluation indices values for the test data set indicates that the performance of the RT approach is suitable for the prediction of values of n for hydraulic designing with CC, RMSE, and MAE values are 1.000, 0.0004, 0.0001 for the training stage and 0.7755, 0.0281, and 0.221 for the testing stage respectively.
Comparative assessment of soft computing based applied modelling approaches
Comparison of soft computing-based modelling approaches (Table 4 and Figure 6) shows that M5P based modelling approaches work better than other applied modelling approaches. To assess the potential of soft computing-based modeling approaches for predicting the values of n for hydraulic designing, agreement and performance graphs are plotted in Figure 6 for model development and validation stages. It is incidental from the plots that the predicted values produced by M5P based modelling approaches were in extremely near closeness to the actual values of n for hydraulic designing and predicted n for hydraulic designing are found to follow a similar pattern to that of actual values. The pruned M5P based modelling approach works better than unpruned M5P. The RF based modelling approach works better than the RT based modelling approach using this data set for the prediction the values of n for hydraulic designing. A box plot is also plotted in Figure 7 for the comparison of actual and predicted values using various applied designs for validation stage. Descriptive statistics of actual and applied models using validation stage are enlisted in Table 5. Figure 7 and Table 5 suggest that the pruned M5P modelling approach is outperforming in comparison to other applied modelling approaches. Minimum and maximum values of actual and predicted values using pruned M5P model is very close. In Figure 7, widths of the lower and upper quartile are almost the same which recommends that the pruned M5P modelling approach is most suitable for the prediction the values of n for hydraulic designing.
Statistic . | M5P_pruned . | M5P_unpruned . | RF . | RT . |
---|---|---|---|---|
Minimum | −0.0350 | −0.0370 | −0.0340 | −0.0560 |
Maximum | 0.0260 | 0.0270 | 0.0320 | 0.0300 |
1st quartile | −0.0220 | −0.0210 | −0.0240 | −0.0330 |
Median | −0.0030 | −0.0040 | −0.0060 | −0.0060 |
3rd quartile | 0.0070 | 0.0040 | 0.0020 | 0.0020 |
Mean | −0.0066 | −0.0058 | −0.0078 | −0.0127 |
Statistic . | M5P_pruned . | M5P_unpruned . | RF . | RT . |
---|---|---|---|---|
Minimum | −0.0350 | −0.0370 | −0.0340 | −0.0560 |
Maximum | 0.0260 | 0.0270 | 0.0320 | 0.0300 |
1st quartile | −0.0220 | −0.0210 | −0.0240 | −0.0330 |
Median | −0.0030 | −0.0040 | −0.0060 | −0.0060 |
3rd quartile | 0.0070 | 0.0040 | 0.0020 | 0.0020 |
Mean | −0.0066 | −0.0058 | −0.0078 | −0.0127 |
The Taylor diagram is a graphical illustration of the performance of developed models in terms of correlation, RMSE and standard deviation is shown in Figure 8. Figure 8 indicates that pruned M5P is the best performing model and the performance of the RT model is least in the prediction of the coefficient of Manning's roughness for hydraulic designing. Furthermore, a comparison of the obtained results was carried out with the previous published studies (Jarrett 1984; Azamathulla & Jarrett 2012). This comparison is done on the basis of the obtained values of CC and RMSE. The details of the comparison are given in Figure 9 which suggests that the results obtained from the M5P are much better than Jarrett (1984) and Azamathulla & Jarrett (2012). The values of CC and RMSE for the M5P_purned are 0.791 and 0.0195 which is better than M5P_unpurned (0.785 and 0.0195); Azamathulla & Jarrett (2012) (0.745 and 0.978) and Jarrett (1984) (0.58 and 1.205). Thus, the M5P model is a general model for the study area and gives closer values to the observed values as compared to other models as well as the previous studies. The M5P model is used to represent the coefficient of Manning's roughness of the study area and it saves time and effort in comparison with experimentation and conventional models.
CONCLUSION
In this study, M5P, RT and RF based modeling approaches have been developed for the prediction of the coefficient of Manning's roughness for hydraulic designing.The comparison analysis using performance evaluation indices conclude that the pruned M5P approach out-performed the RF and RT based modelling approaches using given data set with CC = 0.7858, 0.7910, RMSE = 0.0195, 0.0195, and MAE = 0.0157, 0.0165 for model development and validation period, correspondingly. The training and testing results of the M5P model encourage the utility of this method relative to the other tested methods and show a good potential in representing the Manning coefficient of the study area. In M5P based models, the pruned model works better than the unpruned model. The Taylor diagram and box plot also suggest that M5P based modelling approaches work better than RT and RF based modelling approaches for predicting the coefficient of Manning's roughness for hydraulic designing using this data set.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.