## Abstract

The coefficient of Manning's roughness (*n*) has been generally implemented in the determination of depth and discharge in open channels and canals. This study unravels the novel idea and potential of Random Forest (RF), M5P, and Random Tree (RT) approaches to evaluate and predict the coefficient of Manning's roughness for hydraulic designing. To achieve this purpose, 42 observations were collected for high-gradient streams in Colorado, USA. All the observations were from boulder-bed, cobble and high gradient (S > 0.002 m/m) streams within bank flows. In order to ascertain the best model, the above-mentioned approaches were evaluated and compared using performance evaluation indices such as mean absolute error (MAE), coefficient of correlation (CC), and root mean square error (RMSE). Outcomes of performance evaluation indices revealed that the proposed pruned M5P approach outperformed other applied models for predicting the coefficient of Manning's roughness for hydraulic designing with CC = 0.7858, 0.7910, RMSE = 0.0195, 0.0195, and MAE = 0.0157, 0.0165 for model development and validation period, correspondingly. Furthermore, Taylor diagram and Box plot also suggest that the M5P based approach works better than RF and RT based approaches for predicting the coefficient of Manning's roughness for high-gradient streams using the given data set.

## HIGHLIGHTS

Three soft computing-based modelling approaches (M5P, RF and RT) were developed in the prediction of Manning's roughness coefficient.

The performance of modelling approaches was compared by mean absolute error (MAE), coefficient of correlation (CC), and root mean square error (RMSE).

The total dataset was divided into training and testing subset in the ratio of 70:30 to perform the modelling approaches.

M5P modelling approach is the best approach in the prediction of the Manning's roughness coefficient.

### Graphical Abstract

## INTRODUCTION

*n*denotes the flow resistance or relative roughness of floodplains and channels. The symbol

*n*is taken from Manning's equation for OPC (open channel flow) which is as follows:wherein,

*V*stands for the velocity (average cross sectional) in m/s,

*R*for hydraulic mean depth in m,

*S*for friction slope or energy gradient in m/m, and

*n*for the coefficient of Manning's roughness.

^{0.5}/s. For consistency and simplicity purpose, the unit of the

*n*is ignored. After substituting the values in the continuity equation, the parent equation becomes:where

*Q*stands for discharge in m

^{3}/s;

*A*is the area of cross-section in m

^{2}.

For the conditions of relative uniform flow, when Equations (1) and (3) were formulated, the energy gradient, friction slope, and water–surface slope are in the direction to the stream bed, and the hydraulic radius, area, and slope remain relatively constant. However, the energy slopes, friction, and water may be different in many natural channels, especially in high-gradient streams. Therefore, several authors select such cases which are relatively uniform with gradually varied flow for the investigation of resistance in the channels of natural flow (e.g. Barnes 1967; Limerinos 1970; Jarrett 1984). More so, velocity distribution is assumed to be logarithmic in the channels of natural flow (Chow 1959). Jarrett (1992) conducted a study which revealed that in higher gradient channels vertical-velocity profiles were found to be S-shaped which can affect sediment transport and hydraulic interpretations (assuming the profile is logarithmic).

Despite the availability of equations, guidelines and data to help in the estimation of *n*, the most extensively used approach for the computing depths or discharges of flow in natural channels, for instance morpho dynamics, sediment transport, flood inundation mapping, and the riverine ecosystem studies is Manning's equation, thus one *n* value is chosen for the entire range of flow depth. Costa & Jarrett (2008), Jiang & Li (2010), and Wohl (2000) have succinctly remarked that there is still no exact method for determining the *n* values in natural rivers, therefore, the research based on physical observations with greater reliance on the data obtained from the field is required to decrease the uncertainties for estimating the *n*-values.

There are certain modelling approaches used in an effective manner in the field of scour estimation and *n*-values estimation, i.e. multilayer perceptroninduced with the firefly algorithm (MLP_FFA) (Diop *et al.* 2020; Roushangar *et al.* 2020), terrestrial-remote sensing technique and flood modeling using FLO-2D (Demir & Keskin 2020), ANFIS (adaptive neuro-fuzzy inference systems) (Azamathulla *et al.* 2009; Bahramifar *et al.* 2013; Moharana & Khatua 2014; Singh *et al.* 2020) and ANN (Azamathulla *et al.* 2008; Singh *et al.* 2020; Zounemat-Kermani *et al.* 2020). Also, it has been found that ANNs give reasonably better solutions for the issues of hydraulic and hydrology engineering, especially in cases of complex and nonlinear relationships between the pairs of output-input in corresponding data (Guven 2009; Azamathulla *et al.* 2010; Guven & Talu 2010; Azamathulla & Ghani 2011; Traore & Guven 2012; Singh *et al.* 2020; Zounemat-Kermani *et al.* 2020).

Several researchers have resorted usage of modelling approaches for analyzing the field and laboratory data, and have found significantly better results as compared to conventional statistical methods during the last few decades (Giustolisi 2004; Azmathullah *et al.* 2005; Azamathulla *et al.* 2010; Singh *et al.* 2018; Mohanty *et al.* 2019; Singh Nain *et al.* 2019; Sihag *et al.* 2020; Rani *et al.* 2021; Sihag *et al.* 2021). Several researchers have analyzed the data obtained downstream of hydraulic structures and scour around using ANN (Azmathullah *et al.* 2005; Azamathulla *et al.* 2008). Recently, M5P has attracted the attention of researchers in the prediction of hydraulic characteristics. The present study presents RF, RT and M5P as alternative tools for estimating Manning's *n* value(s).

### M5 tree model

*et al.*2005). The M5P tree model was propounded by Quinlan (1992), which is a binary decision tree having a linear function (leaf) at the terminal nodes that can estimate continuous mathematical features. The M5 tree model is thus an amalgamation of the linear regression and tree regression method. The M5 tree model employs division of data into several categories through linear regression which is named leaf. This separation method introduced at each node enables better knowledge to be obtained with less deviation in the intra-subdivision class values in each branch. The separation criteria help in estimating the standard deviations of the class values extending to nodes which leads to the generation of the basic tree model. This approach generates linear functions at each node and utilizes the standard deviation approach by measuring the estimated error at the terminal node. The construction of the M5P model employs three stages: tree growth, pruning and smoothing. The chances of overfitting can be reduced through pruning. This method of constructing the model tree divides parameter space into subspaces and structure and includes a linear regression model in each of them (Quinlan 1992). Equation (4) represents the criteria for the standard deviation in each of the leaves (Sihag

*et al.*2018; Kumar & Sihag 2019; Sihag

*et al.*2019):where SDR is the standard deviation reduction,

*T*is the data set inputs into the tree branches,

*T*is the data set in leaf,

_{i}*sd*is the standard deviation. Figure 1 shows a schematic shape of the M5 tree model development. In the Figure 1(a), the

*X*

_{1}and

*X*

_{2}are the input variables (independent parameters) and

*Y*is the output data (dependent parameter). Figure 1(b) shows the tree model development for mapping the input and output data. Four linear models were developed at the leaf nodes in Figure 1. These linear models are based on input variables conditions such as greater than or equal to or less than.

### Random forest (RF)

The random forest was introduced by Breiman (1996), which is an adaptable assembly of decision trees that seamlessly manages variance and bias for linear and nonlinear predictions. This approach is versatile and has been preferred to solve various nonlinear or complex engineering issues. This technique employs creation of a large number of trees with the root node achieving a different bootstrap (bagging) sample of the original data set (Breiman 1996, 2001). Using a randomly chosen subset of the parameters of the estimator at each node, division is performed that results in a single generic vote (Liaw & Wiener 2002) in which each tree undergoes a distinct prediction cycle. This model has two different standard frameworks: the number of input variables (*q*) selected to create a tree at each node, and the number of trees to be generated (*k*). For the best heterogeneity, only specific variables represent ‘whole’ at each node. So, the regression of the random forest includes *k* trees. The random forest predictor considers a form by taking over *k* trees joint of general error.The random forest algorithm is simple, comparatively oblivious to the training set features, and capable of achieving high precision in prediction (Breiman 1999, 2001). For the development of the model, a trial-and-error process is employed. The WEKA 3.9 software was used to develop the random forest-based model in this current investigation.

### Random tree (RT)

The random tree algorithm is used to assess a particular number of random characteristics at every node without pruning. Random tree has little to do with machine learning however it uses arbitrary knowledge, particularly bagging (Hamoud *et al.* 2018). Every node in a random forest is finely divided into the arbitrarily chosen forerunner subsets of that node. It deals with both problems with classification and regression. Random trees are a set of forests known as tree estimators. The classification of RT is as follows: the vector input property is taken by the classifier of random trees, it classifies each tree in the forest, and extracts the category mark which receives maximum votes. Average response is the response of the classifier of all the trees in the forest in the denial period (Cutler *et al.* 2012). RTs are basically an amalgamation of two algorithms existing in machine learning: RF principles and single model trees. Model trees are decision trees which represent the linear pattern on the basis of which each leaf is designed for the local subdomain that this leaf represents. The performance of single stable trees has been shown to significantly improve RFs. Basically there are two methods that create tree diversity: first, by removing each tree, the training data is sampled, as in bagging; second, instead of always calculating the best possible division for each node when growing a tree, only one random subset of all attributes is considered for each node, and the best part of that subset is determined.For the first time, Random Model trees merge random forests and model trees. RTs employ this result for dividing criteria and thus encourage considerable balanced trees where a spherical ridge environment runs on all leaves, thus simplifying the optimization method (Barddal *et al.* 2019).

### Performance evaluation indices

*P*are the actual values and

_{i}*R*are the predicted values of manning coefficient, while are the mean of actual and predicted values, and

_{i}*N*are the number of observations.

### Data set

A total of 42 observations were used for this investigation. The complete data set was separated into two portions; the first portion is 70% of total data and the remainder is the second portion (Bhoria *et al.* 2021). The first portion (29 observations) was used for developing the model and the second portion (13 observations) was for validation. The range, features and statistical description of both portions are shown in Table 1. R, S and D84 are independent variables so these are selected as input variable whereas *n* is selected as output in the model development and validation stages. Figure 2 shows the correlation plot of all input and output variables.

Statistic . | First portion . | Second portion . | ||||||
---|---|---|---|---|---|---|---|---|

R
. | S
. | D84
. | n
. | R
. | S
. | D84
. | n
. | |

Minimum | 0.1500 | 0.0020 | 0.0910 | 0.0280 | 0.1800 | 0.0020 | 0.0910 | 0.0300 |

Maximum | 1.6800 | 0.0340 | 0.7920 | 0.1590 | 1.2300 | 0.0310 | 0.6100 | 0.1030 |

Mean | 0.6507 | 0.0122 | 0.3647 | 0.0667 | 0.6015 | 0.0102 | 0.3040 | 0.0528 |

Standard deviation | 0.3948 | 0.0097 | 0.1737 | 0.0378 | 0.3711 | 0.0096 | 0.1498 | 0.0208 |

Kurtosis | −0.1845 | −0.7745 | 0.8270 | 0.0211 | −0.8385 | 0.6385 | −0.1525 | 1.5271 |

Skewness | 0.5913 | 0.6759 | 0.9192 | 1.0281 | 0.7582 | 1.3583 | 0.6203 | 1.2408 |

Confidence level (95%) | 0.1502 | 0.0037 | 0.0661 | 0.0144 | 0.2242 | 0.0058 | 0.0905 | 0.0126 |

Statistic . | First portion . | Second portion . | ||||||
---|---|---|---|---|---|---|---|---|

R
. | S
. | D84
. | n
. | R
. | S
. | D84
. | n
. | |

Minimum | 0.1500 | 0.0020 | 0.0910 | 0.0280 | 0.1800 | 0.0020 | 0.0910 | 0.0300 |

Maximum | 1.6800 | 0.0340 | 0.7920 | 0.1590 | 1.2300 | 0.0310 | 0.6100 | 0.1030 |

Mean | 0.6507 | 0.0122 | 0.3647 | 0.0667 | 0.6015 | 0.0102 | 0.3040 | 0.0528 |

Standard deviation | 0.3948 | 0.0097 | 0.1737 | 0.0378 | 0.3711 | 0.0096 | 0.1498 | 0.0208 |

Kurtosis | −0.1845 | −0.7745 | 0.8270 | 0.0211 | −0.8385 | 0.6385 | −0.1525 | 1.5271 |

Skewness | 0.5913 | 0.6759 | 0.9192 | 1.0281 | 0.7582 | 1.3583 | 0.6203 | 1.2408 |

Confidence level (95%) | 0.1502 | 0.0037 | 0.0661 | 0.0144 | 0.2242 | 0.0058 | 0.0905 | 0.0126 |

## RESULTS AND DISCUSSION

For the accurate prediction of the coefficient of Manning's roughness for features for hydraulic designing, soft computing and regression-based modelling approaches are used in the current investigation. The test performance of all the implemented models was carried out by utilizing three standard statistical parameters, CC, RMSE and MAE. The lower values of MAE and RMSE values show higher model accuracy, and higher CC values show higher model accuracy. The range of CC is −1 to 1. WEKA 3.9 software was implemented for the model development and validation in this study. A trial and error method was employed for model preparation. The ideal values of the first parameters was achieved after a number of tests. There are well-defined statistical criteria for selecting and defining first parameters that are unique to the model.

### Assessment of M5P tree based modelling approach

The M5P modelling approach is developed by a trial and error processs. This model utilises linear regression models for defining an input-output relationship which is based on division of the parameter space of the data into several subspaces. In this investigation pruned and unpruned both models were developed for prediction of the coefficient of Manning's roughness for hydraulic designing. Developed linear equations using pruned and unpruned M5P based models are listed in Tables 2 and 3 respectively.

LM num . | Equation . |
---|---|

1 | n = 0.0258 * R + 3.1029 * S + 0.012 |

LM num . | Equation . |
---|---|

1 | n = 0.0258 * R + 3.1029 * S + 0.012 |

M5 unpruned model tree: (using smoothed linear models) . | LM num . | Equations . |
---|---|---|

S < = 0.008 : | R < = 0.615 : | | S < = 0.004 : | | | R < = 0.45 : LM1 (3/10.087%)| | | R > 0.45 : LM2 (2/2.696%)| | S > 0.004 : LM3 (2/4.044%)| R > 0.615 : | | D84< = 0.259 : LM4 (2/17.522%) | | D84 > 0.259 : | | | R < = 1.01 : LM5 (3/6.354%) | | | R > 1.01 : LM6 (2/5.391%) S > 0.008 : | R< = 0.37 : | | R < = 0.165 : LM7 (2/12.131%) | | R > 0.165 : LM8 (3/33.016%) | R > 0.37 : | | S< = 0.025 : | | | R < = 0.89 : | | | | R< = 0.625 : LM9 (2/26.957%) | | | | R > 0.625 : LM10 (2/53.915%) | | | R > 0.89 : LM11 (3/1.271%) | | S > 0.025 : LM12 (3/46.501%) | 1 | n = 0.0197 * R + 2.3019 * S + 0.0188 |

2 | n = 0.0197 * R + 2.3019 * S + 0.0188 | |

3 | n = 0.0202 * R + 2.4249 * S + 0.0184 | |

4 | n = 0.017 * R + 1.6049 * S + 0.0234 | |

5 | n = 0.0157 * R + 1.6049 * S + 0.0244 | |

6 | n = 0.0156 * R + 1.6049 * S + 0.0245 | |

7 | n = 0.0286 * R + 2.4454 * S + 0.0196 | |

8 | n = 0.0286 * R + 2.4454 * S + 0.0201 | |

9 | n = 0.0254 * R + 2.6391 * S + 0.0253 | |

10 | n = 0.0254 * R + 2.6391 * S + 0.0253 | |

11 | n = 0.0254 * R + 2.6391 * S + 0.025 | |

12 | n = 0.0254 * R + 2.7219 * S + 0.0251 |

M5 unpruned model tree: (using smoothed linear models) . | LM num . | Equations . |
---|---|---|

S < = 0.008 : | R < = 0.615 : | | S < = 0.004 : | | | R < = 0.45 : LM1 (3/10.087%)| | | R > 0.45 : LM2 (2/2.696%)| | S > 0.004 : LM3 (2/4.044%)| R > 0.615 : | | D84< = 0.259 : LM4 (2/17.522%) | | D84 > 0.259 : | | | R < = 1.01 : LM5 (3/6.354%) | | | R > 1.01 : LM6 (2/5.391%) S > 0.008 : | R< = 0.37 : | | R < = 0.165 : LM7 (2/12.131%) | | R > 0.165 : LM8 (3/33.016%) | R > 0.37 : | | S< = 0.025 : | | | R < = 0.89 : | | | | R< = 0.625 : LM9 (2/26.957%) | | | | R > 0.625 : LM10 (2/53.915%) | | | R > 0.89 : LM11 (3/1.271%) | | S > 0.025 : LM12 (3/46.501%) | 1 | n = 0.0197 * R + 2.3019 * S + 0.0188 |

2 | n = 0.0197 * R + 2.3019 * S + 0.0188 | |

3 | n = 0.0202 * R + 2.4249 * S + 0.0184 | |

4 | n = 0.017 * R + 1.6049 * S + 0.0234 | |

5 | n = 0.0157 * R + 1.6049 * S + 0.0244 | |

6 | n = 0.0156 * R + 1.6049 * S + 0.0245 | |

7 | n = 0.0286 * R + 2.4454 * S + 0.0196 | |

8 | n = 0.0286 * R + 2.4454 * S + 0.0201 | |

9 | n = 0.0254 * R + 2.6391 * S + 0.0253 | |

10 | n = 0.0254 * R + 2.6391 * S + 0.0253 | |

11 | n = 0.0254 * R + 2.6391 * S + 0.025 | |

12 | n = 0.0254 * R + 2.7219 * S + 0.0251 |

Results of the M5P model to predict *n* is shown in Figure 3 for model development as well as the validation stage. Results of the performance evaluation indices concludes that the pruned M5P based modelling approach is more accurate than the unpruned M5P based modelling approach for predicting *n* for hydraulic designing with CC = 0.7858, 0.7910, RMSE = 0.0195, 0.0195, and MAE = 0.0157, 0.0165 for model development and validation period, correspondingly. Overall, assessing Figure 3 and Table 4 (performance evaluation parameters) suggests that both pruned and unpruned M5P based modeling approaches are suitable for predicting *n* for hydraulic designing.

Models . | First portion (Model development) . | Second portion (Validation stage) . | ||||
---|---|---|---|---|---|---|

CC . | MAE . | RMSE . | CC . | MAE . | RMSE . | |

M5P_pruned | 0.8533 | 0.0133 | 0.0193 | 0.7910 | 0.0165 | 0.0195 |

M5P_unpruned | 0.8938 | 0.0115 | 0.0167 | 0.7858 | 0.0157 | 0.0195 |

RF | 0.9706 | 0.0064 | 0.0096 | 0.7581 | 0.0165 | 0.0203 |

RT | 1.0000 | 0.0001 | 0.0004 | 0.7755 | 0.0221 | 0.0281 |

Models . | First portion (Model development) . | Second portion (Validation stage) . | ||||
---|---|---|---|---|---|---|

CC . | MAE . | RMSE . | CC . | MAE . | RMSE . | |

M5P_pruned | 0.8533 | 0.0133 | 0.0193 | 0.7910 | 0.0165 | 0.0195 |

M5P_unpruned | 0.8938 | 0.0115 | 0.0167 | 0.7858 | 0.0157 | 0.0195 |

RF | 0.9706 | 0.0064 | 0.0096 | 0.7581 | 0.0165 | 0.0203 |

RT | 1.0000 | 0.0001 | 0.0004 | 0.7755 | 0.0221 | 0.0281 |

### Assessment of RF based modelling approach

Figure 4 provide plots of agreement between actual and predicted Manning's values of *n* for hydraulic designing through an RF based modelling approach for model development and validation stages, respectively. Predicted values from the RF based modelling approach lies very close to the agreement line. Table 4 shows the results of model development and validation stages in terms of CC, RMSE and MAE which indicates that the performance of the RF approach is suitable for the prediction of *n* for hydraulic designing with CC, RMSE and MAE values are 0.9706, 0.0096, and 0.0065 for model development stage and 0.7581, 0.0203 and 0.0165 for validation stage respectively.

### Assessment of RT based modelling approach

The random tree based model development process is similar to M5P and RF based models. This model is also developed using WEKA 3.9. Figure 5 provide plots of agreement between predicted and actual values of *n* for hydraulic designing through RT based modeling approach for model development and validation stages, respectively. Predicted values from RF based modelling approach lies very nearer to the agreement line. Performance evaluation indices values for the test data set indicates that the performance of the RT approach is suitable for the prediction of values of *n* for hydraulic designing with CC, RMSE, and MAE values are 1.000, 0.0004**,** 0.0001 for the training stage and 0.7755, 0.0281, and 0.221 for the testing stage respectively.

### Comparative assessment of soft computing based applied modelling approaches

Comparison of soft computing-based modelling approaches (Table 4 and Figure 6) shows that M5P based modelling approaches work better than other applied modelling approaches. To assess the potential of soft computing-based modeling approaches for predicting the values of *n* for hydraulic designing, agreement and performance graphs are plotted in Figure 6 for model development and validation stages. It is incidental from the plots that the predicted values produced by M5P based modelling approaches were in extremely near closeness to the actual values of *n* for hydraulic designing and predicted *n* for hydraulic designing are found to follow a similar pattern to that of actual values. The pruned M5P based modelling approach works better than unpruned M5P. The RF based modelling approach works better than the RT based modelling approach using this data set for the prediction the values of *n* for hydraulic designing. A box plot is also plotted in Figure 7 for the comparison of actual and predicted values using various applied designs for validation stage. Descriptive statistics of actual and applied models using validation stage are enlisted in Table 5. Figure 7 and Table 5 suggest that the pruned M5P modelling approach is outperforming in comparison to other applied modelling approaches. Minimum and maximum values of actual and predicted values using pruned M5P model is very close. In Figure 7, widths of the lower and upper quartile are almost the same which recommends that the pruned M5P modelling approach is most suitable for the prediction the values of *n* for hydraulic designing.

Statistic . | M5P_pruned . | M5P_unpruned . | RF . | RT . |
---|---|---|---|---|

Minimum | −0.0350 | −0.0370 | −0.0340 | −0.0560 |

Maximum | 0.0260 | 0.0270 | 0.0320 | 0.0300 |

1st quartile | −0.0220 | −0.0210 | −0.0240 | −0.0330 |

Median | −0.0030 | −0.0040 | −0.0060 | −0.0060 |

3rd quartile | 0.0070 | 0.0040 | 0.0020 | 0.0020 |

Mean | −0.0066 | −0.0058 | −0.0078 | −0.0127 |

Statistic . | M5P_pruned . | M5P_unpruned . | RF . | RT . |
---|---|---|---|---|

Minimum | −0.0350 | −0.0370 | −0.0340 | −0.0560 |

Maximum | 0.0260 | 0.0270 | 0.0320 | 0.0300 |

1st quartile | −0.0220 | −0.0210 | −0.0240 | −0.0330 |

Median | −0.0030 | −0.0040 | −0.0060 | −0.0060 |

3rd quartile | 0.0070 | 0.0040 | 0.0020 | 0.0020 |

Mean | −0.0066 | −0.0058 | −0.0078 | −0.0127 |

The Taylor diagram is a graphical illustration of the performance of developed models in terms of correlation, RMSE and standard deviation is shown in Figure 8. Figure 8 indicates that pruned M5P is the best performing model and the performance of the RT model is least in the prediction of the coefficient of Manning's roughness for hydraulic designing. Furthermore, a comparison of the obtained results was carried out with the previous published studies (Jarrett 1984; Azamathulla & Jarrett 2012). This comparison is done on the basis of the obtained values of CC and RMSE. The details of the comparison are given in Figure 9 which suggests that the results obtained from the M5P are much better than Jarrett (1984) and Azamathulla & Jarrett (2012). The values of CC and RMSE for the M5P_purned are 0.791 and 0.0195 which is better than M5P_unpurned (0.785 and 0.0195); Azamathulla & Jarrett (2012) (0.745 and 0.978) and Jarrett (1984) (0.58 and 1.205). Thus, the M5P model is a general model for the study area and gives closer values to the observed values as compared to other models as well as the previous studies. The M5P model is used to represent the coefficient of Manning's roughness of the study area and it saves time and effort in comparison with experimentation and conventional models.

## CONCLUSION

In this study, M5P, RT and RF based modeling approaches have been developed for the prediction of the coefficient of Manning's roughness for hydraulic designing.The comparison analysis using performance evaluation indices conclude that the pruned M5P approach out-performed the RF and RT based modelling approaches using given data set with CC = 0.7858, 0.7910, RMSE = 0.0195, 0.0195, and MAE = 0.0157, 0.0165 for model development and validation period, correspondingly. The training and testing results of the M5P model encourage the utility of this method relative to the other tested methods and show a good potential in representing the Manning coefficient of the study area. In M5P based models, the pruned model works better than the unpruned model. The Taylor diagram and box plot also suggest that M5P based modelling approaches work better than RT and RF based modelling approaches for predicting the coefficient of Manning's roughness for hydraulic designing using this data set.

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.