In this research, the estimation of discharge in compound open channels with convergent and divergent floodplains using soft computing methods, including the neural fuzzy group method of data handling (NF-GMDH), support vector regression (SVR), and M5 tree algorithm were performed. For this purpose, the geometric and hydraulic characteristics of the flow, including relative roughness (ff), relative area (Ar), relative hydraulic radius (Rr), relative dimension of the flow aspects (δ*), relative width (β), relative flow depth (Dr), relative longitudinal distance (Xr), convergent or divergent angle (θ) of the floodplain and longitudinal slope (So) of the bed were used as input variables and discharge was considered as the target (output) variable. The results showed that the statistical indices of the NF-GMDH in the testing stage are RMSENF-GMDH = 0.004, R2NF-GMDH = 0.923 and in the same stage for SVR are RMSESVR= 0.002 and R2SVR = 0.941 and finally for M5 tree algorithm are RMSEM5 = 0.002, R2M5= 0.931. The evaluation of the structure of the M5 tree algorithm showed that the most effective parameters are ff, Dr, Rr, δ*, and θ which confirm the important parameters specified by MARS, GMDH, and GEP algorithms used by previous researchers.

  • Comparing the NF-GMDH, ANFIS, SVM, GEP, MARS, and M5 Algorithm for prediction of discharge in compound channels with convergent and divergent floodplains.

Q

flow discharge

GMDH

group method of data handling

NF-GMDH

neuro-fuzzy group method of data handling

SVM

support vector machine

SVR

support vector regression

MLPNN

multilayer perceptron neural networks

MARS

multivariate adaptive regression splines

GEP

gene expression programming

Dr

relative flow depth

relative longitudinal distance

relative area

relative roughness

relative hydraulic radius

relative dimension of the flow aspects

θ

angle of divergence and convergence of the section

relative width

longitudinal slope

R2

coefficient of explanation

RMSE

root mean square error

The study of the flow characteristics of rivers has always been one of the most important issues in hydraulic engineering. The section of rivers always changes as they pass through different paths, both in the mountains and in the plains. Normally, the flow in rivers is steady and nonuniform. However, with the occurrence of floods, the flow conditions become unsteady and nonuniform. This makes the hydraulic flow in rivers more complicated (Graf & Altinakar 1998). In addition, rivers, especially in the plains, twist and turn along their course, adding to the aforementioned complications. Nowadays, the concepts of the compound open channel are used for hydraulic modeling of the river, as this approach considers both parts of the main channel and the floodplain (Sahu 2011). Usually, the flow velocity in the floodplain is lower than in the main channel; this causes sedimentation and, as a result, the floodplain is rougher than the main channel. Due to the different velocities between the main channel and the floodplains, a shear stress is created at the border of them, which causes the formation of eddies (Mohanta et al. 2020). Several types of research have been carried out in the field of hydraulics of compound open channels (Singh & Tang 2020; Kumar Singh et al. 2022), starting with the study of the flow structure in open channels with prismatic floodplains (Naik et al. 2017), and then followed by the study of nonprismatic floodplains (Singh et al. 2019a, 2019b), including skewed, convergent, and divergent floodplains, and nowadays the meandering compound open channels are of interest. Sellin (1964) showed that the interaction of the flow in the main channel and the floodplains cause eddies at their boundaries, which results in a loss of flow energy and a corresponding decrease in total discharge.

Mohanty et al. (2011) studied the shear stress variations in the compound channel, focusing on the boundary between the main channel and the floodplain; their studies showed that the shear stress layer depends on the geometric and hydraulic conditions of the flow. They stated that as the relative width ratio (ratio of the width of the floodplain to the main channel) increases, the shear stress value decreases. Bousmar et al. (2006) studied the flow hydraulics in a compound open channel with convergent floodplains. The results of their research showed that at high relative depths, lateral mass transfer in the last half of the convergent region is greater than in the first half. Naik et al. (2017) studied the hydraulics of flow in a compound channel with a nonprismatic floodplain. The results of their research showed that the average depth velocity and boundary shear stress increase along the channel convergence. In addition to laboratory studies, the numerical modeling of flow in prismatic and nonprismatic open channels has also been noted by researchers. Rezaei & Knight (2009) investigated the accuracy of the SKM (Shiono and Knight model) in compound open channels with nonprismatic floodplains. They found that this method was not accurate enough for hydraulic modeling of flow in such sections. They modified the SKM model and presented the modified Shiono-Knight model (M-SKM) to estimate flow parameters, including depth average velocity and boundary shear stress, and to determine the stage–discharge relationship.

Nowadays, due to the weak accuracy of numerical models, researchers have used soft computing methods to model and estimate the hydraulic parameters of flow, especially in compound open channels with nonprismatic floodplains (Das & Khatua 2018; Kaushik & Kumar 2022, 2023; Naik et al. 2022; Bijanvand et al. 2023). For example, the flow discharge in the compound open channel with prismatic floodplains has been predicted by an artificial neural network (Sahu 2011), fuzzy adaptive neural network model (Parsaie et al. 2017; Das et al. 2020), multivariate adaptive regression splines (Parsaie & Haghiabi 2017), and gene expression programming (Das et al. 2021), and also the discharge in the meander open channel has been predicted by MARS model by Mohanta et al. (2020) and Pradhan & Khatua (2019) and finally, the discharge in the compound channel with convergent and divergent floodplains was estimated using soft computing models by Yonesi et al. (2022).

The literature review shows that the hydraulic study of the compound open channel is mainly based on laboratory experiments, but numerical modeling has been noted; however, according to the reports, its accuracy is not enough accurate in floodplains with complex geometry. On the other hand, researchers have tried to use soft computing methods to estimate the flow characteristics in such waterways. According to the reports, their accuracy was reasonable in all types of compound open channels. For example, MLPNN, ANFIS, MARS, and GEP models have been utilized for flow discharge prediction in compound open channels.

An essential point in the development of the neuro-fuzzy model is the use of fuzzy logic in the development of the neural network model to increase its reliability. The neuro-fuzzy and GMDH models have been successfully used separately to estimate flow in compound open channels with divergent and convergent floodplains. In addition, tree family algorithms such as MARS have been used successfully. The remarkable point in the development of the GMDH model is the simplicity and clarity of its structure. Considering the confirmation of the proper accuracy of the GMDH model, this research has tried to use the concept of fuzzy logic to increase its reliability. In addition to the tree models, the M5 model, which develops a simpler structure in modeling complex processes, was also investigated.

Therefore, in this research, the development of neural fuzzy group method of data handling (NF-GMDH), support vector machine (SVM) model, and M5 tree algorithm were considered to estimate the discharge in compound open channels with convergent and divergent floodplains. In this regard, two scenarios including the development of mentioned soft computing models based on important parameters and the development based on all involved parameters are considered.

In this part, the parameters involved in predicting the discharge in the compound open channels with divergent and convergent floodplains are reviewed. Then, the statistical characteristics of the collected data are calculated. The soft computing models used in this research including the NF-GMDH, the SVR, and the M5 algorithm are reviewed. Finally, the strategies considered for the modeling of discharges are presented.

Compound open channels with convergent and divergent floodplains

A view of compound open channels with nonprismatic floodplains is shown in Figure 1. When studying the hydraulics of flow in such compound open channels, the flow discharge is proportional to the relative roughness (), relative area (), relative hydraulic radius (), relative dimension of the flow aspects (), relative width (), relative flow depth (), relative longitudinal distance (), convergent or divergent angle () of the floodplain and longitudinal slope () of the bed. Therefore, in the present study, nine dimensionless input parameters presented in Equation (1) were considered for the development of SVR, NF-GMDH, and M5 algorithms.
(1)
Figure 1

A view of the compound channel with non-prismatic floodplains (Mohanta 2014).

Figure 1

A view of the compound channel with non-prismatic floodplains (Mohanta 2014).

Close modal

To develop the mentioned soft computing methods, the data related to the mentioned parameters were collected from Bousmar (2002), Bousmar et al. (2006), Rezaei (2006), Yonesi et al. (2013) and Naik & Khatua (2016) and their statistical characteristics are given in Table 1.

Table 1

Range of collected dataset

SourceRangeffArRrDS010−3δ*Q
Rezaei (2006)  Max 0.830 9.760 4.590 0.522 2.003 6.540 3.020 1.000 − 3.81 0.040 
Min 0.070 0.873 0.869 0.114 0.905 0.366 0.000 0.004 
St div 0.278 2.275 1.219 0.143 1.940 0.900 0.308 0.009 
Avg 0.591 2.716 2.505 0.305 4.313 2.043 0.482 0.018 
Median 0.719 2.240 2.610 0.348 5.150 2.263 0.500 0.017 
Bousmar (2002)  Max 0.837 10.720 4.400 0.538 0.9 6.360 3.000 0.833 −11.3 0.020 
Min 0.059 0.930 0.591 0.101 0.808 0.571 0.000 −3.81 0.003 
St div 0.295 3.505 1.185 0.161 1.831 0.866 0.253  0.005 
Avg 0.605 3.734 2.248 0.345 3.980 1.884 0.256  0.012 
Median 0.747 2.514 2.392 0.416 4.585 2.167 0.188  0.012 
Bousmar et al. (2006)  Max 0.832 12.800 4.200 0.539 0.9 6.290 3.000 1.000 5.71 0.020 
Min 0.052 0.950 0.561 0.102 0.819 0.380 0.167 3.81 0.003 
St div 0.289 4.084 1.187 0.152 1.926 0.805 0.297  0.006 
Avg 0.592 4.591 2.361 0.321 4.166 1.694 0.541  0.013 
median 0.722 3.016 2.663 0.347 4.931 1.834 0.583  0.016 
Yonesi et al. (2013)  Max 0.806 20.600 35.090 0.364 0.88 1.900 3.000 1.000 11.31 0.062 
Min 0.143 1.370 1.910 0.103 0.229 0.576 0.096 3.81 0.011 
St div 0.236 5.825 10.046 0.096 0.622 0.866 0.285  0.018 
Avg 0.482 6.128 10.192 0.224 1.373 1.903 0.359  0.043 
Median 0.552 4.224 6.600 0.252 1.656 2.165 0.257  0.051 
Naik & Khatua (2016)  Max 0.716 22.590 7.260 0.325 1.1 4.450 1.800 0.595 −13.38 0.045 
Min 0.047 3.545 0.850 0.059 0.293 0.178 0.000 −5 0.003 
St div 0.248 6.134 1.849 0.094 1.489 0.622 0.190  0.015 
Avg 0.519 8.665 3.604 0.199 3.138 1.353 0.212  0.031 
Median 0.634 7.060 3.595 0.227 3.795 1.647 0.193  0.037 
SourceRangeffArRrDS010−3δ*Q
Rezaei (2006)  Max 0.830 9.760 4.590 0.522 2.003 6.540 3.020 1.000 − 3.81 0.040 
Min 0.070 0.873 0.869 0.114 0.905 0.366 0.000 0.004 
St div 0.278 2.275 1.219 0.143 1.940 0.900 0.308 0.009 
Avg 0.591 2.716 2.505 0.305 4.313 2.043 0.482 0.018 
Median 0.719 2.240 2.610 0.348 5.150 2.263 0.500 0.017 
Bousmar (2002)  Max 0.837 10.720 4.400 0.538 0.9 6.360 3.000 0.833 −11.3 0.020 
Min 0.059 0.930 0.591 0.101 0.808 0.571 0.000 −3.81 0.003 
St div 0.295 3.505 1.185 0.161 1.831 0.866 0.253  0.005 
Avg 0.605 3.734 2.248 0.345 3.980 1.884 0.256  0.012 
Median 0.747 2.514 2.392 0.416 4.585 2.167 0.188  0.012 
Bousmar et al. (2006)  Max 0.832 12.800 4.200 0.539 0.9 6.290 3.000 1.000 5.71 0.020 
Min 0.052 0.950 0.561 0.102 0.819 0.380 0.167 3.81 0.003 
St div 0.289 4.084 1.187 0.152 1.926 0.805 0.297  0.006 
Avg 0.592 4.591 2.361 0.321 4.166 1.694 0.541  0.013 
median 0.722 3.016 2.663 0.347 4.931 1.834 0.583  0.016 
Yonesi et al. (2013)  Max 0.806 20.600 35.090 0.364 0.88 1.900 3.000 1.000 11.31 0.062 
Min 0.143 1.370 1.910 0.103 0.229 0.576 0.096 3.81 0.011 
St div 0.236 5.825 10.046 0.096 0.622 0.866 0.285  0.018 
Avg 0.482 6.128 10.192 0.224 1.373 1.903 0.359  0.043 
Median 0.552 4.224 6.600 0.252 1.656 2.165 0.257  0.051 
Naik & Khatua (2016)  Max 0.716 22.590 7.260 0.325 1.1 4.450 1.800 0.595 −13.38 0.045 
Min 0.047 3.545 0.850 0.059 0.293 0.178 0.000 −5 0.003 
St div 0.248 6.134 1.849 0.094 1.489 0.622 0.190  0.015 
Avg 0.519 8.665 3.604 0.199 3.138 1.353 0.212  0.031 
Median 0.634 7.060 3.595 0.227 3.795 1.647 0.193  0.037 

Neuro-fuzzy group method of data handling

To explain the NF-GMDH model, it is first necessary to explain the GMDH model and then review the changes made NF-GMDH model. The GMDH method was first proposed by Ivakhnenko (1971) to analyze systems of high complexity. The model included input, hidden, and output layers. This approach is structurally similar to multilayer perceptron artificial neural networks (MLPNN), except that the number of layers and neurons is determined by a predetermined criterion. The GMDH algorithm has been widely used to solve various hydraulic engineering problems (Yarahmadi et al. 2023). Ivakhnenko (1971) developed the GMDH theory using Kolmogorov–Gabor polynomials. The relationship between the input and output parameters of each system can be expressed by a set of Volterra functions, which are similar to the discretized Kolmogorov–Gabor polynomials, as given in the following equation
(2)
where and are the vectors of input parameters and weight coefficients, respectively. Using the power of MLPNN, a second-degree polynomial for each pair of input parameters was proposed. He also found that a quadratic polynomial in a network of perceptrons can form a Kolmogorov–Gabor polynomial. This method is more accurate than the MLPNN because, in the GMDH algorithm, the calculations performed in each neuron are classified as useful and nonuseful data. The GMDH structure is created in the form of a multilayer feed-forward neural network with some support neurons. Each neuron has two inputs. The relationship between the input and output variables in each neuron can be linear or nonlinear polynomial using the stimulus function described in the following equation.
(3)
In the GMDH model, pairs of input parameters are considered to produce second-degree polynomials in the neurons of the first layer. Therefore, the number of neurons in the first layer is equal to . In each layer, the following criterion is used to select the best neurons.
(4)
where and are observed and predicted output, respectively.

Support vector machine

Support vector machines (SVMs) are a set of related supervised learning methods used for classification and regression. In many applications, a nonlinear classifier provides better accuracy (Vapnik 1999). In SVR, the input x is first mapped onto an m-dimensional feature space using some fixed (nonlinear) mapping, and then a linear model is constructed in this feature space. The naive way of making a nonlinear classifier out of a linear classifier is to map our data from the input space x to a feature space F using a nonlinear function . In the space F, the discriminant function is:
(5)
Using mathematical notation, the linear model (in the feature space) f(x, w) is given by
(6)
(7)
(8)
In the feature space, F this expression takes the form:
(9)
(10)
(11)

There are many kernel functions in SVM, so how to select a good kernel function is also a research issue. However, for general purposes, there are some popular kernel functions.

  • I.

    Linear kernel:

  • II.

    Polynomial kernel:

  • III.

    Radial basis function (RBF) kernel:

  • IV.

    Sigmoid kernel:

It is well known that SVM generalization performance (estimation accuracy) depends on a good setting of the meta-parameters C, γ, and r and the kernel parameters. The choice of C, γ, and r controls the complexity of the prediction (regression) model. The problem of optimal parameter selection is further complicated because the complexity of the SVM model (and hence its generalization performance) depends on all three parameters. Kernel functions are used to change the dimensionality of the input space to perform the classification.

M5 tree model

Mathematical modeling of engineering test results using an empirical relationship can be highly error-prone and does not provide a good view of local variations. Identifying homogeneous regions and providing simple linear relationships for each of these regions increases the accuracy of the model. Therefore, to solve complex problems, it is better to break the problem into many smaller and simpler problems and then combine the answers. This idea is the basic definition of models based on classification and clustering. The M5 tree algorithm also divides the complex space into many categories and tries to model them in a simple way using a simple formula. The M5 tree model or M5 tree algorithm was first introduced to the world of engineering computing by Quinlan (1992). Later this algorithm was modified and improved by Wang & Witten (1996). The M5 tree algorithm divides the problem space into two categories and fits a linear formula (one or more variables) to each of the categories. It should be noted that the linear formula fitted to each of the categories is specific to that category and cannot be generalized to other categories, even if they are in the neighborhood. This algorithm uses the standard deviation parameter of the target variable as the error measure in each node and creates a sub-branch in that node. The M5 tree algorithm forms a data tree by branch expansion, which is known as a decision tree. It should be noted that this tree is upside down so that its root is at the top and its branches are at the bottom. In other words, this tree evolves upside down, from top to bottom. In this way, the variable that causes the greatest reduction in the standard deviation (SDR) is considered the base parameter for creating the branch (Equation (12)).
(12)
where T includes the samples that have reached the desired node, Ti is the number of data obtained by dividing the desired node based on the selected attribute. Sd is also the standard deviation. The M5 tree algorithm examines all possible scenarios for creating a branch based on a specific attribute and finally selects an option that can increase the error function compared to other scenarios. Once the tree is complete, a multivariate linear regression model is fitted to the samples in each internal node subtree. Figure 2 shows examples of the M5 tree algorithm.
Figure 2

Examples of the M5 tree algorithm (a) dividing the space of input parameters (X1 × X2) into six areas and (b) expression of criteria for dividing the space of input parameters in the form of a tree.

Figure 2

Examples of the M5 tree algorithm (a) dividing the space of input parameters (X1 × X2) into six areas and (b) expression of criteria for dividing the space of input parameters in the form of a tree.

Close modal
During the construction of the tree model by the M5 tree algorithm, the separation process in the partition nodes may be repeated many times, resulting in a very large tree. In this case, the accuracy of the model for training data increases as the number of tree branches increases, which can lead to overtraining. To overcome this problem, pruning of the tree structure is used. Sometimes, pruning some of the weaker branches improves the predictive power of the model. The pruning method uses the expected error estimate obtained at each node for each piece of training data. After a series of pruning operations, a break (discontinuity) between adjacent linear models is created in the leaves of the pruned tree, which causes a loss of integrity of the system, especially for some models built using small training data. The smoothing process aims to smooth out these discontinuities. Therefore, the estimated value in each of the leaves takes a return path from the leaf to the root, and by combining the model obtained in that leaf with the models in the path from the root to the corresponding leaf, a new model is obtained at each node, this value is combined with the value predicted by the linear model at that node using the following formula.
(13)

The P-value is the predicted value passed to the higher node. P′, in Equation (13), is the prediction value of the model passed from below to this node. k is the smoothing constant of the number of training samples that have reached the node and n is the corresponding node, which is 15 by default.

Modeling strategies

As shown in Equation (1), nine parameters can be used as input variables to model and predict the flow discharge in compound open channels with nonprismatic floodplains using soft computing models. Therefore, a combination of one to nine parameters can be considered in designing the pattern of input variables. Different approaches can be used to reduce the operation in knowing the best input combination. One of the ways is to use the Gamma test previously applied by Das et al. (2020). The second method is to check the structure of the developed models, such as MARS, GEP, and GMDH, which identify the most important parameters and give them more weight during the development process of the mathematical formula. In this research, two scenarios are considered, i.e. development based on the most important parameters and development based on all parameters involved. Furthermore, the coefficient of determination () and the root mean square error (RMSE) were used to check the accuracy of the models used. To develop the aforementioned models, it is necessary to first divide the collected data into two categories: training and testing. It should be noted that the number of collected data is 196, and in this research, 80% of the data were allocated to training and the remaining 20% to testing. The training data are used for calibration and the test data are used for validation. Since the collected data do not have a time series nature, training and testing were randomly assigned to each group. The range of data allocated is shown in Table 2.

Table 2

Statistical characteristics of data assigned to training soft computing models

StageRangeffArRrβS0δ*αxrθQ
Train Minimum 0.38 0.93 1.70 0.11 0.00 1.41 1.33 0.00 −13.38 0.01 
Maximum 0.84 22.59 17.69 0.54 0.00 6.54 3.02 1.00 11.31 0.06 
Average 0.70 4.45 3.30 0.34 0.00 4.35 2.08 0.44  0.02 
Variance 0.01 13.94 3.74 0.01 0.00 1.57 0.29 0.10  0.00 
Test Minimum 0.31 0.94 1.72 0.15 0.00 1.44 1.33 0.00 −13.38 0.01 
Maximum 0.84 20.60 35.09 0.53 0.00 6.43 3.02 1.00 11.31 0.06 
Average 0.69 4.05 4.36 0.33 0.00 4.62 2.20 0.34  0.02 
Variance 0.01 13.43 33.74 0.01 0.00 1.73 0.31 0.10  0.00 
StageRangeffArRrβS0δ*αxrθQ
Train Minimum 0.38 0.93 1.70 0.11 0.00 1.41 1.33 0.00 −13.38 0.01 
Maximum 0.84 22.59 17.69 0.54 0.00 6.54 3.02 1.00 11.31 0.06 
Average 0.70 4.45 3.30 0.34 0.00 4.35 2.08 0.44  0.02 
Variance 0.01 13.94 3.74 0.01 0.00 1.57 0.29 0.10  0.00 
Test Minimum 0.31 0.94 1.72 0.15 0.00 1.44 1.33 0.00 −13.38 0.01 
Maximum 0.84 20.60 35.09 0.53 0.00 6.43 3.02 1.00 11.31 0.06 
Average 0.69 4.05 4.36 0.33 0.00 4.62 2.20 0.34  0.02 
Variance 0.01 13.43 33.74 0.01 0.00 1.73 0.31 0.10  0.00 

Firstly, the results of the M5 model are presented. This model was developed based on both scenarios (introduced in the modeling strategies) and its results are presented in Equations (14) and (15). The reason for the priority of presenting the results of the M5 model compared to other models used in this research is the identification of the most important effective parameters in the development process of the M5 model. The same feature can be seen in the GMDH model, but this feature is not seen in the NF-GMDH model. Of course, it is possible to implement the fuzzy adaptive model in the conventional GMDH model, which takes advantage of the two features of identifying the most important effective and adaptive parameters at the same time.

Equation (14) is the derived mathematical form from the M5 algorithm based on the first scenario and Equation (15) is based on the second scenario. Examination of the structure of the mathematical model presented in Equation (14) shows that the most important parameters are , , , , and θ, which have been identified in the previous computational models including MARS, Gamma test, GMDH, and GEP used by previous researchers. This result shows that the modeling with the M5 model has been developed correctly. The statistical indices of the M5 model based on this first scene in the training phase are and and in the testing phase and . The statistical indices of the M5 model developed based on the second scenario in the training phase are and and in the testing phase and . Comparing the performance of the M5 model in both scenarios shows that the development based on the main effective parameters does not show significant changes in the error statistical indices in the two stages of testing and training. The development of the M5 tree model has 23 branches in the first scenario, while it has 18 branches in the second scenario. This means that in the first scenario, the developed model was based on important parameters, so in the second scenario only five branches of the initial model were pruned. The results of the M5 model in different stages of training and testing are shown in Figures 3 and 4. In these figures, the results of the M5 model in the training and testing stages are plotted against the observed data.
Figure 3

The structure of the NF-GMDH model developed to estimate discharge in compound open channels with convergent and divergent floodplains.

Figure 3

The structure of the NF-GMDH model developed to estimate discharge in compound open channels with convergent and divergent floodplains.

Close modal
Figure 4

The structure of the SVR model developed to estimate discharge in compound open channels with convergent and divergent floodplains.

Figure 4

The structure of the SVR model developed to estimate discharge in compound open channels with convergent and divergent floodplains.

Close modal
The performance of the NF-GMDH model was then considered. For this purpose, two previous scenarios were examined. The results of the NF-GMDH model based on the first scenario are shown in Table 3. As can be seen from this table, the statistical indices of this model in the training phase are and and in the testing phase and . The structure of the NF-GMDH model has been presented based on the second scenario in Figure 3. As shown in this figure, the structure of the NF-GMDH model has two hidden layers. There are five neurons in the first and second hidden layers, and one neuron in the output layer, which is the average of the responses of the five neurons considered in the previous layer. The performance of the NF-GMDH model based on the second scenario in the training and testing phases is shown in Figures 5 and 6, as well. The statistical indices of this model based on the second scenario are presented in Table 3. The statistical indices of the developed model based on the second scenario in the training phase are and and in the testing phase and . The comparison of NF-GMDH with M5 shows that in both scenarios the NF-GMDH model is partially less accurate than M5.
Table 3

Error statistics of the models developed in the training and testing stages

ModelSenarioTrain
Test
R2RMSERMSE
M5 0.979 0.002 0.957 0.002 
 0.955 0.002 0.931 0.002 
NF-GMDH 0.927 0.004 0.934 0.003 
 0.931 0.004 0.923 0.004 
SVR 0.982 0.002 0.965 0.002 
 0.971 0.002 0.941 0.002 
ModelSenarioTrain
Test
R2RMSERMSE
M5 0.979 0.002 0.957 0.002 
 0.955 0.002 0.931 0.002 
NF-GMDH 0.927 0.004 0.934 0.003 
 0.931 0.004 0.923 0.004 
SVR 0.982 0.002 0.965 0.002 
 0.971 0.002 0.941 0.002 
Figure 5

The results of the models developed in the training stage.

Figure 5

The results of the models developed in the training stage.

Close modal
Figure 6

The results of the models developed in the testing stage.

Figure 6

The results of the models developed in the testing stage.

Close modal

Following, the performance of the SVR model in estimating the flow discharge based on both scenarios was checked. The structure of the SVR model developed for the second scenario is shown in Figure 4. To develop the SVR model, different kernel functions (discussed in the Materials and Methods section) were investigated and the results showed that the radial function has better accuracy than others. The statistical indices of the SVR model in the training and testing phases are shown in Table 3. The statistical indices of the SVR model based on scenario two in the training phase are and and in the testing phase and . Comparing the performance of the NF-GMDH model with the M5 model shows that in both scenarios the accuracy of the SVR model is slightly higher than the NF-GMDH model and almost equal to the M5 model.

To better evaluate and compare the performance of the models in the training and testing phases, Taylor plots have been prepared in Figures 7 and 8, respectively. To compare the models used in this research with the main models used in previous research. The results of the models developed in this research were compared with MARS models and MLPNN. As shown in Figure 7, in the training phase, the performance of all models except the NF-GMDH model is almost equal. In the testing stage, the performance of MARS and SVR models was almost equal and better than the performance of NF-GMDH and M5 models.
Figure 7

Taylor diagram of performance of developed models in the training stage.

Figure 7

Taylor diagram of performance of developed models in the training stage.

Close modal
Figure 8

Taylor diagram of performance of developed models in the testing stage.

Figure 8

Taylor diagram of performance of developed models in the testing stage.

Close modal

This section presents the performance of the SVR, NF-GMDH, and M5 tree algorithms and then compares their statistical indices with other models proposed by previous researchers. Das et al. (2020, 2021) developed ANFIS and GEP models to estimate the discharge in compound open channels with divergent and convergent floodplains. They used the Gamma test to determine the main effective parameters. The statistical indices of the ANFIS model at the test stage were and and the statistical indices of the GEP model at the same stage were and . Yonesi et al. (2022) developed the MARS, GMDH, and MLPNN models to estimate discharge in such waterways. Taylor's diagram was used to compare these models with those used in this study (SVR, NF-GMDH, and M5 algorithms). The study of the structure of the MARS and GMDH models showed that the most important parameters are , , , , and θ, which were confirmed by the Gamma test, and the structure obtained by the M5 model.

In this research, the flow discharge in compound open channels with convergent and divergent floodplains was modeled and estimated using soft computing models including SVR, NF-GMDH, and M5 models. For this purpose, the geometric and hydraulic characteristics of the flow including relative roughness, relative area, relative hydraulic radius, relative dimensions of the flow aspects, relative width, relative depth, relative longitudinal distance, convergence or divergence angle, and longitudinal bed slope were used. The performance of the models was then compared with the ANFIS, MARS, MLPNN, and GEP models (developed by previous researchers). Two scenarios were considered for modeling and estimating flow in such watercourses. The first scenario included the development of the mentioned models based on all involved parameters and the second scenario included the development of the models based on the effective parameters. The results of this research showed that the error statistical indices of the NF-GMDH, SVR, and M5 models based on the first scenario at the testing stage are and , and , and and . Examination of the structure of MARS and GMDH, M5, and Gamma test models developed in the current research or previous research showed that the most important parameters involved in the estimation of discharge are the relative roughness, the relative depth, the relative radius, the ratio of flow dimensions aspect, and the angle of convergence or divergence of floodplains.

We are grateful to the Research Council of Urmia University.

All relevant data are available from https://cdnsciencepub.com/doi/abs/10.1139/cjce-2018-0038.

The authors declare there is no conflict.

Bousmar
D.
2002
Flow Modelling in Compound Open Channels
.
Unire de Genie Civil et Environnemental
.
Bousmar
D.
,
Proust
S.
&
Zech
Y.
2006
Experiments on the flow in a enlarging compound channel
. In:
River Flow 2006: Proceedings of the International Conference on Fluvial Hydraulics
,
6–8 September 2006
,
Lisbon, Portugal
.
Taylor and Francis
,
Leiden
,
Netherlands
, pp.
323
332
.
Das
B. S.
&
Khatua
K. K.
2018
Flow resistance in a compound channel with divergent and convergent floodplains
.
Journal of Hydraulic Engineering
144
(
8
),
04018051
.
Das
B. S.
,
Devi
K.
,
Khuntia
J. R.
&
Khatua
K. K.
2020
Discharge estimation in convergent and divergent compound open channels by using adaptive neuro-fuzzy inference system
.
Canadian Journal of Civil Engineering
47
(
12
),
1327
1344
.
Das
B.
,
Devi
K.
&
Khatua
K.
2021
Prediction of discharge in convergent and divergent compound channel by gene expression programming
.
ISH Journal of Hydraulic Engineering
27
(
4
),
385
395
.
Graf
W. H.
&
Altinakar
M. S.
1998
Fluvial Hydraulics
.
Wiley
, New York, NY, USA.
Ivakhnenko
A. G.
1971
Polynomial theory of complex systems
.
IEEE Transactions on Systems, Man, and Cybernetics
1
(
4
),
364
378
.
Mohanta
A.
2014
Flow Modelling of a Non Prismatic Compound Channel by Using CF D
.
Doctoral dissertation, NIT Rourkela, Rourkela, India
.
Mohanta
A.
,
Patra
K.
&
Pradhan
A.
2020
Enhanced channel division method for estimation of discharge in meandering compound channel
.
Water Resources Management
34
(
3
),
1047
1073
.
Mohanty
P.
,
Khatua
K. K.
&
Patra
K. C.
2011
Investigation on Shear Layer in Compound Open Channels
.
Naik
B.
&
Khatua
K. K.
2016
Boundary shear stress distribution for a convergent compound channel
.
ISH Journal of Hydraulic Engineering
22
(
2
),
212
219
.
Naik
B.
,
Khatua
K. K.
,
Wright
N. G.
&
Sleigh
A.
2017
Stage-discharge prediction for convergent compound open channels with narrow floodplains
.
Journal of Irrigation and Drainage Engineering
143
(
8
),
04017017
.
Parsaie
A.
,
Yonesi
H.
&
Najafian
S.
2017
Prediction of flow discharge in compound open channels using adaptive neuro fuzzy inference system method
.
Flow Measurement and Instrumentation
54
,
288
297
.
Pradhan
A.
&
Khatua
K. K.
2019
Discharge estimation at the apex of compound meandering channels
.
Water Resources Management
33
,
3469
3483
.
Quinlan
J. R.
1992
Learning with continuous classes
. In:
5th Australian Joint Conference on Artificial Intelligence
(A. Adams & L. Sterling, eds)
.
World Scientific
,
Hobart, Tasmania, Australia
, pp.
343
348
.
Rezaei
B.
2006
Overbank Flow in Compound Open Channels with Prismatic and Non-Prismatic Floodplains
.
University of Birmingham, Birmingham, UK
.
Sahu
M.
2011
Prediction of Flow and its Resistance in Compound Open Channel
.
NIT Rourkela
,
Rourkela, India
.
Vapnik
V. N.
1999
An overview of statistical learning theory
.
IEEE Transactions on Neural Networks
10
(
5
),
988
999
.
Wang
Y.
&
Witten
I. H.
1996
Induction of Model Trees for Predicting Continuous Classes
.
University of Waikato, Department of Computer Science
,
Hamilton, New Zealand
.
Yarahmadi
M. B.
,
Parsaie
A.
,
Shafai-Bejestan
M.
,
Heydari
M.
&
Badzanchin
M.
2023
Estimation of manning roughness coefficient in alluvial rivers with bed forms using soft computing models
.
Water Resources Management
.
Yonesi
H. A.
,
Omid
M. H.
&
Ayyoubzadeh
S. A.
2013
The hydraulics of flow in non-prismatic compound open channels
.
Journal of Civil Engineering and Urbanism
3
(
6
),
342
356
.
Yonesi
H. A.
,
Parsaie
A.
,
Arshia
A.
&
Shamsi
Z.
2022
Discharge modeling in compound open channels with non-prismatic floodplains using GMDH and MARS models
.
Water Supply
22
(
4
),
4400
4421
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).