Abstract
In this research, the estimation of discharge in compound open channels with convergent and divergent floodplains using soft computing methods, including the neural fuzzy group method of data handling (NF-GMDH), support vector regression (SVR), and M5 tree algorithm were performed. For this purpose, the geometric and hydraulic characteristics of the flow, including relative roughness (ff), relative area (Ar), relative hydraulic radius (Rr), relative dimension of the flow aspects (δ*), relative width (β), relative flow depth (Dr), relative longitudinal distance (Xr), convergent or divergent angle (θ) of the floodplain and longitudinal slope (So) of the bed were used as input variables and discharge was considered as the target (output) variable. The results showed that the statistical indices of the NF-GMDH in the testing stage are RMSENF-GMDH = 0.004, R2NF-GMDH = 0.923 and in the same stage for SVR are RMSESVR= 0.002 and R2SVR = 0.941 and finally for M5 tree algorithm are RMSEM5 = 0.002, R2M5= 0.931. The evaluation of the structure of the M5 tree algorithm showed that the most effective parameters are ff, Dr, Rr, δ*, and θ which confirm the important parameters specified by MARS, GMDH, and GEP algorithms used by previous researchers.
HIGHLIGHTS
Comparing the NF-GMDH, ANFIS, SVM, GEP, MARS, and M5 Algorithm for prediction of discharge in compound channels with convergent and divergent floodplains.
NOTATION
- Q
flow discharge
- GMDH
group method of data handling
- NF-GMDH
neuro-fuzzy group method of data handling
- SVM
support vector machine
- SVR
support vector regression
- MLPNN
multilayer perceptron neural networks
- MARS
multivariate adaptive regression splines
- GEP
gene expression programming
- Dr
relative flow depth
relative longitudinal distance
relative area
relative roughness
relative hydraulic radius
relative dimension of the flow aspects
- θ
angle of divergence and convergence of the section
relative width
longitudinal slope
- R2
coefficient of explanation
- RMSE
root mean square error
INTRODUCTION
The study of the flow characteristics of rivers has always been one of the most important issues in hydraulic engineering. The section of rivers always changes as they pass through different paths, both in the mountains and in the plains. Normally, the flow in rivers is steady and nonuniform. However, with the occurrence of floods, the flow conditions become unsteady and nonuniform. This makes the hydraulic flow in rivers more complicated (Graf & Altinakar 1998). In addition, rivers, especially in the plains, twist and turn along their course, adding to the aforementioned complications. Nowadays, the concepts of the compound open channel are used for hydraulic modeling of the river, as this approach considers both parts of the main channel and the floodplain (Sahu 2011). Usually, the flow velocity in the floodplain is lower than in the main channel; this causes sedimentation and, as a result, the floodplain is rougher than the main channel. Due to the different velocities between the main channel and the floodplains, a shear stress is created at the border of them, which causes the formation of eddies (Mohanta et al. 2020). Several types of research have been carried out in the field of hydraulics of compound open channels (Singh & Tang 2020; Kumar Singh et al. 2022), starting with the study of the flow structure in open channels with prismatic floodplains (Naik et al. 2017), and then followed by the study of nonprismatic floodplains (Singh et al. 2019a, 2019b), including skewed, convergent, and divergent floodplains, and nowadays the meandering compound open channels are of interest. Sellin (1964) showed that the interaction of the flow in the main channel and the floodplains cause eddies at their boundaries, which results in a loss of flow energy and a corresponding decrease in total discharge.
Mohanty et al. (2011) studied the shear stress variations in the compound channel, focusing on the boundary between the main channel and the floodplain; their studies showed that the shear stress layer depends on the geometric and hydraulic conditions of the flow. They stated that as the relative width ratio (ratio of the width of the floodplain to the main channel) increases, the shear stress value decreases. Bousmar et al. (2006) studied the flow hydraulics in a compound open channel with convergent floodplains. The results of their research showed that at high relative depths, lateral mass transfer in the last half of the convergent region is greater than in the first half. Naik et al. (2017) studied the hydraulics of flow in a compound channel with a nonprismatic floodplain. The results of their research showed that the average depth velocity and boundary shear stress increase along the channel convergence. In addition to laboratory studies, the numerical modeling of flow in prismatic and nonprismatic open channels has also been noted by researchers. Rezaei & Knight (2009) investigated the accuracy of the SKM (Shiono and Knight model) in compound open channels with nonprismatic floodplains. They found that this method was not accurate enough for hydraulic modeling of flow in such sections. They modified the SKM model and presented the modified Shiono-Knight model (M-SKM) to estimate flow parameters, including depth average velocity and boundary shear stress, and to determine the stage–discharge relationship.
Nowadays, due to the weak accuracy of numerical models, researchers have used soft computing methods to model and estimate the hydraulic parameters of flow, especially in compound open channels with nonprismatic floodplains (Das & Khatua 2018; Kaushik & Kumar 2022, 2023; Naik et al. 2022; Bijanvand et al. 2023). For example, the flow discharge in the compound open channel with prismatic floodplains has been predicted by an artificial neural network (Sahu 2011), fuzzy adaptive neural network model (Parsaie et al. 2017; Das et al. 2020), multivariate adaptive regression splines (Parsaie & Haghiabi 2017), and gene expression programming (Das et al. 2021), and also the discharge in the meander open channel has been predicted by MARS model by Mohanta et al. (2020) and Pradhan & Khatua (2019) and finally, the discharge in the compound channel with convergent and divergent floodplains was estimated using soft computing models by Yonesi et al. (2022).
The literature review shows that the hydraulic study of the compound open channel is mainly based on laboratory experiments, but numerical modeling has been noted; however, according to the reports, its accuracy is not enough accurate in floodplains with complex geometry. On the other hand, researchers have tried to use soft computing methods to estimate the flow characteristics in such waterways. According to the reports, their accuracy was reasonable in all types of compound open channels. For example, MLPNN, ANFIS, MARS, and GEP models have been utilized for flow discharge prediction in compound open channels.
An essential point in the development of the neuro-fuzzy model is the use of fuzzy logic in the development of the neural network model to increase its reliability. The neuro-fuzzy and GMDH models have been successfully used separately to estimate flow in compound open channels with divergent and convergent floodplains. In addition, tree family algorithms such as MARS have been used successfully. The remarkable point in the development of the GMDH model is the simplicity and clarity of its structure. Considering the confirmation of the proper accuracy of the GMDH model, this research has tried to use the concept of fuzzy logic to increase its reliability. In addition to the tree models, the M5 model, which develops a simpler structure in modeling complex processes, was also investigated.
Therefore, in this research, the development of neural fuzzy group method of data handling (NF-GMDH), support vector machine (SVM) model, and M5 tree algorithm were considered to estimate the discharge in compound open channels with convergent and divergent floodplains. In this regard, two scenarios including the development of mentioned soft computing models based on important parameters and the development based on all involved parameters are considered.
MATERIALS AND METHODS
In this part, the parameters involved in predicting the discharge in the compound open channels with divergent and convergent floodplains are reviewed. Then, the statistical characteristics of the collected data are calculated. The soft computing models used in this research including the NF-GMDH, the SVR, and the M5 algorithm are reviewed. Finally, the strategies considered for the modeling of discharges are presented.
Compound open channels with convergent and divergent floodplains
To develop the mentioned soft computing methods, the data related to the mentioned parameters were collected from Bousmar (2002), Bousmar et al. (2006), Rezaei (2006), Yonesi et al. (2013) and Naik & Khatua (2016) and their statistical characteristics are given in Table 1.
Source . | Range . | ff . | Ar . | Rr . | D . | S010−3 . | δ* . | . | . | . | Q . |
---|---|---|---|---|---|---|---|---|---|---|---|
Rezaei (2006) | Max | 0.830 | 9.760 | 4.590 | 0.522 | 2.003 | 6.540 | 3.020 | 1.000 | − 3.81 | 0.040 |
Min | 0.070 | 0.873 | 0.869 | 0.114 | 0.905 | 0.366 | 0.000 | 0.004 | |||
St div | 0.278 | 2.275 | 1.219 | 0.143 | 1.940 | 0.900 | 0.308 | 0.009 | |||
Avg | 0.591 | 2.716 | 2.505 | 0.305 | 4.313 | 2.043 | 0.482 | 0.018 | |||
Median | 0.719 | 2.240 | 2.610 | 0.348 | 5.150 | 2.263 | 0.500 | 0.017 | |||
Bousmar (2002) | Max | 0.837 | 10.720 | 4.400 | 0.538 | 0.9 | 6.360 | 3.000 | 0.833 | −11.3 | 0.020 |
Min | 0.059 | 0.930 | 0.591 | 0.101 | 0.808 | 0.571 | 0.000 | −3.81 | 0.003 | ||
St div | 0.295 | 3.505 | 1.185 | 0.161 | 1.831 | 0.866 | 0.253 | 0.005 | |||
Avg | 0.605 | 3.734 | 2.248 | 0.345 | 3.980 | 1.884 | 0.256 | 0.012 | |||
Median | 0.747 | 2.514 | 2.392 | 0.416 | 4.585 | 2.167 | 0.188 | 0.012 | |||
Bousmar et al. (2006) | Max | 0.832 | 12.800 | 4.200 | 0.539 | 0.9 | 6.290 | 3.000 | 1.000 | 5.71 | 0.020 |
Min | 0.052 | 0.950 | 0.561 | 0.102 | 0.819 | 0.380 | 0.167 | 3.81 | 0.003 | ||
St div | 0.289 | 4.084 | 1.187 | 0.152 | 1.926 | 0.805 | 0.297 | 0.006 | |||
Avg | 0.592 | 4.591 | 2.361 | 0.321 | 4.166 | 1.694 | 0.541 | 0.013 | |||
median | 0.722 | 3.016 | 2.663 | 0.347 | 4.931 | 1.834 | 0.583 | 0.016 | |||
Yonesi et al. (2013) | Max | 0.806 | 20.600 | 35.090 | 0.364 | 0.88 | 1.900 | 3.000 | 1.000 | 11.31 | 0.062 |
Min | 0.143 | 1.370 | 1.910 | 0.103 | 0.229 | 0.576 | 0.096 | 3.81 | 0.011 | ||
St div | 0.236 | 5.825 | 10.046 | 0.096 | 0.622 | 0.866 | 0.285 | 0.018 | |||
Avg | 0.482 | 6.128 | 10.192 | 0.224 | 1.373 | 1.903 | 0.359 | 0.043 | |||
Median | 0.552 | 4.224 | 6.600 | 0.252 | 1.656 | 2.165 | 0.257 | 0.051 | |||
Naik & Khatua (2016) | Max | 0.716 | 22.590 | 7.260 | 0.325 | 1.1 | 4.450 | 1.800 | 0.595 | −13.38 | 0.045 |
Min | 0.047 | 3.545 | 0.850 | 0.059 | 0.293 | 0.178 | 0.000 | −5 | 0.003 | ||
St div | 0.248 | 6.134 | 1.849 | 0.094 | 1.489 | 0.622 | 0.190 | 0.015 | |||
Avg | 0.519 | 8.665 | 3.604 | 0.199 | 3.138 | 1.353 | 0.212 | 0.031 | |||
Median | 0.634 | 7.060 | 3.595 | 0.227 | 3.795 | 1.647 | 0.193 | 0.037 |
Source . | Range . | ff . | Ar . | Rr . | D . | S010−3 . | δ* . | . | . | . | Q . |
---|---|---|---|---|---|---|---|---|---|---|---|
Rezaei (2006) | Max | 0.830 | 9.760 | 4.590 | 0.522 | 2.003 | 6.540 | 3.020 | 1.000 | − 3.81 | 0.040 |
Min | 0.070 | 0.873 | 0.869 | 0.114 | 0.905 | 0.366 | 0.000 | 0.004 | |||
St div | 0.278 | 2.275 | 1.219 | 0.143 | 1.940 | 0.900 | 0.308 | 0.009 | |||
Avg | 0.591 | 2.716 | 2.505 | 0.305 | 4.313 | 2.043 | 0.482 | 0.018 | |||
Median | 0.719 | 2.240 | 2.610 | 0.348 | 5.150 | 2.263 | 0.500 | 0.017 | |||
Bousmar (2002) | Max | 0.837 | 10.720 | 4.400 | 0.538 | 0.9 | 6.360 | 3.000 | 0.833 | −11.3 | 0.020 |
Min | 0.059 | 0.930 | 0.591 | 0.101 | 0.808 | 0.571 | 0.000 | −3.81 | 0.003 | ||
St div | 0.295 | 3.505 | 1.185 | 0.161 | 1.831 | 0.866 | 0.253 | 0.005 | |||
Avg | 0.605 | 3.734 | 2.248 | 0.345 | 3.980 | 1.884 | 0.256 | 0.012 | |||
Median | 0.747 | 2.514 | 2.392 | 0.416 | 4.585 | 2.167 | 0.188 | 0.012 | |||
Bousmar et al. (2006) | Max | 0.832 | 12.800 | 4.200 | 0.539 | 0.9 | 6.290 | 3.000 | 1.000 | 5.71 | 0.020 |
Min | 0.052 | 0.950 | 0.561 | 0.102 | 0.819 | 0.380 | 0.167 | 3.81 | 0.003 | ||
St div | 0.289 | 4.084 | 1.187 | 0.152 | 1.926 | 0.805 | 0.297 | 0.006 | |||
Avg | 0.592 | 4.591 | 2.361 | 0.321 | 4.166 | 1.694 | 0.541 | 0.013 | |||
median | 0.722 | 3.016 | 2.663 | 0.347 | 4.931 | 1.834 | 0.583 | 0.016 | |||
Yonesi et al. (2013) | Max | 0.806 | 20.600 | 35.090 | 0.364 | 0.88 | 1.900 | 3.000 | 1.000 | 11.31 | 0.062 |
Min | 0.143 | 1.370 | 1.910 | 0.103 | 0.229 | 0.576 | 0.096 | 3.81 | 0.011 | ||
St div | 0.236 | 5.825 | 10.046 | 0.096 | 0.622 | 0.866 | 0.285 | 0.018 | |||
Avg | 0.482 | 6.128 | 10.192 | 0.224 | 1.373 | 1.903 | 0.359 | 0.043 | |||
Median | 0.552 | 4.224 | 6.600 | 0.252 | 1.656 | 2.165 | 0.257 | 0.051 | |||
Naik & Khatua (2016) | Max | 0.716 | 22.590 | 7.260 | 0.325 | 1.1 | 4.450 | 1.800 | 0.595 | −13.38 | 0.045 |
Min | 0.047 | 3.545 | 0.850 | 0.059 | 0.293 | 0.178 | 0.000 | −5 | 0.003 | ||
St div | 0.248 | 6.134 | 1.849 | 0.094 | 1.489 | 0.622 | 0.190 | 0.015 | |||
Avg | 0.519 | 8.665 | 3.604 | 0.199 | 3.138 | 1.353 | 0.212 | 0.031 | |||
Median | 0.634 | 7.060 | 3.595 | 0.227 | 3.795 | 1.647 | 0.193 | 0.037 |
Neuro-fuzzy group method of data handling
Support vector machine
There are many kernel functions in SVM, so how to select a good kernel function is also a research issue. However, for general purposes, there are some popular kernel functions.
- I.
Linear kernel:
- II.
Polynomial kernel:
- III.
Radial basis function (RBF) kernel:
- IV.
Sigmoid kernel:
It is well known that SVM generalization performance (estimation accuracy) depends on a good setting of the meta-parameters C, γ, and r and the kernel parameters. The choice of C, γ, and r controls the complexity of the prediction (regression) model. The problem of optimal parameter selection is further complicated because the complexity of the SVM model (and hence its generalization performance) depends on all three parameters. Kernel functions are used to change the dimensionality of the input space to perform the classification.
M5 tree model
The P-value is the predicted value passed to the higher node. P′, in Equation (13), is the prediction value of the model passed from below to this node. k is the smoothing constant of the number of training samples that have reached the node and n is the corresponding node, which is 15 by default.
Modeling strategies
As shown in Equation (1), nine parameters can be used as input variables to model and predict the flow discharge in compound open channels with nonprismatic floodplains using soft computing models. Therefore, a combination of one to nine parameters can be considered in designing the pattern of input variables. Different approaches can be used to reduce the operation in knowing the best input combination. One of the ways is to use the Gamma test previously applied by Das et al. (2020). The second method is to check the structure of the developed models, such as MARS, GEP, and GMDH, which identify the most important parameters and give them more weight during the development process of the mathematical formula. In this research, two scenarios are considered, i.e. development based on the most important parameters and development based on all parameters involved. Furthermore, the coefficient of determination () and the root mean square error (RMSE) were used to check the accuracy of the models used. To develop the aforementioned models, it is necessary to first divide the collected data into two categories: training and testing. It should be noted that the number of collected data is 196, and in this research, 80% of the data were allocated to training and the remaining 20% to testing. The training data are used for calibration and the test data are used for validation. Since the collected data do not have a time series nature, training and testing were randomly assigned to each group. The range of data allocated is shown in Table 2.
Stage . | Range . | ff . | Ar . | Rr . | β . | S0 . | δ* . | α . | xr . | θ . | Q . |
---|---|---|---|---|---|---|---|---|---|---|---|
Train | Minimum | 0.38 | 0.93 | 1.70 | 0.11 | 0.00 | 1.41 | 1.33 | 0.00 | −13.38 | 0.01 |
Maximum | 0.84 | 22.59 | 17.69 | 0.54 | 0.00 | 6.54 | 3.02 | 1.00 | 11.31 | 0.06 | |
Average | 0.70 | 4.45 | 3.30 | 0.34 | 0.00 | 4.35 | 2.08 | 0.44 | 0.02 | ||
Variance | 0.01 | 13.94 | 3.74 | 0.01 | 0.00 | 1.57 | 0.29 | 0.10 | 0.00 | ||
Test | Minimum | 0.31 | 0.94 | 1.72 | 0.15 | 0.00 | 1.44 | 1.33 | 0.00 | −13.38 | 0.01 |
Maximum | 0.84 | 20.60 | 35.09 | 0.53 | 0.00 | 6.43 | 3.02 | 1.00 | 11.31 | 0.06 | |
Average | 0.69 | 4.05 | 4.36 | 0.33 | 0.00 | 4.62 | 2.20 | 0.34 | 0.02 | ||
Variance | 0.01 | 13.43 | 33.74 | 0.01 | 0.00 | 1.73 | 0.31 | 0.10 | 0.00 |
Stage . | Range . | ff . | Ar . | Rr . | β . | S0 . | δ* . | α . | xr . | θ . | Q . |
---|---|---|---|---|---|---|---|---|---|---|---|
Train | Minimum | 0.38 | 0.93 | 1.70 | 0.11 | 0.00 | 1.41 | 1.33 | 0.00 | −13.38 | 0.01 |
Maximum | 0.84 | 22.59 | 17.69 | 0.54 | 0.00 | 6.54 | 3.02 | 1.00 | 11.31 | 0.06 | |
Average | 0.70 | 4.45 | 3.30 | 0.34 | 0.00 | 4.35 | 2.08 | 0.44 | 0.02 | ||
Variance | 0.01 | 13.94 | 3.74 | 0.01 | 0.00 | 1.57 | 0.29 | 0.10 | 0.00 | ||
Test | Minimum | 0.31 | 0.94 | 1.72 | 0.15 | 0.00 | 1.44 | 1.33 | 0.00 | −13.38 | 0.01 |
Maximum | 0.84 | 20.60 | 35.09 | 0.53 | 0.00 | 6.43 | 3.02 | 1.00 | 11.31 | 0.06 | |
Average | 0.69 | 4.05 | 4.36 | 0.33 | 0.00 | 4.62 | 2.20 | 0.34 | 0.02 | ||
Variance | 0.01 | 13.43 | 33.74 | 0.01 | 0.00 | 1.73 | 0.31 | 0.10 | 0.00 |
RESULTS AND DISCUSSION
Firstly, the results of the M5 model are presented. This model was developed based on both scenarios (introduced in the modeling strategies) and its results are presented in Equations (14) and (15). The reason for the priority of presenting the results of the M5 model compared to other models used in this research is the identification of the most important effective parameters in the development process of the M5 model. The same feature can be seen in the GMDH model, but this feature is not seen in the NF-GMDH model. Of course, it is possible to implement the fuzzy adaptive model in the conventional GMDH model, which takes advantage of the two features of identifying the most important effective and adaptive parameters at the same time.
Model . | Senario . | Train . | Test . | ||
---|---|---|---|---|---|
R2 . | RMSE . | . | RMSE . | ||
M5 | 1 | 0.979 | 0.002 | 0.957 | 0.002 |
2 | 0.955 | 0.002 | 0.931 | 0.002 | |
NF-GMDH | 1 | 0.927 | 0.004 | 0.934 | 0.003 |
2 | 0.931 | 0.004 | 0.923 | 0.004 | |
SVR | 1 | 0.982 | 0.002 | 0.965 | 0.002 |
2 | 0.971 | 0.002 | 0.941 | 0.002 |
Model . | Senario . | Train . | Test . | ||
---|---|---|---|---|---|
R2 . | RMSE . | . | RMSE . | ||
M5 | 1 | 0.979 | 0.002 | 0.957 | 0.002 |
2 | 0.955 | 0.002 | 0.931 | 0.002 | |
NF-GMDH | 1 | 0.927 | 0.004 | 0.934 | 0.003 |
2 | 0.931 | 0.004 | 0.923 | 0.004 | |
SVR | 1 | 0.982 | 0.002 | 0.965 | 0.002 |
2 | 0.971 | 0.002 | 0.941 | 0.002 |
Following, the performance of the SVR model in estimating the flow discharge based on both scenarios was checked. The structure of the SVR model developed for the second scenario is shown in Figure 4. To develop the SVR model, different kernel functions (discussed in the Materials and Methods section) were investigated and the results showed that the radial function has better accuracy than others. The statistical indices of the SVR model in the training and testing phases are shown in Table 3. The statistical indices of the SVR model based on scenario two in the training phase are and and in the testing phase and . Comparing the performance of the NF-GMDH model with the M5 model shows that in both scenarios the accuracy of the SVR model is slightly higher than the NF-GMDH model and almost equal to the M5 model.
This section presents the performance of the SVR, NF-GMDH, and M5 tree algorithms and then compares their statistical indices with other models proposed by previous researchers. Das et al. (2020, 2021) developed ANFIS and GEP models to estimate the discharge in compound open channels with divergent and convergent floodplains. They used the Gamma test to determine the main effective parameters. The statistical indices of the ANFIS model at the test stage were and and the statistical indices of the GEP model at the same stage were and . Yonesi et al. (2022) developed the MARS, GMDH, and MLPNN models to estimate discharge in such waterways. Taylor's diagram was used to compare these models with those used in this study (SVR, NF-GMDH, and M5 algorithms). The study of the structure of the MARS and GMDH models showed that the most important parameters are , , , , and θ, which were confirmed by the Gamma test, and the structure obtained by the M5 model.
CONCLUSIONS
In this research, the flow discharge in compound open channels with convergent and divergent floodplains was modeled and estimated using soft computing models including SVR, NF-GMDH, and M5 models. For this purpose, the geometric and hydraulic characteristics of the flow including relative roughness, relative area, relative hydraulic radius, relative dimensions of the flow aspects, relative width, relative depth, relative longitudinal distance, convergence or divergence angle, and longitudinal bed slope were used. The performance of the models was then compared with the ANFIS, MARS, MLPNN, and GEP models (developed by previous researchers). Two scenarios were considered for modeling and estimating flow in such watercourses. The first scenario included the development of the mentioned models based on all involved parameters and the second scenario included the development of the models based on the effective parameters. The results of this research showed that the error statistical indices of the NF-GMDH, SVR, and M5 models based on the first scenario at the testing stage are and , and , and and . Examination of the structure of MARS and GMDH, M5, and Gamma test models developed in the current research or previous research showed that the most important parameters involved in the estimation of discharge are the relative roughness, the relative depth, the relative radius, the ratio of flow dimensions aspect, and the angle of convergence or divergence of floodplains.
ACKNOWLEDGEMENTS
We are grateful to the Research Council of Urmia University.
DATA AVAILABILITY STATEMENT
All relevant data are available from https://cdnsciencepub.com/doi/abs/10.1139/cjce-2018-0038.
CONFLICT OF INTEREST
The authors declare there is no conflict.