Abstract
In this study, the potential of soft computing techniques, namely Random Forest (RF), M5P, Multivariate Adaptive Regression Splines (MARS), and Group Method of Data Handling (GMDH), was evaluated to predict the aeration efficiency (AE20) of Parshall and Modified Venturi flumes. Experiments were conducted for 26 various Modified Venturi flumes and one Parshall flume. A total of 99 observations were obtained from experiments. The results of soft computing models were compared with regression-based models i.e., with multiple linear regression (MLR) and multiple nonlinear regression (MNLR). Results of the analysis revealed that the MARS model outperformed other soft computing and regression-based models for predicting AE20 of Parshall and Modified Venturi flumes with Pearson's correlation coefficient (CC) = 0.9997, and 0.9992, and root mean square error (RMSE) = 0.0015, and 0.0045 during calibration and validation periods, respectively. Sensitivity analysis was also carried out by using the best executing MARS model to assess the effect of individual input variables on AE20 of both flumes. Obtained results on sensitivity examination indicate that the oxygen deficit ratio (r) was the most effective input variable in predicting the AE20 of Parshall and Modified Venturi flumes.
HIGHLIGHTS
Aeration efficiency of Parshall and Modified Venturi flumes was predicted by using soft computing techniques.
M5P, RF, MARS, and GMDH models were first employed to predict the aeration efficiency.
Outcomes of soft computing models were first compared against regression-based models.
Effectiveness of applied models was evaluated using performance evaluation indicators.
The MARS-based model outperformed other models.
INTRODUCTION
There are several main sources which cause water pollution, such as poisonous substances, detergents, repellers, and products of mining, organic, and industrial wastes (Schwarzenbach et al. 2010). The quality of water in nature such as rivers and artificial systems such as canals or reservoirs are related to the existence of dissolved oxygen (DO). Therefore, DO is considered necessary for aquatic life. The ideal value of DO in natural water bodies ranges between 5 and 6 mg/L (Sánchez et al. 2007). Therefore, it is required to keep the amount of DO in this range because if it drops below 5 mg/L, this will affect the water quality, and fish will die within hours if the level of DO drops below 1–2 mg/L (Baylar et al. 2009b). On the other hand, the level of DO should be not greater than 110%, otherwise, it may be harmful to aquatic life. The physical processes that include the transmission of DO from the atmosphere to water will help to replenish the existing oxygen. The term given for this process is aeration and it is considered a significant process in wastewater treatment plants (Sangeeta & Tiwari 2019).
A hydraulic structure could be considered as the best choice to improve the water quality in a river system by aeration where it helps to raise the amount of DO. Just one hydraulic structure can provide the same amount of DO that may occur along several kilometers of river. This speedy transfer of DO happens due to a large number of bubbles that form because of entrained air with water flow, which further enhances the surface area thereby leading to mass transfer. Different types of spillways such as stepped or overflow spillways can be used for aeration purposes in a river system. On the other hand, these structures have not been considered the best solution in a straight flow canal. Therefore, other types of hydraulic structures such as drop structures and weirs are preferred. For prismatic channels, a small Parshall flume is presented as the best solution to provide the required aeration.
Several studies have reviewed the evaluation of aeration performance of hydraulic structures such as Gulliver et al. (1990), Wilhelms et al. (1993), Chanson (1995), and Ervine (1998). In the last two decades, Baylar & Bagatur (2000, 2006), Baylar et al. (2006, 2008, 2009a, 2009c), Hanbay et al. (2009), and Baylar & Emiroglu (2003) have examined sharp-crested barriers with diverse cross-sectional geometry. Their findings have revealed that the rate of air entrainment and the aeration efficacy of barriers varied depending on the barrier. The Parshall flume was designed and used for the first time by Ralph Parshall. He based his study on work by Cone in 1917, improved in 1928. After that, this cascade was used in several irrigation projects (USBR 2001; Kim et al. 2010; Dursun 2016). Recently, the Parshall flume has been utilized for several projects around the world to measure the discharge in channels and river systems.
In the last few years, Artificial Intelligence (AI) techniques have been widely applied in different fields of water resources as they are capable of resolving intricate problems that did not have a tractable solution (Aghelpour et al. 2019; Jahani & Mohammadi 2019; Sihag et al. 2019b, 2019c; Mohammadi et al. 2020; Thakur et al. 2021). AI-based models/algorithms have been employed extensively in many water resources applications, including hydrology (Banadkooki et al. 2020; Malik et al. 2020b; Tikhamarine et al. 2020c; Ghasempour et al. 2021; Sihag et al. 2021), hydraulics (Parsaie 2016; Parsaie et al. 2016, 2018, 2019; Ebtehaj et al. 2017; Najafzadeh et al. 2017; Sihag et al. 2019a), and water flow/quality (Heddam & Kisi 2017; Parsaie & Haghiabi 2017a, 2019; Haghiabi et al. 2018; Singh et al. 2019; Esmaeilbeiki et al. 2020; Pandhiani et al. 2020). Nevertheless, limited numbers of studies have considered the application of AI for the evaluation of Parshall flume aeration performance. Therefore, in this study, the M5P, Random Forest (RF), Multivariate Adaptive Regression Splines (MARS), and Group Method of Data Handling (GMDH) techniques were used to developed AI-based models to predict the aeration efficiency of Parshall and Modified Venturi flumes and compared them with regression-based models.
MATERIALS AND METHODS
Mechanism of aeration
Aeration is the most dominant factor in wastewater treatment. In aeration, oxygen is introduced into the water to increase the DO level for the survival of aquatic life. Aeration in a Parshall flume is different from in weirs. A Parshall flume consists of three portions. The first portion is made up of the upstream narrowing approach portion followed by a short and sloping throat portion from where the flow continues to the diverging downstream portion, as shown in Figure 1. The upstream portion is the largest one with no slope, the short narrow throat portion has a downstream sloping portion and the floor again rises in the downstream portion (Parshall 1926). In weirs, aeration is done by creating a hydraulic jump (Baylar & Emiroglu 2003). Figure 2 shows an explanation of the aeration process for various stages of a cascade from a weir (Tsang 1987). But the aeration process of a Parshall flume is that it accelerates the flow velocity through contracting sidewalls in the converging portion. At the throat portion, the flow velocity is accelerated from subcritical to supercritical due to contraction and drop; however at the diverging section, the flow changes from fast, supercritical to slow, subcritical following a jump at outlet of the flume resulting in aeration (Figure 1).
Aeration transfer efficiency
Multiple linear regression (MLR)
where represents the dependent (output) variable, denotes the regression coefficients, and are the independent variables.
Multiple nonlinear regression (MNLR)
Review Multivariate adaptive regression splines (MARS)
Review of group method of data handling (GMDH)
This is the discrete form of the Volterra series as presented by the Kolmogorov-Gabor polynomial (Najafzadeh & Lim 2015; Alfaifi et al. 2020).
Review of M5P tree
Review of Random Forest (RF)
The Random Forest (RF) procedure was primarily introduced by Breiman (1996). RF is a versatile approach that has been chosen to solve various nonlinear or intricate engineering issues (Mohammadi & Mehdizadeh 2020). In this technique, a substantial quantity of trees is created having different bootstrap (bagging) samples of the original dataset at the root node. The division is performed at each node using a randomly chosen subset of the estimation parameters. The RF algorithm is comparatively insensitive to features of the training set and can achieve high prediction accuracy (Breiman 2001). It entails the usage of two user-defined parameters: the number of trees cultivated (k) and the number of input characteristics (m). For model development, a trial-and-error process is used. The WEKA 3.9 software was used to develop the RF-based model in this current investigation.
Performance evaluation indicators
For the assessment of the accuracy of the implemented AI and regression based-models, six different types of performance/statistical indicators, namely Pearson's correlation coefficient (CC), scatter index (SI), mean absolute error (MAE), root mean square error (RMSE), Bias, and Nash-Sutcliffe efficiency (NSE), were considered and defined below.
Dataset
For model establishment and validation, a total of 99 experimental observations of aeration efficiency at 20 °C with Parshall and Modified Venturi flumes were used. Two separate classes of the total dataset were separated. The division method was subjective. The model production training dataset included 69 observations while the remaining 30 observations were considered in the model validation test dataset (see Appendix A: Table A1). Six independent variables were considered to be inputs – flow rate (Q), throat widths (W), throat lengths (L), sill heights (N), oxygen deficit ratio (r), and exponent (f) – while aeration efficiency at 20 °C (AE20) was observed as the target for the model establishment and validation goal. Table 1 outlines the features of the data used for model development (training) and validation.
Range . | Q . | W . | L . | N . | r . | f . | AE20 . | Dataset . |
---|---|---|---|---|---|---|---|---|
Mean | 59.0909 | 7.7273 | 15.0000 | 5.0000 | 1.1283 | 0.9170 | 0.1177 | Total |
61.5942 | 8.0797 | 14.7101 | 5.0000 | 1.1250 | 0.9171 | 0.1157 | Training | |
53.3333 | 6.9167 | 15.6667 | 5.0000 | 1.1361 | 0.9169 | 0.1225 | Testing | |
Minimum | 25.0000 | 5.0000 | 10.0000 | 2.5000 | 1.0284 | 0.9070 | 0.0300 | Total |
25.0000 | 5.0000 | 10.0000 | 2.5000 | 1.0284 | 0.9070 | 0.0300 | Training | |
25.0000 | 5.0000 | 10.0000 | 2.5000 | 1.0371 | 0.9070 | 0.0389 | Testing | |
Maximum | 100.0000 | 10.0000 | 20.0000 | 7.5000 | 1.7440 | 0.9213 | 0.4547 | Total |
100.0000 | 10.0000 | 20.0000 | 7.5000 | 1.7440 | 0.9213 | 0.4547 | Training | |
100.0000 | 10.0000 | 20.0000 | 7.5000 | 1.6641 | 0.9213 | 0.4261 | Testing | |
Standard deviation | 26.8344 | 1.9914 | 4.1033 | 2.0516 | 0.0980 | 0.0030 | 0.0641 | Total |
26.6305 | 1.8760 | 4.0114 | 2.1004 | 0.0910 | 0.0030 | 0.0599 | Training | |
26.8564 | 2.0430 | 4.3018 | 1.9696 | 0.1139 | 0.0031 | 0.0736 | Testing | |
Confidence level (95.0%) | 5.3520 | 0.3972 | 0.8184 | 0.4092 | 0.0195 | 0.0006 | 0.0128 | Total |
6.3974 | 0.4507 | 0.9636 | 0.5046 | 0.0219 | 0.0007 | 0.0144 | Training | |
10.0283 | 0.7629 | 1.6063 | 0.7355 | 0.0425 | 0.0011 | 0.0275 | Testing |
Range . | Q . | W . | L . | N . | r . | f . | AE20 . | Dataset . |
---|---|---|---|---|---|---|---|---|
Mean | 59.0909 | 7.7273 | 15.0000 | 5.0000 | 1.1283 | 0.9170 | 0.1177 | Total |
61.5942 | 8.0797 | 14.7101 | 5.0000 | 1.1250 | 0.9171 | 0.1157 | Training | |
53.3333 | 6.9167 | 15.6667 | 5.0000 | 1.1361 | 0.9169 | 0.1225 | Testing | |
Minimum | 25.0000 | 5.0000 | 10.0000 | 2.5000 | 1.0284 | 0.9070 | 0.0300 | Total |
25.0000 | 5.0000 | 10.0000 | 2.5000 | 1.0284 | 0.9070 | 0.0300 | Training | |
25.0000 | 5.0000 | 10.0000 | 2.5000 | 1.0371 | 0.9070 | 0.0389 | Testing | |
Maximum | 100.0000 | 10.0000 | 20.0000 | 7.5000 | 1.7440 | 0.9213 | 0.4547 | Total |
100.0000 | 10.0000 | 20.0000 | 7.5000 | 1.7440 | 0.9213 | 0.4547 | Training | |
100.0000 | 10.0000 | 20.0000 | 7.5000 | 1.6641 | 0.9213 | 0.4261 | Testing | |
Standard deviation | 26.8344 | 1.9914 | 4.1033 | 2.0516 | 0.0980 | 0.0030 | 0.0641 | Total |
26.6305 | 1.8760 | 4.0114 | 2.1004 | 0.0910 | 0.0030 | 0.0599 | Training | |
26.8564 | 2.0430 | 4.3018 | 1.9696 | 0.1139 | 0.0031 | 0.0736 | Testing | |
Confidence level (95.0%) | 5.3520 | 0.3972 | 0.8184 | 0.4092 | 0.0195 | 0.0006 | 0.0128 | Total |
6.3974 | 0.4507 | 0.9636 | 0.5046 | 0.0219 | 0.0007 | 0.0144 | Training | |
10.0283 | 0.7629 | 1.6063 | 0.7355 | 0.0425 | 0.0011 | 0.0275 | Testing |
RESULTS AND DISCUSSION
For the prediction of aeration efficiency of Parshall and modified Venturi flumes, soft computing and regression-based models were used in this investigation. Six standard statistical parameters, CC, MAE, Bias, SI, RMSE, and NSE, were chosen to test the working of all implemented models. Lower RMSE, Bias, SI, and MAE values show higher model accuracy, and higher CC and NSE values show higher model accuracy. Model preparation is a method of trial-and-error. After several tests, optimum values of the user-defined parameters were achieved. There are well-defined statistical criteria for selecting and defining user-defined parameters that are unique to the model.
Results of linear and nonlinear regression-based models
In this investigation, linear and nonlinear regression-based designs for the prediction of aeration efficiency of Parshall and Modified Venturi flumes have also been developed. XLSTAT software employing the least square method was used to develop these equations. For all developed equations, the performance measurement parameter values are listed in Table 2. Linear and nonlinear equations based on regression-based models are as follows:
Models . | CC . | RMSE . | Bias . | SI . | MAE . | NSE . |
---|---|---|---|---|---|---|
Training dataset | ||||||
MLR | 0.9813 | 0.0115 | 0.0000 | 0.0991 | 0.0091 | 0.9629 |
MNLR | 0.9302 | 0.0220 | 0.0011 | 0.1903 | 0.0187 | 0.8630 |
M5P_pruned | 0.9960 | 0.0053 | −0.0002 | 0.0461 | 0.0037 | 0.9920 |
M5P_unpruned | 0.9971 | 0.0047 | −0.0003 | 0.0405 | 0.0035 | 0.9938 |
RF | 0.9849 | 0.0134 | −0.0014 | 0.1162 | 0.0037 | 0.9489 |
GMDH | 0.9999 | 0.0009 | 0.0000 | 0.0080 | 0.0005 | 0.9999 |
MARS | 0.9997 | 0.0015 | 0.0000 | 0.0134 | 0.0010 | 0.9993 |
Testing dataset | ||||||
MLR | 0.9840 | 0.0131 | −0.0019 | 0.1073 | 0.0111 | 0.9670 |
MNLR | 0.9077 | 0.0341 | −0.0090 | 0.2787 | 0.0278 | 0.7776 |
M5P_pruned | 0.9971 | 0.0060 | −0.0014 | 0.0494 | 0.0041 | 0.9930 |
M5P_unpruned | 0.9983 | 0.0052 | −0.0014 | 0.0426 | 0.0038 | 0.9948 |
RF | 0.9760 | 0.0259 | −0.0067 | 0.2111 | 0.0110 | 0.8725 |
GMDH | 0.9958 | 0.0189 | 0.0029 | 0.1540 | 0.0033 | 0.9649 |
MARS | 0.9992 | 0.0045 | −0.0018 | 0.0368 | 0.0021 | 0.9961 |
Models . | CC . | RMSE . | Bias . | SI . | MAE . | NSE . |
---|---|---|---|---|---|---|
Training dataset | ||||||
MLR | 0.9813 | 0.0115 | 0.0000 | 0.0991 | 0.0091 | 0.9629 |
MNLR | 0.9302 | 0.0220 | 0.0011 | 0.1903 | 0.0187 | 0.8630 |
M5P_pruned | 0.9960 | 0.0053 | −0.0002 | 0.0461 | 0.0037 | 0.9920 |
M5P_unpruned | 0.9971 | 0.0047 | −0.0003 | 0.0405 | 0.0035 | 0.9938 |
RF | 0.9849 | 0.0134 | −0.0014 | 0.1162 | 0.0037 | 0.9489 |
GMDH | 0.9999 | 0.0009 | 0.0000 | 0.0080 | 0.0005 | 0.9999 |
MARS | 0.9997 | 0.0015 | 0.0000 | 0.0134 | 0.0010 | 0.9993 |
Testing dataset | ||||||
MLR | 0.9840 | 0.0131 | −0.0019 | 0.1073 | 0.0111 | 0.9670 |
MNLR | 0.9077 | 0.0341 | −0.0090 | 0.2787 | 0.0278 | 0.7776 |
M5P_pruned | 0.9971 | 0.0060 | −0.0014 | 0.0494 | 0.0041 | 0.9930 |
M5P_unpruned | 0.9983 | 0.0052 | −0.0014 | 0.0426 | 0.0038 | 0.9948 |
RF | 0.9760 | 0.0259 | −0.0067 | 0.2111 | 0.0110 | 0.8725 |
GMDH | 0.9958 | 0.0189 | 0.0029 | 0.1540 | 0.0033 | 0.9649 |
MARS | 0.9992 | 0.0045 | −0.0018 | 0.0368 | 0.0021 | 0.9961 |
Comparison of the performance of the MLR and MNLR models was made based on six statistical performance measures viz., CC, RMSE, Bias, SI, MAE, and NSE (Table 2). As per Table 2, given the higher CC, NSE (0.9840 and 0.9670), and lower RMSE, Bias, SI, and MAE (0.0131, −0.0019, 0.1073, and 0.0111) results in the testing phase of the MLR-based model, it was concluded that the MLR-based model had better performance compared to the MNLR-based model. Figure 3 displays agreement plots using training and testing datasets, separately, between observed and predicted aeration efficiency of Parshall and Modified Venturi flumes by the MLR and MNLR based models. As depicted in the graph, values predicted using the MLR based model are close to the line of perfect agreement.
Results of RF-based model
Figure 4 provides plots of agreement between observed and predicted aeration efficiency of Parshall and modified Venturi flumes by the RF-based model for training and testing stages. Predicted values from the RF-based model closely follow the observed values. Table 2 summaries the results of the training and testing datasets which reflects that the performance of the RF design was appropriate for the prediction of aeration efficiency of Parshall and modified Venturi flumes with CC, RMSE, Bias, SI, MAE, and NSE values of 0.9849, 0.0134, −0.0014, 0.1162, 0.0037 and 0.9489, respectively, for the training stage and 0.9760, 0.0259, −0.0067, 0.2111, 0.0110 and 0.8725, respectively, for the testing stage.
Results of the M5P tree-based model
The development of an M5P based model for predicting the values of the aeration efficiency of Parshall and Modified Venturi flumes is identical to the RF-based model development process. The M5 model tree algorithm entails the usage of linear regression models for defining the input-output relationship, based on the splitting of the parameter space of the dataset into several sub-spaces. In this study, both pruned and unpruned M5P models were developed and their structures are shown in Figures 5 and 6. Developed linear equations using pruned and unpruned M5P based models are shown in Tables 3 and 4. Figure 7 shows the results obtained from the M5P model to predict the aeration efficiency of Parshall and Modified Venturi flumes for training and testing stages. Results of performance evaluation parameters suggests that the unpruned M5P based model is more accurate than the pruned M5P based model for predicting the aeration efficiency of Parshall and Modified Venturi flumes with CC = 0.9971, 0.9983, RMSE = 0.0047, 0.0052, Bias = −0.0003, −0.0014, SI = 0.0405, 0.0426, MAE = 0.0035, 0.0038 and NSE = 0.9938, 0.9948 for model training and testing periods, respectively. The overall evaluation suggested (Figure 7 and Table 2) that both pruned and unpruned M5P-based models are suitable for predicting the aeration efficiency of Parshall and modified Venturi flumes.
LM num . | Equations . |
---|---|
1 | AE20 = 0.0001 * Q + 0.0004 * N + 0.8601 * r − 0.8517 |
2 | AE20 = 0.0001 * Q + 0.0004 * N + 0.829 * r − 0.8182 |
3 | AE20 = 0.0001 * Q + 0.0004 * N + 0.6953 * r − 0.6665 |
4 | AE20 = 0.0001 * Q + 0.0004 * N + 0.6815 * r − 0.6504 |
5 | AE20 = 0.0001 * Q + 0.0004 * N + 0.5635 * r − 0.511 |
LM num . | Equations . |
---|---|
1 | AE20 = 0.0001 * Q + 0.0004 * N + 0.8601 * r − 0.8517 |
2 | AE20 = 0.0001 * Q + 0.0004 * N + 0.829 * r − 0.8182 |
3 | AE20 = 0.0001 * Q + 0.0004 * N + 0.6953 * r − 0.6665 |
4 | AE20 = 0.0001 * Q + 0.0004 * N + 0.6815 * r − 0.6504 |
5 | AE20 = 0.0001 * Q + 0.0004 * N + 0.5635 * r − 0.511 |
LM num . | Equations . |
---|---|
1 | AE20 = 0.0001 * Q + 0 * L + 0.0004 * N + 0.8293 * r −0.8198 |
2 | AE20 = 0.0001 * Q + 0 * L + 0.0004 * N + 0.8293 * r −0.8198 |
3 | AE20 = 0.0001 * Q + 0 * L + 0.0004 * N + 0.8528 * r −0.8441 |
4 | AE20 = 0.0001 * Q + 0 * L + 0.0004 * N + 0.8551 * r −0.8464 |
5 | AE20 = 0.0001 * Q + 0.0004 * N + 0.819 * r −0.8077 |
6 | AE20 = 0.0001 * Q + 0.0004 * N + 0.8364 * r −0.8261 |
7 | AE20 = 0.0001 * Q + 0.0004 * N + 0.7794 * r −0.7641 |
8 | AE20 = 0.0001 * Q + 0.0004 * N + 0.7915 * r −0.777 |
9 | AE20 = 0.0001 * Q + 0.0004 * N + 0.796 * r −0.7819 |
10 | AE20 = 0.0001 * Q + 0.0004 * N + 0.796 * r −0.7819 |
11 | AE20 = 0.0001 * Q + 0.0004 * N + 0.6323 * r −0.5957 |
12 | AE20 = 0.0001 * Q + 0.0004 * N + 0.6161 * r −0.5765 |
13 | AE20 = 0 * Q + 0.0004 * N + 0.6528 * r −0.6168 |
14 | AE20 = 0 * Q+ 0.0004 * N + 0.6528 * r −0.6168 |
15 | AE20 = 0.0001 * Q + 0.0004 * N + 0.6769 * r −0.645 |
16 | AE20 = 0.0001 * Q + 0.0004 * N + 0.6805 * r −0.6493 |
17 | AE20 = 0.0001 * Q + 0.0004 * N + 0.6805 * r −0.6493 |
18 | AE20 = 0.0001 * Q + 0.0004 * N + 0.5713 * r −0.5205 |
19 | AE20 = 0.0001 * Q + 0.0004 * N + 0.5735 * r −0.5231 |
20 | AE20 = 0.0001 * Q + 0.0004 * N + 0.543 * r −0.4817 |
LM num . | Equations . |
---|---|
1 | AE20 = 0.0001 * Q + 0 * L + 0.0004 * N + 0.8293 * r −0.8198 |
2 | AE20 = 0.0001 * Q + 0 * L + 0.0004 * N + 0.8293 * r −0.8198 |
3 | AE20 = 0.0001 * Q + 0 * L + 0.0004 * N + 0.8528 * r −0.8441 |
4 | AE20 = 0.0001 * Q + 0 * L + 0.0004 * N + 0.8551 * r −0.8464 |
5 | AE20 = 0.0001 * Q + 0.0004 * N + 0.819 * r −0.8077 |
6 | AE20 = 0.0001 * Q + 0.0004 * N + 0.8364 * r −0.8261 |
7 | AE20 = 0.0001 * Q + 0.0004 * N + 0.7794 * r −0.7641 |
8 | AE20 = 0.0001 * Q + 0.0004 * N + 0.7915 * r −0.777 |
9 | AE20 = 0.0001 * Q + 0.0004 * N + 0.796 * r −0.7819 |
10 | AE20 = 0.0001 * Q + 0.0004 * N + 0.796 * r −0.7819 |
11 | AE20 = 0.0001 * Q + 0.0004 * N + 0.6323 * r −0.5957 |
12 | AE20 = 0.0001 * Q + 0.0004 * N + 0.6161 * r −0.5765 |
13 | AE20 = 0 * Q + 0.0004 * N + 0.6528 * r −0.6168 |
14 | AE20 = 0 * Q+ 0.0004 * N + 0.6528 * r −0.6168 |
15 | AE20 = 0.0001 * Q + 0.0004 * N + 0.6769 * r −0.645 |
16 | AE20 = 0.0001 * Q + 0.0004 * N + 0.6805 * r −0.6493 |
17 | AE20 = 0.0001 * Q + 0.0004 * N + 0.6805 * r −0.6493 |
18 | AE20 = 0.0001 * Q + 0.0004 * N + 0.5713 * r −0.5205 |
19 | AE20 = 0.0001 * Q + 0.0004 * N + 0.5735 * r −0.5231 |
20 | AE20 = 0.0001 * Q + 0.0004 * N + 0.543 * r −0.4817 |
Results of the MARS based model
Basic Function . | . |
---|---|
BF1 = max(0, r −1.1776) | +0.678484169449607 |
BF2 = max(0, 1.1776 -r) | −0.99086139783535 |
BF3 = max(0, r −1.0783) | −0.145667947631348 |
BF4 = BF3*max(0, W-.5) | +0.0103736565914279 |
BF5 = BF3*max(0, 7.5 -W) | −0.0113775669860508 |
BF6 = BF3*max(0, 0.9192-f) | +7.06266243416642 |
Basic Function . | . |
---|---|
BF1 = max(0, r −1.1776) | +0.678484169449607 |
BF2 = max(0, 1.1776 -r) | −0.99086139783535 |
BF3 = max(0, r −1.0783) | −0.145667947631348 |
BF4 = BF3*max(0, W-.5) | +0.0103736565914279 |
BF5 = BF3*max(0, 7.5 -W) | −0.0113775669860508 |
BF6 = BF3*max(0, 0.9192-f) | +7.06266243416642 |
Results of the GMDH based model
The development of GMDH design entails a trial-and-error process that is similar to RF, M5P, and MARS-based models. Table 2 outlines the outcomes of the developed GMDH model. Three hidden layers are included in the GMDH developed model. For its development the same dataset (training and testing) was used as for the development of other soft computing-based models. Table 6 lists outcomes of constants and coefficients of transfer functions. Figure 9 illustrates the results of the GMDH model during the training and testing phases. Assessment parameters of performance indicate that the GMDH was appropriate for the prediction of aeration efficacy of Parshall and modified Venturi flumes with CC = 0.9999, 0.9958, RMSE = 0.0009, 0.0189, Bias = 0.0000, 0.0029, SI = 0.0080, 0.1540, MAE = 0.0005, 0.0033 and NSE = 0.9999, 0.9649 for model development and validation periods, respectively.
Layers . | neurons . | a0 . | a1 . | a2 . | a3 . | a4 . | a5 . |
---|---|---|---|---|---|---|---|
1 | 1 | −1.5473 | 0.0024 | 2.0871 | 0.0000 | −0.5345 | −0.0029 |
2 | 13.6203 | −0.4349 | −29.9403 | −0.5146 | 14.6721 | 2.6730 | |
3 | −1.5211 | 0.0006 | 2.0254 | 0.0000 | −0.5023 | −0.0006 | |
4 | −1.4979 | 0.0007 | 2.0146 | 0.0000 | −0.5132 | −0.0005 | |
5 | −1.5142 | 0.0020 | 2.0188 | 0.0000 | −0.5042 | −0.0015 | |
2 | 1 | −0.0002 | 0.3328 | 0.6698 | −7.6643 | −8.7161 | 16.3742 |
2 | 0.0001 | −0.3447 | 1.3467 | −439.9819 | −446.5554 | 886.5322 | |
3 | −0.0004 | −0.0378 | 1.0426 | 49.8484 | 42.5808 | −92.4401 | |
4 | −0.0004 | 0.0463 | 0.9615 | −835.4749 | −841.8852 | 1677.3433 | |
5 | −0.0001 | 1.1741 | −0.1725 | −186.4012 | −183.7293 | 370.1263 | |
3 | 1 | 0.0001 | 0.2093 | 0.7894 | 231.0363 | 223.1737 | −454.2067 |
Layers . | neurons . | a0 . | a1 . | a2 . | a3 . | a4 . | a5 . |
---|---|---|---|---|---|---|---|
1 | 1 | −1.5473 | 0.0024 | 2.0871 | 0.0000 | −0.5345 | −0.0029 |
2 | 13.6203 | −0.4349 | −29.9403 | −0.5146 | 14.6721 | 2.6730 | |
3 | −1.5211 | 0.0006 | 2.0254 | 0.0000 | −0.5023 | −0.0006 | |
4 | −1.4979 | 0.0007 | 2.0146 | 0.0000 | −0.5132 | −0.0005 | |
5 | −1.5142 | 0.0020 | 2.0188 | 0.0000 | −0.5042 | −0.0015 | |
2 | 1 | −0.0002 | 0.3328 | 0.6698 | −7.6643 | −8.7161 | 16.3742 |
2 | 0.0001 | −0.3447 | 1.3467 | −439.9819 | −446.5554 | 886.5322 | |
3 | −0.0004 | −0.0378 | 1.0426 | 49.8484 | 42.5808 | −92.4401 | |
4 | −0.0004 | 0.0463 | 0.9615 | −835.4749 | −841.8852 | 1677.3433 | |
5 | −0.0001 | 1.1741 | −0.1725 | −186.4012 | −183.7293 | 370.1263 | |
3 | 1 | 0.0001 | 0.2093 | 0.7894 | 231.0363 | 223.1737 | −454.2067 |
Intercomparison of regression and soft computing-based models
Contrasting soft computing-based designs (Table 2 and Figure 10) indicates that the MARS-based model works better than other regression and soft computing-based models. The potential of regression and soft computing-based models for predicting the aeration efficiency of Parshall and Modified Venturi flumes was assessed through performance indicators, agreement plots, and error as plotted in Figure 10 for the testing stage. The predicted values produced by the MARS-based model were found to be near the observed aeration efficiency of Parshall and Modified Venturi flumes as indicated in the plots. Also, the predicted aeration efficiency of Parshall and modified Venturi flumes values are noted to follow the analogous pattern as observed in actual aeration efficiency of Parshall and modified Venturi flumes. The unpruned M5P based model outperforms the pruned M5P based model. Overall, the MARS-based model performance was more accurate than the M5P, GMDH, MLR, MNLR, and RF-based models. The MLR based model is also suitable and better than MNLR for predicting the aeration efficiency of Parshall and modified Venturi flumes using this dataset. The box plot is plotted in Figure 11 for the comparison of observed (actual) and predicted values using various applied designs for the testing stage. Descriptive statistics of observed and applied models during the testing stage are listed in Table 7. Figure 11 and Table 7 also suggest that the MARS model was outperforming in comparison to other applied models. Minimum and maximum values of actual and predicted values using the MARS model are very close. The widths of the lower and upper Quartile are almost the same in Figure 11.
Statistic . | Observed . | MLR . | MNLR . | RF . | M5P_pruned . | M5P_unpruned . | MARS . | GMDH . |
---|---|---|---|---|---|---|---|---|
Minimum | 0.0389 | 0.0586 | 0.0678 | 0.0470 | 0.0450 | 0.0450 | 0.0388 | 0.0401 |
Maximum | 0.4261 | 0.4461 | 0.3779 | 0.3020 | 0.4300 | 0.4250 | 0.4146 | 0.4997 |
1st Quartile | 0.0701 | 0.0816 | 0.0871 | 0.0753 | 0.0733 | 0.0733 | 0.0705 | 0.0703 |
Median | 0.1070 | 0.1054 | 0.1043 | 0.1065 | 0.1060 | 0.1055 | 0.1065 | 0.1065 |
Mean | 0.1225 | 0.1206 | 0.1136 | 0.1159 | 0.1211 | 0.1212 | 0.1208 | 0.1254 |
3rd Quartile | 0.1450 | 0.1378 | 0.1211 | 0.1475 | 0.1430 | 0.1430 | 0.1459 | 0.1452 |
Statistic . | Observed . | MLR . | MNLR . | RF . | M5P_pruned . | M5P_unpruned . | MARS . | GMDH . |
---|---|---|---|---|---|---|---|---|
Minimum | 0.0389 | 0.0586 | 0.0678 | 0.0470 | 0.0450 | 0.0450 | 0.0388 | 0.0401 |
Maximum | 0.4261 | 0.4461 | 0.3779 | 0.3020 | 0.4300 | 0.4250 | 0.4146 | 0.4997 |
1st Quartile | 0.0701 | 0.0816 | 0.0871 | 0.0753 | 0.0733 | 0.0733 | 0.0705 | 0.0703 |
Median | 0.1070 | 0.1054 | 0.1043 | 0.1065 | 0.1060 | 0.1055 | 0.1065 | 0.1065 |
Mean | 0.1225 | 0.1206 | 0.1136 | 0.1159 | 0.1211 | 0.1212 | 0.1208 | 0.1254 |
3rd Quartile | 0.1450 | 0.1378 | 0.1211 | 0.1475 | 0.1430 | 0.1430 | 0.1459 | 0.1452 |
A Taylor diagram (Figure 12) is a graphical illustration of the performance of the developed models in terms of CC, RMSE, and standard deviation and indicates that the MARS model was the best performing model and the performance of the MNLR model was least successful for the prediction of aeration efficiency of Parshall and Modified Venturi flumes using this dataset.
Sensitivity analysis
Analysis of sensitivity was carried out to find the most influential input variable in AE20 of Parshall and modified Venturi flumes. The analysis was done by utilizing the best performing model (MARS). The set of training data used fluctuated as it was developed after eliminating one input variable at a time. Results were listed in terms of CC, RMSE, MAE, and NSE in Table 8 which indicates that the oxygen deficit ratio (r) is the most effective input variable in predicting the AE20 of Parshall and Modified Venturi flumes using this dataset.
CONCLUSION
In this research, M5P, Random Forest (RF), Multivariate Adaptive Regression Splines (MARS), and Group method of data handling (GMDH) were developed to predict aeration efficiency (AE20) of Parshall and Modified Venturi flumes and then compared with multiple linear regression (MLR) and multiple nonlinear regression (MNLR) models. This was done by conducting experiments for 26 different Modified Venturi flumes and one Parshall flume. The comparison analysis using performance evaluation indices concludes that the MARS approach outperformed the other models (i.e., M5P, RF, GMDH, MLR, and MLR) during development (training) and validation (testing) periods, separately. Other major outcomes from this study are that the MLR based model performs better than MNLR based model. In M5P based models, the unpruned model works better than the pruned model. Overall, the M5P based models outperform GMDH, MLR, RF, and MNLR based models. By utilizing the anticipated best performing MARS design, sensitivity analysis was carried out to evaluate each input variable's effect on AE20. The findings of the sensitivity analysis indicate that oxygen deficit ratio (r) was the most effective input variable for estimating AE20 of Parshall and Modified Venturi flumes using this dataset.
CONFLICT OF INTEREST
None.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.