## Abstract

In this study, the potential of soft computing techniques, namely Random Forest (RF), M5P, Multivariate Adaptive Regression Splines (MARS), and Group Method of Data Handling (GMDH), was evaluated to predict the aeration efficiency (AE_{20}) of Parshall and Modified Venturi flumes. Experiments were conducted for 26 various Modified Venturi flumes and one Parshall flume. A total of 99 observations were obtained from experiments. The results of soft computing models were compared with regression-based models i.e., with multiple linear regression (MLR) and multiple nonlinear regression (MNLR). Results of the analysis revealed that the MARS model outperformed other soft computing and regression-based models for predicting AE_{20} of Parshall and Modified Venturi flumes with Pearson's correlation coefficient (CC) = 0.9997, and 0.9992, and root mean square error (RMSE) = 0.0015, and 0.0045 during calibration and validation periods, respectively. Sensitivity analysis was also carried out by using the best executing MARS model to assess the effect of individual input variables on AE_{20} of both flumes. Obtained results on sensitivity examination indicate that the oxygen deficit ratio (r) was the most effective input variable in predicting the AE_{20} of Parshall and Modified Venturi flumes.

## HIGHLIGHTS

Aeration efficiency of Parshall and Modified Venturi flumes was predicted by using soft computing techniques.

M5P, RF, MARS, and GMDH models were first employed to predict the aeration efficiency.

Outcomes of soft computing models were first compared against regression-based models.

Effectiveness of applied models was evaluated using performance evaluation indicators.

The MARS-based model outperformed other models.

## INTRODUCTION

There are several main sources which cause water pollution, such as poisonous substances, detergents, repellers, and products of mining, organic, and industrial wastes (Schwarzenbach *et al.* 2010). The quality of water in nature such as rivers and artificial systems such as canals or reservoirs are related to the existence of dissolved oxygen (DO). Therefore, DO is considered necessary for aquatic life. The ideal value of DO in natural water bodies ranges between 5 and 6 mg/L (Sánchez *et al.* 2007). Therefore, it is required to keep the amount of DO in this range because if it drops below 5 mg/L, this will affect the water quality, and fish will die within hours if the level of DO drops below 1–2 mg/L (Baylar *et al.* 2009b). On the other hand, the level of DO should be not greater than 110%, otherwise, it may be harmful to aquatic life. The physical processes that include the transmission of DO from the atmosphere to water will help to replenish the existing oxygen. The term given for this process is aeration and it is considered a significant process in wastewater treatment plants (Sangeeta & Tiwari 2019).

A hydraulic structure could be considered as the best choice to improve the water quality in a river system by aeration where it helps to raise the amount of DO. Just one hydraulic structure can provide the same amount of DO that may occur along several kilometers of river. This speedy transfer of DO happens due to a large number of bubbles that form because of entrained air with water flow, which further enhances the surface area thereby leading to mass transfer. Different types of spillways such as stepped or overflow spillways can be used for aeration purposes in a river system. On the other hand, these structures have not been considered the best solution in a straight flow canal. Therefore, other types of hydraulic structures such as drop structures and weirs are preferred. For prismatic channels, a small Parshall flume is presented as the best solution to provide the required aeration.

Several studies have reviewed the evaluation of aeration performance of hydraulic structures such as Gulliver *et al.* (1990), Wilhelms *et al.* (1993), Chanson (1995), and Ervine (1998). In the last two decades, Baylar & Bagatur (2000, 2006), Baylar *et al.* (2006, 2008, 2009a, 2009c), Hanbay *et al.* (2009), and Baylar & Emiroglu (2003) have examined sharp-crested barriers with diverse cross-sectional geometry. Their findings have revealed that the rate of air entrainment and the aeration efficacy of barriers varied depending on the barrier. The Parshall flume was designed and used for the first time by Ralph Parshall. He based his study on work by Cone in 1917, improved in 1928. After that, this cascade was used in several irrigation projects (USBR 2001; Kim *et al.* 2010; Dursun 2016). Recently, the Parshall flume has been utilized for several projects around the world to measure the discharge in channels and river systems.

In the last few years, Artificial Intelligence (AI) techniques have been widely applied in different fields of water resources as they are capable of resolving intricate problems that did not have a tractable solution (Aghelpour *et al.* 2019; Jahani & Mohammadi 2019; Sihag *et al.* 2019b, 2019c; Mohammadi *et al.* 2020; Thakur *et al.* 2021). AI-based models/algorithms have been employed extensively in many water resources applications, including hydrology (Banadkooki *et al.* 2020; Malik *et al.* 2020b; Tikhamarine *et al.* 2020c; Ghasempour *et al.* 2021; Sihag *et al.* 2021), hydraulics (Parsaie 2016; Parsaie *et al.* 2016, 2018, 2019; Ebtehaj *et al.* 2017; Najafzadeh *et al.* 2017; Sihag *et al.* 2019a), and water flow/quality (Heddam & Kisi 2017; Parsaie & Haghiabi 2017a, 2019; Haghiabi *et al.* 2018; Singh *et al.* 2019; Esmaeilbeiki *et al.* 2020; Pandhiani *et al.* 2020). Nevertheless, limited numbers of studies have considered the application of AI for the evaluation of Parshall flume aeration performance. Therefore, in this study, the M5P, Random Forest (RF), Multivariate Adaptive Regression Splines (MARS), and Group Method of Data Handling (GMDH) techniques were used to developed AI-based models to predict the aeration efficiency of Parshall and Modified Venturi flumes and compared them with regression-based models.

## MATERIALS AND METHODS

### Mechanism of aeration

Aeration is the most dominant factor in wastewater treatment. In aeration, oxygen is introduced into the water to increase the DO level for the survival of aquatic life. Aeration in a Parshall flume is different from in weirs. A Parshall flume consists of three portions. The first portion is made up of the upstream narrowing approach portion followed by a short and sloping throat portion from where the flow continues to the diverging downstream portion, as shown in Figure 1. The upstream portion is the largest one with no slope, the short narrow throat portion has a downstream sloping portion and the floor again rises in the downstream portion (Parshall 1926). In weirs, aeration is done by creating a hydraulic jump (Baylar & Emiroglu 2003). Figure 2 shows an explanation of the aeration process for various stages of a cascade from a weir (Tsang 1987). But the aeration process of a Parshall flume is that it accelerates the flow velocity through contracting sidewalls in the converging portion. At the throat portion, the flow velocity is accelerated from subcritical to supercritical due to contraction and drop; however at the diverging section, the flow changes from fast, supercritical to slow, subcritical following a jump at outlet of the flume resulting in aeration (Figure 1).

### Aeration transfer efficiency

*C*is the concentration of dissolved oxygen, is the coefficient of the liquid layer for oxygen,

*A*is the surface area,

*V*is the volume over which oxygen transfer occurs, represents saturation concentration, and

*t*indicates time.

*et al.*(1990) gave a predictive association between which is constant concerning time and oxygen aeration efficiency (AE) as shown below:where subscript

*U*and

*D*represent upstream and downstream locations, while

*r*represents oxygen deficit ratio.

*et al.*(1990) as:

### Multiple linear regression (MLR)

*et al.*2021a):

where represents the dependent (output) variable, denotes the regression coefficients, and are the independent variables.

### Multiple nonlinear regression (MNLR)

*et al.*2017):

### Review Multivariate adaptive regression splines (MARS)

*x*denotes a separate parameter, and

*k*denotes a boundary value. The following is the deduced mathematical formula using the MARS design for the desired phenomenon:where

*y*denotes the dependent parameter predicted by the function denotes the value of a constant and

*n*denotes the number of BFs. symbolizes the coefficient multiplied in BFs, and indicates BFs. Arithmetic design is sufficient to complete two phases using the MARS process. The first stage is the creation of the model, where the function input space is split into several subdomains. This stage of the MARS design is the growth of names that feed forward algorithms. The pruning stage is the second step. The BFs established in the previous stage have no major impact on enhancing the accuracy of the model at this stage and are therefore pruned based on a criterion called Generalized Cross-Validation (GCV). Thus, the structure of the derived MARS model type system is adopted by GCV. As below, GCV is defined as:where

*N*signifies the data number, and is the penalty for complexity that escalates by the number of BFs. The above equation shows that for each BF, is the penalty number and is the number of BFs obtained from the MARS process (Parsaie & Haghiabi 2017b, 2017c; Sihag

*et al.*2019d).

### Review of group method of data handling (GMDH)

This is the discrete form of the Volterra series as presented by the Kolmogorov-Gabor polynomial (Najafzadeh & Lim 2015; Alfaifi *et al.* 2020).

### Review of M5P tree

*Z*is a list of examples at the node showing the outcome of the subset of possible set examples, and sd is the standard equation. A considerable tree structure with high prediction precision is generated by this form of technique.

### Review of Random Forest (RF)

The Random Forest (RF) procedure was primarily introduced by Breiman (1996). RF is a versatile approach that has been chosen to solve various nonlinear or intricate engineering issues (Mohammadi & Mehdizadeh 2020). In this technique, a substantial quantity of trees is created having different bootstrap (bagging) samples of the original dataset at the root node. The division is performed at each node using a randomly chosen subset of the estimation parameters. The RF algorithm is comparatively insensitive to features of the training set and can achieve high prediction accuracy (Breiman 2001). It entails the usage of two user-defined parameters: the number of trees cultivated (*k*) and the number of input characteristics (*m*). For model development, a trial-and-error process is used. The WEKA 3.9 software was used to develop the RF-based model in this current investigation.

### Performance evaluation indicators

For the assessment of the accuracy of the implemented AI and regression based-models, six different types of performance/statistical indicators, namely Pearson's correlation coefficient (CC), scatter index (SI), mean absolute error (MAE), root mean square error (RMSE), Bias, and Nash-Sutcliffe efficiency (NSE), were considered and defined below.

*et al.*2019b, 2020a):

*et al.*2020; Malik

*et al.*2021c):

*et al.*2018; Malik

*et al.*2019a):

*et al.*2017):

*et al.*2020a; Malik

*et al.*2021b; Mohammadi

*et al.*2021) is given by:

*et al.*2020b):

### Dataset

For model establishment and validation, a total of 99 experimental observations of aeration efficiency at 20 °C with Parshall and Modified Venturi flumes were used. Two separate classes of the total dataset were separated. The division method was subjective. The model production training dataset included 69 observations while the remaining 30 observations were considered in the model validation test dataset (see Appendix A: Table A1). Six independent variables were considered to be inputs – flow rate (Q), throat widths (W), throat lengths (L), sill heights (N), oxygen deficit ratio (r), and exponent (f) – while aeration efficiency at 20 °C (AE_{20}) was observed as the target for the model establishment and validation goal. Table 1 outlines the features of the data used for model development (training) and validation.

Range . | Q . | W . | L . | N . | r . | f . | AE_{20}
. | Dataset . |
---|---|---|---|---|---|---|---|---|

Mean | 59.0909 | 7.7273 | 15.0000 | 5.0000 | 1.1283 | 0.9170 | 0.1177 | Total |

61.5942 | 8.0797 | 14.7101 | 5.0000 | 1.1250 | 0.9171 | 0.1157 | Training | |

53.3333 | 6.9167 | 15.6667 | 5.0000 | 1.1361 | 0.9169 | 0.1225 | Testing | |

Minimum | 25.0000 | 5.0000 | 10.0000 | 2.5000 | 1.0284 | 0.9070 | 0.0300 | Total |

25.0000 | 5.0000 | 10.0000 | 2.5000 | 1.0284 | 0.9070 | 0.0300 | Training | |

25.0000 | 5.0000 | 10.0000 | 2.5000 | 1.0371 | 0.9070 | 0.0389 | Testing | |

Maximum | 100.0000 | 10.0000 | 20.0000 | 7.5000 | 1.7440 | 0.9213 | 0.4547 | Total |

100.0000 | 10.0000 | 20.0000 | 7.5000 | 1.7440 | 0.9213 | 0.4547 | Training | |

100.0000 | 10.0000 | 20.0000 | 7.5000 | 1.6641 | 0.9213 | 0.4261 | Testing | |

Standard deviation | 26.8344 | 1.9914 | 4.1033 | 2.0516 | 0.0980 | 0.0030 | 0.0641 | Total |

26.6305 | 1.8760 | 4.0114 | 2.1004 | 0.0910 | 0.0030 | 0.0599 | Training | |

26.8564 | 2.0430 | 4.3018 | 1.9696 | 0.1139 | 0.0031 | 0.0736 | Testing | |

Confidence level (95.0%) | 5.3520 | 0.3972 | 0.8184 | 0.4092 | 0.0195 | 0.0006 | 0.0128 | Total |

6.3974 | 0.4507 | 0.9636 | 0.5046 | 0.0219 | 0.0007 | 0.0144 | Training | |

10.0283 | 0.7629 | 1.6063 | 0.7355 | 0.0425 | 0.0011 | 0.0275 | Testing |

Range . | Q . | W . | L . | N . | r . | f . | AE_{20}
. | Dataset . |
---|---|---|---|---|---|---|---|---|

Mean | 59.0909 | 7.7273 | 15.0000 | 5.0000 | 1.1283 | 0.9170 | 0.1177 | Total |

61.5942 | 8.0797 | 14.7101 | 5.0000 | 1.1250 | 0.9171 | 0.1157 | Training | |

53.3333 | 6.9167 | 15.6667 | 5.0000 | 1.1361 | 0.9169 | 0.1225 | Testing | |

Minimum | 25.0000 | 5.0000 | 10.0000 | 2.5000 | 1.0284 | 0.9070 | 0.0300 | Total |

25.0000 | 5.0000 | 10.0000 | 2.5000 | 1.0284 | 0.9070 | 0.0300 | Training | |

25.0000 | 5.0000 | 10.0000 | 2.5000 | 1.0371 | 0.9070 | 0.0389 | Testing | |

Maximum | 100.0000 | 10.0000 | 20.0000 | 7.5000 | 1.7440 | 0.9213 | 0.4547 | Total |

100.0000 | 10.0000 | 20.0000 | 7.5000 | 1.7440 | 0.9213 | 0.4547 | Training | |

100.0000 | 10.0000 | 20.0000 | 7.5000 | 1.6641 | 0.9213 | 0.4261 | Testing | |

Standard deviation | 26.8344 | 1.9914 | 4.1033 | 2.0516 | 0.0980 | 0.0030 | 0.0641 | Total |

26.6305 | 1.8760 | 4.0114 | 2.1004 | 0.0910 | 0.0030 | 0.0599 | Training | |

26.8564 | 2.0430 | 4.3018 | 1.9696 | 0.1139 | 0.0031 | 0.0736 | Testing | |

Confidence level (95.0%) | 5.3520 | 0.3972 | 0.8184 | 0.4092 | 0.0195 | 0.0006 | 0.0128 | Total |

6.3974 | 0.4507 | 0.9636 | 0.5046 | 0.0219 | 0.0007 | 0.0144 | Training | |

10.0283 | 0.7629 | 1.6063 | 0.7355 | 0.0425 | 0.0011 | 0.0275 | Testing |

## RESULTS AND DISCUSSION

For the prediction of aeration efficiency of Parshall and modified Venturi flumes, soft computing and regression-based models were used in this investigation. Six standard statistical parameters, CC, MAE, Bias, SI, RMSE, and NSE, were chosen to test the working of all implemented models. Lower RMSE, Bias, SI, and MAE values show higher model accuracy, and higher CC and NSE values show higher model accuracy. Model preparation is a method of trial-and-error. After several tests, optimum values of the user-defined parameters were achieved. There are well-defined statistical criteria for selecting and defining user-defined parameters that are unique to the model.

### Results of linear and nonlinear regression-based models

In this investigation, linear and nonlinear regression-based designs for the prediction of aeration efficiency of Parshall and Modified Venturi flumes have also been developed. XLSTAT software employing the least square method was used to develop these equations. For all developed equations, the performance measurement parameter values are listed in Table 2. Linear and nonlinear equations based on regression-based models are as follows:

Models . | CC . | RMSE . | Bias . | SI . | MAE . | NSE . |
---|---|---|---|---|---|---|

Training dataset | ||||||

MLR | 0.9813 | 0.0115 | 0.0000 | 0.0991 | 0.0091 | 0.9629 |

MNLR | 0.9302 | 0.0220 | 0.0011 | 0.1903 | 0.0187 | 0.8630 |

M5P_pruned | 0.9960 | 0.0053 | −0.0002 | 0.0461 | 0.0037 | 0.9920 |

M5P_unpruned | 0.9971 | 0.0047 | −0.0003 | 0.0405 | 0.0035 | 0.9938 |

RF | 0.9849 | 0.0134 | −0.0014 | 0.1162 | 0.0037 | 0.9489 |

GMDH | 0.9999 | 0.0009 | 0.0000 | 0.0080 | 0.0005 | 0.9999 |

MARS | 0.9997 | 0.0015 | 0.0000 | 0.0134 | 0.0010 | 0.9993 |

Testing dataset | ||||||

MLR | 0.9840 | 0.0131 | −0.0019 | 0.1073 | 0.0111 | 0.9670 |

MNLR | 0.9077 | 0.0341 | −0.0090 | 0.2787 | 0.0278 | 0.7776 |

M5P_pruned | 0.9971 | 0.0060 | −0.0014 | 0.0494 | 0.0041 | 0.9930 |

M5P_unpruned | 0.9983 | 0.0052 | −0.0014 | 0.0426 | 0.0038 | 0.9948 |

RF | 0.9760 | 0.0259 | −0.0067 | 0.2111 | 0.0110 | 0.8725 |

GMDH | 0.9958 | 0.0189 | 0.0029 | 0.1540 | 0.0033 | 0.9649 |

MARS | 0.9992 | 0.0045 | −0.0018 | 0.0368 | 0.0021 | 0.9961 |

Models . | CC . | RMSE . | Bias . | SI . | MAE . | NSE . |
---|---|---|---|---|---|---|

Training dataset | ||||||

MLR | 0.9813 | 0.0115 | 0.0000 | 0.0991 | 0.0091 | 0.9629 |

MNLR | 0.9302 | 0.0220 | 0.0011 | 0.1903 | 0.0187 | 0.8630 |

M5P_pruned | 0.9960 | 0.0053 | −0.0002 | 0.0461 | 0.0037 | 0.9920 |

M5P_unpruned | 0.9971 | 0.0047 | −0.0003 | 0.0405 | 0.0035 | 0.9938 |

RF | 0.9849 | 0.0134 | −0.0014 | 0.1162 | 0.0037 | 0.9489 |

GMDH | 0.9999 | 0.0009 | 0.0000 | 0.0080 | 0.0005 | 0.9999 |

MARS | 0.9997 | 0.0015 | 0.0000 | 0.0134 | 0.0010 | 0.9993 |

Testing dataset | ||||||

MLR | 0.9840 | 0.0131 | −0.0019 | 0.1073 | 0.0111 | 0.9670 |

MNLR | 0.9077 | 0.0341 | −0.0090 | 0.2787 | 0.0278 | 0.7776 |

M5P_pruned | 0.9971 | 0.0060 | −0.0014 | 0.0494 | 0.0041 | 0.9930 |

M5P_unpruned | 0.9983 | 0.0052 | −0.0014 | 0.0426 | 0.0038 | 0.9948 |

RF | 0.9760 | 0.0259 | −0.0067 | 0.2111 | 0.0110 | 0.8725 |

GMDH | 0.9958 | 0.0189 | 0.0029 | 0.1540 | 0.0033 | 0.9649 |

MARS | 0.9992 | 0.0045 | −0.0018 | 0.0368 | 0.0021 | 0.9961 |

Comparison of the performance of the MLR and MNLR models was made based on six statistical performance measures viz., CC, RMSE, Bias, SI, MAE, and NSE (Table 2). As per Table 2, given the higher CC, NSE (0.9840 and 0.9670), and lower RMSE, Bias, SI, and MAE (0.0131, −0.0019, 0.1073, and 0.0111) results in the testing phase of the MLR-based model, it was concluded that the MLR-based model had better performance compared to the MNLR-based model. Figure 3 displays agreement plots using training and testing datasets, separately, between observed and predicted aeration efficiency of Parshall and Modified Venturi flumes by the MLR and MNLR based models. As depicted in the graph, values predicted using the MLR based model are close to the line of perfect agreement.

### Results of RF-based model

Figure 4 provides plots of agreement between observed and predicted aeration efficiency of Parshall and modified Venturi flumes by the RF-based model for training and testing stages. Predicted values from the RF-based model closely follow the observed values. Table 2 summaries the results of the training and testing datasets which reflects that the performance of the RF design was appropriate for the prediction of aeration efficiency of Parshall and modified Venturi flumes with CC, RMSE, Bias, SI, MAE, and NSE values of 0.9849, 0.0134, −0.0014, 0.1162, 0.0037 and 0.9489, respectively, for the training stage and 0.9760, 0.0259, −0.0067, 0.2111, 0.0110 and 0.8725, respectively, for the testing stage.

### Results of the M5P tree-based model

The development of an M5P based model for predicting the values of the aeration efficiency of Parshall and Modified Venturi flumes is identical to the RF-based model development process. The M5 model tree algorithm entails the usage of linear regression models for defining the input-output relationship, based on the splitting of the parameter space of the dataset into several sub-spaces. In this study, both pruned and unpruned M5P models were developed and their structures are shown in Figures 5 and 6. Developed linear equations using pruned and unpruned M5P based models are shown in Tables 3 and 4. Figure 7 shows the results obtained from the M5P model to predict the aeration efficiency of Parshall and Modified Venturi flumes for training and testing stages. Results of performance evaluation parameters suggests that the unpruned M5P based model is more accurate than the pruned M5P based model for predicting the aeration efficiency of Parshall and Modified Venturi flumes with CC = 0.9971, 0.9983, RMSE = 0.0047, 0.0052, Bias = −0.0003, −0.0014, SI = 0.0405, 0.0426, MAE = 0.0035, 0.0038 and NSE = 0.9938, 0.9948 for model training and testing periods, respectively. The overall evaluation suggested (Figure 7 and Table 2) that both pruned and unpruned M5P-based models are suitable for predicting the aeration efficiency of Parshall and modified Venturi flumes.

LM num . | Equations . |
---|---|

1 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.8601 * r − 0.8517 |

2 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.829 * r − 0.8182 |

3 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.6953 * r − 0.6665 |

4 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.6815 * r − 0.6504 |

5 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.5635 * r − 0.511 |

LM num . | Equations . |
---|---|

1 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.8601 * r − 0.8517 |

2 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.829 * r − 0.8182 |

3 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.6953 * r − 0.6665 |

4 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.6815 * r − 0.6504 |

5 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.5635 * r − 0.511 |

LM num . | Equations . |
---|---|

1 | AE_{20} = 0.0001 * Q + 0 * L + 0.0004 * N + 0.8293 * r −0.8198 |

2 | AE_{20} = 0.0001 * Q + 0 * L + 0.0004 * N + 0.8293 * r −0.8198 |

3 | AE_{20} = 0.0001 * Q + 0 * L + 0.0004 * N + 0.8528 * r −0.8441 |

4 | AE_{20} = 0.0001 * Q + 0 * L + 0.0004 * N + 0.8551 * r −0.8464 |

5 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.819 * r −0.8077 |

6 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.8364 * r −0.8261 |

7 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.7794 * r −0.7641 |

8 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.7915 * r −0.777 |

9 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.796 * r −0.7819 |

10 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.796 * r −0.7819 |

11 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.6323 * r −0.5957 |

12 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.6161 * r −0.5765 |

13 | AE_{20} = 0 * Q + 0.0004 * N + 0.6528 * r −0.6168 |

14 | AE_{20} = 0 * Q+ 0.0004 * N + 0.6528 * r −0.6168 |

15 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.6769 * r −0.645 |

16 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.6805 * r −0.6493 |

17 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.6805 * r −0.6493 |

18 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.5713 * r −0.5205 |

19 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.5735 * r −0.5231 |

20 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.543 * r −0.4817 |

LM num . | Equations . |
---|---|

1 | AE_{20} = 0.0001 * Q + 0 * L + 0.0004 * N + 0.8293 * r −0.8198 |

2 | AE_{20} = 0.0001 * Q + 0 * L + 0.0004 * N + 0.8293 * r −0.8198 |

3 | AE_{20} = 0.0001 * Q + 0 * L + 0.0004 * N + 0.8528 * r −0.8441 |

4 | AE_{20} = 0.0001 * Q + 0 * L + 0.0004 * N + 0.8551 * r −0.8464 |

5 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.819 * r −0.8077 |

6 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.8364 * r −0.8261 |

7 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.7794 * r −0.7641 |

8 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.7915 * r −0.777 |

9 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.796 * r −0.7819 |

10 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.796 * r −0.7819 |

11 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.6323 * r −0.5957 |

12 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.6161 * r −0.5765 |

13 | AE_{20} = 0 * Q + 0.0004 * N + 0.6528 * r −0.6168 |

14 | AE_{20} = 0 * Q+ 0.0004 * N + 0.6528 * r −0.6168 |

15 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.6769 * r −0.645 |

16 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.6805 * r −0.6493 |

17 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.6805 * r −0.6493 |

18 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.5713 * r −0.5205 |

19 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.5735 * r −0.5231 |

20 | AE_{20} = 0.0001 * Q + 0.0004 * N + 0.543 * r −0.4817 |

### Results of the MARS based model

Basic Function . | . |
---|---|

BF1 = max(0, r −1.1776) | +0.678484169449607 |

BF2 = max(0, 1.1776 -r) | −0.99086139783535 |

BF3 = max(0, r −1.0783) | −0.145667947631348 |

BF4 = BF3*max(0, W-.5) | +0.0103736565914279 |

BF5 = BF3*max(0, 7.5 -W) | −0.0113775669860508 |

BF6 = BF3*max(0, 0.9192-f) | +7.06266243416642 |

Basic Function . | . |
---|---|

BF1 = max(0, r −1.1776) | +0.678484169449607 |

BF2 = max(0, 1.1776 -r) | −0.99086139783535 |

BF3 = max(0, r −1.0783) | −0.145667947631348 |

BF4 = BF3*max(0, W-.5) | +0.0103736565914279 |

BF5 = BF3*max(0, 7.5 -W) | −0.0113775669860508 |

BF6 = BF3*max(0, 0.9192-f) | +7.06266243416642 |

### Results of the GMDH based model

The development of GMDH design entails a trial-and-error process that is similar to RF, M5P, and MARS-based models. Table 2 outlines the outcomes of the developed GMDH model. Three hidden layers are included in the GMDH developed model. For its development the same dataset (training and testing) was used as for the development of other soft computing-based models. Table 6 lists outcomes of constants and coefficients of transfer functions. Figure 9 illustrates the results of the GMDH model during the training and testing phases. Assessment parameters of performance indicate that the GMDH was appropriate for the prediction of aeration efficacy of Parshall and modified Venturi flumes with CC = 0.9999, 0.9958, RMSE = 0.0009, 0.0189, Bias = 0.0000, 0.0029, SI = 0.0080, 0.1540, MAE = 0.0005, 0.0033 and NSE = 0.9999, 0.9649 for model development and validation periods, respectively.

Layers . | neurons . | a0 . | a1 . | a2 . | a3 . | a4 . | a5 . |
---|---|---|---|---|---|---|---|

1 | 1 | −1.5473 | 0.0024 | 2.0871 | 0.0000 | −0.5345 | −0.0029 |

2 | 13.6203 | −0.4349 | −29.9403 | −0.5146 | 14.6721 | 2.6730 | |

3 | −1.5211 | 0.0006 | 2.0254 | 0.0000 | −0.5023 | −0.0006 | |

4 | −1.4979 | 0.0007 | 2.0146 | 0.0000 | −0.5132 | −0.0005 | |

5 | −1.5142 | 0.0020 | 2.0188 | 0.0000 | −0.5042 | −0.0015 | |

2 | 1 | −0.0002 | 0.3328 | 0.6698 | −7.6643 | −8.7161 | 16.3742 |

2 | 0.0001 | −0.3447 | 1.3467 | −439.9819 | −446.5554 | 886.5322 | |

3 | −0.0004 | −0.0378 | 1.0426 | 49.8484 | 42.5808 | −92.4401 | |

4 | −0.0004 | 0.0463 | 0.9615 | −835.4749 | −841.8852 | 1677.3433 | |

5 | −0.0001 | 1.1741 | −0.1725 | −186.4012 | −183.7293 | 370.1263 | |

3 | 1 | 0.0001 | 0.2093 | 0.7894 | 231.0363 | 223.1737 | −454.2067 |

Layers . | neurons . | a0 . | a1 . | a2 . | a3 . | a4 . | a5 . |
---|---|---|---|---|---|---|---|

1 | 1 | −1.5473 | 0.0024 | 2.0871 | 0.0000 | −0.5345 | −0.0029 |

2 | 13.6203 | −0.4349 | −29.9403 | −0.5146 | 14.6721 | 2.6730 | |

3 | −1.5211 | 0.0006 | 2.0254 | 0.0000 | −0.5023 | −0.0006 | |

4 | −1.4979 | 0.0007 | 2.0146 | 0.0000 | −0.5132 | −0.0005 | |

5 | −1.5142 | 0.0020 | 2.0188 | 0.0000 | −0.5042 | −0.0015 | |

2 | 1 | −0.0002 | 0.3328 | 0.6698 | −7.6643 | −8.7161 | 16.3742 |

2 | 0.0001 | −0.3447 | 1.3467 | −439.9819 | −446.5554 | 886.5322 | |

3 | −0.0004 | −0.0378 | 1.0426 | 49.8484 | 42.5808 | −92.4401 | |

4 | −0.0004 | 0.0463 | 0.9615 | −835.4749 | −841.8852 | 1677.3433 | |

5 | −0.0001 | 1.1741 | −0.1725 | −186.4012 | −183.7293 | 370.1263 | |

3 | 1 | 0.0001 | 0.2093 | 0.7894 | 231.0363 | 223.1737 | −454.2067 |

### Intercomparison of regression and soft computing-based models

Contrasting soft computing-based designs (Table 2 and Figure 10) indicates that the MARS-based model works better than other regression and soft computing-based models. The potential of regression and soft computing-based models for predicting the aeration efficiency of Parshall and Modified Venturi flumes was assessed through performance indicators, agreement plots, and error as plotted in Figure 10 for the testing stage. The predicted values produced by the MARS-based model were found to be near the observed aeration efficiency of Parshall and Modified Venturi flumes as indicated in the plots. Also, the predicted aeration efficiency of Parshall and modified Venturi flumes values are noted to follow the analogous pattern as observed in actual aeration efficiency of Parshall and modified Venturi flumes. The unpruned M5P based model outperforms the pruned M5P based model. Overall, the MARS-based model performance was more accurate than the M5P, GMDH, MLR, MNLR, and RF-based models. The MLR based model is also suitable and better than MNLR for predicting the aeration efficiency of Parshall and modified Venturi flumes using this dataset. The box plot is plotted in Figure 11 for the comparison of observed (actual) and predicted values using various applied designs for the testing stage. Descriptive statistics of observed and applied models during the testing stage are listed in Table 7. Figure 11 and Table 7 also suggest that the MARS model was outperforming in comparison to other applied models. Minimum and maximum values of actual and predicted values using the MARS model are very close. The widths of the lower and upper Quartile are almost the same in Figure 11.

Statistic . | Observed . | MLR . | MNLR . | RF . | M5P_pruned . | M5P_unpruned . | MARS . | GMDH . |
---|---|---|---|---|---|---|---|---|

Minimum | 0.0389 | 0.0586 | 0.0678 | 0.0470 | 0.0450 | 0.0450 | 0.0388 | 0.0401 |

Maximum | 0.4261 | 0.4461 | 0.3779 | 0.3020 | 0.4300 | 0.4250 | 0.4146 | 0.4997 |

1^{st} Quartile | 0.0701 | 0.0816 | 0.0871 | 0.0753 | 0.0733 | 0.0733 | 0.0705 | 0.0703 |

Median | 0.1070 | 0.1054 | 0.1043 | 0.1065 | 0.1060 | 0.1055 | 0.1065 | 0.1065 |

Mean | 0.1225 | 0.1206 | 0.1136 | 0.1159 | 0.1211 | 0.1212 | 0.1208 | 0.1254 |

3^{rd} Quartile | 0.1450 | 0.1378 | 0.1211 | 0.1475 | 0.1430 | 0.1430 | 0.1459 | 0.1452 |

Statistic . | Observed . | MLR . | MNLR . | RF . | M5P_pruned . | M5P_unpruned . | MARS . | GMDH . |
---|---|---|---|---|---|---|---|---|

Minimum | 0.0389 | 0.0586 | 0.0678 | 0.0470 | 0.0450 | 0.0450 | 0.0388 | 0.0401 |

Maximum | 0.4261 | 0.4461 | 0.3779 | 0.3020 | 0.4300 | 0.4250 | 0.4146 | 0.4997 |

1^{st} Quartile | 0.0701 | 0.0816 | 0.0871 | 0.0753 | 0.0733 | 0.0733 | 0.0705 | 0.0703 |

Median | 0.1070 | 0.1054 | 0.1043 | 0.1065 | 0.1060 | 0.1055 | 0.1065 | 0.1065 |

Mean | 0.1225 | 0.1206 | 0.1136 | 0.1159 | 0.1211 | 0.1212 | 0.1208 | 0.1254 |

3^{rd} Quartile | 0.1450 | 0.1378 | 0.1211 | 0.1475 | 0.1430 | 0.1430 | 0.1459 | 0.1452 |

A Taylor diagram (Figure 12) is a graphical illustration of the performance of the developed models in terms of CC, RMSE, and standard deviation and indicates that the MARS model was the best performing model and the performance of the MNLR model was least successful for the prediction of aeration efficiency of Parshall and Modified Venturi flumes using this dataset.

### Sensitivity analysis

Analysis of sensitivity was carried out to find the most influential input variable in AE_{20} of Parshall and modified Venturi flumes. The analysis was done by utilizing the best performing model (MARS). The set of training data used fluctuated as it was developed after eliminating one input variable at a time. Results were listed in terms of CC, RMSE, MAE, and NSE in Table 8 which indicates that the oxygen deficit ratio (r) is the most effective input variable in predicting the AE_{20} of Parshall and Modified Venturi flumes using this dataset.

## CONCLUSION

In this research, M5P, Random Forest (RF), Multivariate Adaptive Regression Splines (MARS), and Group method of data handling (GMDH) were developed to predict aeration efficiency (AE_{20}) of Parshall and Modified Venturi flumes and then compared with multiple linear regression (MLR) and multiple nonlinear regression (MNLR) models. This was done by conducting experiments for 26 different Modified Venturi flumes and one Parshall flume. The comparison analysis using performance evaluation indices concludes that the MARS approach outperformed the other models (i.e., M5P, RF, GMDH, MLR, and MLR) during development (training) and validation (testing) periods, separately. Other major outcomes from this study are that the MLR based model performs better than MNLR based model. In M5P based models, the unpruned model works better than the pruned model. Overall, the M5P based models outperform GMDH, MLR, RF, and MNLR based models. By utilizing the anticipated best performing MARS design, sensitivity analysis was carried out to evaluate each input variable's effect on AE_{20}. The findings of the sensitivity analysis indicate that oxygen deficit ratio (r) was the most effective input variable for estimating AE_{20} of Parshall and Modified Venturi flumes using this dataset.

## CONFLICT OF INTEREST

None.

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

## REFERENCES

*PhD Dissertation*