## Abstract

Conventionally, impermeable weirs are employed for retaining, measuring, and regulating the water in the river. Today, alternative devices are more predominantly in vogue, which are made of locally available materials called gabion weirs chosen because the latter can better fulfill ecological needs due to their porous nature. Dissolved oxygen (*D.O.*) is one of the significant determinants for assessing the character of water bodies. This study mainly focuses on improving the estimation of the gabion oxygen transfer efficiency (*OTE _{20}*) to enhance its efficacy. The backpropagation neural network (

*BPNN*), adaptive neuro-fuzzy inference system (

*ANFIS*), and multi-variant linear and nonlinear regression

*(MVLR*and

*MVNLR*) are developed with experimental data to estimate the

*OTE*and their results are compared. In terms of statistical metrics, the

_{20}*BPNN*has proved to be the best-performing model. At the same time, triangular membership function (mf)-based

*ANFIS*is the second-best performing model. Nevertheless, other applied mf-based

*ANFIS*,

*MVLR*, and

*MVNLR*are giving a comparable performance. Input variable discharge per unit width (

*q*) is the most crucial parameter in the computation of the

*OTE*, followed by the gabion mean size (

_{20}*d*

_{50}). Major challenges are found in computing porosity of the gabion materials and optimal parameters of proposed data mining techniques.

## HIGHLIGHTS

Gabion weir oxygen transfer efficiency is studied in the laboratory.

Data-driven and conventional models are developed.

The

*BPNN*is found to be the best model, nevertheless other proposed models are giving comparable results.The discharge per unit width is the most sensitive input variable.

### Graphical Abstract

## INTRODUCTION

Oxygen is vital and indispensable for the sustenance of life on the planet since both the animal and plant kingdoms need oxygen for survival. One of the determinants to measure water quality is dissolved oxygen (*D.O.*), and water oxygen solubility is directly proportionate to pressure but inversely proportional to temperature (Baylar *et al.* 2010). Water takes away oxygen by two methods: the first is by the method of aeration, and the second is by the process of photosynthesis. The *D.O*. level gives information about the degree of contamination of the water and how best the water can support aquatic life (LeFevre *et al.* 2015). A higher *D.O*. potential exhibits superior water quality. The deficiency of the *D.O.* level in the watercourse severely affects the ecological order. To sustain ecological stability and equilibrium, the *D.O.* level needed for water bodies should be ensured and to meet this requirement, the oxygen of the atmosphere must be transmitted to water. The physiological process of oxygen transmission from the air to water is called aeration, which needs air and water in close touch and link. The efficacy of oxygen transfer depends on its comparison with the existing prototype weir surface contact between water and air, which is regulated eventually by the water drop sizes or bubbles of air. Oxygen transfer in rivers, ponds, lakes, and oceans is significant to the quality and survival of hydro-lives. Oxygen transfer in water bodies is very effective in the survival of aqua life. Entrapped air bubbles would add up to aeration if the water jets fall from drop structures freely into a receiving water pool. Fluidic devices enhance the amount of *D.O*. in a stream by self-aeration, although the water comes in touch with the hydraulic structure for only a fraction of the clock (Baylar *et al.* 2010). The volume of aeration usually would take place over numerous miles in a river stream and can occur at a sole fluidic device.

Oxygen transfer of the classical weirs was first investigated by Gameson (1957), but it was followed by many researchers (Van der Kroon & Schram 1969; Avery & Novak 1978; Nakasone 1987) who studied the aeration process at hydraulic structures and developed models for aeration efficiency with discharge and geometry of weir. However, Wilhems *et al.* (1993) reviewed the above-developed models and made them for many locations where Avery & Novak's (1978) model proved the best. Gulliver & Rindels (1993) analyzed data and recorded inherent errors with measurement tools for prototype aeration. Gabions spillway was studied by Peyras *et al.* (1992) and explored different forms of flow over the stepped surface. Wuthrich & Chanson (2015) tested the gabion weirs without and with capping and observed significant interactions between the seepage and overflow under-regulated low flow situations. However, lower performances were noticed in energy loss and oxygen transfer efficiency (OTE) for large discharges. Reeve *et al.* (2019) investigated the hydraulic potential of the gabion-stepped spillways with a numerical model. They concluded that the standard gabion step configuration performed well concerning the energy loss and inception point location.

Furthermore, two classical models were developed with regression methods to predict the water depth and the inception point location. Mohamed (2010) performed numerous tests to investigate the flow characteristics of the gabion weir and found that hydraulic characteristics for the gabion weir are different from those of the impervious weir. The head over the previous gabion weir is lower than that of the impervious weir for the same flow rate. Al-Fawzy *et al.* (2020) investigated the impact of hydraulic jumps on the energy loss in the gabion weirs. Mohamed Al-Mohammed & Hassan Mohammed (2015) conducted laboratory experiments on a gabion weir to examine the flow characteristics and found a relationship between upstream water depth and discharge for the various regimes. However, Shariq *et al.* (2020) developed the gabion weir discharge model through flow conditions. Kouadri *et al.* (2021) used eight soft computing methods for prediction of a water quality index (*WQI*). The artificial neural network (ANN) created a model for the chlorophyll-a levels of a shallow eutrophic lake (Mikri Prespa) located in northern Greece (Hadjisolomou *et al.* 2021). Forecasting the infiltration rate (*IR*) of treated wastewater (*TWW*) is essential in regulating clogging problems. Most investigators that calculate the *IR* using neural network models consider the characteristics variables of soil without considering those of *TWW*. So, this work aims to generate a model for forecasting the IR based on various combinations of *TWW* characteristics parameters. Therefore, two different ANN architectures, the multilayer perceptron model (*MLP*) and the Elman neural network (*ENN*), are used to develop the optimal model (Abdalrahman *et al.* 2022). Computational fluid dynamics (CFD) is considered a robust tool to predict the discharge coefficient. To bypass the computational cost of CFD-based assessment, the present study proposes data-driven modeling techniques, as an alternative to CFD simulation, to predict the discharge coefficient based on an experimental dataset (Chen *et al.* 2022). This investigation examines the effects of influential dimensionless factors on estimating one of the critical hydraulic characteristics of inflatable dams, namely the discharge capacity. Several parameters such as the proportion of total upstream head to dam height (*H*_{1}/*D _{h}*), the ratio of overflowing head to dam height (

*h/D*), the ratio of discharge per unit width to its maximum value (

_{h}*q/q*

_{max}), the ratio of the internal pressure of the tube to its maximum value (

*p/p*

_{max}) and the ratio of the longitudinal coordinate placement of each element to

*x*

_{max}are used. A hybrid model based on the particle swarm optimization (PSO) and the genetic algorithm (GA), PSO–GA, is proposed to improve the accuracy of the estimation by combining the advantages of both algorithms (Zheng

*et al.*2021). Hu

*et al.*(2021) used different soft computing models in estimating the overflow capacity of a curved labyrinth. For this, a total of 355 empirical data for six different congressional overflow models were extracted from the results of a laboratory study on labyrinth overflow models. The parameters of the upstream water head-to-overflow ratio, the lateral wall angle, and the curvature angle were used to estimate the discharge coefficient of curved labyrinth overflows. Based on various statistical evaluation indicators, the results show that those input parameters can be relied upon to predict the discharge coefficient. Specifically, the least-squares support vector machine–bat algorithm (LSSVM–BA) model showed the best prediction accuracy during the training and test phases. Such a low-cost prediction model may have remarkable practical implications as it could be an economic alternative to the expensive laboratory solution, which is costly and time-consuming.

### Basic aeration mechanism at weirs

*et al.*2020). Gameson (1957) entails three complex phases of aerations for a free-falling water jet into the pool of water, which includes first in the jet itself during falling, second through the surface of the water pool, and third in the water–air biphasic flow. Gameson (1957) also noticed that outstanding aeration process occurred during the last phase. In hydraulic drop devices, viz. spillways, weirs, canal falls, etc., oxygenation occurs because of a jump triggered and followed by the waterfall. The mechanism involved in transferring oxygen from air to water due to free-fall water jet over and through the weir is highly intricate and stochastic, which is dependent on the water jet velocity, the shape, and roughness of the weir along with the geometry of receiving water pool. The geometry of receiving water pool controls the turbulent mixing quality of bubbles’ movement path and residence time. Figure 1(a) and 1(b) depicts weir aeration mechanisms.

### The present state of knowledge, novelty, and objective of the study

It is usually recognized that numerous parameters regulate the aeration process. Eggers & Villermaux (2008) reviewed the water jet physics thoroughly. Several engineering and environmental developments involve oxygen transfer by getting the air bubbles generated when the additional liquid of the same or dissimilar properties strikes its surface; e.g., a water jet plunging into a pool, a breaking wave free-falling in the water body, etc. Everyday life is affected by polluted water. The D.O. level in the water bodies is depleted primarily owing to organic contaminants. The silt and sediment, a composition of organic and inorganic materials, offer a perfect ecosystem for the growth of germs, bacteria, microbes, organisms, etc. These organisms disintegrate the organic substance in the silt material utilizing the existing D.O. in the river. Therefore, the D.O. level in the water body depleted with time, and this shortfall is counterbalanced by aeration activity, i.e., oxygen transfers from air to water (Gulliver & Rindels 1993). Recently, Luxmi *et al.* (2022) used non-dimensional inputs for finding gabion weir aeration while Srinivas & Tiwari (2022) studied the gabion spillways aeration efficiency.

The preceding review study indicates that very few works have been conducted on the gabion weir oxygen transfer efficiency (*OTE _{20}*), and few or no models have yet been generated. Therefore, the issue of accurate estimation of the

*OTE*is left unresolved. Furthermore, ordinary regression models too cannot estimate the precise value of the

_{20}*OTE*due to the nonlinear and complicated mechanism that occurs during the flow in the gabion weir. So, an alternate way of data mining models could be the most suitable choice owing to their logical, reasonable, and flexible ability in tuning parameters. Recently, data mining models have been employed considerably in water resources and environmental engineering, where neither understanding of mechanisms is needed nor requires a laboratory/field model study (Baylar

_{20}*et al.*2007; Gerger

*et al.*2017; Sattar

*et al.*2019; Kumar

*et al.*2020; Tiwari & Sihag 2020; Tiwari 2021; Tiwari

*et al.*2022). The data mining models are highly efficient at mapping the actual mechanism of the simulated oxygen transfer applications. Among various data mining models, the proposed back propagation neural network (

*BPNN*) and adaptive neuro-fuzzy inference system (ANFIS) models are the most reliable and robust predictive models.

Major difficulties and challenges are faced in computing the porosity and the mean size of gabion particles resolved by the volumetric displacement method and sieve analysis technique, respectively. Besides, the other difficulty and challenge is that the water jet through gabion weirs strikes the middle of the aeration tank to acquire uniform aeration. Furthermore, finding optimal values for tuning parameters of proposed data mining methods is equally challenging, which are calculated by trial and error methods.

The novelty of the current study has many facets, as it outlines the estimation of the *OTE _{20}* by performing laboratory tests with varying discharge per unit width, drop height, porosity, and mean sizes of gabion particles. Secondly, the

*OTE*is estimated and compared with data mining models;

_{20}*BPNN*, and ANFIS utilizing observation data. Thirdly, these estimated values of the

*OTE*are compared with developed multivariate linear relation (

_{20}*MVLR*) and multivariate nonlinear relation (

*MVNLR*) models. Finally, the sensitivity investigation was also executed to get the comparative significance of input variables on the output results of the

*OTE*. The performance potential of these models is evaluated in terms of statistical metrics.

_{20}## MATERIALS AND METHODS

In this section, basic backgrounds, experimental methodology, and proposed data mining techniques have been discussed.

### Basic backgrounds

*OTE* is oxygen transfer efficiency at any water temperature and the *OTE _{20}* is oxygen transfer efficiency at 20 °C.

### Experimental setup and methodology

*D.O*. The tank is filled with a fixed quantity of tape water to find the amount of oxygen transfer efficiency (

*OTE*) of water in the aeration tank for a specific gabion weir. An assessed quantity of Na

_{20}_{2}SO

_{3}with CoCl

_{2}is mixed to drop the

*D.O.*potential of water in the tank down to around 1.5 mg/l (Kumar

*et al.*2021; Tiwari 2021), and some quantity of water is withdrawn at different levels of the water tank for measuring initial dissolved oxygen (

*C*). After that, the experimental test is started for an identified period, and meanwhile, it is made sure that the test device is run for a certain duration in such a manner that the aeration tank water D.O. level should be below the full saturation level (

_{upst}*C*) at the experiment temperature,

_{saturated}*T*(°C) and the final dissolved oxygen level (

*C*) is then computed with the azide modification technique. The water temperature in the aeration tank during the experimental test is recorded utilizing a thermometer. The method is repeated for every run by using one traditional weir and five gabion weirs. The value of

_{downst}*OTE*is then assessed. The test gabion weir is characterized by six exchangeable weirs with varying discharge per meter width (

_{20}*q*), mean sizes of gabion materials (

*d*

_{50}), porosity (

*n*), and 60 observations of gabion weir OTE are taken, and to cross-check the credentials of the dataset, some observations have also been repeated twice-thrice. The matrix of test observations is shown in Table 1.

Weirs . | d_{50} (mm)
. | n (%)
. | q (m^{2}/s)
. | h (m)
. |
---|---|---|---|---|

Gabion-1 | 18.07 | 46.8 | 0.0052 | 0.902, 0.922 |

0.0088 | 0.902, 0.922 | |||

0.0132 | 0.902, 0.922 | |||

0.0176 | 0.905, 0.925 | |||

0.0196 | 0.905, 0.925 | |||

Gabion-2 | 14.95 | 49.1 | 0.0052 | 0.902, 0.922 |

0.0088 | 0.909, 0.929 | |||

0.0132 | 0.909, 0.929 | |||

0.0176 | 0.909, 0.929 | |||

0.0196 | 0.909, 0.929 | |||

Gabion-3 | 16.23 | 40.23 | 0.0052 | 0.902, 0.922 |

0.0088 | 0.902, 0.922 | |||

0.0132 | 0.903, 0.923 | |||

0.0176 | 0.905, 0.925 | |||

0.0196 | 0.905, 0.925 | |||

Gabion-4 | 14.66 | 41.52 | 0.0052 | 0.903, 0.923 |

0.0088 | 0.909, 0.929 | |||

0.0132 | 0.909, 0.929 | |||

0.0176 | 0.909, 0.929 | |||

0.0196 | 0.909, 0.929 | |||

Gabion-5 | 18.32 | 30.1 | 0.0052 | 0.904, 0.924 |

0.0088 | 0.915, 0.935 | |||

0.0132 | 0.915, 0.935 | |||

0.0176 | 0.915, 0.935 | |||

0.0196 | 0.92, 0.94 | |||

Solid weir | 0 | 0 | 0.0052 | 0.925, 0.945 |

0.0088 | 0.93, 0.955 | |||

0.0132 | 0.935, 0.955 | |||

0.0176 | 0.935, 0.955 | |||

0.0196 | 0.935, 0.955 |

Weirs . | d_{50} (mm)
. | n (%)
. | q (m^{2}/s)
. | h (m)
. |
---|---|---|---|---|

Gabion-1 | 18.07 | 46.8 | 0.0052 | 0.902, 0.922 |

0.0088 | 0.902, 0.922 | |||

0.0132 | 0.902, 0.922 | |||

0.0176 | 0.905, 0.925 | |||

0.0196 | 0.905, 0.925 | |||

Gabion-2 | 14.95 | 49.1 | 0.0052 | 0.902, 0.922 |

0.0088 | 0.909, 0.929 | |||

0.0132 | 0.909, 0.929 | |||

0.0176 | 0.909, 0.929 | |||

0.0196 | 0.909, 0.929 | |||

Gabion-3 | 16.23 | 40.23 | 0.0052 | 0.902, 0.922 |

0.0088 | 0.902, 0.922 | |||

0.0132 | 0.903, 0.923 | |||

0.0176 | 0.905, 0.925 | |||

0.0196 | 0.905, 0.925 | |||

Gabion-4 | 14.66 | 41.52 | 0.0052 | 0.903, 0.923 |

0.0088 | 0.909, 0.929 | |||

0.0132 | 0.909, 0.929 | |||

0.0176 | 0.909, 0.929 | |||

0.0196 | 0.909, 0.929 | |||

Gabion-5 | 18.32 | 30.1 | 0.0052 | 0.904, 0.924 |

0.0088 | 0.915, 0.935 | |||

0.0132 | 0.915, 0.935 | |||

0.0176 | 0.915, 0.935 | |||

0.0196 | 0.92, 0.94 | |||

Solid weir | 0 | 0 | 0.0052 | 0.925, 0.945 |

0.0088 | 0.93, 0.955 | |||

0.0132 | 0.935, 0.955 | |||

0.0176 | 0.935, 0.955 | |||

0.0196 | 0.935, 0.955 |

### Data mining modeling techniques

Proposed data mining methods have been discussed in this section as

#### Backpropagation neural network

The *BPNN* is a soft computing method extensively applied in civil engineering (Tiwari & Sihag 2020). It is inspired by the human mind and based upon human brain structure divided into input, intermediate/hidden, and output layers. These layers are connected with weights, and each layer has a different number of neurons/nodes but operates in association to resolve even a complex problem.

*Q*) to the unit.where

_{j}*R*is the interconnecting weight from

_{ij}*i*to

*j*,

*x*is the value of the input in the input layer,

_{i}*Q*is the output result obtained from the activation function to yield output for unit

_{j}*j*. One may consult Haykin & Network (2004) for a comprehensive discussion. The most significant capability of the

*BPNN*s is that they can recognize the most intricate complex relations concerning input-output datasets, utilize consecutive calibration processes, and adjust themselves to the dataset. Additional details of the

*BPNN*could be located in Hassoun (1995). Numerous existing research works on the regression chores have established that

*BPNN*s accomplish better than classical regression analysis based upon the regression accuracy. Though

*BPNN*s are utilized for an extensive array of applications with satisfactory efficacy, it is broadly reported that

*BPNN*s are sensitive to numerous factors, for example, the volume and quality of the dataset, network structure, overfitting problems, and calibration factors (Yuan

*et al.*2008).

During the last two decades, numerous neural network water quality modeling works are carried out with very good predicting results (Hadjisolomou *et al.* 2021). The advantages of the *BPNN* are that it can learn and model nonlinear and complex relationships, and it can manage the relationship between inputs and outputs, as this is rarely simple. The *BPNN* also does not restrict the input variables, unlike other prediction techniques.

#### Adaptive neuro-fuzzy inference system

ANFIS is a soft hybrid method, denoted as an adaptive neuro-fuzzy inference system that amalgamated neural networks (NNs) and fuzzy inference systems (FISs) to gain their benefits and ever first coined by Jang (1993). An *ANFIS* technique utilizes NNs for resolving nonlinear and complex cases and their capability to distinguish and establish relationships between different variables. It also uses FIS to rationalize complex scenarios, utilizing the ideologies extracted from human decision-making (Tiwari & Sihag 2020). To get rid of the weaknesses of NNs and FIS, ANFIS techniques have already been successfully utilized as a reliable estimating tool for OTE, sediment trapping efficiency, a discharge correction factor of Parshall flume and plunging hollow jet penetration depth, and many more problems related to the discipline of water resources, environmental, etc. of civil engineering (Tiwari *et al.* 2020; Saran & Tiwari 2020; Sharafati *et al.* 2021; Tiwari *et al.* 2022). The *ANFIS* technique creates an association between input and output variables by applying the linguistic terminologies. These If-Then rules have a large capacity to deal with nonlinearity or stochastic or dynamical problems. From fuzzy logic, each input (in terms of *x* and *y*) is expressed as a fuzzy set (*P _{i}* and

*Q*) with one output f

_{i}_{i}(Jang 1993). The rules are utilized as

*x*is

*S*

_{2}and

*y*is

*t*

_{2};where

*s*,

*t*, and

*r*are the design variables computed during calibration activity. Figure 3 depicts the

*ANFIS*general structure, and for a complete discussion regarding the

*ANFIS*technique, one can refer to Jang (1993).

The *ANFIS* model has the advantage of having both numerical and linguistic knowledge. *ANFIS* also uses the *BPNN*'s ability to classify data and identify patterns. Compared to the *BPNN*, the *ANFIS* model is more transparent to the user and causes fewer memorization errors.

#### Multivariate linear relation

It is a well understood and popular algorithm in statistical soft computing. It has been proved to have one of the best performing potentials in prediction of water quality (Chou *et al.* 2018; Luxmi *et al.* 2022).

The most important advantage of multivariate linear regression is that it helps us to understand the relationships among variables present in the dataset. This will further help in understanding the correlation between dependent and independent variables.

#### Multivariate nonlinear relation

*OTE*as the dependent parameters, and

_{20}*d*,

_{50}*n*,

*q*, and

*h*are independent variables. A nonlinear predictive model is created with the training dataset. The

*OTE*is stated in dependent parameters as .

_{20}*A*is the multiplying constant, and

*a*,

*b*,

*c*, and

*d*are the function's power variables based on the minimization of square error summation.

Nonlinear regression is a mathematical function that uses a generated line – typically a curve – to fit an equation to some data. The sum of squares is used to determine the fitness of a regression model, which is computed by calculating the difference between the mean and every point of data.

## RESULTS AND DISCUSSION

### Dataset

For the invoking of data mining models, a total of 60 observations of *OTE _{20}* are determined employing the weirs with different discharge per unit width (

*q*), porosity (

*n*), mean size (

*d*

_{50})

_{,}and drop height (

*h*). Two groups are created from the total observations for training (70% observations) and testing (30%). Data grouping is carried out randomly. The statistical characteristics of both groups of data are described in Table 2.

Variables . | q (m^{2}/s)
. | d_{50} (mm)
. | h (m)
. | n (%)
. | OTE
. _{20} |
---|---|---|---|---|---|

Training dataset | |||||

Mean | 0.01 | 13.71 | 0.92 | 34.63 | 0.61 |

Median | 0.01 | 15.59 | 0.92 | 40.88 | 0.62 |

Std. deviation | 0.01 | 6.36 | 0.02 | 16.82 | 0.11 |

Variance | 0.00 | 40.47 | 0.00 | 282.75 | 0.01 |

Kurtosis | −1.61 | 1.13 | −0.51 | 0.49 | 0.08 |

Skewness | −0.11 | −1.65 | 0.55 | −1.37 | −0.35 |

Minimum | 0.01 | 0.00 | 0.90 | 0.00 | 0.34 |

Maximum | 0.02 | 18.32 | 0.96 | 49.10 | 0.86 |

Count | 42 | 42 | 42 | 42 | 42 |

Test dataset | |||||

Mean | 0.012 | 13.71 | 0.92 | 34.63 | 0.60 |

Median | 0.01 | 15.59 | 0.92 | 40.88 | 0.58 |

Std. deviation | 0.01 | 6.47 | 0.01 | 17.10 | 0.08 |

Variance | 0.00 | 41.83 | 0.00 | 292.25 | 0.01 |

Kurtosis | −1.43 | 1.58 | 1.32 | 0.82 | −0.97 |

Skewness | −0.41 | −1.74 | 0.44 | −1.44 | 0.00 |

Minimum | 0.01 | 0.00 | 0.90 | 0.00 | 0.45 |

Maximum | 0.02 | 18.32 | 0.96 | 49.10 | 0.72 |

Count | 18 | 18 | 18 | 18 | 18 |

Variables . | q (m^{2}/s)
. | d_{50} (mm)
. | h (m)
. | n (%)
. | OTE
. _{20} |
---|---|---|---|---|---|

Training dataset | |||||

Mean | 0.01 | 13.71 | 0.92 | 34.63 | 0.61 |

Median | 0.01 | 15.59 | 0.92 | 40.88 | 0.62 |

Std. deviation | 0.01 | 6.36 | 0.02 | 16.82 | 0.11 |

Variance | 0.00 | 40.47 | 0.00 | 282.75 | 0.01 |

Kurtosis | −1.61 | 1.13 | −0.51 | 0.49 | 0.08 |

Skewness | −0.11 | −1.65 | 0.55 | −1.37 | −0.35 |

Minimum | 0.01 | 0.00 | 0.90 | 0.00 | 0.34 |

Maximum | 0.02 | 18.32 | 0.96 | 49.10 | 0.86 |

Count | 42 | 42 | 42 | 42 | 42 |

Test dataset | |||||

Mean | 0.012 | 13.71 | 0.92 | 34.63 | 0.60 |

Median | 0.01 | 15.59 | 0.92 | 40.88 | 0.58 |

Std. deviation | 0.01 | 6.47 | 0.01 | 17.10 | 0.08 |

Variance | 0.00 | 41.83 | 0.00 | 292.25 | 0.01 |

Kurtosis | −1.43 | 1.58 | 1.32 | 0.82 | −0.97 |

Skewness | −0.41 | −1.74 | 0.44 | −1.44 | 0.00 |

Minimum | 0.01 | 0.00 | 0.90 | 0.00 | 0.45 |

Maximum | 0.02 | 18.32 | 0.96 | 49.10 | 0.72 |

Count | 18 | 18 | 18 | 18 | 18 |

### Accuracy and statistical error metrics

*OTE*

_{2}_{0}, two performance measurement metrics, coefficient of correlation (

*cc*), and root mean square error (

*rmse*) are calculated using datasets. Analysis of variance

*(ANOVA*) test has also been conducted to check the numerical variation in observed and estimated values of various approaches.where ; ;

*N*is the number of observations.

The coefficient of correlation (*cc*) represents goodness of fitting. A correlation helps to identify the absence or presence of a relationship between two variables, i.e., awareness of behavior between two parameters. The best possible value is 1. It can have negative value as well but it signifies that the model is worse performing as variables change in the opposite directions. While *rmse* is the most widely used for evaluating the potential for the assessment of the model in prediction of quantitative dataset. The *rmse* gives a relatively high weight to large errors. This means the *rmse* is the most useful when large errors are particularly undesirable. The *rmse* can range from zero to infinitive.

### Results of *MVLR*

*MVLR*model was developed using XLSTAT software to test the selected model. The input variables for the other model are utilized as the estimators, while the

*OTE*

_{2}_{0}is used as response parameters. The generated form of the

*MVLR*model from training data is given in Equation (4), and the fitness and suitability of this developed form are checked through the test data. The outcomes in the agreement diagram for training and testing datasets as depicted in Figure 4. The estimated error and accuracy of the results are assessed by performance metrics (Table 3). Figure 4 presents a scatter plot of OTE data points at the gabion weir. Relatively, the bulk of the training data points for the training phase is near to zero error line (along the perfect line), while for the testing stage, the testing data points are slightly away from the very zero error line, which implies that the generated predictive model is not so precise and accurate. This contention is further buttressed from the perusal of Figure 5, where the variation of the computed data has been shown against the experimental values of the

*OTE*for the test and training datasets. Table 3 suggests that cc and

_{20}*rmse*are 0.883 and 0.265, respectively, for the test dataset, which means the correlation is moderately good but high error as

*cc*= 0.902 and

*rmse*= 0.047 that of the training dataset. It suggests that this model performs well in the training and testing phases.

Approaches . | Training . | Testing . | ||
---|---|---|---|---|

cc
. | rmse
. | cc
. | rmse
. | |

BPNN | 0.956 | 0.033 | 0.900 | 0.041 |

MVLR | 0.902 | 0.047 | 0.844 | 0.048 |

MVNLR | 0.930 | 0.024 | 0.883 | 0.265 |

ANFIS triangular mf (ANFIS_TRI) | 0.976 | 0.024 | 0.846 | 0.051 |

ANFIS trapezoidal_mf (ANFIS_TRAP) | 0.933 | 0.039 | 0.812 | 0.060 |

ANFIS gbell mf (ANFIS_GBELL) | 0.968 | 0.028 | 0.676 | 0.082 |

ANFIS gauss mf (ANFIS_GAUSS) | 0.976 | 0.024 | 0.682 | 0.080 |

Approaches . | Training . | Testing . | ||
---|---|---|---|---|

cc
. | rmse
. | cc
. | rmse
. | |

BPNN | 0.956 | 0.033 | 0.900 | 0.041 |

MVLR | 0.902 | 0.047 | 0.844 | 0.048 |

MVNLR | 0.930 | 0.024 | 0.883 | 0.265 |

ANFIS triangular mf (ANFIS_TRI) | 0.976 | 0.024 | 0.846 | 0.051 |

ANFIS trapezoidal_mf (ANFIS_TRAP) | 0.933 | 0.039 | 0.812 | 0.060 |

ANFIS gbell mf (ANFIS_GBELL) | 0.968 | 0.028 | 0.676 | 0.082 |

ANFIS gauss mf (ANFIS_GAUSS) | 0.976 | 0.024 | 0.682 | 0.080 |

### Results of *MVNLR*

*OTE*

_{20}as target and input variables like flow parameter (

*q*), gabion material parameters (

*d*

_{50}and

*n*), and drop height (

*h*), model Equation (5) is formulated, and the fittingness and suitability of this relation are checked on the remaining balanced testing data. The outcomes of training and testing datasets are shown as an agreement diagram in Figure 6. From the careful examination of Figure 6, it is clear that computed data points through the

*MVNLR*for training lie near the perfect line. Still, for testing data, predicted values are a little bit away, which suggests that the model is computing poorly in the testing stage.

*cc*and

*rmse*are 0.883 and 0.265, respectively, for the test dataset, which means the correlation is moderately good but high error while cc = 0.902 and

*rmse*= 0.047 that of the training dataset. The outcomes of

*OTE*

_{20}for training and testing datasets are shown as the variation of the computed data against the experimental values in Figure 7. From Figure 7, the above statement is further reinforced. Besides, Table 3 suggests that

*cc*and

*rmse*are 0.883 and 0.265, respectively, for the test dataset, which means the correlation is reasonably good but high error while

*cc*= 0.930 and

*rmse*= 0.0247 that of the training dataset.

### Results of the *BPNN*

*BPNN*methods is adopted for estimating the

*OTE*. The data mining method in the present study is a multilayer

_{20}*BPNN*. The

*BPNN*is composed of multiple layers, and each layer has many neurons. The layers among them are interlinked with weighted coefficients. Generally, three kinds of layers occur in the

*BPNN*model; the initial (first) layer signifies inputs while the middle or second (hidden) layer for computing input weights, and the last (third) layer is the output layer. The development of the

*BPNN*involves three stages; the preparation of data for training is the first stage, the second stage involves various permutations and combinations of optimal network architectures, and the final third stage is testing. The number of hidden layers and neurons are selected by trial and error, and the best network topology is supposed to be that which gives very close to the desired results, i.e., after computing training error, the error is fed back to the input layer. The weighted connection (Tiwari 2006) of the input constituents is signified as is the output variable

_{,}is the input variable and

*y*is the number of nodes (neurons) that link to the

*x*th node. characterizes bias, and shows a weighted coefficient. The

*BPNN*modeling is executed through open

*WEKA*software. The optimal topology of the

*BPNN*model is shown in Figure 8, and its value of optimum tuning parameters is shown in Table 4.

BPNN topology
. | Number of hidden Layers . | Momentum . | Learning rate . | Iteration . |
---|---|---|---|---|

4-9-1 | 1 | 0.2 | 0.3 | 1,500 |

BPNN topology
. | Number of hidden Layers . | Momentum . | Learning rate . | Iteration . |
---|---|---|---|---|

4-9-1 | 1 | 0.2 | 0.3 | 1,500 |

*OTE*of the training and testing data points is plotted against their corresponding experimental data points shown in Figure 9. The performance metrics for the training and testing steps of the

_{20}*BPNN*model are shown in Table 3. The values of cc = 0.956 and

*rmse*= 0.033 are achieved from model generation, while in the testing steps, the values of cc and

*rmse*are 0.900 and 0.041, respectively (Table 3). The lower the value of error (

*rmse*) and the higher the value of cc for the

*BPNN*model in the testing stages, the better potential estimation by

*BPNN*. Figure 10 describes the variation of the computed data with the experimental values of the

*OTE*for training and testing.

_{20}### Results of *ANFIS*

This work uses ANFIS to model the correlation between inputs and output variables. The model utilizes MATLAB-fuzzy rules based upon the membership function (mf). No definite rule exists for generating the ANFIS model, and selecting ANFIS tuning parameters requires a hit and trial method. ANFIS model generation is somewhat identical to the *BPNN* model and involves hidden layers, neurons in the hidden layer, mfs as well as optimization processes.

The addition of mf numbers is done one after the other to each parameter and models with four different shapes, i.e., triangular mf (*Tri*), trapezoidal mf (*Trap*), generalized bell-shaped mf (Gbell), and Gaussian mf (*Gauss*) are trained and tested. Thus, the *ANFIS* model is checked for accuracy and error on the testing datasets with the performance metrics. Depending on numerous training and corresponding testing of the models, (2-3-2-2) is the input combination of the mf number utilized in the present study. Model-specific parameters chosen in the current work are listed in Table 5.

Input mf Number . | Input mf shape . | ANFIS
. | Optimization . | Output mf type . | Epochs . |
---|---|---|---|---|---|

2-3-2-2 | Tri, Trap, Gbell, Gauss | Sugeno | Backpropagation | Linear | 6 |

Input mf Number . | Input mf shape . | ANFIS
. | Optimization . | Output mf type . | Epochs . |
---|---|---|---|---|---|

2-3-2-2 | Tri, Trap, Gbell, Gauss | Sugeno | Backpropagation | Linear | 6 |

*ANFIS*is shown in Figure 11. From this, it is evident that the triangular-shaped mf outperforms other shaped mfs based on the

*ANFIS*model. This aspect is further corroborated by a perusal of Figure 12, where estimated data points by triangular-shaped mf are nearest to experimental values. Table 3 also suggests where cc is highest while

*rmse*is lowest compared to other proposed

*ANFIS*models. However, other

*ANFIS*models are also performing well.

### Comparison of results

*BPNN*model has been found to be the best performing model compared with other applied models, and the second-best performing model is

*MVNLR*. However,

*BPNN*,

*ANFIS*,

*MVLR*, and

*MVNLR*models depict good accuracy as most of the computed values of

*OTE*lie inside ±30% error lines barring a few points with upper

_{20}*OTE*

_{20}ranges. Owing to higher cc, computation with the

*BPNN*is better than

*ANFIS*. The models obtained from multiple variant linear regression (

*MVLR*) and also from numerous variant nonlinear regression (

*MVNLR*) give comparable results (Table 3).

*BPNN*results are very near to the experimental values. Besides, outcomes of a single-factor ANOVA (Table 6) suggest that insignificant differences between observed and computed values have been found using all considered models. Therefore, the overall comparison of the outcomes suggests

*BPNN*proved to be the most effective tool in computing the OTE of the Gabion weir. Reasons may be attributed to multiple flexibilities in tuning parameters like the number of hidden layers, the number of neurons in the hidden layer, momentum, learning rate, and epoch but

*BPNN*has a lower number of tuning parameters in comparison to other proposed computing models, especially ML-based ANFIS models. So, optimal values of these tuning parameters can be achieved easily and hence give results closer to actual value (experimental value). Besides, the

*BPNN*model has the capacity to compute and consider all complex and nonlinear variables which are responsible for oxygen transfer in the gabion weir flow, however other proposed models do not have such ability

**.**

Model . | F
. | P-value
. | F-crit
. | Variation in experimental and computed values . |
---|---|---|---|---|

BPNN | 0.004 | 0.95 | 4.13 | Insignificant |

MVLR | 0.030 | 0.86 | 4.13 | Insignificant |

MVNLR | 0.121 | 0.73 | 4.16 | Insignificant |

ANFIS_TRI | 0.04 | 0.844 | 4.13 | Insignificant |

ANFIS_TRAP | 0.59 | 0.45 | 4.13 | Insignificant |

ANFIS_GBELL | 0.024 | 0.88 | 4.13 | Insignificant |

ANFIS_GAUSS | 0.02 | 0.89 | 4.13 | Insignificant |

Model . | F
. | P-value
. | F-crit
. | Variation in experimental and computed values . |
---|---|---|---|---|

BPNN | 0.004 | 0.95 | 4.13 | Insignificant |

MVLR | 0.030 | 0.86 | 4.13 | Insignificant |

MVNLR | 0.121 | 0.73 | 4.16 | Insignificant |

ANFIS_TRI | 0.04 | 0.844 | 4.13 | Insignificant |

ANFIS_TRAP | 0.59 | 0.45 | 4.13 | Insignificant |

ANFIS_GBELL | 0.024 | 0.88 | 4.13 | Insignificant |

ANFIS_GAUSS | 0.02 | 0.89 | 4.13 | Insignificant |

### Sensitivity investigation

To work out the effective input parameters in the computation of the OTE at the gabion weir_{,} a sensitivity investigation was performed with the *BPNN* as this model depicted the highest computed accuracy for this dataset. Firstly, models utilizing different parameters are made out, and corresponding values of *cc* and *rmse* are measured (Table 7). The model's performance potential during training and testing is checked by eliminating each input parameter one after the other. The highest fluctuation in the results of *cc* and *rmse* is noted when discharge per unit length (*q*) is pulled out from the input grouping in testing, which implies the critical importance of *q* in affecting the computation of *OTE*_{20,} followed by the size of gabion particle (*d*_{50}) input parameter

Input combination . | Input parameter removed . | Training . | Testing . | ||
---|---|---|---|---|---|

cc
. | rmse
. | cc
. | rmse
. | ||

q, d _{50}, n, h | – | 0.956 | 0.033 | 0.900 | 0.041 |

q, d _{50}, n | h | 0.8713 | 0.0468 | 0.85 | 0.047 |

q, d _{50}, h | n | 0.87 | 0.048 | 0.87 | 0.043 |

d _{50}, n, h | q | 0.85 | 0.05 | 0.15 | 0.095 |

q, n, h | d_{50} | 0.86 | 0.047 | 0.80 | 0.05 |

Input combination . | Input parameter removed . | Training . | Testing . | ||
---|---|---|---|---|---|

cc
. | rmse
. | cc
. | rmse
. | ||

q, d _{50}, n, h | – | 0.956 | 0.033 | 0.900 | 0.041 |

q, d _{50}, n | h | 0.8713 | 0.0468 | 0.85 | 0.047 |

q, d _{50}, h | n | 0.87 | 0.048 | 0.87 | 0.043 |

d _{50}, n, h | q | 0.85 | 0.05 | 0.15 | 0.095 |

q, n, h | d_{50} | 0.86 | 0.047 | 0.80 | 0.05 |

## CONCLUSIONS

A gabion weir is supposed to be more ecologically responsive than a conventional one, as its perviousness permits materials and water-living animals to move through it. Fluidic devices have a rejuvenating effect on D.O. levels in a water body, though the water is in contact with the fluidic device for a shorter time. Fluidic devices can produce turbulence in which bubbles are carried away to the flow end and, in turn, enhance *D.O.* concentrations, though the water would be in contact with the device for only a brief period. The present study used *MVLR*, *MVNLR*, *BPNN*, and *ANFIS* with mfs to compute the oxygen aeration efficiency. Four key input parameters, *q*, *n*, *d*_{50}, and *h*, are considered, and 60 observations were collected from experimental tests. The following crucial takeaways could be drawn from this investigation:

The

*BPNN*is found to be the outperforming model where neurons in the hidden layer are the most sensitive tuning parameters, and its optimal value is found to be nine and also optimized values of their other significant parameters; learning rate, momentum, and the number of epochs are 0.3, 0.2, and 1,500, respectively.The triangular mf-based

*ANFIS*,*MVNLR*, and*MVLR*give comparable results. However, the least performing model is generalized mf-based*ANFIS*.Outcomes of a single-factor

*ANOVA*suggest that insignificant differences between experimental and computed values have been found using all considered models.Results of the sensitivity investigation show that discharge per unit width (

*q*) is the most sensitive variable in the computation of the oxygen transfer efficiency (*OTE*_{20})_{,}while the second significant sensitive variable is the mean size (*d*_{50}) of gabion material.

Due to constraints of the study, the limited number of datasets is taken, so for the improvement in the work, more dataset is required from the same source or another for reaching out with definite inference; nevertheless, these models may assist the researchers and project engineers in computing the *OTE*_{20}. Furthermore, the considered models may also be compared with other data mining algorithms and classical models.

## AUTHOR CONTRIBUTIONS

N.K.T., K.L., and S.R. conceptualized the study; N.K.T. and K.L. did formal analysis and investigation; N.K.T. wrote and prepared the original draft; N.K.T. and S.R. wrote, reviewed, edited, and supervised the article.

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.

## CONFLICT OF INTEREST

The authors declare there is no conflict.

## REFERENCES

*Optimal Design of Silt Ejector*