Empirical modeling of turbidity removal in a dissolved air ﬂ otation system: application of arti ﬁ cial neural networks

Dissolved air ﬂ otation (DAF) is a physical separation process that uses air microbubbles to remove suspended material dispersed in a liquid phase. Even though DAF is considered a well-established unit operation, modeling it is dif ﬁ cult due to the complexity of the phenomena involved, resulting in conceptual models with no practical application. Thereby, the objective of this work was to evaluate empirical modeling ef ﬁ ciency in predicting the turbidity removal dynamic using arti ﬁ cial neural networks applied to a DAF prototype. For the study of the neural network input variables, a two-level, full-factorial design was utilized to verify the statistical signi ﬁ cance of the saturation pressure and the saturated water ﬂ ow rate in relation to the turbidity removal. Using a time-delay recurrent neural network architecture, two empirical models were proposed to simulate the dynamic behavior of the turbidity removal promoted by the DAF prototype. The real-time model provided good predictions with R ¼ 0.9717 and MSE ¼ 1.0482, and the simulation model was also able to predict the process behavior presenting performance criteria equal to R ¼ 0.9475 and MSE ¼ 1.8640. predicting the


GRAPHICAL ABSTRACT INTRODUCTION
Dissolved air flotation (DAF) is a unit operation capable of removing solid or liquid contaminant particles present in a liquid phase. At the beginning of its use (the early 1900s), DAF was widely used in the ore processing industry as a method of separating mineral ores (Edzwald ). Sixty years later, water and wastewater treatment plants began applying DAF to remove color, natural organic matter, and suspended particles using small air bubbles produced in a saturation vessel.
When the DAF process is applied to treating water for public supply purposes, the technique is used in the clarification stage of the raw water treatment to remove turbidity.
In this case, the flotation process usually takes place in rectangular tanks divided into a contact zone and a separation zone. In the contact zone, air microbubbles collide with coagulated impurity flocs to form particle-bubble agglomerates. These agglomerates have a lower density than water, so they rise to the separator surface where they form a floating layer that can be collected and removed in the separation zone (Edzwald ). The microbubbles are formed from the depressurization of the water flow that is saturated with compressed air at pressures ranging from four to six bar (Edzwald & Haarhoff ).
Therefore, the raw water feed flow rate, the saturation vessel pressure, and the air bubble flow rate injected into the flotation tank are the physical parameters related to the process efficiency. These parameters must be considered not only in the project phase but also during DAF unit operation. The coagulant and flocculant doses, the pH values, and the ambient temperature are another group of important parameters to guarantee the success of water treatment. However, these parameters are mainly related to the preliminary treatment stages of raw water. This paper focuses on modeling and simulating the relation between turbidity removal and the physical parameters directly related to the flotation stage efficiency.
Over the years, process simulation has been used for describing the behavior of industrial processes using adequate mathematical models capable of representing the real process operation. Once implemented, the simulation is used as an auxiliary tool for making decisions about operational changes in the process. It allows the prediction of the system reaction in different case scenarios without direct disturbances being applied to the real process. Many The absence of papers in this area can be explained by the fact that simulating the processes involved in DAF using first-principles models derived from mass, energy, and momentum balances is nevertheless an arduous task (Haarhoff ). This case scenario is due to the complexity of the phenomena involved, especially in the coagulation/ flocculation steps and in the collision and attachment processes among flocs and microbubbles. Besides that, although DAF is considered a well-established unit operation, in some countries (like Brazil) sedimentation is the most-used clarification process in water treatment stations due to its simplicity and low power consumption. However, despite being the conventionally applied clarification process, sedimentation presents several disadvantages when compared with DAF (e.g., the sludge produced is less concentrated than the float layer formed in DAF tanks, larger areas are required for the installation of the sedimentation tanks, and the residence time of the raw water being treated is higher; Zabel ).

).
In the last five years, artificial intelligence (AI) has returned to be a reason for enthusiasm in the academic and industrial environments. According to Venkatasubramanian (), after going through two periods of great stagnation, also known as 'AI winters', the application of AI techniques, such as the use of artificial neural networks, has been very promising in solving several complex problems. ANNs are structures that mirror the functioning of the human brain and its basic processing units are known as neurons. Several ANN architectures have already been proposed and studied by AI researchers in recent decades (Leijnen & van Veen ). Among these architectures, the recurrent neural networks are especially useful when there is observed to be a temporal dependence between the process data being analyzed (e.g. industrial processes, like the dissolved air flotation unit operation).
Given that water crisis is an actual scenario in Brazil and other countries, it is extremely important to understand how to increase the efficiency of the available water treatment technologies, such as developing a better understanding of the dynamic behavior of the phenomena observed in a DAF unit. Therefore, it is equally useful for industrial process operators to have at their disposal a support tool that enables quick and safe testing of different operating conditions. This can be achieved by a reliable DAF simulation model. Thus, the aim of this work is to develop an empirical ANN model to simulate the dynamic behavior of a DAF prototype. Knowing that turbidity is one of the main water potability assessment parameters, and therefore its monitoring is an essential task in water treatment stations, turbidity removal was the chosen output variable and a time-delay recurrent neural network architecture was employed.

DAF prototype
To collect the data needed to construct an ANN model, tests were carried out in a DAF prototype built and automated by  The clarified water flows out from the bottom of the separation zone, where a fraction is directed to a sand filter (4) that removes the nonfloating particles. The other fraction of the clarified water is directed to an online turbidimeter (6) before it is purged. The filtered water is stored in a buffer tank (5) and part of it is used to feed the saturator vessel.

Experimental runs and factorial design
To simulate the characteristics of superficial water collected in rivers and lakes, a raw synthetic water was prepared using red clay, a typical soil found in the state of São Paulo, Brazil.
The synthetic water was obtained by mixing tap water and the red clay; the amount of red clay required was proportional to the initial desired turbidity value. In this study, a constant value of 20 NTU was used for initial turbidity. Sodium aluminate 2% v/v (NaAlO 2 ) and Tanfloc SG ®  the maximum saturated water flow rate measured by the  The saturated water flow rate can also be defined in terms of the recycle ratio. The recycle ratio, calculated as the ratio between the saturated water and the raw water feed flow rates, is an estimated measure of the microbubble quantity injected into the flotation tank contact zone and can be calculated using Equation (1). Turbidity removal (Equation (2)) was the response variable chosen for analysis since its dynamic behavior represents the flotation unit efficiency and is the output variable used at the empirical modeling stage.
where Q SAT ¼ saturated water flow rate (L min À1 ); Q F ¼ raw water feed flow rate (L min À1 ); and RR ¼ recycle ratio (%).
The second part of the experimental procedure consisted of realizing tests where the physical variables related to the DAF (saturation pressure and recycle ratio) operation were perturbated. This strategy was adopted to ensure the observation of the turbidity removal dynamic behavior, thus allowing us to obtain a time series database. Four experiments were realized and the operational conditions applied are presented in Table 3.
Step perturbations were used.
The authors are aware that the disturbance ranges

Development of ANN models
Once all the experiments were performed, the four databanks generated were unified to form a single database composed of the time series that represent the dynamic behavior of the chosen input and output signals.
A time-delay recurrent neural network (TDRNN) architecture was used to model the turbidity removal behavior over time. In the TDRNN architecture, the output prediction is not only based on the current value of the interest variable but also considers the past-calculated output. Therefore, the model's calculated output values are fed back into the feed layer and used like input signals as well as the other exogenous input variables (Haykin ). This architecture was chosen due to its capability of modeling dynamic and nonlinear processes. The TDRNN architecture along with the variables studied in this work is represented in Figure 3.
In this work, two empirical models were proposed: a real-time model and a simulation model. The real-time  Not only a training stage, but also a validation stage was realized using an algorithm built in MATLAB (MathWorks, Inc., Natwick, MA). Training and validation test sets corresponded to 80% and 20% of the total points collected, respectively. The split into training and validation sets respected the temporal dependency of the data, therefore shuffling was not applied in order to maintain the chronological connection between the data samples. A sequential split was adopted (i.e. the first 80% of data points were used to train the ANN and validation used the last 20%).
The Levenberg-Marquardt backpropagation method was used to perform the supervised training step, the hidden layer(s) activation function was a hyperbolic tangent sigmoid transfer function, and the output layer activation function was a linear transfer function.
To avoid overfitting during the learning stage, the early stopping technique was applied using the validation set.
Early stopping is a method that interrupts the training pro-

Two-level, full-factorial design
To analyze the effects of the input signals chosen to feed the empirical neural model, a two-level, full-factorial design was first carried out. The Pareto graph is presented in Figure 4, which indicates that both saturation pressure and saturated water flow rate are statistically significant within a 95% confidence interval. Since replicates were realized, a pure error of 0.97 was calculated. Table 4 shows the effect values and the significance of the factors analyzed.
Between the two factors, the saturated water flow rate had the highest influence on turbidity removal. The positive effect indicates that the TBD R rises on average 2.70% ± 0.69% when the saturated water flow increases from 0.38 to 0.44 L/min.
The saturation pressure also presented a positive effect, thus TBD R increases about 2.12% ± 0.69% when the pressure is raised from six to seven bar.
These results are in qualitative agreement with theory described in the specialized literature (Edzwald ; Edzwald ; Edzwald & Haarhoff ). Therefore, it has been verified that these physical variables (saturation pressure and saturated water flow rate) affect the turbidity removal promoted by the DAF unit and are adequate to model its dynamic behavior.
According to Table 4, the interaction effect between the factors evaluated is negligible since it shows a p-value higher than 0.05. Therefore, when the pressure is increased or decreased this action does not influence the saturated water flow rate effect on the turbidity removal. Changes in the saturated water flow rate also do not influence the pressure on the turbidity removal.

ANN models
Once the statistical significance of the variables chosen as the ANNs' input signals was verified, tests with disturbances in these variables were performed (Table 3) The best network for the real-time model presented three time-delays in the input layer and three neurons in the hidden layer. Figure 6 shows the regression plots for the training and validation stages. The high regression coefficient and the low MSE value calculated (0.9717 and 1.0482, respectively) indicate that the ANN was able to map the relationships between inputs and outputs provided during training. In addition, the regression plots show agreement exists between the targets and the predicted values.
The comparison between the real dynamic behavior of TBD R and the one predicted by the ANN is shown in Figure 7(a). It is noticed that the real-time model's prediction was very close to the expected TBD R values. This means that the real-time model correctly assimilated the influence that the input variables (i.e., the DAF unit's physical parameters), the saturation pressure, the saturated water flow rate, and the raw water turbidity have on the calculated output variable (i.e., the turbidity removal at the flotation tank's exit). Another important highlight is that the ANN model   The prediction errors presented low amplitude, remaining in the range of À2% and 2% during the entire simulation.
Given the current configuration of the DAF prototype, the magnitude of the calculated errors is acceptable, since this variation is usually observed in all tests performed. The location of the in-line turbidimeter measurement outlet at  According to Table 5, the best ANN topology for the simulation model consisted of two hidden layers, with five and three neurons, respectively, and also three temporal delays in the input layer. Figure   On the other hand, the behavior presented by the simulation model indicates no overfitting occurred during the training step, as the neural network did not excessively memorize the information provided, but it was able to generalize and learn the global behavior of the temporal series that constituted the database used.
This is an important outcome, since it is very common for recurrent neural network architectures just to repeat the pattern of the dependent variable (i.e. the target) dislocated in time by the delay used, indicating a poor learning achievement. Figure 9 shows clearly that the simulation model is capable of generalization. The prediction errors presented amplitudes around 5%, which are higher than those observed for the real-time model, but they did not significantly affect the quality of the calculated outputs.
As presented in Although the R VALIDATION was around 0.7, the validation regression plot (Figure 8(b)) shows that the predicted and the real TBD R values are aligned and well described by the regression line. It is important to emphasize that the validation dataset is used to allow the application of the early stopping technique during the learning step. Therefore, validation is a support stage used to enhance the training performance and to avoid model overfitting. Besides that, considering the process modeled, the deviations observed in the validation regression plot are acceptable since they do not represent errors bigger than 5% in the predicted output variable (i.e., the turbidity at the flotation tank's exit). The same observations apply to the real-time model.
Therefore, the simulation model provides a prediction of the turbidity removal promoted by the DAF unit, once given the saturation pressure, the saturated water flow rate, and the raw water turbidity, without the need to perform an experimental run. This becomes quite useful in exploratory studies involving DAF and when a need exists to save the reagents used in the chemical pretreatment. In addition, the simulation model is extremely useful in investigating the process's operational conditions on an industrial scale, since a plant does not need to be disturbed or placed in unsafe conditions to obtain the desired behaviors.

CONCLUSIONS
The novel application of ANNs to model the dynamic behavior of a dissolved air flotation prototype treating water for public supply purposes proved to be an effective and adequate methodology. Using a two-level full-factorial design, it was found that it is possible to use DAF physical variables (saturation pressure and saturated water flow rate) as input signs to the neural models, since these variables are statistically significant and influence the removal of turbidity promoted by the process in the range of operational conditions tested. The TBD R rises on average 2.70% ± 0.69% and 2.12% ± 0.69% when the saturated water flow rate and the saturation pressure increase, respectively.
Both models provided predictions consistent with the targets used in the training stage. The time-delay recurrent neural networks were able to deal with the nonlinearity of the process and the temporal dependence of the input and output variables. The real-time model, fed with data directly measured by the prototype sensors during an experimental run, presented a high regression coefficient of 0.9717 and MSE of 1.0482. The simulation model, which predicted