ABSTRACT
In this research, a parallel hybrid model is presented for the simulation of nitrogen removal by submerged biofiltration of a very large-size wastewater treatment plant. This hybrid model combines a mechanistic and a machine learning model to produce accurate predictions of water quality variables. The models are calibrated and validated using detailed and quality-controlled operational data collected over a period of 3.5 months in 2020. The mechanistic model is a modified activated sludge model that describes the biological, physical and chemical processes taking place in a biofilm reactor based on the domain knowledge of these processes. A three-layer feed-forward artificial neural network with a rectified linear activation function that aims to reduce the mechanistic model's residual error and then correct its output. The results show how the hybrid model outperforms and significantly reduces the size of the mechanistic model's prediction errors of the effluent nitrate concentration from a relative mean error of 12% (mechanistic model) to 2% (hybrid model) during training. The error on nitrate simulations increases to 8% during hybrid model testing, still significantly lower than the error of the mechanistic model. These results support future operational applications of hybrid biofilm models, such as in digital twins.
HIGHLIGHTS
First hybrid model for a biofilm wastewater system for nitrogen removal.
Feed-forward neural networks came out as the best of different data-based modelling approaches.
Validation error (15 min interval) of the hybrid model is three times smaller than the error of the mechanistic-only model.
The hybrid model is designed so that it can be used in digital twin applications.
ABBREVIATIONS
- ANN
Artificial neural network
- ASM
Activated sludge model
- COD
Chemical oxygen demand
- CODs
Soluble fraction of the chemical oxygen demand
- CSTR
Continuous stirred tank reactor
- DDM
Data-driven model
- HM
Hybrid model
- MAE
Mean absolute error
- ME
Mean error
- ML
Machine learning
- MM
Mechanistic model
- MSE
Mean squared error
- NIT
Nitrification
- PostDN
Postdenitrification
- PreDN
Predenitrification
- R2
Coefficient of determination
- RMSE
Root mean square error
- SVM
Support vector machines
- TKN
Total Kjeldahl nitrogen
- TSS
Total suspended solids
- WRRF
Water resource recovery facility
INTRODUCTION
Improving the performance and efficiency of water resource recovery facilities (WRRFs) involves strengthening the control of the processes that take place there. Objectives for the wastewater treatment sector nowadays include optimization of energy performance, reduction of the use of reagents, of greenhouse gas emissions and of the overall environmental impact of operations, improvement of the recovery of resources and the stabilization of effluent quality. An important field of research aims to study the potential of the application of artificial intelligence techniques to achieve this aspired improvement in the performance of WRRFs (Schneider et al. 2022).
Wastewater treatment includes several successive physical, biological and chemical processes, that are non-linear and non-stationary, which complicates the optimization of their management (Henze et al. 2008). To facilitate and improve treatment and ensure the achievement of performance in accordance with regulatory obligations, in recent decades, the instrumentation, control and automation of WRRFs have developed significantly (Ingildsen & Olsson 2016; Corominas et al. 2018) and so is the development of mathematical models for simulation, scenario analysis and support of process control (Garrido-Baserba et al. 2020).
Since the 1990s, mechanistic dynamic models describing the suspended growth process of biomass in activated sludge systems, such as the activated sludge model 1 (ASM1) and ASM2 models (Henze et al. 2000) have acquired a solid and valid basis within the scientific community. These models contain many parameters and may require significant calibration when used outside typical conditions (Rieger et al. 2012). In the case of treatment by biofiltration, in which a dense ‘fixed-culture’ biomass is exploited, the available models that also need to describe biofilm-specific processes (e.g., diffusion, competition for space among several types of biomasses and regular backwashing to prevent filter clogging) are considered more complex (Eberl et al. 2006).
A literature review of mechanistic modelling studies for nitrogen removal by biofiltration reveals that over the last 25 years, the models have indeed become increasingly complex to include more processes taking place in both the biofilm and bulk liquid compartment, the gaseous compartment, and the transfer between these compartments. Behrendt (1999) developed a two-step nitrification biofilter model; however, it failed to account for the evolution of biomass growth and decay or the transport of particulates (Samie et al. 2010). Falkentoft et al. (2000) developed a model based on ASM2d describing denitrification and phosphorus removal in a biofilter. It introduced the concept of an initial biofilm thickness to account for the impact of filter washing. The model of Viotti et al. (2002) simulates concentration profiles of the chemical oxygen demand (COD) and NH4-N inside the biofilm and along the filter bed. It also models the accumulation of biomass interfering with filtration efficiency and affecting head loss due to clogging. Hidaka & Tsuno (2004) added NO3-N concentration profiles over the filter bed represented by five continuously stirred tank reactors (CSTRs). However, it does not account for diffusion, assuming complete substrate penetration into a uniform biofilm consisting of one layer.
Samie et al. (2010) adequately developed a biofiltration model with a three-layered biofilm for carbon and nitrogen (nitrification followed by denitrification) removal calibrated with daily laboratory samples from a large WRRF placing the modelled stages in series. Moreover, this model estimated greenhouse gas emissions (Samie et al. 2011). Vigne et al. (2010) successfully applied a model with six CSTRs, each containing a five-layered biofilm with a fixed-depth boundary layer and bulk liquid compartment to a nitrification biofilter with operational data from a large WRRF. During washing, partial mixing of the biofilter media along the filter bed is modelled, reducing the vertical gradient of biofilm properties.
Following that work, Bernier et al. (2014) modelled a biofilter with seven CSTRs but with a simplified two-layered biofilm. It computes head loss as a function of the thickness of the biofilm layers. The model was successfully calibrated with operational data from a large WRRF. Fiat et al. (2019) provided a model extension to include the main nitrous oxide (N2O) biological pathways during nitrification and denitrification.
Finally, Zhu (2020) enhanced the biofiltration model of Bernier et al. (2014) by returning to a five-layer biofilm but making the boundary layer variable thickness depending on hydraulic conditions. It not only considers media mixing during washing but adds less intense mixing during normal operation. The biological production and removal of N2O and a more detailed gas–liquid transfer process were added. Energy consumption could be calculated as well with this model. Placed in series, it is a biofilter model integrating three stages of nitrogen removal (predenitrification–nitrification–postdenitrification).
The increased complexity of these mechanistic biofilter models is making their parameterization, calibration, and validation much more laborious, requiring qualified experts and highly frequent observational data of good quality. This can become a bottleneck to the application of mechanistic models in an operational context (Vanrolleghem et al. 2005). In addition to the complexities mentioned in the review above, biofiltration models ‘suffer’ from a lack of widely accepted guidelines available to the modelling community (Rittmann et al. 2018). This increased complexity and inherent numerical challenges also require more computing power, which makes them less suitable for real-time applications where model updates are needed in real time with a short time horizon of action (Schneider et al. 2022).
Conversely, data-driven models that use artificial intelligence techniques to find patterns in data are computationally very fast, making them very interesting for real-time applications (Duarte et al. 2023). These models describe the system only based on information extracted from the provided data and have strong interpolation features, but they are less reliable for making predictions outside of training conditions (Newhart et al. 2019). Data-driven models typically require larger datasets than mechanistic models. Related to the inherent characteristics of wastewater treatment processes, which are nonlinear and show a large time dependency, frequent retraining of a data-driven model (DDM) remains necessary (Torfs et al. 2022).
Hybrid modelling combines the use of mechanistic models with data-based models and can take advantage of each approach (Von Stosch et al. 2014a): a mechanistic model that incorporates relevant knowledge about the processes, and a DDM that increases the predictive power of the model by including information on lesser-known sub-processes at reduced computing power. The training and testing of machine learning (ML) models is much faster than the calibration and validation of mechanistic models (Sundui et al. 2021). In recent literature, hybrid models (HMs) describing activated sludge systems are reported to improve model performance and support process monitoring and control (Lee et al. 2005; Quaghebeur et al. 2022; Sparks et al. 2022; Torfs et al. 2022). The main advantage of an HM is a higher benefit/cost ratio for solving complex problems, which is a key factor for process systems engineering (Schneider et al. 2022).
Serial and parallel structures are hybrid model architectures in which the mechanistic and DDM components work together. In a parallel structure, the outputs of both models are combined (usually by addition) to improve output. In a cooperative parallel structure, the DDM is trained to learn the mechanistic model residual error with respect to observational data (Lee et al. 2005) or to learn the residuals in the dynamics (Quaghebeur et al. 2022). With a competitive parallel hybrid model, the mechanistic and data-driven models independently make the predictions for the same variable, which are then weighted and combined into a final output (Dors et al. 1995; Peres et al. 2001; Galvanauskas et al. 2004; Ghosh et al. 2019).
The serial structure allows the output of a first model (commonly a DDM) to be input to the second (mechanistic) model with the objective of improving the second model's performance. This is useful when a specific subprocess is not sufficiently well described mechanistically. The data-driven component then represents these missed dynamics (Psichogios & Ungar 1992). With the inverse approach, the output of a mechanistic model is given to a DDM with the objective of supporting its features with domain knowledge (Tsen et al. 1996; Li et al. 2019; Hannaford et al. 2023).
The hybrid concept has been the subject of research for a decade in other industrial sectors; for example, in the biopharmaceutical industry (e.g., Von Stosch et al. 2014b) and the chemical industry (e.g., Schuppert & Mrziglod 2018). The wastewater sector is experiencing a lag in the development of HMs probably due to the complexity of the processes to be modelled, the high level of investments (that disagree with the low budgets available for wastewater infrastructures) and the long-life expectancy of the infrastructures which makes changes slow to be adopted. Additionally, wastewater treatment systems have peculiarities such as their nonlinear and nonstationary nature, making it a challenge to develop accurate mechanistic models (Corominas et al. 2018). On the other hand, great potential is expected in applying AI and its ML methodologies thanks to the increasing availability of ‘big data’ (Garrido-Baserba et al. 2020).
In this study, a hybrid model that is built to form an integral part of a ‘digital twin’ (e.g., for use in the HM predictive control application in Serrao et al. 2023) is developed by combining a mechanistic dynamic biofilm reactor model (MM) with a DDM capable of running in near real-time in a python environment controlling both the MM and the DDM. A key novel aspect of this study is that high-frequent operational data from a very large-size operational municipal water resource recovery facility is used for the training and testing of the HM instead of data from a pilot or otherwise controlled environment. The mechanistic biofilm reactor model and ML models are trained for longer periods (3.5 months) than in any other hybrid biofilm modelling study known to the authors. It is also the first hybrid model of a biofilter system.
A clear outcome of this study is that the HM improves the accuracy of the wastewater treatment model by reducing the size of prediction errors by a factor of 3 under challenging validation conditions. This increases the confidence in the model's performance and thus allows the application of hybrid biofiltration models as decision-making tools for managers, operators and process engineers in the wastewater industry and as an online ingredient for model-based predictive control (Serrao et al. 2023).
This modelling study culminated in the demonstration study of Serrao et al. (2023) in which the conventional feed-forward controller of methanol dosing in use today at the plant under study is compared to a model predictive controller of the methanol dosing to a biofilm reactor that takes advantage of the developed hybrid model in the digital twin central to this new controller.
MATERIALS AND METHODS
Case study site
The research project was carried out with observational data collected in the Seine-aval WRRF managed by SIAAP1 located in the Greater Paris Region, France. This very large-size plant receives wastewater generated by nearly 6,000,000 population equivalents every day, i.e., a volumetric flow of approximately 1,250,000 m3/day in dry weather (14.5 m3/s) (SIAAP 2018).
The average upflow velocity through the Biostyr filters is 10 m/h for an average NO3-N removal of 70% (preDN) and 7.2 m/h to remove 90% of the NH4-N (NIT). The average upflow velocity of the Biofor filters (postDN) is 19 m/h with a NO3-N reduction of 85% over the period studied for dry and rainy weather periods combined. More details can be found in Serrao et al. (2023).
The biological filters are backwashed daily on a fixed schedule by reversing the flow using treated water stored on site and injection of large air bubbles. During washing, which lasts 30 min per day per filter, the trapped particles and part of the biomass are removed in order to restore the filtration capacity. Additional short backwashing sessions of 10–15 min can be applied to a filter when the pressure loss reaches a determined value. More details can be found in Serrao (2023).
Hybrid model approach
In this configuration presented in Figure 2, the MM first simulates the effluent water quality variables, sludge concentrations, greenhouse gas emission and flow rates, as well as the intermediate state variables of the model. The selection of features for the DDM is based on data availability followed by an analysis of importance using an F-test method that allows reducing the dimensionality of the data. The selected simulated variables are then passed to the DDM (blue coloured arrow in Figure 2). Next to these MM simulation data, the data-driven component also receives measurement data of the influent quality and flow rates, as well as information on the principal operational settings, such as the effluent set points for nitrates, as specified in the MM (black arrow). After training, the DDM uses this information to calculate the residual errors of the mechanistic model simulations Δ (green arrow). Finally, the two results are added together to obtain corrected values of the effluent variable (light blue highlighted arrow). In this approach, a ML model is trained for each of the output variables, where for each variable, a corresponding subset of process input variables or features is used.
During the training phase of the DDM, observations of the water quality variables to be modelled are used to calculate the difference between observed and simulated variables. This residual error of the variables as simulated by the mechanistic model is then used by a Bayesian optimization algorithm to train the ML model so that it learns to calculate the proper errors. It serves as labelled responses in a supervised ML training environment.
Mechanistic model structure
The mechanistic model applied here is designed as a one-dimensional dynamic biofilm reactor model describing a ‘fixed-culture’ submerged upward flow biological activated filter. The model was previously developed (Zhu et al. 2018; Zhu 2020) and combines sub-models for the hydraulic representation of the filter media, the transfer and transport of soluble and particulate substrates to/from and within the biofilm, kinetics of bacterial growth and decay, and backwashing to avoid clogging of the filter. The model simulates the biological and physico-chemical conversion of carbon, nitrogen and phosphorus. In total, 22 biological conversions are accounted for by describing the evolution of substrates by mass balances.
The physical processes considered in the model (the horizontal arrows in Figure 3) in each reactor are the transfer to and transport within the biofilm of soluble components by diffusion, the transfer and transport of particulate components by filtration, attachment and detachment, as well as a flux of four gaseous components (O2, N2, NO, N2O), based on Pocquet et al. (2016). These gazes are considered as soluble components in the biofilm layers and are transported by diffusion (Zhu 2020). In addition, the model redistributes the particulate components between the successive mixed reactors during treatment and especially during backwashing by representing continuous media mixing at different intensities for normal operation and backwashing. Mixing occurs between the corresponding biofilm layers of adjacent bioreactors. The model allows for estimating energy consumption related to aeration and pumping, as well as greenhouse gas emissions (Zhu et al. 2018).
DDM structure
In the proposed hybrid model structure, different ML methods were tested: artificial neural networks (ANN), support vector machines (SVM), nonlinear regression and decision tree-based methods (Lee et al. 2002; Li & Vanrolleghem 2022; Shirkoohi et al. 2022). In this study, supervised learning was applied for training the ML models where input data are provided to the model along with the output. The selection of the optimal ML model in this study includes the exploration of data features, validation schemes and model performance.
The k-fold cross-validation method was selected for training so that the models could be compared with the same validation scheme. Cross-validation is a resampling procedure used to evaluate ML models that use smaller subsets of the original dataset to avoid an overfit or underfit of the model during training. This method makes efficient use of all the data and still gives a good estimate of the predictive accuracy of the trained model, although it can take a longer time to execute because the model is trained repeatedly on the subsets (Nti et al. 2021).
Observational data
The datasets used in this study cover December 2019 to October 2020 and consist of 15-min interval sensor data on flow rates, organic matter, total suspended solids (TSS) and nitrogen (NH4-N and NOx-N), supported by daily flow-proportional composite samples for laboratory analysis collected at key points in the system (see Figure 1). Details can be found in Serrao et al. (2023).
The calibration period covers December 2019 until 15 March 2020. The validation period runs from June to September 2020, a summer period characterized by low flows during the holiday month of August, conditions very different from the calibration period, making for an ambitious validation challenge.
Operational data collected in raw format were first cleaned to remove outliers and erroneous data using the method described by Alferes & Vanrolleghem (2016). Datasets for model input require gap filling of missing data, for which the moving median method was applied. Furthermore, data input for the ML models were normalized using the z-score, with a mean ŷ and a standard deviation σ, and scaled in the range of 0–1 to provide uniform data across all features that still have the original shape properties. The normalized variables (features) included the type of day (weekday or weekend), type of weather (dry or rainy), temperature, as well as operational parameters such as the effluent NO3-N set point and the methanol dosing rate.
Model performance criteria
The evaluation criteria used to evaluate the model performance are the mean error (ME) between observed and simulated data, the mean absolute error (MAE) and the root mean square error (RMSE), as well as their relative values. The ME indicates an uncertainty in a measurement or the difference between the measured value and modelled values. The MAE gives the average size of the error in a collection of predictions, without taking their direction into account. In an ideal case, the errors should be close to 0, but values up to 15% are considered acceptable. Model calibration for the mechanistic and HMs was based on the RMSE, which provides a quantitative measure of how well the model fits the data average, although it is more sensitive to outliers or large errors. The ME and MAE were supplementary indicators used for indication of a model's bias and of the error's size on average. The Janus coefficient, that evaluates the ratio of RMSE's of validation and calibration has an optimal value of 1 and indicates successful validation for values up to 2 (Hauduc et al. 2015).
Mechanistic model calibration
The mechanistic biofilter model developed by Zhu (2020) was first calibrated for the large-size municipal wastewater treatment plant (Seine-aval, SIAAP) using operational measurement data at the entry and exit of the predenitrification (preDN) stage. High-frequency data (interval of 15 min) were available for nitrate, nitrite, ammonia, total COD, soluble COD (CODs) and TSS concentrations, supported by daily composite laboratory results. The calibration/validation of this model is described in detail in Serrao et al. (2023).
The mechanistic modelling of the biofiltration system was performed in the WEST (DHI, Hørsholm, Denmark) environment.
DDM training
The selection of the optimal DDM type included training with linear regression models, regression decision tree models with a fine, medium and coarse structure; SVM and neural networks (NN) with one, two and three hidden layers. Since NN with one hidden layer performed equally good or better than models with multiple hidden layers that can be considered to be more ‘complex’; those other models weren't selected for further training. To select each model's hyperparameters (number of layers, type of activation function, etc.), a five-fold cross-validation was applied that splits the data randomly into five subsets (folds) of equal size. One randomly chosen subset was first used for model validation, while the other four were used for training. This process was repeated five times so that each subset was used once for validation. The average error across all five folds is then reported as the average validation error, allowing for the selection of the optimal DDM structure. During the training phase of the ML models, the residual error of the mechanistic model simulations with respect to the measured data was transmitted to allow supervised training. This was done for each simulated univariate output.
Given the time-dependent behaviour of the system, the data cannot be divided randomly into a training and testing set but split chronologically where roughly the first 80% of the records in the time series are marked for training and the remainder 20% are set aside for testing.
The performance of each model was evaluated on the training set, in particular in terms of the RMSE criterion using a five-fold cross-validation scheme. The selected model was further tested with the testing dataset.
The raw data processing, the development and the training of the DDM were all performed in the MATLAB R2022a (MathWorks, Natick, MA, USA) environment using the Machine Learning and Deep Network Designer toolboxes.
Optimization of the selected DDM structure
Selection of the number of features and the model's hyperparameters was done to avoid under-fitting or over-fitting conditions over both training and testing periods. This was achieved by selecting the simplest model that still has a low RMSE score after removing features that are not contributing to the performance of the DDM (see Section 2.2.7.1) and by changing the values of the hyperparameters of the model, such as the number of neurons in the hidden layer (see Section 2.2.7.2).
NO3-N DDM feature selection
Not all features contribute equally to the response variable; some can actually disturb the model output and decrease performance. Model improvement is achieved by removing features that have a low predictive power. The selection of which features to retain is based on a feature-ranking algorithm.
Here, the F-test was applied to identify the variables that allowed to best fit the data (Helsel & Hirsch 1992). The F-test uses a filter-type feature selection algorithm that measures feature importance based on characteristics, such as feature variance and feature relevance to the response. It examines the importance of each feature individually and allows defining the likelihood that an observed improvement of a fit to the data is worth the use of the feature. For each validation fold, the features that had a zero or near zero F-test score were removed, such that only the highest ranked features with a significant contribution (F-test above a value of 175, which corresponds to the half-value of the feature with the highest finite F-test score), were retained.
The variables NOx_IN, which is the measured NOx-N concentration [g/m3] in the preDN influent, and Q_REC (the measured flow rate of the recirculation of nitrified water [m3/day]), are both identified as less influential. Their intrinsic information content is actually included in variables that have been classified as influential, namely the NOX_REC [g/m3], which accounts for the significant source of nitrates and nitrites in the recycled water from the nitrification tank, after combination with the influent; and the Q_SW [m3/day] (the settled water flow rate). Note that the flow rate of recirculated nitrified water Q_REC [m3/day] is adjusted according to Q_SW to maintain a constant hydraulic loading of the biofilter; if Q_SW [m3/day] increases, then the Q_REC [m3/day] decreases.
NO3-N DDM hyperparameters & model selection
Another important step in DDM model development is the identification of the DDM hyperparameters. A hyperparameter is a parameter that describes a configuration that is external to the model itself, like the learning rate, the number of estimators and the type of regularization. Hyperparameter tuning was performed using Bayesian optimization with MSE as a loss function. An acquisition function determines the next set of hyperparameter values to try.
The hyperparameters evaluated, are (1) the number of hidden layers with a range [1:3]; (2) the first layer size with a range [1:300]; (3) the second layer size with a range [1:300]; (4) the third layer size with range [1:300]; (5) the activation function, a search among the rectified linear unit, hyperbolic tangent, none, and sigmoid; (6) the regularization strength (Lambda). The optimization algorithm searches among log-scaled real values in the range [1e-5/n, 1e5/n], where n is the number of observations. For more information, refer to Serrao (2023).
Table 1 provides an overview of the 10 best scoring DDMs for the estimation of the residual errors between the mechanistic model simulation results and the observations of nitrate concentrations in the effluent of the preDN system. As can be seen, NNs provide good performance with the lowest RMSE score for both testing and training data of NO3-N. The optimal network configuration was determined by simulations with the smallest configuration of an input, hidden and output layer and by varying the number of nodes in the hidden layer. The model with the lowest RMSE value during testing is a three-layer neural network with 12, 25 and 1 neurons, respectively, for the input, middle and output layers. The second to fifth positions are taken by decision tree models, indicating that models with a simpler structure could have a similar performance. However, considering the training RMSE values, where ANN models are listed at first, second and third places, it becomes clear that ANN models have a stronger performance. This is confirmed by the testing performance, where the lowest RMSE is found for a NN model. Overall, the neural network models consistently outperform other model types, such as Decision Trees and Supported Vector Machines.
Model type . | Hyperparameters . | Selected features . | RMSE training . | RMSE testing . |
---|---|---|---|---|
Neural network | Neurons: 12:25:1 | 12 | 0.058 | 0.170 |
Decision tree | Minimum leaf size: 12 Surrogate splits: off | 9 | 0.063 | 0.191 |
Decision tree | Minimum leaf size: 4 Surrogate splits: off | 9 | 0.059 | 0.193 |
Decision tree | Minimum leaf size: 4 Surrogate splits: off | 12 | 0.059 | 0.194 |
Decision tree | Minimum leaf size: 4 Surrogate splits: off | 10 | 0.059 | 0.194 |
SVM kernel | Regularization strength: auto | 12 | 0.062 | 0.246 |
SVM kernel | Regularization strength: auto | 10 | 0.062 | 0.274 |
SVM kernel | Regularization strength: auto | 9 | 0.059 | 0.295 |
Neural network | Neurons: 9:25:1 | 9 | 0.059 | 0.297 |
Neural network | Neurons: 10:25:1 | 10 | 0.058 | 0.365 |
Model type . | Hyperparameters . | Selected features . | RMSE training . | RMSE testing . |
---|---|---|---|---|
Neural network | Neurons: 12:25:1 | 12 | 0.058 | 0.170 |
Decision tree | Minimum leaf size: 12 Surrogate splits: off | 9 | 0.063 | 0.191 |
Decision tree | Minimum leaf size: 4 Surrogate splits: off | 9 | 0.059 | 0.193 |
Decision tree | Minimum leaf size: 4 Surrogate splits: off | 12 | 0.059 | 0.194 |
Decision tree | Minimum leaf size: 4 Surrogate splits: off | 10 | 0.059 | 0.194 |
SVM kernel | Regularization strength: auto | 12 | 0.062 | 0.246 |
SVM kernel | Regularization strength: auto | 10 | 0.062 | 0.274 |
SVM kernel | Regularization strength: auto | 9 | 0.059 | 0.295 |
Neural network | Neurons: 9:25:1 | 9 | 0.059 | 0.297 |
Neural network | Neurons: 10:25:1 | 10 | 0.058 | 0.365 |
As indicated in Table 1, during training the ANN model that applies 12 features has a score similar to the ANN model using 10 features; however, during testing the 12 features model has a lower RMSE of 0.17 compared to the 10 features model (RMSE: 0.37). It was therefore decided to select the three-layer feed-forward neural network with 12 neurons in the input layer, 25 neurons in the middle layer and one neuron in the output layer with a rectified linear unit activation function.
The selected ANN for NO3-N removal was specifically chosen for use in the HM predictive control application presented in Serrao et al. (2023) as an integral part of a digital twin, where it is combined with the mechanistic dynamic biofilm reactor model described in paragraph 2.2.1. This digital twin runs in a Python environment using the WEST application programming interface, with timely provision of input data collected from the supervisory control and data acquisition (SCADA) system and allows controlling in near real-time the methanol dosing in a preDN biofilter.
An overview of the 10 best scoring data-driven models for the estimation of the total COD and TSS residual error is presented in Supplementary Information. For these two output variables too, the selected model is a three-layer feed-forward artificial neural network. For the total COD, the same 12:25:1 configuration was found. For TSS, the ANN model contains 10 neurons in the hidden middle layer, which makes it different from the NNs selected for the nitrate and total COD model. This relates to differences in the removal of carbon as compared to nitrogen. Both for the training and testing datasets, decision tree models take up prominent places in the RMSE ranking. These model types are easier to interpret and more transparent to explain output, but also prone to overfitting as indicated by the RMSE values for the test data that are higher than those of the selected ANNs (Bramer 2007).
RESULTS AND DISCUSSION
The configuration of the hybrid model implies an interaction between a mechanistic model and a DDM, with the mechanistic model output being used by the DDM to be trained using the calculated residual error. For that reason, this paragraph commences with a quick overview of the mechanistic model's performance.
Mechanistic model performance
Training and testing of hybrid model
Detailed results with the training and testing data for effluent nitrate are now presented. Details on the TSS and total COD simulations with the HM are presented in the Supplementary Information.
Effluent NO3-N with high-resolution data
Interesting to notice are the model simulated values lower than 1.0 mgNO3-N/L (e.g., between January 20th and 30th). In fact, here the cleaned sensor data never dropped below 1 mg NO3-N/L, a result of the lower precision for inline sensors at low concentrations. Conversely, the simulations seem to correspond better with the laboratory analysis results for composite samples that occasionally indicate daily averages below 1 mg NO3-N/L.
Figure 6(b) shows that the hybrid model simulates well the trends and variability of the nitrate concentration over the testing period and performs much better than the MM. Towards the end of the testing period, however, the relative improvement of the HM seems to be lower. Starting from 8 March, a period occurs in which the HM has difficulties to compensate for the residual error of the MM simulations, this is especially in those cases when the MM predicts low NO3-N concentrations of 1 mgN/L or less. The daily composite laboratory samples however show average values between 1.0 and 2.0 mgNO3-N/L, which conceal the high variability in the 15-min values that can temporarily reach extremely high and low values.
This reduced performance could be due to different reasons, such as a lack of good quality measurements (due to the lower precision of the nitrate sensor below 1 mg NO3-N/L) or less reliable simulations by the MM model in the lower range of nitrate and nitrite concentrations. A lower performance with the test data could indicate that the data from the training period does not contain a sufficiently wide range of information to train the DDM. Still, the Janus coefficient of 1.0 has the optimal value and does not indicate an error of over- or under-fitting.
The statistical performance indicators are listed in Table 2 for the simulations of the effluent variables at the preDN outlet with the hybrid model and are compared to the MM. In this table, the performance of the models is relative to the observed data from the calibration/training and validation/testing periods; MM-calibration (1 December 2019 to 15 March 2020); MM-validation (01 July 2020 30 September 2020); HM-training (1 December 2019 to 29 February 2020) and HM-testing (01 March 2020 to 15 March 2020). The MM model validation period runs from July to September 2020, a summer period characterized by low flows during the holiday month of August, conditions very different from the calibration period, making for an ambitious validation challenge. For this reason, the DDM was not trained or tested with the summer dataset.
. | . | . | Calibration (MM) or training (HM) . | Validation (MM) or testing (HM) . | . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Variable . | Model . | Unit . | N . | Mean obs (ӯ) . | ME/ӯ . | MAE/ӯ . | RMSE/ӯ . | N . | Mean obs (ӯ) . | ME/ӯ . | MAE/ӯ . | RMSE/ӯ . | Janus coef. . |
NO3-N | MM | gN/m3 | 10,656 | 2.89 | −47% | 67% | 86% | 8,015 | 2.62 | −123% | 129% | 146% | 1.5 |
NO3-N | HM | gN/m3 | 8,736 | 2.94 | −11% | 25% | 34% | 1,440 | 3.23 | 8% | 24% | 32% | 1.0 |
. | . | . | Calibration (MM) or training (HM) . | Validation (MM) or testing (HM) . | . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Variable . | Model . | Unit . | N . | Mean obs (ӯ) . | ME/ӯ . | MAE/ӯ . | RMSE/ӯ . | N . | Mean obs (ӯ) . | ME/ӯ . | MAE/ӯ . | RMSE/ӯ . | Janus coef. . |
NO3-N | MM | gN/m3 | 10,656 | 2.89 | −47% | 67% | 86% | 8,015 | 2.62 | −123% | 129% | 146% | 1.5 |
NO3-N | HM | gN/m3 | 8,736 | 2.94 | −11% | 25% | 34% | 1,440 | 3.23 | 8% | 24% | 32% | 1.0 |
Shown are the relative errors compared to the observed mean. N, number of observations; ӯ is the observed mean.
For the effluent nitrate, the HM outperforms the MM with relative mean errors that are much smaller during training and testing for the HM compared to the calibration and validation of the MM, indicating a much smaller bias for the HM. The other performance criteria (MAE and RMSE) demonstrate a better approximation of the dynamics by the HM. The Janus coefficient, with an optimal value of 1.0, indicates that the RMSE values are nearly identical for training and testing, such that the model can be considered valid for nitrate simulations. Overall, even though the relative MAE and relative RMSE range from 24 to 32%, the scores remain globally acceptable given the highly dynamic conditions of the process and considering the sensor measurement limitations.
Effluent NO3-N with daily values
In contrast to high-resolution data with a time interval of 15 min, which can capture short-term fluctuations in wastewater conditions and are therefore more appropriate for real-time control operations, daily average values may be more appropriate for process analysis purposes because they provide a noise-reduction and provide a better indication of overall performance instead of being determined by short-term extreme values.
The daily laboratory analysis results follow in general the daily average values of the sensor data, with a few notable exceptions (e.g., a short period from 19 to 22 of January, 1, 12 and 28 February). The HM is here not able to compensate since the DDM was trained on high-resolution measurements and is thus unaware of the discrepancy between laboratory and sensor observations.
The daily averages of the simulations with the mechanistic model follow the trend of the operational data, but beginning on 8 March they start to show an overestimation. This is confirmed in the parity plot in Figure 8(b), where the mechanistic model only has an R2 value of 0.21 for the same data period.
CONCLUSIONS
A parallel hybrid model was developed in which a calibrated/validated mechanistic biofilm model of a submerged biofilter for nitrogen removal is corrected by a data-driven neural network model to improve the accuracy of water quality simulations. For the first time, a mechanistic model and a DDM were calibrated and trained on high-frequent operational data collected at a very large municipal WRRF by inline sensors and supported by daily laboratory sampling analyses over a long period of 3.5 months.
The simulations obtained over the training and testing periods well establish the validity of the hybrid model built by the addition of a feed-forward neural network: the trained DDM can pick-up ‘hidden’ dynamic information that corrects the mechanistic model's residual error. The validation error (15 min interval) of the hybrid model is three times smaller than the error of the mechanistic-only model. This excellent performance is confirmed by the strong Janus coefficients, which evaluate the capability of the model to perform under unseen conditions.
Considering the highly dynamic conditions observed in the treatment plant being studied, the considerable measurement errors observed in the monitoring data, and the sensitivity of the model calculations to outliers and faults in the input data, it is fair to conclude that the modelling results could further benefit from more intense (and automated) data preparation and reconciliation processes, as well as more data from longer training and testing periods.
Altogether, the results of this study with the developed hybrid model indicate that the data-driven component captures sufficient residual information to compensate for the inaccuracy of the mechanistic model. This shows that mechanistic models, in particular those developed for biofiltration, require assumptions to be made that fail to include all dynamics, some of which can be ‘recovered’ by the hybrid model. The hybrid model reduces the size of residual errors and makes predictions of future states more reliable, thus improving the confidence in the model's output.
ACKNOWLEDGEMENTS
The authors would like to thank the operators of the Seine-aval WRRF (SIAAP) for access to the facility and collection of the data.
Syndicat interdépartemental pour l'assainissement de l'agglomération parisienne
FUNDING
This work was supported by SIAAP (Paris, France) as part of the Mocopée Research Program (SIAAP 2018) that includes partnerships with model EAU – Université Laval, LEESU and W-SMART (innEAUvation.fr). The Natural Sciences and Engineering Research Council of Canada awarded a Discovery grant to model EAU entitled: ‘Towards digital twin based control of water resource recovery facilities – Methods supporting the use of adaptive hybrid digital twins’.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.