This study explores various approaches to formulating a parallel hybrid model (HM) for Water and Resource Recovery Facilities (WRRFs) merging a mechanistic and a data-driven model. In the study, the HM is constructed by training a neural network (NN) on the residual of the mechanistic model for effluent nitrate. In an initial experiment using the Benchmark Simulation Model no. 1, a parallel HM effectively addressed limitations in the mechanistic model's representation of autotrophic bacteria growth and the data-driven model's incapability to extrapolate. Next, different versions of a parallel HM of a large pilot-scale WRRF are constructed, using different calibration/training datasets and different versions of the mechanistic model to investigate the balance between the calibration effort for the mechanistic model and the compensation by the NN component. The HM can improve predictions compared to the mechanistic model. Training the NN on an independent validation dataset produced better results than on the calibration dataset. Interestingly, the best performance is achieved for the HM based on a mechanistic model using default (uncalibrated) parameters. Both long short-term memory (LSTM) and convolutional neural network (CNN) are tested as data-driven components, with a CNN HM (root-mean-squared error (RMSE) = 1.58 mg NO3-N/L) outperforming an LSTM HM (RMSE = 4.17 mg NO3-N/L).

  • In a parallel hybrid model (HM), a data-driven component compensates for structural gaps in a mechanistic model.

  • A data-driven component of an HM should be trained on an independent dataset.

  • Integrating an uncalibrated mechanistic model in a parallel HM may lead to better results than applying an overly calibrated mechanistic model.

  • A convolutional neural network outperforms a long short-term neural network as data-driven component of an HM.

The wastewater industry encounters increased challenges as a result of stringent regulations, exemplified by the European Union's Water Framework Directive (European Commission 2020). These regulations necessitate a focused effort to meet stringent water quality standards and transition towards sustainable, energy-efficient, and circular technologies. In this context, modelling is a powerful tool to support Water Resource Recovery Facility (WRRF) operators and engineers.

In recent decades, engineers have favoured mechanistic approaches for modelling wastewater treatment processes. In WRRF operations, a transparent methodology is one of the keys to adoption, leading researchers to favour simpler but interpretable methodologies (Newhart et al. 2019). Mechanistic models are based on fundamental engineering and scientific knowledge about the physical, chemical, and biological mechanisms that affect a system, with the relationships themselves defined by modellers and assumed to be known. A generally accepted mechanistic model for the biokinetic processes in the biological reactor of wastewater treatment plants is the family of activated sludge models (ASMs), of which Activated Sludge Model No. 1 (ASM1) is the most commonly used (Henze et al. 2000; Sin & Al 2021). The ASM1 was primarily developed to model the removal of organic carbon and nitrogen, but it also aims to accurately describe sludge production and oxygen consumption (Gernaey et al. 2004). The model uses a chemical oxygen demand (COD)-based modelling technique where all organic matter is expressed as equivalent amounts of COD, as it provides a link between electron equivalents in the organic substrate, the biomass, and the oxygen used.

Mechanistic models require relatively limited data input, have high interpretability, and can extrapolate the process performance to a wide variety of process operating conditions. These models often simplify the complex processes in the WRRF, such as aeration, mixing, or aggregation of particulates (Schneider et al. 2022). The application of mechanistic models can be limited due to knowledge gaps or the (over)simplification of complex processes. There are still some relevant processes that are not understood clearly enough to be put in a model for use in simulations, such as the formation of nitrous oxide and the conversions within the biological phosphorus removal process (due to the variability of phosphate-accumulating organism metabolics) (Regmi et al. 2019). This results in only a partial description of the process or system, which can be valuable, but may not be robust or resistant to significant changes (Schneider et al. 2022). Mechanistic models require extensive and time-consuming parameterization, calibration, and validation. Parameter estimation based on numerical optimization algorithms may also lead to non-unique parameter estimates and many local optima (Dochain & Vanrolleghem 2001; Hvala & Kocijan 2020), which in turn limits the predictability of the model.

Data-driven approaches are gaining attention due to improved sensor technologies since these types of models are particularly successful when dealing with problems involving large and high-quality datasets (Cheng et al. 2020; Therrien et al. 2022; Khalil et al. 2023). Modern data-driven models are typically developed using (machine learning) algorithms that do not consider fundamental mass, charge, and energy balances. In contrast to mechanistic models, the structure of data-driven models is directly derived from data and does not require extensive knowledge about the system. Depending on the chosen complexity, data-driven models can theoretically approximate any underlying dynamics from the data. However, data-driven models have low knowledge-based interpretability and need high-quality datasets with sufficient representative dynamics, which can be a challenge due to harsh sensor conditions in WRRFs. They are generally also limited in extrapolation power (Schneider et al. 2022).

Hybrid modelling is a solution to bring forth the advantages of both mechanistic and data-driven models. Hybrid models (HMs) combine mechanistic and data-driven modelling techniques and benefit in this way from the advantages of the two approaches (Von Stosch et al. 2014). Mechanistic models have the capability to conserve critical process knowledge and maintain extrapolation capabilities, while data-driven approaches are capable of discovering hidden relationships and patterns that may not be sufficiently captured by mechanistic models. The application of hybrid modelling in the domain of WRRFs has the potential to foster automation (Rodriguez-Roda et al. 2002), increase efficiency, and increase the predictive power of models (Quaghebeur et al. 2022; Serrao et al. 2023).

Three different architectures of HMs can be distinguished: serial, parallel, and surrogate models. In a serial HM, the output of the one model is used as input to the other model (Psichogios & Ungar 1992). Generally, the output of the data-driven model is used as the input to the mechanistic model since the data-driven component is capable of predicting the dynamics of certain subprocesses that are not accurately modelled by the mechanistic component, such as poorly defined reaction kinetics (Lee et al. 2002; Lee et al. 2005). In cases where the overall structure of the mechanistic model is inaccurate or the origin of the mechanistic model gap is unknown, the parallel arrangement can partially compensate for any structural mismatch in the mechanistic model (Oliveira 2004). In a parallel architecture, the HM aims to optimize predictive accuracy without necessarily deepening the understanding of underlying processes. This shift towards prioritizing predictive accuracy over in-depth process understanding is particularly beneficial in applications where immediate decision-making or control actions are critical, for example, in model predictive control. In a parallel HM, a data-driven component is trained to predict the residuals between the mechanistic model outputs and the measurement data. The predicted residuals are then fused with the mechanistic model outputs, usually by addition. Finally, surrogate models are data-driven models that are trained on the output of a mechanistic model to create a computationally more efficient model, allowing rapid and/or large-scale simulations (Forrester et al. 2006). The importance of surrogate models is expected to rise with the growing use of digital twins for real-time simulations, as they can be a valuable tool in this context. Neural differential equations have recently emerged as a novel approach in hybrid modelling for wastewater treatment, learning the residuals of dynamics rather than the residuals of the state (Quaghebeur et al. 2022).

HMs have proven useful in a number of applications within the field of wastewater treatment. HMs have proven their usefulness in the creation of influent generator models, which have been developed using different data-driven methods to predict the incoming flow rate and pollutant loads to WRRFs (Flores-Alsina et al. 2014; Zhu et al. 2015; Li & Vanrolleghem 2022). HMs also demonstrate their usefulness in the development of soft-sensing models. Soft-sensing models, often data-driven, can predict variables that serve as input information to controllers, which are used in conjunction with ASM models (Wang et al. 2022). The implementation and advancement of HMs also have the capacity to encourage and enhance the effective deployment of digital twin technology in WRRFs (Torfs et al. 2022) and real-time model predictive control strategies (Serrao 2023).

Currently, the predominant architecture of HMs for wastewater treatment processes is the parallel HM (Piuleac et al. 2012; Cheng et al. 2021; Serrao et al. 2023), although instances of the serial architecture and the combination of parallel and serial models can also be found (Karama et al. 2001; Hvala & Kocijan 2020). The principal focus of the current research is predicting effluent quality (Piuleac et al. 2012; Serrao et al. 2023). Studies on HMs for WRRFs often provide little detailed documentation regarding the specific training and calibration efforts for both model components. Despite the application of different types of models as data-driven components of an HM in various studies, a consensus on this matter remains elusive. Although applications of HMs in WRRF modelling are increasing, research specifically addressing the most effective approach to developing such models is underdeveloped.

Schneider et al. (2022) listed various critical challenges in the field of hybrid modelling of WRRFs that need corresponding development efforts. The availability of Good Modelling Practice guidelines for the development of mechanistic models for ASMs is well-established (Rieger et al. 2012). While there are as of yet no official guidelines specifically tailored for data-driven models for WRRFs, several well-established documents provide best practices for setting up data-driven models in a general context (Hastie et al. 2009; Chollet 2021). The integration of hybrid modelling paradigms into existing frameworks for WRRF modelling raises important considerations regarding the extension of existing protocols and their transferability to hybrid modelling studies. Another challenge for HMs is the fact that uncertainty arises from both the mechanistic and data-driven components, and studies often lack the quantification of uncertainty, limiting the assessment of trust in HMs. Also, the task of balancing complexity between the mechanistic and data-driven components requires more research to determine the acceptable level of error that can be compensated by the data-driven model. One aspect to address regarding this challenge is the establishment of a calibration protocol for HMs with guidelines on the effort required for and the order of calibration of the mechanistic model compared to the data-driven model as well as the data requirements for these calibration efforts. Selecting a suitable architecture for constructing HMs presents another challenge in the development of HMs. It is important to ascertain the specific purposes for employing parallel models and sequential models. Identifying the most effective data-driven models for distinct purposes is also crucial in achieving desired outcomes.

This research aims to investigate how various factors impact the development of parallel hybrid modelling architectures to improve the predictive power of wastewater treatment plant models. The study is performed on a unique, continuously monitored pilot-scale wastewater treatment plant, called pilEAUte (Québec, QC, Canada). For this pilot-scale plant, a mechanistic model was developed in the previous research (Kirim 2022) and is used in this work. However, accurate effluent quality prediction remains a challenge in this model. The first objective of this study is to explore the feasibility of developing a parallel HM for enhancing effluent quality prediction in the wastewater treatment plant. In addition, the optimal model structure of the data-driven component is explored by comparing the performance of a parallel HM with a long short-term memory recurrent neural network (LSTM-RNN) as data-driven component and of a parallel HM with a convolutional neural network (CNN) as a data-driven component. Also, best practice in setting up parallel HMs with respect to calibration and training efforts between the mechanistic and the data-driven components is explored.

Case study and mechanistic model

An HM is developed for a pilot-scale WRRF installed at Université Laval (Québec, Canada), called pilEAUte. The plant continuously receives domestic wastewater from a student residence and a kindergarten and rain runoff from the parking lot in the university campus. The WRRF treats 12 m3 of wastewater per day, and the configuration consists of a pumping station, a storage tank, a primary settler, and a biological reactor with five basins upstream of the secondary settling tank. For a more detailed description of the pilEAUte setup, refer to Kirim (2022). The plant is capable of carbon and nitrogen removal using a predenitrification configuration. The aeration flow rate in each basin is controlled using mass flow controllers based on the oxygen concentration in basin 4, with a setpoint set at 3 mg/L during the study. Each basin is equipped with individual airflow lines, which are adjusted using ratio controllers. The biological reactors have an internal recycle system that circulates water from the last basin to the first basin and settled sludge is recycled from the secondary clarifier to the first basin. The pilEAUte is monitored with sensors at the outlet of the primary clarifier, the biological reactors, the recycle flows, and the outlet of the secondary clarifiers (Appendix A).

The study uses a pre-existing mechanistic model by Kirim (2022), implemented in the WEST 2017 software platform (DHI 2017, Hørsholm, Denmark). The mechanistic model was initialized, calibrated, and validated based on 2 months of pilEAUte operational data with 1 min frequency that reflect normal operational conditions in terms of flow patterns (Figure 1). The raw monitoring data were subjected to univariate data validation developed by Alferes & Vanrolleghem (2016) and further processed using outlier detection and data smoothing filters before modelling (Alferes & Vanrolleghem 2016; Philippe 2018). The mechanistic model was calibrated following the Good Modelling Practice guidelines for activated sludge models (Rieger et al. 2012). The final calibrated parameter values are presented in Table 1.
Table 1

Calibrated biokinetic model parameter values (Kirim 2022)

ParameterDefault valueCalibrated valueUnitDeviation from default (%)a
b_H Decay coefficient for heterotrophic biomass 0.62 0.70 1/d 12 
f_XI Fraction of biomass converted to particulate inert matter 0.1 0.13 – 26 
k_h Maximum specific hydrolysis rate 3.20 gCOD/(gCOD*d) 
b_NH Decay coefficient for NH4 oxidizing autotrophic biomass 0.05 0.06 1/d 18 
b_NO Decay coefficient for NO oxidizing autotrophic biomass 0.033 0.04 1/d 19 
K_HNO2_NO Nitrous acid half-saturation coefficient for NO oxidizing autotrophic biomass 0.000872 0.000061 gCOD/m3 174 
K_NH3_NH Ammonia half-saturation coefficient for NH4 oxidizing autotrophic biomass 0.75 0.0057 gNH3-N/m3 197 
K_SH Substrate half-saturation coefficient for heterotrophic biomass 20 8.74 gCOD/m3 78 
mu_H Maximum specific growth rate for heterotrophic biomass 4.77 1/d 23 
mu_NH Maximum specific growth rate for NH4 oxidizing autotrophic biomass 0.8 0.71 1/d 12 
mu_NO Maximum specific growth rate for NO oxidizing autotrophic biomass 0.79 0.95 1/d 18 
K_NO2_H Nitrite half-saturation coefficient for denitrifying heterotrophic biomass 3.31 gCOD/m3 107 
K_O_NH Oxygen half-saturation coefficient for NH4 oxidizing autotrophic biomass 0.6 0.25 gO2/m3 82 
K_O_NO Oxygen half-saturation coefficient for NO oxidizing autotrophic biomass 1.5 0.27 gO2/m3 139 
K_OH Oxygen half-saturation coefficient for heterotrophic biomass 0.2 0.10 gO2/m3 67 
n_NO2 Correction factor for anoxic growth of heterotrophs on nitrite 0.6 0.92 – 42 
n_NO3 Correction factor for anoxic growth of heterotrophs on nitrate 0.6 0.49 – 20 
ParameterDefault valueCalibrated valueUnitDeviation from default (%)a
b_H Decay coefficient for heterotrophic biomass 0.62 0.70 1/d 12 
f_XI Fraction of biomass converted to particulate inert matter 0.1 0.13 – 26 
k_h Maximum specific hydrolysis rate 3.20 gCOD/(gCOD*d) 
b_NH Decay coefficient for NH4 oxidizing autotrophic biomass 0.05 0.06 1/d 18 
b_NO Decay coefficient for NO oxidizing autotrophic biomass 0.033 0.04 1/d 19 
K_HNO2_NO Nitrous acid half-saturation coefficient for NO oxidizing autotrophic biomass 0.000872 0.000061 gCOD/m3 174 
K_NH3_NH Ammonia half-saturation coefficient for NH4 oxidizing autotrophic biomass 0.75 0.0057 gNH3-N/m3 197 
K_SH Substrate half-saturation coefficient for heterotrophic biomass 20 8.74 gCOD/m3 78 
mu_H Maximum specific growth rate for heterotrophic biomass 4.77 1/d 23 
mu_NH Maximum specific growth rate for NH4 oxidizing autotrophic biomass 0.8 0.71 1/d 12 
mu_NO Maximum specific growth rate for NO oxidizing autotrophic biomass 0.79 0.95 1/d 18 
K_NO2_H Nitrite half-saturation coefficient for denitrifying heterotrophic biomass 3.31 gCOD/m3 107 
K_O_NH Oxygen half-saturation coefficient for NH4 oxidizing autotrophic biomass 0.6 0.25 gO2/m3 82 
K_O_NO Oxygen half-saturation coefficient for NO oxidizing autotrophic biomass 1.5 0.27 gO2/m3 139 
K_OH Oxygen half-saturation coefficient for heterotrophic biomass 0.2 0.10 gO2/m3 67 
n_NO2 Correction factor for anoxic growth of heterotrophs on nitrite 0.6 0.92 – 42 
n_NO3 Correction factor for anoxic growth of heterotrophs on nitrate 0.6 0.49 – 20 

aThe deviation from default is calculated by , with A being the default parameter value and B being the calibrated parameter value.

Figure 1

Timeline with the time periods used for the different mechanistic modelling purposes (Kirim 2022).

Figure 1

Timeline with the time periods used for the different mechanistic modelling purposes (Kirim 2022).

Close modal
The calibrated model results for nitrate during the validation period are shown in Figure 2. The validation period indicated that the model lacks predictive power for effluent nitrate. The order of magnitude is the same for most of the validation period, but significant differences were observed during a few days. A more detailed description of the mechanistic model can be found in the PhD thesis by Kirim (2022).
Figure 2

Calibrated mechanistic model results and measurements for effluent nitrate during the validation period (Kirim 2022).

Figure 2

Calibrated mechanistic model results and measurements for effluent nitrate during the validation period (Kirim 2022).

Close modal

Given the current lack of clarity regarding the specific structural component missing in the mechanistic model, leading to its suboptimal performance in predicting effluent nitrate, our strategy involves the implementation of a parallel HM to address these discrepancies. The choice of a parallel HM proves particularly advantageous in this context, as it has the capacity to address inaccuracies in the overall structure of the mechanistic model or fill gaps with unidentifiable score.

Parallel hybrid model setup

The HM is constructed by training a data-driven component to predict the residuals between the outputs of the mechanistic model and the measured data. The predicted residuals are then added to the mechanistic model outputs (Figure 3). In this research, neural networks (NNs) are chosen as the data-driven component of the HM. NNs are frequently used in wastewater treatment modelling for their ability to handle complex, nonlinear dynamics and capture functional relationships among the water quality data, even when the underlying relationships are challenging to define (Bahramian et al. 2023). They outperform traditional methods such as multiple linear regression and auto-regressive integrated moving average by requiring fewer assumptions to achieve a higher accuracy (Chen et al. 2020). In addition, NNs exhibit great fault tolerance since the corruption of one or more network cells does not prevent it from generating output, and NNs can work effectively with incomplete or missing data (Mijwel 2021).
Figure 3

Structure of a parallel hybrid model.

Figure 3

Structure of a parallel hybrid model.

Close modal

Recurrent neural networks (RNNs) are models that have at least one feedback loop, which allows them to take some temporal context into account in their decision function. Hence, RNNs are able to ‘remember’ information through time, which makes them a useful tool for time series forecasting (Goodfellow et al. 2016). The RNN cell encounters difficulties with long sequences because of vanishing or exploding gradient issues. During backpropagation, gradients can either become too small (vanish), resulting in inadequate weight updates, or too large (explode), causing unstable weight matrices. These issues stem from gradients being intractable and limiting the RNN's capacity to capture long-term dependencies effectively (Hewamalage et al. 2021). LSTM-RNNs address these issues as they are capable of learning long-term dependencies. Long short-term memory (LSTM) cells have an internal recurrence (a self-loop), in addition to the outer loop of the RNN. The cells have the same inputs and outputs as an ordinary recurrent network, yet each cell features additional parameters and a set of gating units that manage the information flow. The gates enable these networks to reject irrelevant information from the past and remember information in the current state (Goodfellow et al. 2016). LSTM networks have the ability to remember more than 1,000 time steps, depending on the complexity of the network's architecture (Hochreiter & Schmidhuber 1997). CNNs are often used for processing multidimensional inputs. CNNs use convolutional layers containing filters to identify important features in the input data. In addition, pooling layers are employed to summarize these features and extract the most prominent ones within a given locality (Goodfellow et al. 2016). Recently, CNNs have also gained attention as a valuable tool for time series forecasting (Wang et al. 2017; Durairaj & Mohan 2022). CNNs train filters that represent recurrent patterns in the series and use them to predict future values (Koprinska et al. 2018). They can also be effective for handling noisy time series by removing noise at each layer and extracting only the meaningful patterns (Borovykh et al. 2017).

For the development of the NNs, Keras (version 2.11.0, Chollet 2015) is used, a high-level interface of the TensorFlow platform (version 2.11.0, Abadi et al. 2016). To integrate Python-developed NN models with the mechanistic model, the WEST Tornado kernel is called through the command line interface. A Windows batch file is created to set the path to the Tornado kernel library and execute the XML file linked to the WEST model. The input variables for the NNs were preprocessed as described by Kirim (2022). Moreover, input data were standardized (centring and scaling) and subsampled from the original 1- to a 10-min interval to reduce the computational burden. The input data are reshaped into a 3D tensor to align with the required format for LSTM-RNNs and one-dimensional CNNs. The mean absolute error (MAE) is used as training loss and is optimized using the adaptive moment estimation (Adam) optimizer, and models are trained using a batch size of 64. Early stopping is used to determine the optimal number of training epochs. The number of layers is manually tuned first (between 1 and 5). The LSTM hidden units' size, CNN kernel size and the number of filters, activation function (ReLU or tanh), and the learning rate of the Adam optimizer were optimized using a random search tuner with the KerasTuner framework (O'Malley et al. 2019). As a regularization procedure, a drop-out layer with a drop-out rate of 0.1 was chosen between each layer of the NNs. All hyperparameters of the NNs are tuned using a separate validation period. The evaluation metrics encompassed the coefficient of determination (R2), which estimates the percentage of variance accounted for by the model. In addition, MAE and root-mean-squared error (RMSE) were used to provide information about the magnitude error.

Hybrid model setup with benchmark simulation model

In an initial experiment, a proof of concept is performed based on the well-established Benchmark Simulation Model no. 1 (BSM1) (Gernaey et al. 2014). The objective is to illustrate the capabilities of a parallel HM to capture missing process dynamics in the context of wastewater treatment as well as to assess the data requirements for training such a parallel HM. This experiment is important given the data constraints observed in the pilEAUte case study, where only a 2-month dataset was available for HM development. In the BSM1 experiment, which uses a dataset spanning a 2-week period, the goal is to assess the feasibility and performance of constructing a functional HM with data over a constrained time horizon. If a parallel HM can be successfully implemented for BSM1 using 2 weeks of high-frequency data, then developing a parallel HM based on 2 months of data with a similar frequency would be reasonable. The BSM1 serves as a test model for control strategies within a wastewater treatment setup featuring two anoxic tanks, three aerated tanks, and one settler. The ASM1 has been selected to describe the biological phenomena occurring in the biological reactor tanks. Realistic influent dynamics are available for the benchmark model, e.g. dry weather, or rainy weather (a combination of dry weather and a long rain period). With a configuration similar to the pilEAUte, this plant proves highly beneficial for initial explorations of an HM. The primary focus centres on predicting the NO3-N concentration in the effluent as this is also the variable of interest in the pilEAUte case study. First, synthetic data of the behaviour of a wastewater treatment plant using the full BSM1 are generated in WEST 2023 (DHI, Hørsholm, Denmark). The model simulated 150 days with constant influent to obtain a steady state. Subsequently, simulations were conducted for 2 weeks using the BSM1 dry weather influent, and an additional 2 weeks were simulated using the BSM1 rain weather influent. The data are sampled every 15 min, and a Gaussian noise with mean μ = 0 and standard deviation σ = 0.25 is added to the effluent NO3-N data.

Next, three models were fitted to the simulated dataset: a mechanistic model using only conventional differential equations with a modification to the autotrophic growth process, a purely data-driven model based on a CNN and an HM combining both. In the context of this proof of concept, only a CNN was used, as the comparison with LSTM would extend the section beyond the intended scope. To explore the impact of imperfect domain knowledge within the mechanistic model, the value of a calibrated parameter was intentionally modified to another value. Specifically, a version of the BSM1 was created where the growth of autotrophs was inaccurately accounted for. Quaghebeur et al. (2022) conducted a comparable experiment with BSM1 to demonstrate the superior performance of a hybrid approach incorporating neural differential equations in contrast to relying solely on either a mechanistic or data-driven model. In the notation of the BSM1 model, the growth process of autotrophs, denoted as ρ3, is governed by the following equation:
(1)
where μA is the growth rate of the autotrophs, SNH is the concentration of soluble ammonium nitrogen, KNH is the ammonia half-saturation coefficient for autotrophic biomass, SO is the concentration of soluble oxygen, KO,A is the oxygen half-saturation coefficient for autotrophic biomass, and XB,A is the concentration of active autotrophic biomass. By adjusting μA from 0.5 d−1 in the calibrated model to 1 d−1, the model incorrectly represents the growth dynamics of autotrophs. This scenario exemplifies a mechanistic model that inadequately accounts for autotroph growth. In this comparative analysis, the data-driven CNN model predicts results using the same input variables as the mechanistic model. In the parallel HM, a CNN is trained to predict the residuals between the mechanistic model output and the measurement data. The predicted residuals are then added to the mechanistic model output. The dry weather influent case is used to train (10 days) and validate (4 days) the models, and to test the models, the rainy weather influent is used. During days 8–10 of the rainy influent dataset, a rain event occurs that doubles the hydraulic load and reduces the contaminant concentrations by half. This period significantly deviates from the dry weather influent used for training. It serves as a unique opportunity to evaluate the models' capacity to extrapolate into unfamiliar operational situations.

Hybrid model setup with pilEAUte model

For the pilEAUte model, HMs are configured in different ways to determine the most efficient approach, with the specific configurations detailed in the results and discussion section. The input data used to train the NNs consist of the mechanistic model output of effluent nitrate, the measured air flowrate in basin 4, the measured pH in the influent, and the measured, non-fractionated variables that are also fed to the mechanistic model: temperature in basin 4 and influent flowrate, CODt, CODs, total suspended solids (TSS), and NH4-N (Figure 4). The air flowrates in basins 3 and 5 are adjusted using ratio controllers by scaling the aeration flowrate in basin 4 by a factor based on the tank volumes. As these flowrates do not offer additional information to the data-driven component, they are not included as input features. Initially, the data-driven component was trained without incorporating the mechanistic model output for nitrate as an input feature, resulting in suboptimal performance; its inclusion was subsequently deemed necessary. The sludge waste flowrate was not included as input for the NN since this only varies in the long term, so its effect is expected to be primarily seen over a longer period.
Figure 4

Overview of the studied parallel hybrid model developed to predict effluent nitrate.

Figure 4

Overview of the studied parallel hybrid model developed to predict effluent nitrate.

Close modal

Parallel hybrid model proof of concept on benchmark model

The results of the proof-of-concept experiment, based on the BSM1, are presented in this section. The results for effluent nitrate for the training (dry weather) and test period (rainy weather) are shown in Figure 5, and the quantitative metrics are given in Table 4 in Appendix C. The mechanistic model is able to capture some of the general dynamics, both during dry and rainy weather. However, as the parameter that accounts for the autotrophic growth is modified, the mechanistic model overpredicts the nitrate and is not able to accurately reproduce the dynamics in both periods, leading to an RMSE value of 2.43 mg NO3-N/L in the test period. This shows a clear mismatch between the mechanistic model's representation of autotrophic bacteria growth and the system. The output of the purely data-driven model during the training period demonstrates satisfactory performance, capturing the appropriate dynamics. However, when simulating the behaviour during rainy weather, the model fails to make accurate predictions, especially during the rain event. This type of event is not represented in the training dataset, and the model is in effect extrapolating, leading to poor performance with an RMSE value of 2.25 mg NO3-N/L. This shows the limits of a purely data-driven model, as the ability to extrapolate is crucial for useful modelling of wastewater treatment processes. The HM shows similar performance to the data-driven model during dry weather. However, it is able to capture the dynamics during rainy weather more accurately compared to the data-driven and mechanistic models. This is also clearly reflected in the lower RMSE value of 0.64 mg NO3-N/L during the test period. During the rain event, the HM benefits from insights of both the mechanistic and data-driven models, leading to a reduction in RMSE.
Figure 5

Results of the BSM1 experiment: hybrid model output and data-driven model output compared to mechanistic model output with wrong dynamics and the dataset for effluent nitrate for the training (dry weather) and test (rainy weather) period. The rain event during the wet weather conditions is indicated by the grey box.

Figure 5

Results of the BSM1 experiment: hybrid model output and data-driven model output compared to mechanistic model output with wrong dynamics and the dataset for effluent nitrate for the training (dry weather) and test (rainy weather) period. The rain event during the wet weather conditions is indicated by the grey box.

Close modal

In an additional experiment, μA is restored to its calibrated value of 0.5 d−1. Simultaneously, the ASM mechanistic model is modified by adjusting the value of the KNH parameter from 1 to 5 mg NH4-N/L, thereby introducing an alternative variation in the system's dynamics. Subsequently, an HM is constructed using this modified mechanistic model. The outcomes of this experiment are shown in Appendix B. This experiment yielded results similar to those observed in the first experiment. Although the HM exhibits somewhat inferior performance during the rain event within the test period compared to the HM in the first experiment, it still outperforms both the modified mechanistic model and the data-driven model.

In this part of the study, it was demonstrated that the application of a parallel HM allows for a significant reduction in RMSE, even when both components of the HM are given the same input data. Furthermore, it was shown that with a training period of only 10 days of high-frequency data, the HM achieved a good performance, evidenced by an RMSE of 0.64 mg NO3-N/L during the testing phase.

pilEAUte case study

The work of Kirim (2022) revealed that, despite an extensive and thorough calibration procedure, the mechanistic model of the pilEAUte plant still exhibits limited predictive power for effluent nitrate. This limitation was the driving force behind the investigation undertaken in this study, aiming to explore the potential and identify an optimal methodology for developing a parallel HM for a WRRF case study. The purpose of this HM is to compensate for the missing information in the mechanistic model by incorporating additional information and ultimately enhancing the overall predictive power of the model.

Data-driven models

The first part of the study involves a comparative analysis between purely data-driven models using LSTM and CNN architectures and a mechanistic model to assess whether a parallel HM configuration is necessary. The NNs have the same input data as the data-driven component of the HM, except for the mechanistic model output for nitrate. Training and validation of the NNs are done using the calibration period of the mechanistic model, and the test period is the validation period of the mechanistic model (Figure 6).
Figure 6

Timeline with the time periods used for the different mechanistic modelling (Kirim 2022) and data-driven modelling purposes. The data-driven model is trained and validated using the calibration period of the mechanistic model and tested using the validation period of the mechanistic model.

Figure 6

Timeline with the time periods used for the different mechanistic modelling (Kirim 2022) and data-driven modelling purposes. The data-driven model is trained and validated using the calibration period of the mechanistic model and tested using the validation period of the mechanistic model.

Close modal
Figure 7 presents the output of the models during the training and test phases of the NNs and the quantitative metrics are given in Table 5 in Appendix C. During the test period, the mechanistic model exhibits shortcomings in predicting crucial events, and it predicts a peak in NO3 occurring between days 49 and 52, which deviates from the observed measurements. The LSTM data-driven model shows unsatisfactory performance during the training phase with an increase in RMSE from 2.30 to 2.95 mg NO3-N/L compared to the mechanistic model, which suggests a potential challenge in learning meaningful information from the input data. During the testing phase, while showing a reduced RMSE value of 2.15 mg NO3-N/L compared to the mechanistic model with an RMSE of 4.38 mg NO3-N/L, the LSTM model lacks the capacity to accurately capture the dynamics essential for improved predictions. The CNN model shows improved performance during the training phase compared to the LSTM and mechanistic model. This results in a more accurate representation of certain dynamics during the test period when compared with both the LSTM data-driven model and the mechanistic model. For instance, between days 54 and 56, the CNN model captures the dynamics more correctly. However, overall, none of the models are able to adequately capture the measurement dynamics in the testing phase, indicating that building fully data-driven models to capture and predict complex biological processes in WRRFs is difficult. An HM approach is a promising approach to improve the modelling performance since it incorporates a backbone of known process dynamics.
Figure 7

LSTM-RNN data-driven model output and CNN data-driven model output compared to mechanistic model output and measurements for effluent nitrate for the training and test period. The NNs are trained on the calibration dataset of the mechanistic model.

Figure 7

LSTM-RNN data-driven model output and CNN data-driven model output compared to mechanistic model output and measurements for effluent nitrate for the training and test period. The NNs are trained on the calibration dataset of the mechanistic model.

Close modal

Hybrid models using calibration dataset

In the next part of the research, the performances of HMs using LSTM and CNN are explored. In these HMs, the NNs are trained to predict the residual between the calibrated mechanistic model output and the measurements, which is then added to the mechanistic model outputs. This approach is motivated by the potential presence of valuable information on uncaptured dynamics within the residuals, and the NNs are expected to uncover this information by learning patterns in the residuals. The NN components are again trained and validated using the calibration period of the mechanistic model, and testing is done using the validation period of the mechanistic model, similar to the data-driven models in the previous experiment (Figure 6). The HMs show similar behaviour across both periods (Figure 8). The quantitative metrics of this experiment are shown in Table 6 in Appendix C. Compared to the mechanistic model, the LSTM HM and CNN HM show improved performance during the training phase, characterized by a decrease in RMSE values to 1.27 and 0.99 mg NO3-N/L, respectively. However, during the test period, both HMs show poor predictive accuracy as neither model shows significant improvement over the mechanistic model. This shortcoming is also evident in the corresponding RMSE values.
Figure 8

LSTM-RNN hybrid model output and CNN hybrid model output compared to mechanistic model output and measurements for effluent nitrate for the training and test period. The NNs are trained on the calibration dataset of the mechanistic model.

Figure 8

LSTM-RNN hybrid model output and CNN hybrid model output compared to mechanistic model output and measurements for effluent nitrate for the training and test period. The NNs are trained on the calibration dataset of the mechanistic model.

Close modal

The results of this experiment indicate that the data-driven components of the HM were not able to learn relevant missing dynamics from the remaining error of the mechanistic model in the calibration period. It should be noted that the mechanistic model itself has already been extensively calibrated to this data period. Hence, it raises the question of whether sufficient relevant structural information remains in the residual of this calibration period for the data-driven models to capture. In addition, the question arises to what extent the missing dynamics and knowledge in the output of the mechanistic model are already corrected by compensating certain parameters from their ‘true’ values during the calibration itself. In such a case, the data-driven component in the HM faces a challenging task to perform effectively. It must successfully tackle two objectives to accurately predict unseen periods: (1) recognize and correct for the parameter compensation of the mechanistic model, and (2) incorporate the missing dynamics. These requirements are demanding, and there is a realistic possibility of failure of the data-driven component to predict accurately in this case.

Hybrid model with validation dataset

In the next experiment, the goal is to investigate the potential advantages of training the NN component of the HM on an independent validation dataset. The aim is to understand the difference in the information contained within the residuals of the calibration dataset and independent validation dataset and to explore what the best approach is to calibrate parallel HMs. So, an HM is created in which a CNN is trained on the residual of the dataset used to validate the mechanistic model. The exclusive use of CNN as the data-driven component in this part of the study is based on its superior performance demonstrated in prior experiments compared to LSTM. Further insights into the different performances of the CNN and LSTM will be detailed in the final section of the discussion. It should be noted that the training period in this approach is shorter than in the previous approach (11 days compared to 18 days) due to the limited length of the validation dataset of the mechanistic model. The validation for the CNN in this approach is performed on the calibration dataset of the mechanistic model, and the testing period corresponds to the initialization period of the mechanistic model (Figure 9).
Figure 9

Timeline with the time periods used for the different mechanistic modelling (Kirim 2022) and the hybrid modelling purposes. In this specific experiment, the CNN component of the hybrid model is trained on an independent dataset (the validation period of the mechanistic model), validated on the calibration period of the mechanistic model, and tested on the initialization period of the mechanistic model.

Figure 9

Timeline with the time periods used for the different mechanistic modelling (Kirim 2022) and the hybrid modelling purposes. In this specific experiment, the CNN component of the hybrid model is trained on an independent dataset (the validation period of the mechanistic model), validated on the calibration period of the mechanistic model, and tested on the initialization period of the mechanistic model.

Close modal
Figure 10 shows the results of this experiment, and the quantitative metrics are given in Table 7 in Appendix C. During the training period, the CNN HM output (RMSE = 1.19 mg NO3-N/L) shows improvement compared to the mechanistic model output (RMSE = 4.56 mg NO3-N/L). Specifically, the HM shows an improvement in capturing the dynamics compared to the mechanistic model during the test period. However, the HM model struggles to accurately capture the observed dynamics in certain periods; for instance, during days 16–19, there are noticeable deviations between predictions and measurements, contributing to an elevation in RMSE. However, the CNN HM trained on the validation dataset performs better during the test period (RMSE = 3.25 mg NO3-N/L) than the CNN HM trained on the calibration dataset of the mechanistic model (RMSE = 5.37 mg NO3-N/L). It should be noted that results may have been influenced by the nature of the data in this approach. The test data in this approach, i.e., the initialization period of the mechanistic model, exhibit a diurnal pattern. The presence of such patterns may lead to relatively better predictability and potentially favourable results during the testing phase.
Figure 10

CNN hybrid model output compared to mechanistic model output and measurements for effluent nitrate for the training and test period. The CNN component is trained on an independent validation dataset.

Figure 10

CNN hybrid model output compared to mechanistic model output and measurements for effluent nitrate for the training and test period. The CNN component is trained on an independent validation dataset.

Close modal

The findings of these experiments suggest that training a NN in a parallel HM configuration on the same dataset as used for calibration of the mechanistic model may not yield effective results, as the remaining error during this period contains limited structural information for the model to learn. Training the data-driven component in an HM on an independent dataset appears to provide more relevant information for the NN component to learn some structurally missing info. However, going through an extensive calibration of the mechanistic model prior to considering a parallel HM still creates a risk of wrongly fitting the mechanistic model due to structural deficiencies which hamper the final overall HM model performance.

Analysis of mechanistic model structure deficiencies

For the mechanistic model of Kirim (2022), potential calibration issues can be observed in two different submodels: the hydraulic and the biokinetic models. The hydraulic model was established using data obtained from two tracer tests, which led to the determination of various backflows between the basins that were incorporated into the final model configuration. The results of the hydraulic model can be found in Appendix D. The results demonstrate that the hydraulic model prediction is still suboptimal. In the first tracer test dataset, the model consistently overestimates the observed peak in each tank. For the second dataset, the shape of the peak deviates from the actual peak in each tank. A suboptimal hydraulic model will influence the subsequent calibration of biokinetic model parameters as they may be used to compensate for missing dynamics in the hydraulic model. The calibrated parameters of the biokinetic model are presented in Table 1. It can be observed that the calibrated values for the model parameters and related to the biokinetics of the two-step nitrification process are significantly lower than their default values (factor 10–100 difference). The same situation holds for the oxygen half-saturation concentrations (KO,NH and KO,NO) (factor 3–5 difference). This could be an indication that the biokinetic model parameters are suboptimally calibrated to compensate for missing phenomena in the model structure, resulting in an overall negative impact on the model's predictive power.

In the next part of the study, the potential impact of wrongly fitting the mechanistic component in HMs is explored. The goal is to identify the optimal balance between the calibration effort for the mechanistic model and acceptable compensation by the NN component. This investigation involves comparing the training of the NN component on the residual of the calibrated mechanistic model with that of an uncalibrated mechanistic model. To accomplish this, the mechanistic model's biokinetic parameters were set to their default values (Table 1), and the backflows included in the hydraulic model were removed. The model was re-executed, generating new mechanistic model predictions for effluent nitrate. The removal of the backflows had no impact on the model predictions. However, resetting the biokinetic parameters to their default values did result in a significant difference in the output of the mechanistic model. The uncalibrated model performs worse than the calibrated one, as expected.

A CNN is then trained and validated on the residual of the uncalibrated model. The NN is thus trained to predict the residual between the output of the mechanistic model with default parameters for effluent nitrate, and the corresponding measurements. The predicted residuals are then added to the output of the uncalibrated mechanistic model. The results are compared with those from a previous experiment, in which the CNN of the HM was trained on the difference between a calibrated mechanistic model. This experiment is performed by both training the NN component on the calibration dataset of the mechanistic model and the validation dataset of the mechanistic model.

The results of these experiments are shown in Figure 11 (with quantitative metrics in Table 2) and Figure 12 (with quantitative metrics in Table 8 in Appendix C). In these experiments, both HMs show similar behaviour in the training period and demonstrate improved accuracy and prediction of dynamics compared to the mechanistic model. In Figure 11, during the testing phase, the CNN HM trained on the residual of the uncalibrated mechanistic model outperforms the other two models, reflected in a significantly lower RMSE value of 1.58 mg NO3-N/L. In Figure 12, the CNN HM trained on the residual of the uncalibrated mechanistic model outperforms the CNN HM trained on the residual of the calibrated mechanistic model based on the performance metrics. However, the dynamics are less accurately predicted by this model due to the reduced prominence of the peaks in effluent nitrate.
Table 2

Accuracy metrics for comparison of mechanistic model (mech.), CNN hybrid model based on the calibrated mechanistic model (HM cal.), and CNN hybrid model based on the uncalibrated mechanistic model (HM uncal.) for training on the calibration dataset

PeriodModelMetric
RMSEMAER2
Train Mech. 2.30 1.87 0.17 
HM cal. 0.99 0.77 0.84 
HM uncal. 1.39 1.05 0.70 
Test Mech. 4.38 3.66 <0 
HM cal. 5.37 4.46 <0 
HM uncal. 1.58 1.29 0.23 
PeriodModelMetric
RMSEMAER2
Train Mech. 2.30 1.87 0.17 
HM cal. 0.99 0.77 0.84 
HM uncal. 1.39 1.05 0.70 
Test Mech. 4.38 3.66 <0 
HM cal. 5.37 4.46 <0 
HM uncal. 1.58 1.29 0.23 

The metrics correspond to the results in Figure 11.

Figure 11

CNN hybrid model output trained on the residual of a calibrated mechanistic model and CNN hybrid model output trained on the residual of an uncalibrated mechanistic model compared to mechanistic model output and measurements for effluent nitrate for the training and test period. The CNNs are trained on the calibration dataset of the mechanistic model.

Figure 11

CNN hybrid model output trained on the residual of a calibrated mechanistic model and CNN hybrid model output trained on the residual of an uncalibrated mechanistic model compared to mechanistic model output and measurements for effluent nitrate for the training and test period. The CNNs are trained on the calibration dataset of the mechanistic model.

Close modal
Figure 12

CNN hybrid model output trained on the residual of a calibrated mechanistic model and CNN hybrid model output trained on the residual of an uncalibrated mechanistic model compared to mechanistic model output and measurements for effluent nitrate for the training and the test period. The CNNs are trained on an independent validation dataset.

Figure 12

CNN hybrid model output trained on the residual of a calibrated mechanistic model and CNN hybrid model output trained on the residual of an uncalibrated mechanistic model compared to mechanistic model output and measurements for effluent nitrate for the training and the test period. The CNNs are trained on an independent validation dataset.

Close modal

So, in these experiments, it is shown that an HM trained on the uncalibrated mechanistic model can improve the accuracy of the predictions and capture the right dynamics, whereas the HM trained on the calibrated model fails to compensate for the missing dynamics of the mechanistic model. Moreover, the predictions using this HM remain rather close to the predictions of the mechanistic model in this case.

The results of this experiment suggest that the residual of the uncalibrated mechanistic model contains valuable information and patterns that are not present in the residual of the calibrated model. By incorporating this residual information into the training process, the NN is able to improve its predictive capabilities, resulting in superior performance during the test period. The HM trained on the uncalibrated mechanistic model (Figure 11) performs better than both the CNN and LSTM purely data-driven models (Figure 7) in terms of prediction accuracy and capturing the dynamics. This suggests that the mechanistic model still offers valuable information to the HM, even though the performance of the stand-alone mechanistic model is far from optimal. The results of these experiments thus support the hypothesis that the mechanistic model may have been wrongly fitted. This is always a risk when a mechanistic model is unable to describe some dynamics. Detailed calibration of the mechanistic model with several unidentifiable parameters may lead to overfitting, making it more difficult for a parallel NN component to learn the missing model dynamics. The present work shows that a parallel HM approach can compensate for non-calibrated phenomena. This could potentially save time in the calibration effort of mechanistic models as well. Hvala & Kocijan (2020) set up an HM based on a default mechanistic model and pre-set input wastewater characterization. This model showed a prediction accuracy comparable to the prediction accuracy of an HM with tuned mechanistic model parameters, indicating that no prior complex tuning of the mechanistic model is needed for good HM performance.

Hybrid model with CNN and LSTM trained on uncalibrated mechanistic model residual

To further investigate the findings of the first experiments, where the performance of an HM with CNN was compared with an HM with LSTM, an additional HM is developed with an LSTM as the data-driven component. In this experiment, the LSTM is also trained on the residual of the uncalibrated mechanistic model. The goal of this experiment is to confirm that the inferior performance of the LSTM HM compared to the CNN HM can be attributed to the nature of the data-driven component in the HM, rather than the specific characteristics of the calibrated mechanistic model.

The results of this experiment are shown in Figure 13, and the quantitative metrics are given in Table 9 in Appendix C. During the training period, the output of the LSTM HM performance is similar to the CNN HM and is more accurate than the output of the mechanistic model, characterized by a lower RMSE for both HMs. However, it is noteworthy that the correct dynamics are absent in the outputs of the LSTM HM during the test period. The LSTM HM predicts several peaks that do not occur in the measurement data during these periods, for example, on days 52–54 and 56–57 in the test period. This reaffirms the earlier findings, where the LSTM HM also exhibited poorer performance compared to the CNN HM.
Figure 13

CNN and LSTM hybrid model output trained on the residual of an uncalibrated mechanistic model compared to mechanistic model output and measurements for effluent nitrate for the training and the test period.

Figure 13

CNN and LSTM hybrid model output trained on the residual of an uncalibrated mechanistic model compared to mechanistic model output and measurements for effluent nitrate for the training and the test period.

Close modal

LSTM-RNNs are typically used to predict time series due to their ability to capture long-term dependencies in sequential data. However, in this study, the model fails to effectively learn meaningful patterns from the input data during the training phase. The LSTM-RNN struggles to capture and represent the underlying patterns and dependencies present in the time series data. As a result, when faced with the test data, the model lacks the necessary knowledge and context to accurately recognize and predict patterns in the residuals. This observation is noteworthy because the existing literature suggests that LSTM-RNNs perform well as data-driven components in HMs, as demonstrated by Jia et al. (2021) and Dong et al. (2023). The precise cause behind the failure of the LSTM remains ambiguous or undetermined. It is possible that the dataset is not large and diverse enough for the LSTM-RNN to learn something useful in this study. On the contrary, the HM with CNN as the data-driven component is able to detect and learn important dynamics in the residuals during the test period. The CNN has the ability to concentrate on important time series features and can extract these features through its learning process. The convolution filter in CNN can be useful for identifying patterns at different timescales or detecting specific events in the time series. This was also confirmed by Wang et al. (2019). However, to draw definitive conclusions regarding the optimal data-driven component for an HM, additional research is needed. A holistic approach using comprehensive evaluation criteria is essential to verify the performance among data-driven models within this context.

The use of a relatively short dataset (2 months) may be seen as a limitation in this study. The dataset was constrained to the spring season due to frequent sensor errors and is thus not representative of a full year of data with seasonal influences. However, it should be stressed that data requirements for (hybrid) models can vary significantly based on the modelling objective and particularly the length of the prediction horizon that is envisioned. For applications requiring short-term predictions (such as model predictive control or short-term operational optimization in digital twins), a shorter training period could provide adequate results as long as sufficient (and relevant) dynamics are present in the training data. As in most modelling studies, retraining and/or recalibration of the model will become necessary at some point. The length and information content of the training dataset may influence the frequency at which recalibration or retraining is needed. For the current study, both the BSM1 experiment using 2 weeks of data, and the pilEAUte case study using 2 months of data indicate that parallel HMs can successfully be set up with datasets over a limited time horizon. Further testing on a longer dataset (for example, a full year) could be beneficial to evaluate the parallel HM's performance across different conditions, thereby ensuring its robustness and determining the need and frequency of retraining.

In this study, parallel HMs are developed in which the data-driven component of the HM is trained to forecast the residual between the mechanistic model output and the measurements for effluent nitrate. A first theoretical experiment based on BSM1 demonstrated the effective predictive capability of a parallel hybrid, effectively addressing limitations in the mechanistic model's representation of autotrophic bacteria growth and a data-driven model's incapability to extrapolate. Constructing an HM of the pilEAUte showed that training a parallel HM requires separate datasets for accurate calibration of the mechanistic model and training of the data-driven model. Further improvement in HM performance was achieved when the CNN component of a parallel HM was combined with an uncalibrated mechanistic model. This indicates that HMs may be more difficult to construct when the mechanistic component is wrongly calibrated with uncertain or unidentifiable parameters. It also indicates that balancing calibration efforts between mechanistic and data-driven components in HMs is crucial for optimal final model performance. An iterative approach is recommended over a sequential calibration procedure.

Finally, different choices for the data-driven component were tested for their ability to reveal the hidden dynamics from WRRF time series. Comparing the LSTM-RNN and CNN HMs, the CNN HM outperformed the LSTM HM in learning meaningful patterns and predicting accurate dynamics in wastewater treatment system time series with recurrent phenomena. This indicates that CNN is more suitable as a data-driven component for parallel HMs in this context than LSTM-RNN. When developing a model for wastewater treatment processes, thoughtful allocation of efforts is essential, potentially giving preference to refining HMs, as they may provide a more effective path for improvement than excessive calibration efforts in mechanistic models.

Further research is required to reach the full potential of hybrid modelling. Implementing a parallel HM for a full-scale treatment plant and examining various strategies for its setup could offer additional valuable insights into the optimal construction of an HM.

Currently, limited research is available on determining the effective data-driven components in parallel HM structures. This study indicates the potential of CNN as a data-driven component of a hybrid modelling approach, but further exploration of alternative options may lead to even better methods. In recent years, CNN-LSTM models have emerged as a valuable data-driven approach for time series prediction. This model combines the strengths of CNN by leveraging the feature extraction capabilities and LSTM by capturing long-term dependencies in input time series data.

This study aimed to identify the most effective methods for constructing a parallel HM. However, a general framework for integrating mechanistic and data-driven models still needs to be developed. It should address key considerations such as data compatibility, model selection, parameter estimation, model validation, and interpretation of the HM results. By developing this framework, researchers and practitioners can ensure a systematic and reliable integration of data-driven and mechanistic models, leading to improved accuracy and understanding of complex systems.

Jan Verwaeren received funding from the ‘Onderzoeksprogramma Artifciële Intelligentie (AI) Vlaanderen’. The work of Peter Vanrolleghem was supported by the Natural Sciences and Engineering Research Council of Canada Discovery Grant RGPIN-2021–04347 Towards digital twin-based control of Water Resource Recovery Facilities – Methods supporting the use of adaptive hybrid digital twins.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Abadi
M.
,
Barham
P.
,
Chen
J.
,
Chen
Z.
,
Davis
A.
,
Dean
J.
,
Devin
M.
,
Ghemawat
S.
,
Irving
G.
,
Isard
M.
,
Kudlur
M.
,
Levenberg
J.
,
Monga
R.
,
Moore
S.
,
Murray
D. G.
,
Steiner
B.
,
Tucker
P.
,
Vasudevan
V.
,
Warden
P.
,
Wicke
M.
,
Yu
Y.
&
Zheng
X.
2016
Tensorflow: A system for large-scale machine learning
. In:
12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)
, pp.
265
283
.
Bahramian
M.
,
Dereli
R. K.
,
Zhao
W.
,
Giberti
M.
&
Casey
E.
2023
Data to intelligence: The role of data-driven models in wastewater treatment
.
Expert Systems With Applications
217
,
119453
.
Borovykh
A.
,
Bohte
S.
&
Oosterlee
C. W.
2017
Conditional time series forecasting with convolutional neural networks. arXiv preprint arXiv:1703.04691
.
Chen
Y.
,
Song
L.
,
Liu
Y.
,
Yang
L.
&
Li
D.
2020
A review of the artificial neural network models for water quality prediction
.
Applied Sciences
10
(
17
),
5776
.
Cheng
T.
,
Harrou
F.
,
Kadri
F.
,
Sun
Y.
&
Leiknes
T.
2020
Forecasting of wastewater treatment plant key features using deep learning-based models: A case study
.
IEEE Access
8
,
184475
184485
.
Cheng
X.
,
Guo
Z.
,
Shen
Y.
,
Yu
K.
&
Gao
X.
2021
Knowledge and data-driven hybrid system for modeling fuzzy wastewater treatment process
.
Neural Computing and Applications
35
,
7185
7206
.
Chollet
F.
2021
Deep Learning With Python
, 2nd edn.
Manning Publications
,
New York, NY, USA
.
Chollet
F.
2015
Keras
.
Available from: https://keras.io
.
DHI
2017
MIKE Powered by DHI. Available from: https://www.mikepoweredbydhi.com/products/west.
Dochain
D.
&
Vanrolleghem
P. A.
2001
Dynamical Modelling and Estimation in Wastewater Treatment Processes
.
IWA Publishing
,
London, UK
.
Durairaj
D. M.
&
Mohan
B. K.
2022
A convolutional neural network based approach to financial time series prediction
.
Neural Computing and Applications
34
(
16
),
13319
13337
.
European Commission
.
2020
Regulation (EU) 2020/741 of the European Parliament and of the Council of 25 May 2020 on minimum requirements for water reuse
.
Official Journal of the European Union
L 177
,
32
55
.
Flores-Alsina
X.
,
Saagi
R.
,
Lindblom
E.
,
Thirsing
C.
,
Thornberg
D.
,
Gernaey
K. V.
&
Jeppsson
U.
2014
Calibration and validation of a phenomenological influent pollutant disturbance scenario generator using full-scale data
.
Water Research
51
,
172
185
.
Forrester
A. I.
,
Bressloff
N. W.
&
Keane
A. J.
2006
Optimization using surrogate models and partially converged computational fluid dynamics simulations
.
Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences
462
(
2071
),
2177
2204
.
Gernaey
K. V.
,
Van Loosdrecht
M. C.
,
Henze
M.
,
Lind
M.
&
Jørgensen
S. B.
2004
Activated sludge wastewater treatment plant modelling and simulation: State of the art
.
Environmental Modelling & Software
19
(
9
),
763
783
.
Gernaey
K. V.
,
Jeppsson
U.
,
Vanrolleghem
P. A.
&
Copp
J. B.
2014
Benchmarking of Control Strategies for Wastewater Treatment Plants
.
IWA Scientific and Technical Report No. 23
.
IWA Publishing
,
London, UK
.
Goodfellow
I.
,
Bengio
Y.
&
Courville
A.
2016
Deep Learning
.
MIT Press
,
Cambridge, MA, USA
.
Hastie
T.
,
Tibshirani
R.
,
Friedman
J. H.
&
Friedman
J. H.
2009
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
, Vol.
2
.
Springer
,
New York, NY, USA
.
Henze
M.
,
Gujer
W.
,
Mino
T.
&
van Loosdrecht
M. C.
2000
Activated Sludge Models ASM1, ASM2, ASM2d and ASM3
.
IWA Publishing
,
London, UK
.
Hewamalage
H.
,
Bergmeir
C.
&
Bandara
K.
2021
Recurrent neural networks for time series forecasting: Current status and future directions
.
International Journal of Forecasting
37
(
1
),
388
427
.
Hochreiter
S.
&
Schmidhuber
J.
1997
Long short-term memory
.
Neural Computation
9
(
8
),
1735
1780
.
Jia
X.
,
Willard
J.
,
Karpatne
A.
,
Read
J. S.
,
Zwart
J. A.
,
Steinbach
M.
&
Kumar
V.
2021
Physics-guided machine learning for scientific discovery: An application in simulating lake temperature profiles
.
ACM/IMS Transactions on Data Science
2
(
3
),
1
26
.
Karama
A.
,
Bernard
O.
,
Gouzé
J. L.
,
Benhammou
A.
&
Dochain
D.
2001
Hybrid neural modelling of an anaerobic digester with respect to biological constraints
.
Water Science and Technology
43
(
7
),
1
8
.
Kirim
G.
2022
Modelling and Model-Based Optimization of N-Removal WRRFs: Reactive Settling, Conventional & Short-Cut N-Removal Processes
.
PhD thesis
,
Université Laval
,
Québec, QC, Canada
.
Koprinska
I.
,
Wu
D.
&
Wang
Z.
2018
Convolutional neural networks for energy time series forecasting
. In:
2018 International Joint Conference on Neural Networks (IJCNN)
.
IEEE
, pp.
1
8
.
Lee
D. S.
,
Jeon
C. O.
,
Park
J. M.
&
Chang
K. S.
2002
Hybrid neural network modeling of a full-scale industrial wastewater treatment process
.
Biotechnology and Bioengineering
78
(
6
),
670
682
.
Lee
D. S.
,
Vanrolleghem
P. A.
&
Park
J. M.
2005
Parallel hybrid modeling methods for a full-scale cokes wastewater treatment plant
.
Journal of Biotechnology
115
(
3
),
317
328
.
Li
F.
&
Vanrolleghem
P. A.
2022
Including snowmelt in influent generation for cold climate WRRFs: Comparison of data-driven and phenomenological approaches
.
Environmental Science: Water Research & Technology
8
(
10
),
2087
2098
.
Mijwel
M. M.
2021
Artificial neural networks advantages and disadvantages
.
Mesopotamian Journal of Big Data
2021
,
29
31
.
Newhart
K. B.
,
Holloway
R. W.
,
Hering
A. S.
&
Cath
T. Y.
2019
Data-driven performance analyses of wastewater treatment plants: A review
.
Water Research
157
,
498
513
.
O'Malley
T.
,
Bursztein
E.
,
Long
J.
,
Chollet
F.
,
Jin
H.
&
Invernizzi
L.
2019
Keras-tuner
.
Philippe
R.
2018
Automatic Data Quality Assessment Tools for Continuous Monitoring of Wastewater Quality
.
Master's thesis
,
Université Laval
,
Québec, QC, Canada
.
Piuleac
C. G.
,
Sáez
C.
,
Cañizares
P.
,
Curteanu
S.
&
Rodrigo
M. A.
2012
Hybrid model of a wastewater-treatment electrolytic process
.
International Journal of Electrochemical Science
7
,
6289
6301
.
Psichogios
D. C.
&
Ungar
L. H.
1992
A hybrid neural network-first principles approach to process modeling
.
AIChE Journal
38
(
10
),
1499
1511
.
Regmi
P.
,
Stewart
H.
,
Amerlinck
Y.
,
Arnell
M.
,
García
P. J.
,
Johnson
B.
,
Maere
T.
,
Miletić
I.
,
Miller
M.
,
Rieger
L.
,
Samstag
R.
,
Santoro
D.
,
Schraa
O.
,
Snowling
S.
,
Takacs
I.
,
Torfs
E.
,
van Loosdrecht
M.
,
Vanrolleghem
P.
,
Villez
K.
,
Volcke
E.
,
Weijers
S.
,
Grau
P.
,
Jimenez
J.
&
Rosso
D.
2019
The future of WRRF modelling–Outlook and challenges
.
Water Science and Technology
79
(
1
),
3
14
.
Rieger
L.
,
Gillot
S.
,
Langergraber
G.
,
Ohtsuki
T.
,
Shaw
A.
,
Takács
I.
&
Winkler
S.
2012
Guidelines for Using Activated Sludge Models
.
IWA Publishing
,
London, UK
.
Rodriguez-Roda
I.
,
Sanchez-Marre
M.
,
Comas
J.
,
Baeza
J.
,
Colprim
J.
,
Lafuente
J.
,
Cortes
U.
&
Poch
M.
2002
A hybrid supervisory system to support WWTP operation: Implementation and validation
.
Water Science and Technology
45
(
4–5
),
289
297
.
Schneider
M. Y.
,
Quaghebeur
W.
,
Borzooei
S.
,
Froemelt
A.
,
Li
F.
,
Saagi
R.
,
Wade
M. J.
,
Zhu
J.-J.
&
Torfs
E.
2022
Hybrid modelling of water resource recovery facilities: Status and opportunities
.
Water Science and Technology
85
(
9
),
2503
2524
.
Serrao
M.
2023
Towards Intelligent Process Control of Municipal Wastewater Treatment: The Development of a Hybrid Model That Aims to Improve Simulation Performance and Process Optimization
.
PhD Thesis
,
École National des Ponts et Chaussées – ParisTech
,
Paris, France
.
Serrao
M.
,
Jauzein
V.
,
Azimi
S.
,
Rocher
V.
,
Tassin
B.
&
Vanrolleghem
P. A.
2023
Hybridizing a first-principles biofilm model with a data-based model to improve model accuracy for model predictive control of a 6 million PE WRRF
. In:
Proceedings WEF/IWA Innovations in Process Engineering Conference
,
June 6–9, 2023
,
Portland, OR, USA
.
Therrien
J. D.
,
Maere
T.
,
Halle
S.
,
Dallaire
P.
&
Vanrolleghem
P. A.
2022
Using the right wastewater characteristics for early COVID-19 pandemic warning and forecast using deep machine-learning
. In
12th Urban Drainage Modeling Conference
,
January 10–12, 2022
,
Costa Mesa, CA, USA
.
Torfs
E.
,
Nicolaï
N.
,
Daneshgar
S.
,
Copp
J. B.
,
Haimi
H.
,
Ikumi
D.
,
Johnson
B.
,
Plosz
B. B.
,
Snowling
S.
,
Townley
L. R.
,
Valverde-Pérez
B.
,
Vanrolleghem
P. A.
,
Vezzaro
L.
&
Nopens
I.
2022
The transition of WRRF models to digital twin applications
.
Water Science and Technology
85
(
10
),
2840
2853
.
Von Stosch
M.
,
Oliveira
R.
,
Peres
J.
&
de Azevedo
S. F.
2014
Hybrid semiparametric modeling in process systems engineering: Past, present and future
.
Computers & Chemical Engineering
60
,
86
101
.
Wang
Z.
,
Yan
W.
&
Oates
T.
2017
Time series classification from scratch with deep neural networks: A strong baseline
. In:
2017 International Joint Conference on Neural Networks (IJCNN)
.
IEEE
, pp.
1578
1585
.
Wang
K.
,
Li
K.
,
Zhou
L.
,
Hu
Y.
,
Cheng
Z.
,
Liu
J.
&
Chen
C.
2019
Multiple convolutional neural networks for multivariate time series prediction
.
Neurocomputing
360
,
107
119
.
Wang
G.
,
Jia
Q.-S.
,
Zhou
M.
,
Bi
J.
,
Qiao
J.
&
Abusorrah
A.
2022
Artificial neural networks for water quality soft-sensing in wastewater treatment: A review
.
Artificial Intelligence Review
55
(
1
),
565
587
.
Zhu
J.-J.
,
Segovia
J.
&
Anderson
P. R.
2015
Defining influent scenarios: Application of cluster analysis to a water reclamation plant
.
Journal of Environmental Engineering
141
(
7
),
04015005
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).

Supplementary data