The wastewater industry is currently facing dramatic changes, shifting away from energy-intensive wastewater treatment towards low-energy, sustainable technologies capable of achieving energy positive operation and resource recovery. The latter will shift the focus of the wastewater industry to how one could manage and extract resources from the wastewater, as opposed to the conventional paradigm of treatment. Debatable questions arise: can the more complex models be calibrated, or will additional unknowns be introduced? After almost 30 years using well-known International Water Association (IWA) models, should the community move to other components, processes, or model structures like ‘black box’ models, computational fluid dynamics techniques, etc.? Can new data sources – e.g. on-line sensor data, chemical and molecular analyses, new analytical techniques, off-gas analysis – keep up with the increasing process complexity? Are different methods for data management, data reconciliation, and fault detection mature enough for coping with such a large amount of information? Are the available calibration techniques able to cope with such complex models? This paper describes the thoughts and opinions collected during the closing session of the 6th IWA/WEF Water Resource Recovery Modelling Seminar 2018. It presents a concerted and collective effort by individuals from many different sectors of the wastewater industry to offer past and present insights, as well as an outlook into the future of wastewater modelling.
THE NEED FOR QUESTIONING THE STATUS QUO
The wastewater industry is currently facing dramatic changes, shifting away from energy-intensive wastewater treatment towards low-energy, sustainable technologies capable of achieving energy- positive operation and resource recovery. The latter will shift the focus of the wastewater industry to the extraction of resources from the wastewater, as opposed to the conventional paradigm of treatment. Thanks to the pioneering developments of the past few decades, process models were established in the wastewater industry for designing, upgrading, and optimizing wastewater treatment plants. However, due to the ever expanding and ambitious objectives of wastewater management, the scope and structure of the process models of the next generation need to be re-defined to address new challenges. The new and wider vision for water resource recovery facilities (WRRFs) includes water sanitation, protection of water sources and the environment, energy reduction and production, and resource recovery. Conventional process models must be extended with new approaches such as thermodynamic, hydraulic, or economic models, just to name a few.
During the past few years, different approaches in the wastewater modelling field have been proposed, presenting new solutions to fulfil the new requirements for WRRFs. Approaches describing physicochemical models (Batstone et al. 2012), energy and economic cost models (Rahman et al. 2016), greenhouse gas models (Mannina et al. 2016) or methods about how to integrate all these aspects in a plant-wide context (Solon et al. 2017) have been recently proposed.
Moreover, hydrodynamics and mass transport have become a crucial point for the optimum design and operation of WRRFs or novel technologies dedicated to resource recovery. In addition to simulating hydraulic phenomena occurring in WRRFs (Samstag et al. 2016), computational fluid dynamics (CFD) models show a great potential in physicochemical processes where gaseous, solid, and aqueous phases interact and can be a very valuable tool for, for example, the optimization of the aeration process or the recovery of valuable products from waste streams by crystallization.
However, simplicity (given by the well-known and generally-accepted International Water Association (IWA) Activated Sludge Model (ASM), and Anaerobic Digestion Model (ADM)) vs complexity (proposed in new models with greater numbers of components, processes, and parameters) is a question for which there is no clear agreement in the modelling profession (Lizarralde et al. 2018). How complex should the models be? Can the more complex models be calibrated (i.e. fitted to data and process observations), or will additional unknowns be introduced? After almost 30 years using well-known IWA models, should the community move to other components, processes, or model structures like black box models, CFD techniques, etc.? Can new data sources – e.g. on-line sensor data, chemical and molecular analyses, new analytical techniques, off-gas analysis – keep up with the increasing process complexity and a growing need for process understanding? Are different methods for data management, data reconciliation, and fault detection mature enough to deal with such a large amount of data? Are the available calibration techniques able to cope with such complex models?
This paper represents a concerted and collective effort by individuals from many different sectors of the wastewater industry to offer insights, as well as an outlook into the future of wastewater modelling.
THE LIMITATIONS AND USEFULNESS OF ASMS
A useful process model has the level of complexity that is required to mimic the process aspects that are of importance to the investigation at hand. The complexity of the model is the result of several things, such as the temporal and spatial scales involved in the simulation. For example, oxygen dynamics take minutes while microbial population dynamics take weeks; the micro-scale of processes inside a floc are important but are frequently neglected (Picioreanu et al. 2007). The most significant impact on model complexity is the number of state variables that define relevant biological, chemical, and physical conversion processes. ASMs were developed as a tool for the design and operation of biological wastewater treatment (Henze et al. 2000). They are an effective process evaluation tool for determining the effluent composition as well as process requirements for a facility such as aeration demand, recycle pump flow requirement and sludge production as a function of time-varying influent characteristics. The inherent flexibility of the matrix model structure of ASMs facilitates the incorporation of additional microbial or chemical processes such as deammonification (Dapena-Mora et al. 2004), methane oxidation (Daelman et al. 2014), sulfide conversion (Lu et al. 2012), and cellulose conversion, which were not originally included in the ASM formulations. Three decades of applied dynamic-mechanistic models have produced a set of values for model parameters (specific growth rates, decay rates, yield coefficients, etc.). Most of these parameters are not site-specific, which allows for the application of the models to situations where no existing performance data exist (e.g. modelling treatment plants that are not yet built).
Proponents of ASMs recognize the models' limitations due to knowledge gaps or the simplification of complex processes. For example, there are still some relevant processes such as the formation of nitrous oxide (a potent green-house gas (GHG)) and the conversions within the biological phosphorus removal process (due to the variability of phosphate accumulating organism metabolics (Gebremariam et al. 2011), which are not understood clearly enough to be put into a model for use in simulations.
Two limitations of ASMs, as clearly outlined in the literature, are (1) a lack of ability to properly account for solids retention times (SRT) and (2) an inability to predict sludge settleability. The recommended useful range of the models is SRTs from 3 to about 30 days (Henze et al. 1987). The mechanism of bioflocculation is not well-understood and this precludes an accurate prediction of effluent suspended solids (Jimenez et al. 2005, 2007). This is true at any SRT but is particularly relevant for high-rate systems running at SRTs less than 3 days (Smitshuijzen et al. 2016). There has been a growing interest in high-rate processes with the intent of maximizing carbon capture by bioflocculation while minimizing carbon oxidation. Although sludge settleability is relatively simple to measure, this parameter cannot be easily predicted because there is still a gap in the fundamental understanding of floc formation. Settling models have been coupled with ASMs, but they lack theoretical descriptions of observed settling behaviour. Moreover, they are oriented to predict return sludge concentrations but not effluent concentrations.
A further limitation of ASMs is that only one microbial group (one state variable) is considered for a single process (e.g. nitrification). At best, extended ASMs are used in which a distinction is made between ammonium oxidizing and nitrite oxidizing bacteria (Wyffels et al. 2004), while a wide range of species are able to carry out nitrification, all of which are reduced to a single organism behaviour that is reflected in model parameter settings (Vannecke & Volcke 2015). Explicitly considering this variation, however, is generally not required to achieve useful predictions of the macroscopic reactor behaviour (e.g. representation of effluent quality). Furthermore, the half-saturation values used in ASMs do not truly represent intrinsic affinity constants but rather lump the effect of diffusion and spatial gradients of local environments (Arnaldos et al. 2015; Baeten et al. 2018). Substrate diffusion is largely dependent on floc properties which are a function of the local shear conditions, cohesion forces related to the exopolymeric substances characteristics, and turbulence intensity (Chu et al. 2003). This aspect of ASMs limits the model's ability to predict the outcomes of processes like simultaneous nitrification-denitrification as well as the conversion of micropollutants in biological wastewater treatment systems.
Despite these limitations, with appropriate calibration, ASMs can provide accurate results based on the data entered into the model. The actual conditions at a facility, however, will rarely match the exact values used in the modelling effort. Thus, it is important that informed judgment is used regarding the applicability and accuracy of the model results. Fortunately, the relative behaviour of modelled states obtained from ASM-based simulations is typically quite accurate even if the numerical values themselves may not be. Understanding the qualitative response of the process to changes in facility operation or potential designs is often sufficient. This means that the extent and rigour of a model calibration exercise will depend on the model application.
Historically, uncalibrated ASMs have proven very useful as well. Models used for understanding mechanisms and for studying influencing factors require little or no calibration. For example, modelling N2O emissions may not be used for exact prediction but are useful in identifying N2O formation mechanisms, which in turn are crucial for evaluating greenhouse gas mitigation strategies. Such models are used not only for research but also for teaching purposes. The use of models for knowledge transfer cannot be understated. The process model that is shared and valued by multiple stakeholders facilitates effective communication and generation of insight. In addition, ASMs are used for knowledge continuity (e.g. when key staff members enter or leave a project team). The use of ASMs formed an instruction and training manual for this field for many novice engineers. As these engineers develop their skills, they, in turn, modify and expand existing ASMs, or develop competing model structures altogether. In this way, new knowledge is shared and strengthened over time.
It is almost always the misuse of a model that results in some users deciding that the model is not useful. This misuse results primarily from three sources: (1) a lack of understanding of the model structure and under which conditions it is valid, (2) improper calibration, normally from using default wastewater fractionation parameters, or (3) believing the model results are a perfect representation of the true system behaviour. The first and second items are reasonably well-recognized by most of the model-use community; however, the third item is often ignored or forgotten, even among frequent users of simulators. It is always essential to keep in mind the original modelling objective and the modelling assumptions defining the boundaries of applicability.
THE IMPACT OF DATA ABUNDANCE ON WRRF MODELLING
The largest impact of an increased abundance of data on WRRF modelling is in the inclusion of more data and non-traditional data sources into modelling and in the combination of various modelling technologies into a tool set that is more broadly-based and at the same time more unified than it is currently. Users and developers of models will have more opportunity to make use of much more (on-line) data and of various types (e.g. images from cameras, operational log books, spectra from analysers, outputs from acoustical sensors, etc.) that are either directly or indirectly related to components of interest and to employ many modelling methods to solve engineering and operational problems that are related to WRRF operation.
Because of this, there is a blurring of lines between technologies that are traditionally seen as separate and unique and which will bring about the development of hybrid models that use both traditional and new forms and sources of data. This suggests a change in thinking about how data are used in model development, and in current ideas about whether models are strictly data-driven or are based on mathematical statements about fundamental principles of conservation of mass, charge, and energy.
Currently, there is much-heated debate regarding modelling methodology starting with terminology conventions. These conventions imply that so-called ‘black box’ models are those that employ modelling methods that are in some fashion not directly accessible (i.e. are in some sense opaque), or that are not easily interpreted by the model developer and user. In contrast to this, the terminology ‘white box’ is used to describe models that are thought to describe fundamental principles based on an in-depth understanding of the underlying processes. The latter model types are often touted as more open and readily interpretable by the user (i.e. they are deemed to be more transparent in some way).
As part of this ongoing discussion, often the term ‘black box’ is considered synonymous to models that are data-driven and that are developed using algorithms and other methods that do not reference fundamental mass, charge, and energy balances. In contrast, the ‘white box’ terminology is deemed as equivalent to thermodynamic fundamentals or first-principles physical laws. While this terminology may be useful for certain purposes, it is misleading to consider black box models as purely data-driven and white box models as purely based on first principles. For example, activated sludge models (e.g. ASM1/2d/3), while considered first principles models, include a variety of Monod-type switching functions which are mathematically tractable but are not supported with theory. Data are used to determine their kinetic parameters and these switching functions are also used to fit models that do not use biological or chemical concepts directly or not at all such as hydrolysis modelling. Similarly, settling velocity functions are not based on theoretical first principles constructs but their mathematical forms and parameters can be deduced from properly designed experiments. Thus, this labelling of models as data-driven or first-principles-based is largely irrelevant. In contrast, the interpretability of a model remains an important factor in choosing a useful model, as discussed further below.
Choosing a suitable modelling technology to capture useful information from data is largely a function of utility. George E. P. Box, a prominent data analyst and statistician famously wrote roughly 40 years ago: ‘All models are wrong, some are useful’ (Box 1979). Today, this statement is especially compelling given the ever-increasing amount of data that are available to engineers for model-building and testing, and the ease of use of software tools. In examining the work of Box and his colleagues, it should be obvious that all useful models are derived from data of sufficient quality. These data are collected through carefully designed and performed experiments, plant trials, and/or database queries of one kind or another, and that all model-building is inherently data-driven with data acting as the principal conduit of information (Box et al. 2005). This information is ultimately sequestered in the form of a model and is used for various practical or theoretical applications. The use of data in model-making is clear in models such as the ASM family, as much as it is in developing models that are based on neural networks or multivariate statistical analysis. Data are used to determine kinetic parameters for biological processes and data are also used to fit models that do not use biological or chemical concepts directly or at all.
Ultimately, models are judged on their ability to predict events or process outcomes in a given application such as closed-loop control, plant design, etc. (which is the main practical application after achieving a better understanding of a process or system) (Box et al. 2015). User preference, familiarity, and ease of model use in practical applications are also key differentiating factors for engineering work as much as what the mathematical form of the model may be.
Well-constructed models should be able to provide clearly interpretable results for the model developer and user no matter what the model type is. In the case of activated sludge models, this comes in the form of mass, charge, and energy conservation equations, and the relationship to the biological process that is being studied. However, a process expert is needed to analyse the results. In other model types, for example, in the application of multivariate statistics, the interpretability can be achieved using contribution functions and plots that reveal how variables are combined to provide a certain model prediction or output (Miller et al. 1998). If these tools are properly set up, the user can have direct diagnostics as part of the model results. An example is the application of various types of multivariate statistical methods (Spearman's rank correlation analysis, hierarchical k-means clustering and principal component analysis) to relate N2O emission from biological nitrogen removal systems (Vasilaki et al. 2018). With careful examination of modelling methods and supporting diagnostic tools, it should be evident that models of any type can be interpreted and used properly.
While the successful and meaningful application of tools for model interpretation is certainly important to the model developer, they should not be a barrier to using models of any kind nor should they be used to classify modelling technology in an unnecessary way. Instead, the focus should be given to fostering approaches to combine modelling methods and various data sources as well as the development of tools to help visualize, interpret, and interact with the calculated model results, such as using principal component analysis to assess membrane bioreactor fouling (Maere et al. 2012). A modern process simulator is an example of visualization and interaction between data, models, and model users. The increasing effort to create more user-friendly software points to the value of software development techniques in examining and interpreting data and model outputs. However, this development requires the combination data-centred methods with process knowledge. Moreover, as data become more and more abundant, it is imperative that students are trained in advanced data methods to truly engage in multi-disciplinary approaches that remove barriers to success and bring results to WRRFs. This emphasis on education will likely pay dividends in the long run as students become employees and create demand for more education and training by identifying new and creative ways of solving problems.
Model interpretation should be the key discussion point in a multidisciplinary forum. One example of successfully combining modelling, numerical methods, and data analysis is the use of multivariate methods in biological flux balance explorations. Here, principal components methods can be used in combining linear programming and flux balance modelling to deduce the distribution of glucose and ammonia in an Escherichia coli system (Sarıyar et al. 2006). Other examples that require knowledge of multiple fields to achieve practical results can be found in image analysis where cameras provide images that are used as data in closed-loop feedback control and in chemometric analysis (Prats-Montalbán et al. 2011).
In an increasingly data-rich environment, it is not likely that ‘black box’ or ‘white box’ models will overtake one-another as more data are used in model development. Instead, combinations of techniques (hybrid models that combine black and white box models in parallel or in series (Lee et al. 2005)) that can be applied to solve real-world problems will emerge as dominant as will better methods and ways to interpret and communicate model results. Model developers and users will be more able to use and combine models of many types in their quest to solve interesting and practical problems. As part of this, the development of models will increasingly involve mathematicians, computer scientists, systems engineers, and software developers as well as chemical, environmental and civil engineers, biochemists and biologists.
THE STATE OF DATA QUALITY
The improved computational power offered by new instrumentation hardware, pre-packaged algorithms, and models promises a future in which ubiquitous use of old and new instruments and data collection systems will continue to lead to improved management and operation of water infrastructures. In this paradigm, the data that are generated must be of such quality that the information needed for manual and automated decision-making can be easily extracted and used. Data should be collected because they are direct inputs to (1) model calibration and adjustment procedures for mass and energy balance models, (2) the development of data-driven and empirical models (soft-sensors), (3) closed-loop controls, and (4) operational decision support systems that are used both on-line and off-line. These applications that consume data can, in turn, produce other data that are used in various combinations and at various frequencies. This may include, for example, fault and event detection systems that exploit correlations and relationships among measured variables to detect a faulty control system, a bad sensor, an unexpected process problem, or a flawed laboratory procedure.
To define an appropriate level of data quality, it is typical to rely on measurements such as accuracy, precision, and timeliness of the produced data. This will likely remain so in the future with a strong focus on the extraction of reliable information from available data that are taken for a clearly defined purpose (Hotelling 1947). However, the use of data-augmentation algorithms could help increase the information contained in the available data (De Mulder et al. 2018). Measures of data quality can only be reasonably assessed within the context of the end-use of the data. It should be clear that the notion of data quality is not limited to one sensor or device taken individually. Discussions of data quality apply to networks of measurement devices and various measurement principles. This implies that high-quality data are fit for purpose, i.e. they express the information needed by the decision-maker (or an automated control system), and that low-quality data are not capable of this. An experimental design procedure balancing data collection cost and accuracy, ensuring that measured variables lead to the identification of key variables defined by the end-user, was proposed by Le et al. (2018).
Quite frequently, data are deemed of low data quality when the required information is obscured in the data carrying that information. Indeed, typical wastewater process data (i.e. from automated sensors) and laboratory data are noisy, may contain inconsistent values or biases and are often not available for periods of time or at the required sampling frequency. This can make both model identification protocols and control systems ineffective, leading to poor decisions, e.g. increasing energy use unnecessarily. Although the knowledge of process experts can be used to design and implement fail-safes and other safeguards to overcome some deficiencies in sensor data, implementing such fail-safes can be cumbersome and cannot guarantee a fail-free operation. This suggests that there is an opportunity to improve the designs of fail-safes, fall-backs and perhaps entire WRRF processes in general to include data handling and data processing systems. This could improve the dynamic performance of plants from the outset starting with plant designs that incorporate data management and control system considerations explicitly (Marlin 2000).
Currently, improvements to process and laboratory data can be achieved through effective actions that are performed as part of plant maintenance activities. This is not likely to change in the future, but these activities will probably expand in scope, for both sensor and laboratory data. To ensure overall data quality (i.e. fitness to purpose), these activities should target all aspects in the chain of data acquisition, transmission, storage, and end-use. Regular maintenance such as laboratory tests using standards, sensor cleaning and calibration should be combined with reviews of database structures and settings to ensure that useful information that is encapsulated in stored data is accessible when needed.
For example, improperly set data historian or SCADA system compression and logging frequencies can break correlation patterns in data that may be useful for extracting information on process faults and upsets. If these settings are not properly chosen, data may only reflect the effect of compression settings and not information on the underlying process at all.
An interesting discussion on the negative aspects of poorly-applied data compression schemes can be found in Kourti (2003). In her paper, Kourti describes how typical algorithms that are used to compress data in historians and databases can create artificial trends in data to such a degree that measured values on variables that are not related to one another in a process are identified as very highly correlated during data analysis. The artificial correlation that is caused by compression settings, including sampling rates, can be easily changed in software to ensure that a database or a historian provides data that are fit for use. These and other system design settings (e.g. connections between sensors and control loop inputs) should be reviewed periodically to ensure that the data management system can meet its purpose in supporting control systems and other goals by providing a reliable flow of information that is taken from the data.
Regular maintenance of data management systems should be coupled with data visualization and the use of models to improve data quality. When they are part of an SOP (standard operating procedure), consolidating data for plotting and visualization can provide valuable insight into the state of data and how reasonable data values may be. This can include summaries of data that are directly related to WRRF energy and effluent permit performance such as averages and key process indicators but it should also include raw values collected and stored from sensors (i.e. soft and physical sensors) and labs (Thomann et al. 2002). Models can be used as part of an automated methodology to identify, correct, and replace poor or missing data. Such systems can include automated mass balance calculations or soft-sensor-based diagnostic checks for outliers and other unusual conditions. Data-driven models and fundamental mass and energy balance models can be created and geared to specifically deal with data quality so that the information the data carry can be used effectively. The combination of these calculations will improve the performance of critical systems that rely on data as inputs, especially if they are performed in an automated manner with little or no operator involvement.
For certain deviations from acceptable quality such as outliers or spikes, there are many algorithms and data treatment and filtering methods available to avoid labour-intensive data-cleaning procedures. While these methods can be beneficial in identifying and removing some causes of poor data, challenges still exist. For example, the impact of sensor drift on sensor measurements is typically much smaller than the impact of process variability and changes in the measured environment. As a result, drift is often difficult to detect algorithmically. In addition, hardware redundancy can be of limited value in dealing with drift (and other sensor faults) as all redundant sensors that measure the same variable may exhibit the same drift as a problematic sensor.
This suggests that relying on redundancy that is based on many diverse measurements (on different but related variables) simultaneously taken in a multivariate approach may be a better option for dealing with poor or missing data (Miletic et al. 2004). Since building this kind of redundancy may be cost-prohibitive in some cases due to the need for multiple sensors of different kinds, on-site inspection and reference measurement checks are often the only available option for maintaining data quality. This provides an opportunity for research and software development focused on improving data availability and quality (Rieger et al. 2010; Villez et al. 2016) or extracting valuable information from low-cost or poorly maintained sensors (Wani et al. 2017; Thürlimann et al. 2018).
THE ERA OF CHEAP AND FAST CFD MODELS
In 1972 Octave Levenspiel, in a widely-used textbook on chemical process engineering said, ‘If we know what is happening within the vessel, then we are able to predict the behaviour of the vessel as a reactor. Though fine in principle, the attendant complexities make it impractical to use this approach.’
In his textbook, Levenspiel went on to describe the tanks in series (TIS) and axial dispersion models that he rightly felt were the best that could be used in his earlier era. The development of CFD methods since the 1970s changed this outlook significantly. CFD is the set of numerical schemes and analyses to solve momentum and continuity equations for fluid mechanics. These numerical methods are necessary because the partial differential equations describing the fluid mechanics of process tanks typically have no analytical solution and, hence, the fluid domain is generally discretized into a grid or mesh scheme. Instead of assuming homogeneity or symmetry in multiple dimensions, as is the case for one dimensional (1D) formulations, e.g. TIS (Levenspiel 1998) and 1D-settler models (Takács et al. 1991; Bürger et al. 2011), CFD includes more detail in the dimensions (Samstag et al. 2016).
Using CFD, one can now compute two- or three-dimensional (2D or 3D) velocity fields and follow interactions of reactants and products throughout a tank. This information can be used to optimize tank geometry and to improve designs and operation. TIS models have provided the computational base for biokinetic models like the IWA activated sludge models (ASM – Henze et al. 2000) for over 30 years. Today, by using CFD confirmed by field testing, it is demonstrated that the distribution of reactants and products within reactor tanks can vary widely across commonly-used reactor types. This work shows that CFD can provide a much more accurate description of these processes than was possible in an earlier era.
The wastewater modelling community remains computationally limited today when using CFD in combination with biokinetic modelling in biological wastewater treatment. Currently, it is known that CFD can be used to help predict the effectiveness of tank mixing and biological transformation in different geometries and locate sensors so that they can optimize control and be used as a calibration for simpler TIS and other models to improve their accuracy (Karpinska & Bridgeman 2016; Samstag et al. 2016). However, to provide more details and realism in the CFD models, extensions are required on multiphase flows, integrating kinetics, and adding distributions using population balance models for modelling phenomena such as bubbles (aeration and gas stripping) or granular sludge and flocculation. The need for additional features to expand the possibilities of CFD is clearly a limiting factor as it increases the simulation time considerably.
Regardless of the computational burden, CFD provides vital information for the design, upgrade, optimization, and operation of WRRFs. Beyond the obvious advantages for hydraulic design and capacity assessment, it also allows highlighting the impact of concentration variations in the reactor (Gresch et al. 2011; Rehman et al. 2017) which have been shown to have a significant impact in experimental work (Amaral et al. 2018) and measurement campaigns (Bellandi et al. 2018). Documented cases of rapid return of investments in using CFD have been demonstrated also for water treatment applications, specifically in the area of ultraviolet treatment (Santoro et al. 2010) where strong gradients and coupled optics, chemistry, and hydraulics dictate photoreactor performance. Furthermore, CFD can help to unravel the ambiguous, as well as arbitrary, lumping of kinetic parameters such as half-saturation indices (Arnaldos et al. 2015, 2018) and improve the predictability of biokinetic models under varying operational conditions.
At the start, the CFD model is initialized with (dynamic) inputs and boundary conditions. Preferably, those inputs and boundary conditions are derived from measured data. In the end, the model is confronted with reality for validation, i.e. does it live up to the expectations and observations. This reality-check is clearly based on data and is essential in order to make CFD more than just a way to produce colourful images that are not easily integrated and combined with as ASM or other models.
While the general applicability of Moore's Law (a conjecture that suggests that the speed of computers doubles every 18 months) is uncertain, it is clear that within the last 10 years, a CFD model of a million cells that would have taken weeks to complete can now be completed overnight. As computer speeds have increased, the number of cells to get a finer mesh has increased substantially, rather than to dramatically reduce computation times using a coarser mesh. If recent trends of acceptance of CFD models for WRRF process design are any indication of things to follow, CFD will become a much more widely-used technique in the analysis of biological treatment than was possibly imagined in the early days of ASM modelling.
FIT FOR USE
The widespread adoption of modelling and simulation in the wastewater industry demonstrates the benefits of ASM-type models used in the last decades (Brdjanovic et al. 2015). In a wastewater system project, models can be used in all project phases, e.g. WWT management options, configuration, design, commissioning and operation (Daigger 2011). Despite the many purposes of modelling, the main objective in the industry is to assist designers, utility managers, and operators in decision making. Therefore, the benefit of any model does not increase with its complexity but rather by its ability to supply an adequate basis for decisions. While mass-balancing might be sufficient at an early project stage, detailed dynamic models with high accuracy are needed for optimizing operations and controller design, e.g. for aeration (Schraa et al. 2017).
The balance between model complexity and ease of use depends on the set of questions being addressed. Consequently, the trade-off between complexity and ease-of-use of a model is not something that is set once (i.e. prior to the investigation) and does not remain constant. It rather evolves together with knowledge and depends on the stage where the project or scientific investigation is, and the level of understanding of the phenomena being modelled.
A generally-accepted approach in searching for the optimum among complexity, robustness, and accuracy of a model relies on the principle of building-block-based model development. This approach calls for models where a building-block also represents the unit of complexity: as a result, simpler models (with fewer building-blocks) are typically preferred to complex ones. It should be stressed that models with more building-blocks do not always yield more accurate results. This is the case if the blocks added to the model are of marginal or no benefit to explaining the underlying physical processes, or potentially even interfere with the identifiability of the parameters utilized in the overall model. As such, it is good practice to conduct a statistical analysis to confirm whether added model complexity can indeed adequately explain the desired phenomenon with the increased number of model parameters. Failing to pass such a test would imply that the additional model parameters produce a model that is over-fitted in a certain context i.e. they model only noise or that the additional parameters are not numerically tractable with a result of poor estimates that seem insignificant (Draper & Smith 1998; Box et al. 2005). This can, therefore, affect the usefulness of the overall model leading to the apparent paradox that the more complex model is less predictive than the simpler one.
A model, whether in its conceptual stage or translated into its mathematical form – is a tool for facilitating the deployment of the scientific method in research. Such a process, that is cyclic in nature involves steps such as hypothesizing, predicting, testing and questioning. Therefore, a good model is the one supporting the investigator to refine hypotheses, to design experiments, to sharpen data analyses and to provide insight into results interpretation. For practitioners, the recent developments in improved wastewater process models are continually evolving in research and in practice. Important applications such as novel treatment technologies, stricter effluent requirements, GHG emissions, and other sustainability indicators require a better understanding of the processes and more powerful models for decision support. This holds both for modelling new treatment processes and operational boundaries, as well as any output variables that are of interest. With respect to treatment process modelling, conventional model formulations fall short of capturing new processes such as aerobic granular sludge, anammox, and algae-based systems (Daigger 2011). More model complexity may be needed to describe these phenomena. Although a relatively straightforward ASM1-based model was considered sufficient for design and process analysis of granular systems (Volcke et al. 2012), complex models including granule formation and growth were considered necessary for obtaining improved process understanding. Another important area of recent research is on plant and system-wide modelling. The expansion of the models outside the plant allows for integrated evaluation and control of the whole system (Rauch et al. 2002). Furthermore, integrating life cycle analysis in the models makes it possible to assess the off plant environmental impact, e.g. for changes in use and recovery of resources (Arnell et al. 2017).
As urbanization forces many wastewater treatment plants to operate closer to their design capacity while facing stricter effluent standards, economic and operating margins are reduced. Several of the recent research models go towards fundamental biological and chemical modelling (Ni et al. 2014; Kazadi Mbamba et al. 2016; Vaneeckhaute et al. 2018). Excluding empirical parameters specific for each plant but rather using fundamental constants. A large number of state variables and parameters in these models is potentially a problem. The identifiability of parameters and possibility to directly measure them are often limited. However, initial studies show that the fundamental parameters are robust and require little or no calibration, provided the proper model components are included (Kazadi Mbamba et al. 2016; Vaneeckhaute et al. 2018). Still, the data requirements for model calibration and validation of more complex models are an issue as historical data of many needed states do not exist and large measurement campaigns are costly. Increasing the number of model equations might also affect simulation times. Depending on the project, this might be an issue for the modeler. However, the ever-increasing computer speed, or even access to cluster computers, makes it manageable in many cases.
In developing such models, an important yet often neglected aspect is the model verification step (i.e. the confirmation that the mathematical formulation used to describe the conceptual model is correctly implemented). For example, an unchecked model, i.e. a model that perhaps contains a subtle mathematical error such as wrong conversion factor, etc., could lead to unintended consequences if calibration is used to reconcile predicted and observed data with such a model. Such a model error could affect parameter estimations in a way that impairs the ability of the model to predict process outcomes faithfully outside its calibration range, despite the apparent good agreement with data within the calibration region. It is stressed here that adding model complexity does not always mean a more difficult model application. Appropriately-chosen sub-models for mixing and aeration of reaction networks may facilitate better model calibration and application, even if underlying models are more complicated.
Future model development will likely put emphasis on resource recovery (water, nutrients, organics, energy) rather than wastewater treatment. The practice of design, operation and control of resource recovery technology will need models that consider stringent objectives related to water-product quality, process performance stability, and operating costs. As models lead to a better understanding of processes, this may also lead to new and innovative resource recovery solutions.
For resource recovery to thrive, it will have to be considered from a broader perspective than technical feasibility. Unit process models for resource recovery will likely be integrated within broader frameworks (e.g. automated dynamic process control, sustainability, etc.) and at various scales (e.g. sewershed, watershed) to target combined social, economic, and environmental goals. Effective cost and price models will need to be developed for the different parts of the WRRF value chain in order to provide input to economic assessments. The life-cycle analysis will help decision-makers make environmentally sound choices on the most cost-effective process design and best process operation. These decisions can only be taken if the analysis comprehensively accounts for environmental aspects such as resilience assessments, and broader environmental impact studies as well as plant-level control and optimization efforts. No matter the modelling application or scope, better experimental designs that result in improved measurement campaigns for gathering key data will be paramount.
Whatever the future may hold for model development, increased data availability in combination with improved computational capacity will continue to shape the structure of future modelling frameworks. The newly-developing synergy between first principles and data-driven models has the potential to create very powerful tools for further innovation, development and decision support. However, balancing the efforts for model development and complexity, data collection, data quality assurance, and integration of different frameworks will be challenging and will require diverse technical skills.
Contributed equally to this paper.