New framework for standardized notation in wastewater treatment modelling

Ll. Corominas L. Rieger H. Hauduc P. A. Vanrolleghem modelEAU, Université Laval, Pavillon Pouliot, 1065 av. de la Médecine, Quebec G1V 0A6QC, Canada E-mail: lluis.corominas@gci.ulaval.ca; rieger@envirosim.com; peter.vanrolleghem@gci.ulaval.ca I. Takács EnviroSim Associates Ltd, 15 Impasse Fauré, Bordeaux 33000, France E-mail: imre@envirosim.com G. Ekama Water Research Group, Department of Civil Engineering, University of Cape Town, Rondebosch, 7701 Cape Town, South Africa E-mail: george.ekama@uct.ac.za H. Hauduc Cemagref, EPURE, Parc de Tourvoie, BP44, F-92163 Antony Cedex, France E-mail: helene.hauduc@cemagref.fr A. Oehmen CQFB/REQUIMTE, Department of Chemistry, FCT, Universidade Nova de Lisboa, P-2829-516 Caparica, Portugal E-mail: adriano@dq.fct.unl.pt K. V. Gernaey Department of Chemical and Biochemical Engineering, Technical University of Denmark, Building 229, DK-2800 Kgs, Lyngby, Denmark E-mail: kvg@kt.dtu.dk M. C. M. van Loosdrecht Department of Biotechnology, Delft University of Technology, Julianalaan 67, 2628 BC, Delft, The Netherlands E-mail: M.C.M.vanLoosdrecht@tudelft.nl Y. Comeau Department of Civil, Geological and Mining Engineering, Ecole Polytechnique, P.O. box 6079, Station Centre-ville, Montreal H3C 3A7QC, Canada E-mail: yves.comeau@polymtl.ca Many unit process models are available in the field of wastewater treatment. All of these models use their own notation, causing problems for documentation, implementation and connection of different models (using different sets of state variables). The main goal of this paper is to propose a new notational framework which allows unique and systematic naming of state variables and parameters of biokinetic models in the wastewater treatment field. The symbols are based on one main letter that gives a general description of the state variable or parameter and several subscript levels that provide greater specification. Only those levels that make the name unique within the model context are needed in creating the symbol. The paper describes specific problems encountered with the currently used notation, presents the proposed framework and provides additional practical examples. The overall result is a framework that can be used in whole plant modelling, which consists of different fields such as activated sludge, anaerobic digestion, sidestream treatment, membrane bioreactors, metabolic approaches, fate of micropollutants and biofilm processes. The main objective of this consensus building paper is to establish a consistent set of rules that can be applied to existing and most importantly, future models. Applying the proposed notation should make it easier for everyone active in the wastewater treatment field to read, write and review documents describing modelling projects.


INTRODUCTION
Mathematical modelling of wastewater treatment (WWT) processes has become a widely accepted tool in the past decade, and is used for research, plant design, optimization, training, and model-based development and testing of process control. Starting with the activated sludge system and now moving into whole plant modelling, the modelling community has produced a significant number of models describing the processes occurring in wastewater treatment plants (WWTPs). New models and model extensions are constantly being developed in response to changing requirements, e.g. stricter effluent limits, or new processes such as side-stream treatment.
One of the milestones in dynamic modelling of WWTPs was the research carried out by the University of Cape Town (Ekama & Marais 1977;Dold et al. 1980). With this research a specific notation was introduced (further referred to as the 'UCT system') and several research groups are still using this naming system (e.g. Barker & Dold 1997;Lee et al. 2006). vector and extra information as units and names) and with a new and standardized notation (in this paper referred to as the 'IWA system'). The latter notation had its roots in the work of another IAWPRC/IUPAC task group, led by Prof. Grau et al. (1982aGrau et al. ( ,b, 1987. The need to widen the model boundaries and to include other process units led to the development of several other models such as ADM1 for anaerobic treatment (Batstone et al. 2002), fixed biomass (Rittmann & McCarty 1980;Wanner & Gujer 1986;Horn et al. 2003) and membrane bioreactors (MBRs; Lu et al. 2001;Jiang et al. 2008). Nitrite as an intermediate compound is included in several models (Sin et al. 2008). Increased microbiological and biochemical insights led to the development of so-called metabolic models (e.g. Smolders et al. 1995;Murnleitner et al. 1997;Lavallé e et al. 2009;Lopez-Vazquez et al. 2009).
An emerging field is the modelling of the fate of micropollutants, where a number of models were proposed by several researchers (e.g. Joss et al. 2006;Schö nerklee et al. 2009). All of these models are published with their own notation, sometimes using different names for the same compound or parameter, or the same name for different compounds/parameters.

Motivation
The need for a common international notation standard in biological wastewater treatment was already highlighted in Henze et al. (1982), where examples were given of abuse of notation (e.g. double notation, double meaning, misdirection, etc.). It was concluded that notation is a common cause of confusion due to the absence of a universally agreed system of terminology. At the same time a proposal for unifying the notation used in the description of biological wastewater treatment processes was presented by Grau et al. (1982aGrau et al. ( ,b, 1987. This proposal was presented by a Working Group set up by the IAWPRC and the Commission on Water Quality of the International Union of Pure and Applied Chemistry (IUPAC). In this report, several symbols are listed together with their description, dimensions and some specifications as footnotes. This notation standard has been followed for many years.
However, the complexity of WWT models has significantly increased over the last 25 years (Gujer 2006) and new modelling concepts have been introduced. Moreover, in the work of Grau et al. (1982aGrau et al. ( ,b, 1987  (ii) Several specific pitfalls prevail in the existing notations (e.g. colloidal matter, see next section).
(iii) No internationally accepted framework is available to name new state variables and parameters.
(iv) Model documentation (including notation) is time consuming and can lead to implementation errors.
(v) Model exchange is a problematic issue especially for complex models (Gernaey et al. 2006).
(vi) Coupling different models is becoming common such as for plant-wide modelling (Grau et al. 2009) making the use of one notation indispensable.
(vii) Different notations in reporting and coding can cause implementation errors and make double-checking difficult.
Given all of the above, it appears that a new and extendable notational framework is needed, that should: To present the new framework, this paper is organized as follows; first, the general objectives for the framework are laid out and general notation rules are introduced. Then, separate sections for state variables and parameters are presented. They include a discussion on problems currently encountered and a description of the proposed new framework with some examples. Finally, the contributions of the new framework and the conclusions are described.

GENERAL FRAMEWORK
The proposed notation should be valid for the different subfields of WWT modelling, and is mainly focused on biokinetic models. Therefore, the new notation has been developed considering models for activated sludge, anaerobic digestion, sidestream treatment, membrane bioreactors, micropollutant fate and biofilm processes, etc. In addition, the notation also considers metabolic modelling approaches. The main objective of this consensus building paper is, first to create a consistent set of notation rules that can be applied to existing and more importantly, future models and second, to promote the establishment of a consensus on variable/parameter names.

Naming system established for the new notation
The main goal of the new notation is to provide a framework which allows unique naming of state variables (the compounds or components used in the model's mass balances) and parameters. The resulting name is kept as short and mnemonic as possible and previously accepted notation is applied whenever feasible. An important element of the new notation is that the symbols are consistently defined as a main symbol with different subscript levels, which accounts for the increasing complex-

STATE VARIABLES
While analysing current models, the most obvious problems were encountered with respect to the naming of state variables. New models, model extensions, the connection of models using different sets of state variables (e.g. in whole WWTP and other fields of integrated modelling) were driving forces to develop the new notational framework.
The new notation should provide the information required in the context of the model used (e.g. on the physical, chemical and biological properties of the compounds).

Different naming systems
Looking at the most common models one can observe that there is no real consensus with respect to the use of standardized symbols (Table 1). † Main letter: In the IWA system the main letter is used to differentiate between the particulate ('X') variables that will settle out of the bulk liquid and soluble ('S') variables that will remain dissolved. The former UCT system uses the main letter to differentiate between units of measurement where 'S' represents substrate, 'Z' volatile solids in COD units, 'X' volatile solids in VSS units and 'N' nitrogen (e.g. S bs,c , Z BH , N obs ). † Subscripts: * Degradability: In the UCT system 'B' stands for biodegradable and 'U' for unbiodegradable (e.g. S US ). In the IWA notation they are given as 'S' (substrate) and 'I' (inert) (e.g. S I ). Conversion processes that do not depend on biodegradation, such as precipitation, acid-base reactions or adsorption, lack a clear notational framework to deal with these 'abiotic' (non biological) reactions.  ASM models use g COD m 23 , g N m 23 and mol HCO 32 m 23 for alkalinity.
Another example is S ND in ASM1 and N OS in GenASDM which represent soluble biodegradable organic nitrogen. Both symbols use 'S' for soluble and 'N' for nitrogen, but they are combined in a different way.

Different names used
Ammonia, nitrate, oxygen, volatile fatty acids and other compounds have different symbols or abbreviations in different models (see Table 1). Moreover, biomass names are abbreviated differently (e.g. nitrifying organisms in Table 1).

Framework
In the proposed notational framework, the main symbol is related to the particle size and should always be given. In the subscript, four levels can be provided, each referring to different information: 1. Degradability 2. Organic/inorganic compound 3. Name of compound or organism 4. Additional specifications.
The main symbol is in upper case and italics, the different elements of the subscript are in upper case (or combined with lower case if needed to make the name clearer, e.g. AcCoA) and not italicised, as defined in Table A1 of the Appendix. Figure 1 shows the proposed framework and some examples that illustrate the notational procedure.
In most cases, one or several of the subscript levels are not required (as illustrated in Figure 1), and therefore, are not included in the symbol. Generally, if the name of the compound is provided (e.g. Volatile Fatty Acids, abbreviated as VFA), it is not necessary to write the preceding levels (i.e. degradability or organic/inorganic). Finally, depending on the model or on the context for describing the model, it may be necessary to add specifications, as the final elements of the subscript. An example is X I in the ASM1 model, which becomes X U,Inf when applying the proposed notation, with the subscript 'Inf' referring to the fact that this fraction originates from the influent of the WWTP.

Notational procedure
Particle size The first upper case letter of the notation is related to the particle size. It is proposed to differentiate between soluble (S), particulate (X) and colloidal (C) matter. The novelty here is that the colloidal fraction is included explicitly, as was already proposed by Melcer et al. (2003). The filter size to distinguish between soluble, particulate and colloidal compounds cannot be specified at this stage, considering that MBR researchers need to adapt it according to the membrane pore size used. Therefore, the particle size used in a particular model (or study) should be specified and documented. Care should be taken not to confuse the use of 'C' for colloidal and for total material concentration (as defined in Grau et al. 1987). It is proposed to use the symbol 'Tot' for total material concentration.

Degradability
This is one of the most important aspects of WWT models.
It is proposed to distinguish between undegradable (U), biodegradable (B) and abiotically convertible (A) compounds. The last symbol was already used in Howard et al.
(1991) and refers to compounds that can be involved in conversion processes that are not related to the metabolism of an organism (e.g. photolysis, chemical reactions, adsorption, etc.).

Organic/inorganic
This differentiation is useful, notably, to distinguish between autotrophic and heterotrophic metabolism, where the carbon is obtained from inorganic (Ig) or organic (Org) compounds.   S NH x [g N m 23 ] for total ammonia consisting of NH 3 and NH 4 (the x is used to lump both) or S Ac [g COD m 23 ] for the sum of acetate and acetic acid).
For example, the description of total ammonia in the system is frequently required (e.g. in ASM1, S NH x , as substrate for autotrophic nitrifying organisms, ANOs). Other times the model needs to consider one of the ionic species (e.g. inhibition by ammonia, S [NH 3 ]).

Specifications
In certain cases it is necessary to include extra information in the name of the variable (fourth and next levels).
The following cases are considered. † Structured biomass compounds will appear in the symbol next to the name of the organism, separated by a comma. With cell internal storage products, different levels of detail can be considered. For example, X PAO,PHA would be preferred when glycogen is included in the model as another state variable (i.e. modelling more than one storage polymer), while X PAO,Stor would be fine in cases where glycogen is not considered (i.e. only one organic storage polymer is modeled). † The origin of the products can be specified to indicate whether the compound is originating from endogenous processes (E) or from the influent (Inf) (e.g. X U,E or X U,Inf to describe the ASM1 state variables X P and X I , respectively). † For some models it is important to specify the compartment. For instance, in the case of biofilm or anaerobic digestion models, different compounds are in equilibrium between different compartments/phases. The symbols considered for the compartments are the following (Morgenroth 2008): L for liquid, G for gas, F for the inner biofilm, LF for the biofilm surface (e.g. S CO 2, L or S CO 2, G ). If all variables of the model belong to the same compartment, there may be no need to specify the compartment. † If required, the valence of an ion, e.g. in the case that S Fe,2 and S Fe,3 are considered in the same model. † If required, the units can be defined as an additional subscript. They should be written as shown in Grau et al. (1987), indicating the power (can be negative or positive) in the superscript (e.g. g COD m 23 ).
Naming lumped variables. A lumped variable is the single variable obtained after grouping several variables. The first two levels proposed in the new framework allow the grouping of variables according to the degradability and the organic-inorganic properties (e.g. see in Figure 2, X U,Org and X B,Org ). It is also possible within this framework to lump variables according to their particle size. In this case, the main symbol will contain the different particle size letters, following the sequence X ! C ! S (for example, X S in ASM1 is XC B according to the new notation). For some of the lumped variables, the specific name is normally provided (e.g. 'Stor' for storage products or 'Bio' for total biomass).
Composite variables (calculated from multiple state variables, facilitating the comparison of model results with experimental measurements) are not discussed in this paper.

Example
ASM2d using the new framework Figure 2 shows an example for the use of the new state variable notational framework for ASM2d (Henze et al. 2000). The variables are organized according to particle size, organic/inorganic properties and degradability. It can be seen that the main symbols are kept identical (except for the former X S , which becomes XC B ) in the proposed framework and that some modifications are introduced in the subscripts. For simple variables describing specific molecules, the chemical formula is used in both notational systems (e.g. S N 2 , S O 2 , S PO 4 ). For total ammonia an 'x' is added at the end of the subscript with the new notation (the 'x' combines NH 4 þ and NH 3 ); the same applies for S NO x , where the 'x' combines NO 2 2 and NO 3 2 . Regarding volatile fatty acids, the subscript 'VFA' is used in the new notation instead of the abbreviation 'A' used previously.
For variables that do not have a specific name or formula, the degradability is specified in the subscript (e.g. X U , XC B ).
Organism variable symbols have the main symbol 'X' and the subscript finishes with an 'O' (e.g. X ANO for ammonium nitrifying organisms, X OHO for ordinary heterotrophic organisms). Internal cell compound symbols are linked to the organism (X PAO,Stor ).

MODEL PARAMETERS
It is an insurmountable task to define a framework that covers the naming of every parameter used in all present and future biokinetic models. Therefore, the authors' goal was to provide a framework for standard, frequently used parameters or for cases where problems were encountered in current practice. The comparison of the parameter symbols used in different models (see Table 3) reveals some challenges that the new notation faces (e.g. avoiding the use of different main symbols and subscripts for the same parameter).
This section describes the stoichiometric and kinetic parameters separately, in accordance with the setup of the Gujer matrix.

Yield
In the proposed notation a 'yield' represents a stoichiometric parameter describing the amount of a specified product that is obtained from specified amounts of reactants.    Specific problems encountered. † For the biomass growth yield coefficients, there is no standardization to specify the substrate source (not considered in the evaluated models) and the environmental conditions (e.g. for aerobic conditions, O is used in TU Delft-P and the subindex 1 in UCTPHO þ ). † Naming yields, such as for cell-internal storage (e.g. Y PO 4 in ASM2, which represents the requirement of X PP per X PHA storage during P-release), is not straightforward and does not allow a clear understanding of the parameter on the basis of the symbol only.
Framework. The main symbol for yield is Y (upper case letter and italics). Subscripts start with the reactant (or substrate source) and, through an underscore, describe the product (e.g. the cell-internally stored compounds). They continue with the name of the organism followed by the environmental conditions, which allows differentiating yields depending on the availability of oxygen and nitrate/ nitrite (Ox: oxic; Ax: anoxic; Ax2: anoxilic, nitrite present; Ax3: anoxalic, nitrate present; and An: anaerobic). The 'reactant_product' subscript with the underscore between the two compounds for the yield is used in other fields as well. For instance in Roels (1983), Y SX represents the yield of biomass on substrate and Y SP the yield of product on substrate. Figure 3 shows the proposed framework and some examples that illustrate the notational procedure. In the cases when only one substrate is consumed for direct growth the 'reactant_product' pair is not required (e.g. Y OHO ).

Composition and fractionation coefficients
In the proposed framework, composition coefficients refer to the conversion factors used in the continuity equations.
Within this context they are defined as a part of a larger entity to explain the composition of a compound. For instance, composition factors are used to specify the content of an element (N, P), charge or any other part (e.g. COD, TSS) of a compound or organism (e.g. nitrogen content of ordinary heterotrophic organisms).
Fractionation coefficients are used to indicate the portion of a state variable that is transformed via a specific process (e.g. f P in ASM1 describes the fraction of biomass leading to unbiodegradable particulate decay products).
Specific problems encountered. † Need for clarification of the different use of fractions (composition vs fractionation). Framework. The main symbol defines the meaning of the stoichiometric coefficient used. The letter 'i' is used for composition coefficients and 'f' for fractionation coefficients. When using 'i', the first subscript represents the smaller portion (e.g. nitrogen content) and the second subscript represents the main compound or organism (larger entity). When using 'f', the same order of subscripts is used ("smaller"_"larger") and the process type can be added in the specifications level. Figure 4 presents the proposed framework and some examples that illustrate the notational procedure. f can eventually be used to express ratios (e.g. PP/PAO in ASM2d would become f PP_PAO ).
As a general rule simplification is recommended for state variables specified in one of the subscript levels of the parameters. The main letter (X,C,S) is used only if the subscript is not meaningful by itself. Normally, the organism names and the chemical compounds can be written without main letter (e.g. Bio). Lumped variables will need the main letter (e.g. XC B ). The comas separating the subscripts of a state variable name will not be used (e.g. i P_XUE ). This applies to the rest of the parameters as well.

Rate coefficients and reduction factors
Reaction rates characterize the kinetics of a process. In ASM-type models, process rate Equations (r j ) normally include the maximum rate and several saturation terms (e.g. Monod term, Michaelis-Menten...). Reduction factors account for a reduced rate under specific environmental conditions (e.g. anoxic conditions). This framework focuses on the rate coefficients and reduction factors used in these equations.
Specific problems encountered. † The letter 'k' was used for both rates (lower case 'k') and saturation coefficients (upper case 'K'), which could lead to confusion (e.g. for hydrolysis rate k H and saturation coefficient K H ). † Not all rate constants were defined in all models (e.g. maintenance was missing in most notational systems).  'q' is used for all other rates. As additional information, correction factors are specified in the framework since they can also be used for these parameters (the main letter is 'h') and temperature correction (u) as well.
The first subscript is used for the correction factors to specify the main symbol. The second subscript includes the organism in upper case and in the third level the substrate source or the 'reactant_product' pair is specified. Other specifications may be given in the fourth level. Figure 5 provides an overview of the framework and some examples, including one for a reduction factor under anoxic conditions for the heterotrophic growth rate in ASM2d and an example for a temperature correction factor. In the latter case, the equation used for temperature correction should be properly documented ('pow' or 'exp' can be used in the specifications to indicate the type of equation).
Additional explanations and examples. Some common abbreviations for processes (e.g. "hyd" for hydrolysis, "ab" for acid-base reactions) can be found in Table A1 of the Appendix. In the last examples of Figure 5, 'h' and 'u' have been used as main symbols and the parameter symbol to which they refer is found in the subscript.

Saturation or inhibition coefficients
These coefficients are used in reduction terms (e.g. Monod, inhibition Monod, Haldane, etc.) to reduce the maximum process rate according to the existence or limitation of another compound.
Specific problems encountered. † Non-unique names for some coefficients (e.g. K PP and K IPP in ASM2d or K O 2 without reference to specific biomass or a ratio). † Additional information is sometimes required to understand the meaning of a parameter.
Framework. The main symbol is an upper case K in italics.
The first subscript level describes the type of the reduction term (saturation or inhibition). The second level relates to

CONTRIBUTIONS OF THE NEW FRAMEWORK
The new framework provides a structured system to specify the symbols for state variables and parameters used in wastewater treatment modelling. Different symbol levels, providing physical, biological and chemical information, are introduced in a systematic and intuitive way with the intention to provide a straightforward, simple and easy to understand framework. Necessarily, there must be compromise in order to keep symbols simple, yet meaningful. This is achieved by providing only those subscript levels that are required to make the symbol unique within the model context. The characters chosen for the framework originated from previously proposed notational examples and the symbols that result are often similar or identical to the ones that were most commonly used in practice. A list of abbreviations is provided as an attempt to standardize selected words and symbols (see Table A1 of the Appendix).

CONCLUSIONS
It is the hope of the authors that the proposed framework combines the advantages of different notational systems, such as the UCT and IWA systems, resulting in a standardised methodology for expressing nomenclature that is useful for the WWT modelling community. Using common notation should facilitate communication amongst modellers and other experts. It should help to achieve better 'readability' of new models and help prevent misinterpretation and implementation errors. Since coding is an essential and error-prone part of model implementation, the new notation also provides naming rules for programming.
In view of emerging fields in WWT modelling, like the fate of micropollutants and the inclusion of water chemistry, or new modelling approaches like metabolic or structured biomass models, a standardised framework for notational expression is a highly valuable means of conveying modelling advances to the entire WWT modelling community. With the proposed framework, it should be possible to give meaningful, distinct and commonly accepted names to the new variables and parameters that will inevitably arise from these future advances.
The next step is to convince modellers around the world to adjust their notation and use the new naming rules.
The authors believe that these alterations are necessary in order to ease the transfer of knowledge between modelling studies. The structured framework proposed should be directive, yet flexible enough for the benefit of all model users and for the future of modelling.  The same abbreviation (F) is used for fermentable organic matter and for the inner biofilm compartment. However, the compartment is specified in the last subscript and the variable name in the first subscript, avoiding confusion. † All letters are lower case for process abbreviations to minimize confusion (e.g. Stor and stor).