Identi ﬁ cation of behavioural model input data sets for WWTP uncertainty analysis

Uncertainty analysis is important for wastewater treatment plant (WWTP) model applications. An important aspect of uncertainty analysis is the identi ﬁ cation and proper quanti ﬁ cation of sources of uncertainty. In this contribution, a methodology to identify an ensemble of behavioural model representations (combinations of input data, model structure and parameter values) is presented and evaluated. The outcome is a multivariate conditional distribution of input data that is used for generating samples of likely inputs (such as Monte Carlo input samples) to perform WWTP model uncertainty analysis. This article presents an approach to verify uncertainty distributions of input data (otherwise often assumed) by using historical observations and actual plant data.


INTRODUCTION
In this article, a set of dynamic wastewater treatment plant (WWTP) model input data is first identified and then used to run simulations, producing confidence intervals giving an estimate of the prediction uncertainties of further model applications. With prior knowledge of the input data and model parameter values to a dynamic WWTP model of the Henriksdal WWTP in Stockholm, Sweden, behavioural input data sets (BIDS) are identified and form model representations (i.e. combinations of input data, model structure and parameter values) that are: (1) consistent with measured data and (2) useful for scenario analyses by simulation. The concept of rejecting non-behavioural model representations has been adopted from the Generalized Likelihood Uncertainty Estimation (GLUE) methodology introduced by Beven & Binley ().
Dynamic WWTP models are today tools that are commonly used by both researchers and engineers. For most practical applications, an early goal in the modelling procedure is calibration; to make simulation results match a set of measurements of target variables. A typical WWTP model contains several potential target variables, which should be considered simultaneously during model calibration (Hauduc et al. ). There are guidelines for selecting what variables to consider as well as acceptable uncertainty ranges for these, which among other things depend on the type of application (Rieger et al. ). Ranges are provided since a perfect match is not achievable; a set of model outputs y will never exactly equal the corresponding measurements,ỹ, because of uncertainties in model structure, m, in applied parameter values, θ, in model input data, I, as well as experimental errors. An overview of the many uncertainties related to WWTP modelling is found in Belia et al. ().
Therefore, it is necessary to accept a model representation that is a sufficiently good approximation of reality, but it should then be recognized that there are several other possible representations leading (at least almost) to equally good results (Beven & Freer ). This is especially true if the assumptions of the calibration procedure are modified, e.g. if calibration data is added/removed or if an alternative objective function is used. In other words, the uncertainties are lumped into the calibration results and then transformed into prediction uncertainty during model scenario analyses, i.e. simulations of a certain change/deviation from the normal operating point of the plant. A nontrivial question is then to what extent one can trust and apply the results of the scenario analyses.
The objective of this paper is to define and validate a method for quantifying the WWTP prediction uncertainty using a probabilistic/Monte Carlo sampling framework. The method, which is inspired by GLUE (Beven & Binley ), is based on grading and selecting behaviourally good model representations from a larger set, which is obtained from Monte Carlo simulations with input data sampled from a prior uncertainty distribution. In the literature review of Belia et al. (), it is found that the Monte Carlo method where the uncertainty of input parameters is propagated through the model without any feed-back is the most commonly used method of uncertainty analysis when evaluating WWTP designs. Examples of applications with this method are found in Benedetti et al. () and Sin et al. (). Monte Carlo methods like the one presented in this study, where the prior uncertainty is updated by evaluating and comparing the simulations with measured data from the process, are however not as commonly applied, especially not for uncertainty analysis of full-scale WWTP models. Previously Sin et al. () used a Monte Carlo based calibration approach focusing on a limited number of kinetic ASM parameters and on aeration input data and Mannina et al. () used GLUE and varied 29 kinetic and stoichiometric ASM2d parameters to calibrate and assess the uncertainty of a full-scale WWTP.
Although it is possible to include an infinite number and various types of parameters in the uncertainty analysis the authors have, in this paper, chosen to focus on a limited amount of input data only. The choice was based on two criteria: (1) Identifiability analysis of WWTP models have indicated that only a fraction of WWTP kinetic and stoichiometric parameters are identifiable given typical data/ information collected from full-scale plant operations (Brun et al. ; Vanrolleghem et al. ; Sin et al. ).
(2) Previous studies on sensitivity analysis (Sin et al. ), as well as several calibration protocols (Hulsbeek et al. ; Melcer et al. ; Vanrolleghem et al. ), agree that the most significant parameters affecting uncertainty in WWTP plant models are the influent fractions and calibration of the solids balance.
Therefore, in the below case study application, the authors have selected the influent load, influent fractionation, influent temperature and parameters affecting the solids balance (and sludge retention time, SRT) as main input sources of uncertainty. These variables are measured, which facilitates the quantification of the prior uncertainty ranges. The influent temperature was moreover chosen as an uncertain calibration parameter, because it explicitly affects the kinetic parameters (growth rate, decay rate, etc.) of the underlying microbiological processes (ASM models). The outcome from the method is a set of behaviourally good input data, BIDS, which are conditioned on a historical (calibration) data set that can be used in model-based applications, such as scenario analysis of plant operation.

WWTP and problem statement
Henriksdal WWTP (approx. 800,000 p.e.) in Stockholm, Sweden, includes primary treatment, an activated sludge system, tertiary treatment in sand filters and anaerobic digestion (AD) for sludge stabilisation and biogas production, see Figure 1. The plant is currently being operated at its maximum allowed sludge concentration in the bioreactors (MLSS), approximately 2,500 g/m 3 .
With one out of seven biological treatment lines (14%) being out of operation due to reconstruction works, the nitrate recirculation pump (Q Rec ) capacity and the number of available aerators at the studied WWTP have been reduced by 14%. Unavoidably, however, the influent wastewater flow rate and load to the plant increase all the time because of the population increase in Stockholm and densification of the wastewater catchment area. Still, the effluent requirement, 10 gTN m À3 as a yearly average for nitrogen, remains. To evaluate if the nitrogen removal process can be enhanced, either by speeding up the remaining recirculation pumps or by intensifying the aeration, the two scenarios were simulated and evaluated using a dynamic model of the WWTP as will be described in the following.

WWTP model under study
The Benchmark Simulation Model no. 2 (BSM2) (Gernaey et al. ) was a close to reality basis for modelling the plant. Computational speed, however, is an issue for Monte Carlo methods, and this was the main reason for excluding the anaerobic digester (AD) process model of BSM2 from this study. At the WWTP, and in the model, reject water from the AD is mixed with raw wastewater before the influent sampling point. A few minor additional modifications to the BSM2 layout were needed to adjust the model to the Henriksdal WWTP: (1) The total bioreactor volume is distributed into two smaller (Z1, Z8) and six equally sized zones (Z2-7), which is motivated by historical tracer tests and the placement of mixers (Z2-4) and controllable aeration (Z4-7) zones. (2) Dissolved ferrous sulphate (FeSO 4 ) is added to the primary clarifier and to the sand filter inlets and is modelled by assuming an instant production of precipitates (inert suspended solids). (3) Inert suspended solids generation from the influent and iron dosage is modelled using the two particulate dummy state variables of BSM2. (4) Rapid two-media sand filter units are used to remove P and TSS from the secondary clarifier effluent and is modelled as an ideal thickener. In the model, the filter flush water is recirculated to the bioreactor, while in reality there is a possibility to recirculate that water to the inlet of the primary clarifier. However, the settings of the valves regulating the destiny of the flush water is not recorded at Henriksdal WWTP, thus adding to the model structure uncertainty.

Adjustment of site-specific conceptual model parameters
The specific inorganic suspended solids (ISS) production was assigned a value of 2.5 g ISS/g Fe (ATV ). Then, by analysing five years of delivery data of the FeSO 4 product (measured weight, iron content) and measured data on sludge leaving the plant with trucks (weight, iron content, ash content), it could be concluded that, to close the mass balance, the influent wastewater then contains approximately 30 g ISS/m 3 . This is conditional on the modelling assumption that ISS is not formed by other processes and motivates the use of f VSS_TSS ¼ 0.9 as the fixed parameter value for fractionating influent TSS to VSS.
The removal efficiency of the conceptual primary clarifier model was modified to reduce approx. 60% of the particulate material by increasing the efficiency correction factor from 0.65 to 0.73. The removal of Fe 2þ precipitates in the primary settler was reduced by decreasing the specific settleability factor from 1 to 0.75. This was motivated by measurements of total iron in the thickened WAS flow, indicating the fraction of total added iron that ends up in the biological treatment due to the simultaneous precipitation.
The removal efficiency of the simplified sand filter model was set to 90% and together with the secondary settler model (using default BSM2 parameter values) the TSS in the effluent was on an average 4 mg TSS/l, which is similar to measured values. Finally, the removal efficiency of the WAS thickener was set to 90% based on grab samples of TSS in the reject water and experience of the operational staff.

Measurement data and acceptable uncertainty ranges
The first step in the process of identifying BIDS is to decide which model outputs j to consider and the acceptable uncertainty ranges ε ¼ {ε j }, see Figure 2. These were adopted from the 'good modelling practice' (GMP) protocol (Rieger et al. ) and the modelling application type 'assessing the plant capacity for nitrogen removal'.
Measured process dataỹ selected from a period of 200 days (May-Nov. 2016) was selected. Weekly grab samples of MLSS (ỹ MLSS ) and the measured waste activated sludge production (ỹ WAS ) from weekly SS grab samples and online flow measurements of the WAS were used to update the prior uncertainty with information related to the plant sludge production and SRT. Online measurements of NH 4 (ỹ NH ) and NO 3 (ỹ NO ) concentrations from the secondary clarifier effluent contributed with information of the plant nitrogen removal capacity. (ỹ NH ) and (ỹ NO ) data were validated in the sense that the sets used for calibration originated from periods when duplicate sensor readings, providing similar results, were available.
The measured mean values S(ỹ j ) of the target variables are shown later in Table 1 as are the applied acceptable uncertainty ranges ε. The acceptable uncertainty range for both NO 3 -N and NH 4 -N was 0.9 mg/l. As a criterion, the authors thus decided to accept all input data sets leading to almost complete nitrification, here defined as an average NH 4 -N concentration in the aeration tank <1.8 mg N/l. For the WAS mass loading rate, a higher range (±10%) than recommended (±5%) was used. It is believed that 5% is a too narrow range since, in the applied model, uncertainties related to the TSS wasted with the filter flush water as well as WAS being recirculated to the plant with reject water, are simulated as uncertainty in y WAS .

Prior input uncertainty
During the identification phase with access to measured data the BSM framework with parameter values θ was considered well established and kept fixed. Besides site-specific conceptual parameters, all ASM1 and settling parameters of the model were kept constant at their default values. The total uncertainty was projected on part of the model input data by the inclusion of eight uncertainty factors in the vector δ ¼ {δ m }, assumed to be independent and uniformly distributed (U) with min/max ranges and referred to as the multivariate prior uncertainty distribution P. The selection of input data to consider uncertain was concentrated on the influent wastewater loads and fractions, which are fed to the used influent wastewater generator model (Gernaey et al. ) including uncertainty in particulate inert COD (X I ) content (determining the degradability of influent organic matter) and the wastewater temperature, see Box 1 in Figure 2. The influent temperature was chosen as an uncertain calibration parameter, because it explicitly affects the kinetic parameters (growth rate, decay rate, etc.) of the underlying microbiological processes (ASM models). By considering also the operational model inputs Q WAS and iron dosage Fe Dos as uncertain, the uncertainties in the settling tank model and the overall sludge balance were modelled, which affect key variables, such as actual sludge wastage from the plant and SRT (Ekama ).
Online input data sets considered to be known (hence not uncertain) and directly used as inputs to the model include the influent, return activated sludge (RAS) and nitrate recirculation (REC) flow rates and three sensor data readings of dissolved oxygen concentrations in bioreactors Z5,6,7. This choice was deliberately made to keep the model calibration complexity manageable and focus the discussion on analysis of the presented methodology.
Ranges (min/max) of P(δ) were derived from online monitoring and influent sampling results as well as from expert reasoning (Sin et al. ), see Table 1. i ¼ 1,…,N latin hypercube sampling (LHS) of δ from P were used to generate realisations of model Monte Carlo input data I, see Figure 2. For example, the yearly average TN load was estimated to be 11,600 kgN d À1 and to have an uncertainty of ±20% (uniform distribution), thus δ TN ∼ U(0.8,1.2), and the ith modelled influent TN load profile was generated by multiplying the ith sample of δ TN with 11,600 and the estimated yearly and daily load patterns.
The dry weather daily load pattern (DayVar(t)) was obtained by processing online measurements of influent flow rate and NH 4 -N concentrations (results not shown). The yearly pattern was derived as follows (see also Figure 3, left). 18 years of weekly flow proportional TN influent load data to the WWTP was arranged and scaled with the yearly average to produce 18 yearly load variation profiles, each having a mean value of one. The black line shows the average variation as a function of time, which was filtered and used as input data (SeasVar(t)) to the simulations. The red region is delimited by the 5% and 95% percentiles of the 18 yearly patterns. In Figure 3 (right), the result of the anticipated min/max prior uncertainties for the TN load (dashed lines) are shown together with measurements from three years (green line).
The ith NH 4 , COD, TSS and X I load profiles were then generated by multiplying the ith TN load with the ith samples of corresponding uncertainty parameters for wastewater composition: δ NH4:TN , δ COD:TN , δ TSS:COD and δ XI:XCOD . Thus, all pollutants were assumed to have the same diurnal and seasonal variation. Three uncertain input data sets were derived from measured time series data (influent temperature, Temp Inf (t), Q WAS (t) and ferrous iron dosage Fe Dos (t)) by directly multiplying these with their corresponding uncertainty factor.

Selection and grading of BIDS
Having run N Monte Carlo simulations and after comparing the model results with measurements, realisations that differ more than the pre-defined uncertainty ranges, ε j , for any target variable, j, are rejected. The subsets of δ yielding results within the ranges are identified as behavioural (BIDS) and denoted δ B . An essential part of the presented methodology is the rejection (not only assigning a low likelihood) of non-behavioural models. This is because density dependent sampling methods, such as Markov Chain Monte Carlo, is not proposed here, and for the method to be applicable in practice, the number of ensemble simulations needed to be run for the prediction phase must be limited. An input data set, i, was rejected if the resulting mean error (ME) of any of the targets j were outside the selected uncertainty ranges, e.g. if ME j ¼ jS(y (i) j ) À S(ỹ j )j > ε j , where S(·) denotes the mean function. Then an accepted behavioural input data set must provide reasonably good results considering all target variables and thus, a bad prediction of e.g. MLSS cannot be compensated for by good WAS, NH 4 and NO 3 predictions. The choice of using the mean value as a statistic to distinguish between behavioural and non-behavioural model representations was based on the time averaging (month/year) proposed for the application type in the GMP protocol (Rieger et al. ). Also, according to the experience of the authors, a common and pragmatic first step during manual WWTP calibration is to compare results from steady state simulations with averaged results; the applied rejection methodology here is a way to automate this procedure.
Following the GLUE methodology each behavioural model representation is then conditioned on past data and given a likelihood weight, L, according to the modeller's confidence in that particular model representation as a useful predictor of the future. The likelihoods are then normalized to unity and used to compute the likelihoodweighted cumulative density function (cdf) for any model prediction. From the cdf the percentiles of the simulations with δ B is derived. These uncertainty limits reflect what the model can say about the system response after conditioning on the past calibration data.
There are numerous ways to measure the efficiency of dynamic models (Hauduc et al. ) especially, as in the current study, when there are multiple target variables. The subjective choice of the likelihood measure in this paper was based on a product of four (one for each target variable j) factors with exponential functions of the mean squared errors (MSE). For the rth behavioural input data set δ B (r) , r ¼ 1,…,R, it reads The uncertainty range, ε j , is used to model the rate at which the likelihood decreases for an increase in MSE:s (Lindblom et al. ). To exemplify: if a behavioural input data set δ B (r) yields MSE:s of exactly the uncertainty ranges the rth likelihood value becomes exp(À1·4) ¼ 0.018. The selection of likelihood function and statistic (resolution of data) is not obvious and an important topic for further research.

RESULTS AND DISCUSSION
Figure 4 (left) shows histograms and a two-dimensional projection for two out of the eight uncertain input parameters (δ TN and δ COD:TN ). N ¼ 2,000 input data sets from the uniform prior uncertainty distribution are shown using blue markers. With the assumptions made, 9% (R ¼ 179) of those were identified as BIDS forming, in combination with the model structure and default parameter values, plausible validated model representations, which were used further for scenario analyses. The BIDS are shown using red markers and for these two parameters there is a correlation. For high TN loads (δ TN > 1.1) there are e.g. no BIDS with a high influent COD:TN ratio (δ COD:TN > 1.1). This is because, at least partly, the influent COD load would then be too high and rejected by the WAS criterion (leading to extensive sludge production). A future sensitivity analysis (using Sobol indices or partial correlation coefficients) will be used to complement the selection and analysis of uncertain parameters.
In Figure 4 (right) the likelihood-weighted influent TN load is shown together with the initial prior uncertainty ranges (dashed lines) and measured data. For TN, as well as the other model inputs loads (NH 4 , COD, TSS, X I /BOD 7 ), the anticipated prior uncertainty distribution results in simulated uncertainty ranges that cover the measurements. Moreover, after updating the distribution with process measurements the simulated ranges narrow. Figure 5 shows in turquoise part of the process data,ỹ, used to identify and grade the BIDS as well as the resulting uncertainty ranges. The WAS load was operated manually during this period and varies a lot because of limitations in the downstream sludge system. This results in varying MLSS concentrations are well predicted by the simulations with BIDS. Considering the nitrogen target variables, the simulated uncertainty range of NH 4 does not deviate significantly from the prior uncertainty. This is due to the fact that temperature was high during the calibration period and that the DO concentration inputs were considered fixed. Then nitrification was almost completely independent of the prior uncertainty. For NO 3 on the other hand, the uncertainty is reduced significantly after identification and simulation with the BIDS.

Scenario simulations
To test and validate the uncertainty analysis method, the identified 179 BIDS were used to simulate the two scenarios as outlined in the WWTP and problem statement section. The initialization settings were dry weather conditions, an influent temperature of 14 C and a constant DO setpoint in the last aerobic zone (Z7) of 0.5 gO 2 m À3 . The DO setpoints in Z5-6 were adjusted (maximum 4 gO 2 m À3 ) by an effluent ammonia-based aeration controller (ABAC) with setpoint 1.6 gN m À3 . Q Rec was kept constant and Q RAS controlled to 50% of Q Inf (Figure 6(a)). A constraint for the scenarios was that the MLSS concentrations of the bioreactors were not allowed to increase above the average MLSS of the calibration period.
The loading and temperature input data series to the scenarios were obtained by replacing δ with the behavioural uncertainty factors δ B in the equations of Box 1 in Figure 2 (running only the BIDS).
The uncertainty factors for operational input data needed to be translated to uncertainty factors for the setpoints of the controllers. Dosage of Fe 2þ is modelled by a feed-forward controller proportional to the influent flow rate. For each of the 179 BIDS, the setpoint value (g Fe/m 3 ) is set as the average of the corresponding Fe 2þ consumption during calibration. Thus, the BIDS that had a high/low chemical sludge production during calibration will have a similarly high/low production during the prediction. Regarding the uncertainty in Q WAS it was noted that each set of the 179 BIDS defining the loads, temperature, Fe 2þ dosage and Q WAS was associated with a simulated MLSS concentration profile. Since the average MLSS of the calibration period was assumed to be the maximum allowed, the simulated average MLSS of each BIDS was used as the setpoint value for Q WAS in the scenarios. Thus e.g. the F/M ratios of the BIDS was similar in the base scenarios.
For each scenario, the results of two individual simulations, yielding different answers to the stated problem, are shown in Figure 6 using blue and red colouring. By doing so the importance of uncertainty analysis while drawing conclusions from dynamic WWTP model simulations is highlighted. According to the herein presented methodology, although most steps of the 'good modelling practice' is followed, by using and simulating only one calibrated model representation (combination of input data, model structure and parameter values), the resulting conclusion drawn could be either the 'red' one or the 'blue' one (or a third one). Recall that both simulations are based on realistic interpretations of the available input data (defined by the prior distribution) and are judged to be sufficiently consistent with the measured process data (defined by the likelihood function).
Scenario 1: Tuning of the engines of the remaining recirculation pumps, by increasing Q Rec 26% so that Q Rec /Q Inf increase from 3.5 to 4.2 (Figure 6(a)). Figure 6(b)-6(d) show the simulated uncertainty limits of effluent NO 3 with the realisations of two individual BIDS (coloured lines and dots) included for illustrative purpose. The differences in average effluent NO 3 concentrations after and before the increase, ΔNO 3 , were studied. As shown in the histogram (Figure 6(c)), although most results indicate enhanced denitrification, there are also BIDS yielding deteriorated performance (ΔNO 3 > 0). Moreover, as shown in Figure 6(d), BIDS with similar likelihood values might lead to a decrease or an increase of the effluent NO 3 concentration. Among the simulations yielding deteriorated denitrification, the NO 3 in the anoxic zones is already high with the lower pumping rate, leading to that the increased recirculation only means more oxygen being transported to the anoxic zone.
Scenario 2: Increase in influent pollutant loads by increasing δ TN and δ Fe 20% (all others, e.g. the COD and TSS loads increase as well since generated from the behavioural subset of wastewater composition parameters (Table 1).  To maintain the MLSS despite the load increase at t ¼ 360 d, the WAS flow rate increases in all simulations. The DO concentration in the aerobic zones (Figure 7, left) are increased by the ABAC controller to maintain the NH 4 in the effluent at 1.6 gN m À3 although the aerobic SRT is decreased (Figure 7, right). By considering the likelihoodweighted 75/25% and 95/5% prediction limits, the results state that the DO will probably increase to above 2 gO 2 m À3 and possible saturate at 4 gO 2 m À3 while the plant can continue to nitrify (effluent ammonia <4 gNH 4 -N m À3 ).
However, note that among the simulations with BIDS there are those indicating that nitrification cannot be expected to be complete following the load increase, see the 100/0% percentile. This highlights the fact that the result of a simulation depends on the assumptions made during calibration. The blue and red lines, for example,  represent simulations with two of the 179 BIDS that slightly over-and underpredicted the measured MLSS concentration during calibration. The blue simulation is a result of a high anticipated value of the true WAS flow rate and thus a low SRT during calibration. To produce results within the acceptable uncertainty ranges for NH 4 and NO 3 (required as having been identified as BIDS) this set of input data is conditional on a high value of the influent temperature. The opposite holds for the red simulation (low Q WAS , high SRT, low temperature). Obviously, the assumptions done for the 'blue' simulation during calibration involve a higher sensitivity to the increased load since the NH 4 setpoint cannot be maintained.
An added value with applying the presented methodology, compared to more traditional methods based on e.g. expert ranges, is that the results are derived with actual and case-specific data and that the uncertainty ranges therefore will consider and be dependent on the amount and quality of accessible data. Another benefit is that the method is easy to use in practice. Once the BIDS have been identified they can be used for simulating the uncertainty of basically any scenario. Although the total uncertainty might not be the concern of e.g. a process engineer, it is also usually valuable to run at least a few alternative simulations while studying a problem. By sampling e.g. 5-10 input data sets from the BIDS and run the model several times, he/she will get much more insight compared to running one simulation only.
It should finally be noted that the results are dependent on several assumptions (choice of parameters to consider uncertain, the prior distribution, target variables, uncertainty ranges and likelihood function etc). The predicted uncertainty for scenarios that differ significantly from the calibration data period might therefore not be well described.

CONCLUSIONS
Monte Carlo methods are powerful methods that offer a range of applications for WWTP process analysis, optimization and more. An important requirement of these methods is definition of an input uncertainty domain, which directly affects the outcome and quality of the Monte Carlo analysis. The authors have presented a 'Proof of concept' evaluation of a methodology to generate multivariate samples from the input uncertainty domain, which are conditioned on historical plant data and observations. The results show that the applied methodology is promising and useful to verify assumptions related to input data for Monte Carlo analysis.
The authors recognize that many assumptions are made in this study to develop and evaluate the proposed framework, yet they also believe it provides a pragmatic and useful method to improve credibility of Monte Carlo simulations. The later requires random samples of input data, which are often defined by expert knowledge without a systematic way to verify the validity of these 'expert' ranges. Rather than falsification or to inspire to further mechanistic model development, the long-term goal of demonstrating the applicability of the methodology is to enhance the application of already available WWTP simulators combined with powerful Monte Carlo methods.