ABSTRACT
This article presents the development of virtual sensors for estimation of phosphates (PO4) and soluble COD profiles in a novel, continuous flow, moving bed bioreactor with enhanced biological phosphorus removal and simultaneous nitrification and denitrification, the Hias process. The virtual sensors combine online measurements with additional electrical conductivity (EC), oxidation–reduction potential (ORP) measurements, and state-space models at inlet, zone 3 and zone 7. The data were collected from Hias municipal WRRF, Norway from March to July 2023, and include both online data and laboratory data. Input variables were selected using correlation plots. Linear measurement equations were fitted to relate PO4 and COD concentrations in the laboratory data set with the online measurements including EC/ORP measurements. The state-space models were identified from the online data with model accuracy from moderate to strong. The estimated PO4 and COD concentrations correspond to most of the scarce laboratory data points at inlet and zone 3, whereas the model in zone 7 requires more work. A Kalman filter was developed for zone 3 and implemented in KYB IIoT platform. Future work is suggested on improvement of the model accuracy in zone 7, and development of energy-efficient control strategies using the virtual sensors.
HIGHLIGHTS
Virtual sensors for estimation of PO4 and COD in biological wastewater treatment
Combination of EC and ORP sensors with online measurements and models enable estimation of PO4 and COD in different process zones.
Virtual sensors in process zones can enable energy-efficient predictive control.
INTRODUCTION
As the European Commission has proposed an updated urban waste water treatment directive (EuropeanCommission 2022), stricter requirements will be set for the removal of nutrients such as PO4, organic substances and nitrogen compounds. The European Commission also requires energy-neutral operations in the water industry within the next two decades (EuropeanCommission 2021). Currently, the water industry uses approximately 1% of the total energy consumption in the European union. Hence, the development of novel biological nutrient removal processes and energy-efficient control methods for these are essential to meet the strict nutrient recovery requirements while minimizing the energy consumption at water resource recovery facilities (WRRFs).
In general, at municipal WRRFs, the primary wastewater treatment, clarification, is followed by a secondary treatment process that removes nutrients (Henze et al. 2008). Primary wastewater treatment encompasses a series of physical processes designed to remove settleable and floatable solids from wastewater. These processes include screening, grit removal, and sedimentation (Shewa & Dagnew 2020). In the secondary treatment process, nutrient recovery relies either on chemical additions or biological processes (Rezai & Allahkarami 2021) to achieve total nitrogen (TN) and total phosphorus (TP) concentrations below legal limits. Notably, every process step requires accurate monitoring, and real-time measurements using a broad range of sensors is a preferred choice. Water quality monitoring includes measurements of physical characteristics (e.g. temperature, conductivity), chemical parameters (e.g. pH, dissolved oxygen, alkalinity, nitrogen and PO4 compounds), and abundance of certain biological taxa (Korostynska et al. 2012). Monitoring could also include assays of biological activity such as alkaline phosphatase, tests for toxins and direct measurements of pollutants such as heavy metals or hydrocarbons (Frau et al. 2019, 2021). Phosphates can exist in wastewater in several forms depending on the source/nature of the discharge, but are generally grouped within three broad classes: orthophosphates, condensed phosphates (pyro-, meta- and poly-) and organic phosphorus (Korostynska et al. 2012). Municipal wastewater contains both dissolved and particulate forms of phosphorus. The majority, which is present as dissolved phosphate, comprises about 50% orthophosphate, 35% condensed phosphates, and 15% organic phosphates. Municipal wastewater typically contains between 6 and 8 mg/L of total phosphorus (Owodunni et al. 2023).
Conventional biological nutrient removal processes include active sludge systems such as anaerobic–anoxic–oxic (AAO) and sequencing batch reactor (SBR) (Zhou et al. 2022). A novel approach to biological nutrient removal is the Hias process, a moving bed bioreactor with enhanced biological phosphorus removal and simultaneous nitrification and denitrification developed by HiasHow2O (Rudi et al. 2019; Hias 2022). In this compact process, large amounts of small biofilm carriers, submerged in wastewater, circulate through three anaerobic and seven aerobic process zones, while nutrients such as phosphorus, organic substances and nitrogen compounds are removed from the wastewater in simultaneous biological processes in different layers of the biofilm. The phosphorus-rich biofilm carriers are recirculated from the last aerobic process stage to the first anaerobic process stage by a conveyor belt. The treated effluent flows through a disc filter, where the phosphorus rich biomass sludge is removed. Hence, the Hias process is more compact and does not require energy for sludge circulation (Hias 2022). The aeration rates in the Hias process are controlled in real-time using PID controller-based strategy. This work aims to develop virtual sensors that can enable multivariate control strategy to minimize energy consumption and reduce pollution to aquatic environment.
Real-time monitoring of water quality parameters is crucial for optimal operations, but the industry faces several challenges. While some parameters can be efficiently monitored using physical sensors, there are no technologically and economically feasible sensors for others, such as phosphates. An alternative is to use costly online-analyzers, which are prone to bio-fouling and clogging in solutions with a high content of suspended solids (Zidaoui et al. 2023). A virtual (or soft) sensor is a software algorithm that estimates the value of a water quality parameter without the need for the physical sensor itself. It does this by using mathematical equations or machine learning algorithms and the values of other available physical sensor data (Paepae et al. 2021, 2022, 2023). Nair et al. (2019, 2020) have demonstrated a model-based soft-sensor concept with extended Kalman filter and conductivity sensors for anaerobic basins in a laboratory-scale MBBR pilot system. Further, Wang et al. (2022) have demonstrated the use of ORP sensors for virtual measurements of nutrients in aerobic basins. In our previous work, we have studied data-driven models of the effluent PO4 in the Hias process (Komulainen et al. 2023; Nermo 2023).
In this article, first, the Hias process and instrumentation are presented. The tools and methods applied to develop virtual sensors based on state-space models and online measurements with additional EC and ORP sensors are described. Then, the results for virtual sensors estimating the PO4 and COD concentrations in three zones of the Hias process are explained in detail. The virtual sensor approach developed and tested is elaborated in the discussion section. Finally, the conclusions and further work are presented.
MATERIALS AND METHODS
Hias process description
The Hias process with instrumentation. Influent and biomass carriers (black dots) enter anaerobic zones 1–3 (dark blue). Water flows through aerobic zones (light blue) where aeration rates (valves, FO) are used to control the dissolved oxygen concentration (white dots). Excess PO4-rich biomass is separated from treated effluent water in disc filter (yellow).
The Hias process with instrumentation. Influent and biomass carriers (black dots) enter anaerobic zones 1–3 (dark blue). Water flows through aerobic zones (light blue) where aeration rates (valves, FO) are used to control the dissolved oxygen concentration (white dots). Excess PO4-rich biomass is separated from treated effluent water in disc filter (yellow).
The Hias process instrumentation includes continuous measurements of flowrate, temperature, aeration rate, dissolved oxygen, nutrient compositions, and suspended solids. Additional online measurements electrical conductivity and oxidation–reduction potential sensors were installed to the Hias process during the PACBAL research project 2022–2023. Hias laboratory determines nutrient composition of PO4 and soluble COD at inlet, zones 3,4,7,10 and outlet from daily grab samples 5 days a week. Hence, there are 5 samples from laboratory and 1,008 samples of online data for each variable per week. As most phosphorus (P) in wastewater is bound into different phosphate compounds, and the available laboratory measurements were ortho-phosphates, for simplicity in this article, the phosphate concentration is marked as (PO4). The soluble organic substances, or carbon parameters, were measured online at inlet and zone 7 with an optical instrument, which corresponds to laboratory measurements of soluble chemical oxygen demand (COD). Further in this article, the soluble organic substances were marked as COD. The online and laboratory measurements used in this study are listed in Table 1.
Online and laboratory measurements in the Hias process
Symbol . | Description . | Unit . | Online . | Laboratory . |
---|---|---|---|---|
PO![]() | PO4 inlet | ![]() | – | x |
PO4z3 | PO4 zone 3 | ![]() | – | x |
PO![]() | PO4 zone 4 | ![]() | – | x |
PO![]() | PO4 zone 7 | ![]() | – | x |
PO![]() | PO4 effluent after disc filter | ![]() | x | x |
CODin | COD inlet | ![]() | x | x |
COD3 | COD zone 3 | ![]() | – | x |
COD4 | COD zone 4 | ![]() | – | x |
COD7 | COD zone 7 | ![]() | x | x |
NH3in | NH3 inlet | ![]() | – | x |
NO2in | NO2 inlet | ![]() | x | – |
NO![]() | NO2 zone 7 | ![]() | x | – |
NO3in | NO3 inlet | ![]() | x | – |
NO![]() | NO3 zone 7 | ![]() | x | – |
NOXin | ![]() | ![]() | x | – |
NOX7 | ![]() | ![]() | x | – |
ECin | El. conductivity at inlet | ![]() | x | – |
EC3 | El. conductivity at zone 3 | ![]() | x | – |
ORP4 | Ox.Red. potential zone 4 | ![]() | x | – |
ORP7 | Ox.Red. potential zone 7 | ![]() | x | – |
![]() | Water flowrate inlet | ![]() | x | – |
![]() | Water temperature inlet | ![]() | x | – |
FO![]() | Aeration rate zones 4–10 | ![]() | x | – |
DO![]() | Dissolved oxygen zones 4,5,6,8,9 | ![]() | x | – |
SS10 | Suspended solids zone10 | ![]() | x | – |
SS | Suspended solids out after disc filter | ![]() | x | – |
Symbol . | Description . | Unit . | Online . | Laboratory . |
---|---|---|---|---|
PO![]() | PO4 inlet | ![]() | – | x |
PO4z3 | PO4 zone 3 | ![]() | – | x |
PO![]() | PO4 zone 4 | ![]() | – | x |
PO![]() | PO4 zone 7 | ![]() | – | x |
PO![]() | PO4 effluent after disc filter | ![]() | x | x |
CODin | COD inlet | ![]() | x | x |
COD3 | COD zone 3 | ![]() | – | x |
COD4 | COD zone 4 | ![]() | – | x |
COD7 | COD zone 7 | ![]() | x | x |
NH3in | NH3 inlet | ![]() | – | x |
NO2in | NO2 inlet | ![]() | x | – |
NO![]() | NO2 zone 7 | ![]() | x | – |
NO3in | NO3 inlet | ![]() | x | – |
NO![]() | NO3 zone 7 | ![]() | x | – |
NOXin | ![]() | ![]() | x | – |
NOX7 | ![]() | ![]() | x | – |
ECin | El. conductivity at inlet | ![]() | x | – |
EC3 | El. conductivity at zone 3 | ![]() | x | – |
ORP4 | Ox.Red. potential zone 4 | ![]() | x | – |
ORP7 | Ox.Red. potential zone 7 | ![]() | x | – |
![]() | Water flowrate inlet | ![]() | x | – |
![]() | Water temperature inlet | ![]() | x | – |
FO![]() | Aeration rate zones 4–10 | ![]() | x | – |
DO![]() | Dissolved oxygen zones 4,5,6,8,9 | ![]() | x | – |
SS10 | Suspended solids zone10 | ![]() | x | – |
SS | Suspended solids out after disc filter | ![]() | x | – |
Electical conductivity (EC) is a measure of the total amount of dissolved ions in water. The EC sensors in the Hias process are Xylem TetraCon 700 IQ, rugged electrode-style conductivity probe. The oxidation–reduction potential (ORP) is a measure of the tendency of a solution to gain or lose electrons. ORP can be used to monitor water quality in industrial and wastewater treatment applications (Weißbach et al. 2018; Wang et al. 2022). The ORP sensors in the Hias process are Xylem SensoLyt 700 IQ with impedance converter and temperature sensor. The optical measurements of COD, NO2 and NO3 are based on UV-spectrometry. The COD/NO2/NO3 sensors in the Hias process are Xylem WTW NiCaVis 701 IQ NI. The optical measurement of dissolved oxygen (DO) is based on photoluminescence. The sensor type is Xylem WTW FDO 70x IQ F. The effluent PO4 is measured using an online HACH Phosphax sc phosphate analyzer with sampling time of 10 min and minimum detection limit of 0.05 mg/L.
Software tools, data collection and preprocessing methods
An Industrial IoT platform, KYB, developed by Digitread Connect, was used for online data collection and Kalman implementation of the virtual sensors. The Industrial IoT platform KYB was used for uploading and standardizing operational data. The online data sets were collected in .csv format. The laboratory data set was obtained in .xlsx format from the process engineers. Matlab was used for development of data preprocessing and measurement equations, state-space model identification and Kalman filter tuning. Matlab functions ’fillmissing’ and ’filloutliers’ were applied to pre-process the data. Function ’fitlm’ multivariate regression was used to estimate measurement equation parameters. Function ’corrplot’ was applied to find the correlation between the variables. Function ’ssest’ was used to estimate the state-space models. Kalman filter was implemented for the real-time estimation of PO4 and COD in the KYB IIoT platform. Data preprocessing methods suitable for real-time use included outlier removal of values above/below three standard deviations and filling of missing values with the previous feasible value.
Modeling approach
The case study is a continuous large-scale municipal WRRF process, the Hias process. The modeling challenge is to estimate PO4 and COD profile in the Hias process zones using extra instrumentation of two electrical conductivity (EC) and two oxidation–reduction potential (ORP) sensors. Direct online measurements of PO4 are not available in the process zones (only effluent analyzer) and direct online measurements of COD are available at two zones. Possible modeling approach would need to include an equation between the EC and ORP sensor measurements and nutrient concentrations, and a dynamic model accounting for the biological phenomena in the Hias process.
In previous work with the pilot-scale MBBR process, a state-space model structure has been applied in an Extended Kalman filter to estimate the PO4 and COD concentrations at the inlet and anaerobic zones (Nair et al. 2019, 2020). State-space models with input–output delay were applied to estimate the effluent PO4 concentration for the large-scale municipal Hias process in Komulainen et al. (2023).








Model error indices



Measurement equations between EC, ORP, PO4 and COD
Input variable selection using correlation plots
As the Hias process has much more variables available than a state-space model identification algorithm can handle, the input variables (features) were selected using correlation pair plots between the input and output variables. Time delays were first applied for each variable, and then the variables were plotted using the corrplot function in Matlab. In total, three to six input variables with highest correlation to the measured output variables were chosen. High correlation between two input variables would rule out one of the input variables (Ljung 1999).
State-space model identification
State-space models are estimated from the selected input–output variable set with input–output time delays. The data needs to be scaled, so that the mean values and standard deviations become similar between the variables. The data set is divided into two, where first half is used for model estimation (estimation/training data) and the second half for validation (validation/testing data). The ’ssest’ Matlab function with input–output delay and two states is used with the estimation data set. The resulting model is applied to the validation data set, and error between the measurement and model are compared numerically using the R2 error index and visually using a trend figure.
PO4 and COD calculation using state estimates
Kalman filter implementation in IoT platform
A Kalman filter improves the future estimates of PO4 and COD by adjusting an error variable in the algorithm. Kalman filter ensures that the error between the model and measurements is very close to zero. A description of a linear Kalman filter algorithm can be found from text books, for example Ljung (1999).
The Kalman filter code was developed in Matlab m-script and manually translated to Python. NumPy (https://numpy.org/) was used for implementing the Kalman filter, and pandas (https://pandas.pydata.org/) was used for handling and pre-processing data. The Python driver for MongoDB (https://pypi.org/project/pymongo/) was used for fetching online data from the database, and for updating the database with the estimation results. The Kalman filter script in Python was containerized using Docker (https://www.docker.com/), and deployed to KYB as a serverless function in AWS Lambda (https://aws.amazon.com/lambda/). The script was then set up to run every 10 min, pushing the estimation results back into the database as virtual sensor updates.
RESULTS
Data collection and preprocessing
The online data and laboratory data were collected for a period of 21 March–31 July 2023. Prior to the measurement equation work, an extended laboratory data set was created. The laboratory data set with daily measurements was manually filled with values for EC, ORP and COD from the online data set with matching time stamps. During period of 1 April–15 May 2023 many online measurements were missing, hence, for most of the results, the online data set is for period 15 May 2023–31 July 2023, whereas and extended period of 21 March–31 July 2023 is used for the laboratory data set.
As the data is not normally distributed, the outliers in the data set were identified based on more than 1.5 interquartile ranges above the upper quartile (75%) or below the lower quartile (25%). Then, the missing values were filled in using fillmissing based on previous feasible value. In the Kalman filter online implementation, the outlier detection limit was set to three standard deviations and outliers and missing values were filled with previous feasible value.
Input–output time delays θ

Time delays in samples with
Symbol . | Zones in between . | Delay in samples . |
---|---|---|
θ8 | 10 | 29 |
θ7 | 7 | 26 |
θ4 | 4 | 15 |
θ3 | 3 | 11 |
θ2 | 2 | 7 |
θ1 | 1 | 4 |
Symbol . | Zones in between . | Delay in samples . |
---|---|---|
θ8 | 10 | 29 |
θ7 | 7 | 26 |
θ4 | 4 | 15 |
θ3 | 3 | 11 |
θ2 | 2 | 7 |
θ1 | 1 | 4 |
Nutrient estimation at inlet
A state-space system at inlet can be defined between:
the measured outputs (
) are
and
the states (
) consist of unmeasured
and measured
the measured inputs (
) at inlet are
,
,
,
and
Selection of input variables
ECin correlation over 0.4 : NO2in (0.61), NO3in (0.52), NOXin (0.66).
CODin correlation over 0.4: T (-0.42), NO3in (0.78), NOXin (0.72).
Measurement equation
State-space model identification
The state-space model was identified between the inputs and outputs with two state-variables. The resulting state-space model coefficients and error indices are presented in Table 3. The state-space model comparison to the estimation data set gave R2 index of 0.62 and 0.64 for EC and COD, and the comparison to the validation data set gave R2 index of 0.52 and 0.66 for EC and COD. The model accuracy is sufficient.
Inlet SS model parameters and error indices
States . | a11 . | a12 . |
---|---|---|
x1 | 1.078 | -0.2512 |
![]() | ![]() | |
x2 | 0.4884 | -0.3337 |
Inputs | ![]() | ![]() |
![]() | −0.1591 | −0.9459 |
NOXin | 0.1156 | 0.5399 |
Measurements | ![]() | ![]() |
ECin | 0.3488 | 0.6138 |
![]() | ![]() | |
CODin | 0 | 1 |
Error | Estimation | Validation |
index | data | data |
![]() | 0.62 | 0.52 |
![]() | 0.64 | 0.66 |
States . | a11 . | a12 . |
---|---|---|
x1 | 1.078 | -0.2512 |
![]() | ![]() | |
x2 | 0.4884 | -0.3337 |
Inputs | ![]() | ![]() |
![]() | −0.1591 | −0.9459 |
NOXin | 0.1156 | 0.5399 |
Measurements | ![]() | ![]() |
ECin | 0.3488 | 0.6138 |
![]() | ![]() | |
CODin | 0 | 1 |
Error | Estimation | Validation |
index | data | data |
![]() | 0.62 | 0.52 |
![]() | 0.64 | 0.66 |
Inlet PO4 and COD estimates
The inlet PO4 and COD were estimated using the state variables x1 and x2 from the state-space models. The mean and standard deviation of these variables were calculated on the laboratory data. The standard deviation of the state estimates was close to one and the scaling factors w1 and w2 were also 1. The PO4 and COD estimates were calculated by scaling the state estimates x1 and x2 with the laboratory data set mean and standard deviation values as presented in Table 4.
Inlet laboratory data properties and scaling factor w
Variable . | N . | Mean . | Stdev . | w . |
---|---|---|---|---|
![]() | 20 | 5.812 | 1.1426 | 1 |
CODin | 13 | 478.92 | 115.68 | 1 |
Variable . | N . | Mean . | Stdev . | w . |
---|---|---|---|---|
![]() | 20 | 5.812 | 1.1426 | 1 |
CODin | 13 | 478.92 | 115.68 | 1 |
Inlet PO4 laboratory measurements (*) and PO4 model estimate (blue line).
COD laboratory measurements (*), COD online measurement (black like) and COD model estimate (blue line).
COD laboratory measurements (*), COD online measurement (black like) and COD model estimate (blue line).
Nutrient estimation zone 3
A state-space system in zone 3 can be defined as:
the measured outputs (
) are
the states (
) consist of unmeasured
and unmeasured
the measured inputs (
) at inlet with delay θ3 are
,
,
,
,
,
the measured input (
) at inlet with delay θ4 is
Selection of input variables with EC3 correlation plot
The scaled and delayed input variables were plotted against the EC3 measurements, given in Figure 5, to choose a set of input variables with high absolute correlation with EC3.
inlet variable NO2in (0.59): Nitrite has high correlation with EC3, but low correlation with the other variables.
inlet variable SS10 (0.57): Suspended solids in zone10 can represent the biomass capacity on releasing PO4 and binding COD on the next round in the Hias process zones 1–3. As SS10 and SS are highly correlated, only SS10 is selected.
inlet variable CODin (0.73): COD at inlet has high correlation with NO3, SS10, SS.
inlet variable ECin (0.98): is highly correlated with the output variable and most of the input variables.
Measurement equation EC3
State-space model identification
The state-space model was identified between the inputs and outputs with two state-variables. The first half of the data set was used for model estimation and the second half for model validation. The resulting matrix coefficients are presented in Table 5 . The state-space model fitness to estimation data had a R2 value of 0.96 and to validation data a R2 value of 0.97. The model fitness for the measured variable EC3 is very high because the correlations with the input variables are very high.
Zone state-space model parameters and error indices
State . | . | a11 . | a12 . |
---|---|---|---|
x1 | 0.7556 | −0.7581 | |
![]() | ![]() | ||
x2 | −0.3866 | −0.4218 | |
Inputs | Delay | ![]() | ![]() |
CODin | 11 | 0.04415 | 0.06767 |
NO![]() | 11 | 0.05976 | 0.1121 |
SS10 | 15 | -0.02631 | -0.04808 |
ECin | 11 | 0.8372 | 1.49 |
Measurement | ![]() | ![]() | |
EC3 | 0.648 | 0.3332 | |
Error | Estimation | Validation | |
index | data | data | |
R2 EC3 | 0.9578 | 0.9711 |
State . | . | a11 . | a12 . |
---|---|---|---|
x1 | 0.7556 | −0.7581 | |
![]() | ![]() | ||
x2 | −0.3866 | −0.4218 | |
Inputs | Delay | ![]() | ![]() |
CODin | 11 | 0.04415 | 0.06767 |
NO![]() | 11 | 0.05976 | 0.1121 |
SS10 | 15 | -0.02631 | -0.04808 |
ECin | 11 | 0.8372 | 1.49 |
Measurement | ![]() | ![]() | |
EC3 | 0.648 | 0.3332 | |
Error | Estimation | Validation | |
index | data | data | |
R2 EC3 | 0.9578 | 0.9711 |
Zone 3 PO4 and COD estimates
The PO4 and COD at zone 3 were estimated using the state variables x1 and x2 from the state-space model. The mean and standard deviation of these variables were calculated on the laboratory data. The inverse standard deviation of the state estimates x1 and x2 was used as scaling factor w1 and w2. The PO4 and COD estimates were calculated by scaling the state estimates x1 and x2 with the laboratory data set mean and standard deviation values as presented in Table 6.
Zone 3 laboratory data properties and scaling factor w
Variable . | N . | mean . | stdev . | w . |
---|---|---|---|---|
PO4z3 | 20 | 35.36 | 10.91 | 0.99 |
COD3 | 13 | 221.77 | 44.21 | 1.42 |
Variable . | N . | mean . | stdev . | w . |
---|---|---|---|---|
PO4z3 | 20 | 35.36 | 10.91 | 0.99 |
COD3 | 13 | 221.77 | 44.21 | 1.42 |
Zone 3 PO4 laboratory measurements (*) and PO4 model estimate (blue line).
Zone 3 COD laboratory measurements (*) and COD model estimate (blue line).
Nutrient estimation zone 7
A state-space system in zone 7 can be defined as:
the measured outputs (
) are ORP
and
the states (
) consist of unmeasured
and measured
the measured input (
) at the end of Hias process with delay θ8 is
.
the measured inputs (
) at inlet with delay θ7 are
,
,
, EC
,
,
,
.
the measured input (
) at zone 3 with delay θ4 is EC
.
the measured inputs (
) at zone 4 with delay θ3 are DO
, FO
.
the measured inputs (
) at zone 5 with delay θ2 are DO
, FO
.
the measured inputs (
) at zone 6 with delay θ1 are DO
, FO
.
the measured inputs (
) at zone 7 without delay are
,
,
,
.
Selection of data set
The ORP7 measurements have poor data quality for most of the available online data set period. Therefore, the correlations with input variables and ORP7 were close to zero when the whole data set was used for correlation analysis. Hence, two data periods with sufficient ORP7 data quality were chosen, samples 7,000–8,000 for estimation data and samples 9,500–10,500 for validation data. With these data periods, the correlations between input variables and ORP7 were increased up to 0.56.
Selection of input variables with ORP7 and COD7 correlation plots
inlet variable T (0.56 and 0.65): Temperature has high correlation with both output variables, but also with most input variables.
inlet variable SS (0.52 and 0.63): suspended solids from the Hias process outlet are highly correlated with output variables, but also with most input variables. Suspended solids can represent the biomass capacity on releasing/binding PO4 and binding COD in the Hias process. As SS10 and SS are highly correlated, only the variable with highest correlation is selected.
zone 3 variable EC3 (0.38 and 0.72): electrical conductivity at zone 3 is highly correlated especially with COD7. As ECin and EC3 are highly correlated, EC3 with highest correlation to COD7 is chosen. The EC difference variable EC3-ECin has low correlation to both output variables.
zone 7 variable FO7 (0.40 and 0.64): The aeration rate in zone 7 is highly correlated with the output variables.
zone 7 variable NO
(−0.50 and −0.53): Nitrite is negatively correlated with both output variables.
zone 7 variable DO7E (0.11 and −0.03): Dissolved oxygen in zone 7 is a variable in the measurement equation, and hence it should be included to the state-space model even though it has low correlation on both output variables.
Measurement equation








State-space model identification
The state-space model was identified between the inputs and outputs with two state-variables. The first half of the data set was used for model estimation and the second half for model validation. The resulting state-space model coefficients and error indices are presented in Table 7. The state-space model comparison to estimation data set gave R2 index of 0.54 and 0.73 for ORP7 and COD7, and the comparison to validation data set gave R2 index of -0.14 and 0.62 for ORP7 and COD7. The R2 values are sufficient for COD7, but the fit to validation data for ORP7 is insufficient.
State-space model parameters and R2 at zone 7
State . | – . | a11 . | a12 . |
---|---|---|---|
x1 | 0.4838 | −3.076 | |
a21 | a22 | ||
x2 | -0.009 | 0.9097 | |
Input | delay | ![]() | ![]() |
![]() | 4.223 | 0.08713 | |
SS | 2.952 | 0.0516 | |
EC3 | −0.05261 | 0.01237 | |
FO7 | −1.227 | −0.02149 | |
NO![]() | −2.535 | −0.05085 | |
DO7E | 0.1727 | −0.001416 | |
Output | ![]() | ![]() | |
ORP7 | 0.126 | −0.452 | |
![]() | ![]() | ||
COD7 | 0 | 1 | |
Input | delay | ![]() | ![]() |
![]() | 0 | 0 | |
SS | 0 | 0 | |
EC3 | 0 | 0 | |
FO7 | 0 | 0 | |
NO![]() | 0.353 | 0 | |
DO7E | 0.293 | 0 | |
Error | Estimation | Validation | |
index | data | data | |
![]() | 0.5403 | −0.1429 | |
![]() | 0.7282 | 0.6168 |
State . | – . | a11 . | a12 . |
---|---|---|---|
x1 | 0.4838 | −3.076 | |
a21 | a22 | ||
x2 | -0.009 | 0.9097 | |
Input | delay | ![]() | ![]() |
![]() | 4.223 | 0.08713 | |
SS | 2.952 | 0.0516 | |
EC3 | −0.05261 | 0.01237 | |
FO7 | −1.227 | −0.02149 | |
NO![]() | −2.535 | −0.05085 | |
DO7E | 0.1727 | −0.001416 | |
Output | ![]() | ![]() | |
ORP7 | 0.126 | −0.452 | |
![]() | ![]() | ||
COD7 | 0 | 1 | |
Input | delay | ![]() | ![]() |
![]() | 0 | 0 | |
SS | 0 | 0 | |
EC3 | 0 | 0 | |
FO7 | 0 | 0 | |
NO![]() | 0.353 | 0 | |
DO7E | 0.293 | 0 | |
Error | Estimation | Validation | |
index | data | data | |
![]() | 0.5403 | −0.1429 | |
![]() | 0.7282 | 0.6168 |
Zone 7 PO4 and COD estimates
Zone 7 PO4 and COD were estimated using the state variables x1 and x2 from the state-space models. The mean and standard deviation of these variables were calculated on the laboratory data. The scaling factors w1 and w2 were calculated as inverse of the standard deviation in the state-variables. The PO4 and COD estimates were calculated by scaling the state estimates x1 and x2 with the laboratory data set mean and standard deviation values as presented in Table 8.
Zone 7 laboratory data properties and scaling factor w
Variable . | N . | Mean . | Stdev . | w . |
---|---|---|---|---|
PO![]() | 20 | 3.1 | 3.56 | 1/8.53 |
COD7 | 13 | 80.3 | 10.5 | 1/0.56 |
Variable . | N . | Mean . | Stdev . | w . |
---|---|---|---|---|
PO![]() | 20 | 3.1 | 3.56 | 1/8.53 |
COD7 | 13 | 80.3 | 10.5 | 1/0.56 |
Zone 7 PO4 laboratory measurements (*) and PO4 model estimate (blue line).
Zone 7 laboratory measurements (*), COD online measurement (black like) and COD model estimate (blue line).
Zone 7 laboratory measurements (*), COD online measurement (black like) and COD model estimate (blue line).
Kalman filter
The model accuracy was strongest at zone 3, and Kalman filter was implemented at the IIoT platform for zone 3. Kalman filter parameters Q and R were the same as the variance, respectively, in states and measurements. Since the inputs, state variables x1 and x2 and outputs are normalized with zero mean and standard deviation of 1, Q and R became an identity matrices. Initial guess for covariance matrix Pp diagonal elements were 0.1, and initial values for states were chosen 0.5. The Kalman estimated states x1 and x2 were then scaled to PO4 and COD values using the mean and standard deviation from the laboratory measurements.
DISCUSSION
The reduced ASM2d modeling approach of Nair et al. (2020) with Extended Kalman filter was considered first. We developed reduced order ASM2d models using the available laboratory measurements and online data. However, ammonia (NH3) was not included in the online measurements and the model accuracy decreased just within days. The laboratory data could not be made available in the KYB IIoT platform due to Microsoft two-factor authentication, and therefore, it was not possible to calibrate the reduced order ASM2d models online.
In our data-driven approach, each zone with EC or ORP measurement becomes it’s own virtual sensor. The virtual sensor consists of selected physical online measurements, a state-space model and a Kalman filter that continuously updates the error between the model predictions and real measurements. The state-space model consists of dynamic equation and measurement equation.
Input variable selection
The input variables were selected using correlation matrix with time-delayed variables and the EC/ORP measurement. The selection of input variables affects greatly the state-space model accuracy. Selection of too many input variables deteriorates the accuracy of the state-space model. For example in zone 3, selection of 6 input variables gives R2 value of 0.6 at best, where as selection of 4 input variables increases the R2 value up to 0.97. Other input variable selection methods, such as Shapley additive explanations (Paepae et al. 2022) or Matlab stepwise regression could be used in future work.
State-space model – Measurement equation
The measurement equation, a linear multivariate regression was developed, in the anaerobic zones between EC and nutrients PO4 and COD, and, in the anaerobic zones, between ORP and nutrients PO4, COD, NO2 and dissolved oxygen DO. The measurement equation coefficients give values for the C and D matrices. The challenge with the measurement equation is small number of laboratory samples where all the PO4, COD were present. For the inlet and zone 3, the measurement equation accuracy was reasonably strong, but for zone 7, the ORP sensor quality caused lower modeling accuracy. It is possible to increase the modeling accuracy by adding quadratic and interaction terms, but these would not fit with the linear state-space model.
State-space model – dynamic part
The system identification algorithm was forced to use the measurement equation C and D matrix values, and only find optimal parameters of A and B matrices for the state-space model using online data between measured input and output variables. This implementation enables meaningful relationship between the inputs, states and outputs, and hence information of the unmeasured states, or the nutrient concentrations.
Comparison of results
The model comparisons to online and laboratory measurements is illustrated for inlet, zone 3 and zone 7 only with the state-space models, not with Kalman filter. The Kalman filter will minimize the error between the model and output measurements, and not give a realistic picture of the model accuracy. The model accuracy was strongest at zone 3, and Kalman filter as implemented at the IIoT platform for zone 3 (and effluent). The main weakness of the modeling approach is that it is not directly related to material balances as the Extended Kalman Filter approach in Nair et al. (2019, 2020). If the laboratory data could be made available in the IIoT platform, more advanced modeling approaches can be developed. The modeling work for zone 7 was challenging due to very limited period of feasible ORP sensor data, and a few laboratory measurements. However, it was desired to test the same approach as for inlet and zone 3. Hence, the results are not strong.
CONCLUSIONS AND FURTHER WORK
This article presents development work for virtual sensors estimating PO4 and COD using online measurements and state-space models of the Hias process. As the laboratory data set was not available online, the modeling approach utilizing simplified ASM2d models could not be applied. Hence, data-driven approach of state-space models was chosen. Input variables for the dynamic part of the state-space models were selected using correlation plots. Linear measurement equations were developed between the online EC and ORP measurements and phosphate PO4 and carbon COD measurements using aggregated laboratory and online data sets. The modeling accuracy R2 was strong at inlet (0.86), and zone 3 (0.91), but moderate at zone 7 (0.66). The measurement equations were further used to formulate state-space models. The dynamic part of the state-space models were identified from the online data. The R2 error index for the state-space models was 0.6 for EC and COD at inlet, 0.9 for EC zone 3 and 0.6 for COD at zone 7, but less than 0.5 for ORP at zone 7. The estimated states were scaled, and the estimates of the PO4 and carbon concentrations were plotted against the laboratory measurements.The estimated values correspond to most of the laboratory data for inlet and zone 3, whereas for zone 7, there are too few laboratory data points to evaluate the result. There were too few laboratory data points to give statistical evaluation.
Kalman filter was implemented in KYB platform to provide online estimates of the phosphate PO4 and carbon COD concentrations at zone 3 with the best model accuracy. The ORP sensor data quality in zones 4 and 7 was insufficient. For zone 4 no measurements were available, whereas zone 7 had a limited data period of 2,000 online samples and 3 laboratory samples. This was challenging for modeling of the PO4 and COD in zone 7.
Future work is suggested on improvement of the model accuracy in zone 7 with a new data set, and development of design for energy-efficient control strategies using the virtual sensors.
ACKNOWLEDGEMENTS
RFF Innlandet is gratefully acknowledged for funding the PACBAL project (nr 337727). Torgeir Saltnes at Hias How2O, Katrine Marsteng Jansen from Hias IKS water resource recovery facility have provided valuable insights in the Hias process. Katrine Marsteng Jansen and Tobias Korten from Hias IKS have supported data collection. Axel Tveiten Bech and Ola Solli Grønningsæter from Digitread Connect have performed online data collection. Axel Tveiten Bech has implemented the state-space models and Kalman filter in the KYB IIoT platform. Professor Tiina Komulainen is the principal investigator of the project. She has developed the modeling approach, supervised and revised the dynamic modeling work, and prepared the article draft. Professor Olga Korostynska has supervised the sensor part of the work. Research assistant A.Malik Baqeri has worked on testing different modeling approaches. Professor Tiina Komulainen has prepared the final implementation of the models, virtual sensors and figures. Katrine Marsteng Jansen (Hias IKS) and Professor Olga Korostynska have done critical review of this article.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.