This article presents the development of virtual sensors for estimation of phosphates (PO4) and soluble COD profiles in a novel, continuous flow, moving bed bioreactor with enhanced biological phosphorus removal and simultaneous nitrification and denitrification, the Hias process. The virtual sensors combine online measurements with additional electrical conductivity (EC), oxidation–reduction potential (ORP) measurements, and state-space models at inlet, zone 3 and zone 7. The data were collected from Hias municipal WRRF, Norway from March to July 2023, and include both online data and laboratory data. Input variables were selected using correlation plots. Linear measurement equations were fitted to relate PO4 and COD concentrations in the laboratory data set with the online measurements including EC/ORP measurements. The state-space models were identified from the online data with model accuracy from moderate to strong. The estimated PO4 and COD concentrations correspond to most of the scarce laboratory data points at inlet and zone 3, whereas the model in zone 7 requires more work. A Kalman filter was developed for zone 3 and implemented in KYB IIoT platform. Future work is suggested on improvement of the model accuracy in zone 7, and development of energy-efficient control strategies using the virtual sensors.

  • Virtual sensors for estimation of PO4 and COD in biological wastewater treatment

  • Combination of EC and ORP sensors with online measurements and models enable estimation of PO4 and COD in different process zones.

  • Virtual sensors in process zones can enable energy-efficient predictive control.

As the European Commission has proposed an updated urban waste water treatment directive (EuropeanCommission 2022), stricter requirements will be set for the removal of nutrients such as PO4, organic substances and nitrogen compounds. The European Commission also requires energy-neutral operations in the water industry within the next two decades (EuropeanCommission 2021). Currently, the water industry uses approximately 1% of the total energy consumption in the European union. Hence, the development of novel biological nutrient removal processes and energy-efficient control methods for these are essential to meet the strict nutrient recovery requirements while minimizing the energy consumption at water resource recovery facilities (WRRFs).

In general, at municipal WRRFs, the primary wastewater treatment, clarification, is followed by a secondary treatment process that removes nutrients (Henze et al. 2008). Primary wastewater treatment encompasses a series of physical processes designed to remove settleable and floatable solids from wastewater. These processes include screening, grit removal, and sedimentation (Shewa & Dagnew 2020). In the secondary treatment process, nutrient recovery relies either on chemical additions or biological processes (Rezai & Allahkarami 2021) to achieve total nitrogen (TN) and total phosphorus (TP) concentrations below legal limits. Notably, every process step requires accurate monitoring, and real-time measurements using a broad range of sensors is a preferred choice. Water quality monitoring includes measurements of physical characteristics (e.g. temperature, conductivity), chemical parameters (e.g. pH, dissolved oxygen, alkalinity, nitrogen and PO4 compounds), and abundance of certain biological taxa (Korostynska et al. 2012). Monitoring could also include assays of biological activity such as alkaline phosphatase, tests for toxins and direct measurements of pollutants such as heavy metals or hydrocarbons (Frau et al. 2019, 2021). Phosphates can exist in wastewater in several forms depending on the source/nature of the discharge, but are generally grouped within three broad classes: orthophosphates, condensed phosphates (pyro-, meta- and poly-) and organic phosphorus (Korostynska et al. 2012). Municipal wastewater contains both dissolved and particulate forms of phosphorus. The majority, which is present as dissolved phosphate, comprises about 50% orthophosphate, 35% condensed phosphates, and 15% organic phosphates. Municipal wastewater typically contains between 6 and 8 mg/L of total phosphorus (Owodunni et al. 2023).

Conventional biological nutrient removal processes include active sludge systems such as anaerobic–anoxic–oxic (AAO) and sequencing batch reactor (SBR) (Zhou et al. 2022). A novel approach to biological nutrient removal is the Hias process, a moving bed bioreactor with enhanced biological phosphorus removal and simultaneous nitrification and denitrification developed by HiasHow2O (Rudi et al. 2019; Hias 2022). In this compact process, large amounts of small biofilm carriers, submerged in wastewater, circulate through three anaerobic and seven aerobic process zones, while nutrients such as phosphorus, organic substances and nitrogen compounds are removed from the wastewater in simultaneous biological processes in different layers of the biofilm. The phosphorus-rich biofilm carriers are recirculated from the last aerobic process stage to the first anaerobic process stage by a conveyor belt. The treated effluent flows through a disc filter, where the phosphorus rich biomass sludge is removed. Hence, the Hias process is more compact and does not require energy for sludge circulation (Hias 2022). The aeration rates in the Hias process are controlled in real-time using PID controller-based strategy. This work aims to develop virtual sensors that can enable multivariate control strategy to minimize energy consumption and reduce pollution to aquatic environment.

Real-time monitoring of water quality parameters is crucial for optimal operations, but the industry faces several challenges. While some parameters can be efficiently monitored using physical sensors, there are no technologically and economically feasible sensors for others, such as phosphates. An alternative is to use costly online-analyzers, which are prone to bio-fouling and clogging in solutions with a high content of suspended solids (Zidaoui et al. 2023). A virtual (or soft) sensor is a software algorithm that estimates the value of a water quality parameter without the need for the physical sensor itself. It does this by using mathematical equations or machine learning algorithms and the values of other available physical sensor data (Paepae et al. 2021, 2022, 2023). Nair et al. (2019, 2020) have demonstrated a model-based soft-sensor concept with extended Kalman filter and conductivity sensors for anaerobic basins in a laboratory-scale MBBR pilot system. Further, Wang et al. (2022) have demonstrated the use of ORP sensors for virtual measurements of nutrients in aerobic basins. In our previous work, we have studied data-driven models of the effluent PO4 in the Hias process (Komulainen et al. 2023; Nermo 2023).

In this article, first, the Hias process and instrumentation are presented. The tools and methods applied to develop virtual sensors based on state-space models and online measurements with additional EC and ORP sensors are described. Then, the results for virtual sensors estimating the PO4 and COD concentrations in three zones of the Hias process are explained in detail. The virtual sensor approach developed and tested is elaborated in the discussion section. Finally, the conclusions and further work are presented.

Hias process description

The Hias process with instrumentation is illustrated in Figure 1. The clarified influent wastewater and the recirculated biofilm carriers on a conveyor belt enter the anaerobic zones. The water and biofilm carriers float through the process with gravity. The three anaerobic zones are mixed to ensure sufficient distribution of biofilm carriers in the water. Aeration in the following seven zones ensures sufficient dissolved oxygen concentrations for aerobic nutrient removal. The phosphorus rich sludge is removed from the water in disc filter and effluent flows out.
Figure 1

The Hias process with instrumentation. Influent and biomass carriers (black dots) enter anaerobic zones 1–3 (dark blue). Water flows through aerobic zones (light blue) where aeration rates (valves, FO) are used to control the dissolved oxygen concentration (white dots). Excess PO4-rich biomass is separated from treated effluent water in disc filter (yellow).

Figure 1

The Hias process with instrumentation. Influent and biomass carriers (black dots) enter anaerobic zones 1–3 (dark blue). Water flows through aerobic zones (light blue) where aeration rates (valves, FO) are used to control the dissolved oxygen concentration (white dots). Excess PO4-rich biomass is separated from treated effluent water in disc filter (yellow).

Close modal

The Hias process instrumentation includes continuous measurements of flowrate, temperature, aeration rate, dissolved oxygen, nutrient compositions, and suspended solids. Additional online measurements electrical conductivity and oxidation–reduction potential sensors were installed to the Hias process during the PACBAL research project 2022–2023. Hias laboratory determines nutrient composition of PO4 and soluble COD at inlet, zones 3,4,7,10 and outlet from daily grab samples 5 days a week. Hence, there are 5 samples from laboratory and 1,008 samples of online data for each variable per week. As most phosphorus (P) in wastewater is bound into different phosphate compounds, and the available laboratory measurements were ortho-phosphates, for simplicity in this article, the phosphate concentration is marked as (PO4). The soluble organic substances, or carbon parameters, were measured online at inlet and zone 7 with an optical instrument, which corresponds to laboratory measurements of soluble chemical oxygen demand (COD). Further in this article, the soluble organic substances were marked as COD. The online and laboratory measurements used in this study are listed in Table 1.

Table 1

Online and laboratory measurements in the Hias process

SymbolDescriptionUnitOnlineLaboratory
PO PO4 inlet  – 
PO4z3 PO4 zone 3  – 
PO PO4 zone 4  – 
PO PO4 zone 7  – 
PO PO4 effluent after disc filter  
CODin COD inlet  
COD3 COD zone 3  – 
COD4 COD zone 4  – 
COD7 COD zone 7  
NH3in NH3 inlet  – 
NO2in NO2 inlet  – 
NO NO2 zone 7  – 
NO3in NO3 inlet  – 
NO NO3 zone 7  – 
NOXin  inlet  – 
NOX7  zone 7  – 
ECin El. conductivity at inlet  – 
EC3 El. conductivity at zone 3  – 
ORP4 Ox.Red. potential zone 4  – 
ORP7 Ox.Red. potential zone 7  – 
 Water flowrate inlet  – 
 Water temperature inlet  – 
FO Aeration rate zones 4–10  – 
DO Dissolved oxygen zones 4,5,6,8,9  – 
SS10 Suspended solids zone10  – 
SS Suspended solids out after disc filter  – 
SymbolDescriptionUnitOnlineLaboratory
PO PO4 inlet  – 
PO4z3 PO4 zone 3  – 
PO PO4 zone 4  – 
PO PO4 zone 7  – 
PO PO4 effluent after disc filter  
CODin COD inlet  
COD3 COD zone 3  – 
COD4 COD zone 4  – 
COD7 COD zone 7  
NH3in NH3 inlet  – 
NO2in NO2 inlet  – 
NO NO2 zone 7  – 
NO3in NO3 inlet  – 
NO NO3 zone 7  – 
NOXin  inlet  – 
NOX7  zone 7  – 
ECin El. conductivity at inlet  – 
EC3 El. conductivity at zone 3  – 
ORP4 Ox.Red. potential zone 4  – 
ORP7 Ox.Red. potential zone 7  – 
 Water flowrate inlet  – 
 Water temperature inlet  – 
FO Aeration rate zones 4–10  – 
DO Dissolved oxygen zones 4,5,6,8,9  – 
SS10 Suspended solids zone10  – 
SS Suspended solids out after disc filter  – 

Electical conductivity (EC) is a measure of the total amount of dissolved ions in water. The EC sensors in the Hias process are Xylem TetraCon 700 IQ, rugged electrode-style conductivity probe. The oxidation–reduction potential (ORP) is a measure of the tendency of a solution to gain or lose electrons. ORP can be used to monitor water quality in industrial and wastewater treatment applications (Weißbach et al. 2018; Wang et al. 2022). The ORP sensors in the Hias process are Xylem SensoLyt 700 IQ with impedance converter and temperature sensor. The optical measurements of COD, NO2 and NO3 are based on UV-spectrometry. The COD/NO2/NO3 sensors in the Hias process are Xylem WTW NiCaVis 701 IQ NI. The optical measurement of dissolved oxygen (DO) is based on photoluminescence. The sensor type is Xylem WTW FDO 70x IQ F. The effluent PO4 is measured using an online HACH Phosphax sc phosphate analyzer with sampling time of 10 min and minimum detection limit of 0.05 mg/L.

Software tools, data collection and preprocessing methods

An Industrial IoT platform, KYB, developed by Digitread Connect, was used for online data collection and Kalman implementation of the virtual sensors. The Industrial IoT platform KYB was used for uploading and standardizing operational data. The online data sets were collected in .csv format. The laboratory data set was obtained in .xlsx format from the process engineers. Matlab was used for development of data preprocessing and measurement equations, state-space model identification and Kalman filter tuning. Matlab functions ’fillmissing’ and ’filloutliers’ were applied to pre-process the data. Function ’fitlm’ multivariate regression was used to estimate measurement equation parameters. Function ’corrplot’ was applied to find the correlation between the variables. Function ’ssest’ was used to estimate the state-space models. Kalman filter was implemented for the real-time estimation of PO4 and COD in the KYB IIoT platform. Data preprocessing methods suitable for real-time use included outlier removal of values above/below three standard deviations and filling of missing values with the previous feasible value.

Modeling approach

The case study is a continuous large-scale municipal WRRF process, the Hias process. The modeling challenge is to estimate PO4 and COD profile in the Hias process zones using extra instrumentation of two electrical conductivity (EC) and two oxidation–reduction potential (ORP) sensors. Direct online measurements of PO4 are not available in the process zones (only effluent analyzer) and direct online measurements of COD are available at two zones. Possible modeling approach would need to include an equation between the EC and ORP sensor measurements and nutrient concentrations, and a dynamic model accounting for the biological phenomena in the Hias process.

In previous work with the pilot-scale MBBR process, a state-space model structure has been applied in an Extended Kalman filter to estimate the PO4 and COD concentrations at the inlet and anaerobic zones (Nair et al. 2019, 2020). State-space models with input–output delay were applied to estimate the effluent PO4 concentration for the large-scale municipal Hias process in Komulainen et al. (2023).

A state-space model relates the measured variables to unmeasured state variables and measured input variables . A dynamic model with coefficient matrices A and B relates states at next time step with states and inputs at current time step k as described in Equation (1). A linear measurement model with coefficient matrices C and D relates measurements and states at current time step k as described in Equation (2). Further, the state-space model structure is used to develop a methodology that combines measurement equations between EC, ORP, and nutrients together with dynamic state-space models of the Hias process.
(1)
(2)

Model error indices

The measurement equation and state-space model fit to data are evaluated using the R2 index based on the real measurements , the mean value of the real measurements , the model calculated output for N data points as presented in the following equation.
(3)
The R2 index can be interpreted as value between 0 and 0.3 is weak, 0.3 and 0.5 is moderate, 0.5 and 0.7 is sufficient, and value between 0.7 and 1 is strong explanation of variability between the measurements and the model.

Measurement equations between EC, ORP, PO4 and COD

In the anaerobic part of the Hias process, electrical conductivity (EC) sensors are placed at inlet and zone 3. According to Kim et al. (2007), Aguado et al. (2006), and Nair et al. (2020), the relationship between electrical conductivity EC and nutrient concentrations (PO4 and COD) in anaerobic conditions can be modeled using multivariate regression with linear, quadratic and interaction terms between the input variables (predictors). In our work, we look into first order models to fit the C matrix in the state-space model for anaerobic zone 3 and inlet, as presented in Equation (4). Effect of temperature and other variables will be taken into account by other parts of the dynamic model. The laboratory data set was used to fit the model, as PO4 and COD are not measured online in anaerobic zones (except COD at inlet). Matlab function fitlm was used to fit the ci parameters of the following multivariate regression:
(4)
In the aerobic part of the Hias process, oxidation–reduction (ORP) sensors are placed at zone 4 and zone 7. The ORP sensor at aerobic zone 4 was defective during the data collection period. Nair et al. (2019) have used pH as input for measurement equation in aerobic conditions. However, at Hias WRRF two ORP sensors are available instead of pH sensors. Wang et al. (2022) have developed linear multivariate regression for nutrient monitoring using ORP sensors in anoxic denitrification process. In our work, we look into first order models to fit the C matrix in the state-space model for aerobic zone 7, as presented in Equation (5). Effect of temperature and other variables will be taken into account by other parts of the dynamic model. The laboratory data set was used to fit the model, as PO4 is not measured online in aerobic zones. COD and NO2 and DO are measured at zone 7. Matlab function fitlm was used to fit the ci parameters of the following multivariate regression:
(5)

Input variable selection using correlation plots

As the Hias process has much more variables available than a state-space model identification algorithm can handle, the input variables (features) were selected using correlation pair plots between the input and output variables. Time delays were first applied for each variable, and then the variables were plotted using the corrplot function in Matlab. In total, three to six input variables with highest correlation to the measured output variables were chosen. High correlation between two input variables would rule out one of the input variables (Ljung 1999).

State-space model identification

The estimation of PO4 and COD can be formulated as a state-space system with two states (x1 and x2), multiple inputs (ui), output measurements (y1 and y2) and input–output delays θi. State-space model identification requires scaled continuous data of the inputs and outputs, but not the states. The state-space model will estimate the values of the state variables. The state-space model for states x1 and x2, corresponding to scaled values of PO4 and COD can be represented as:
(6)
Theory on identification of state-space systems is presented in Ljung (1999) and for state-delayed state-space systems in Gu & Li (2023). Matlab function ’ssest’ uses the subspace identification algorithm to identify the model parameter matrices A and B. Model parameters in matrices C and D are defined by the measurement equation.

State-space models are estimated from the selected input–output variable set with input–output time delays. The data needs to be scaled, so that the mean values and standard deviations become similar between the variables. The data set is divided into two, where first half is used for model estimation (estimation/training data) and the second half for validation (validation/testing data). The ’ssest’ Matlab function with input–output delay and two states is used with the estimation data set. The resulting model is applied to the validation data set, and error between the measurement and model are compared numerically using the R2 error index and visually using a trend figure.

PO4 and COD calculation using state estimates

The state variable estimates x1 and x2 can be used to calculate the PO4 and COD concentrations in different zones of the Hias process. The mean and standard deviation of PO4 and COD were calculated from the laboratory data. The standard deviation of the state estimates x1 and x2 was calculated from the state-space model results, and used as scaling factors w1 and w2 for the calculation of the PO4 and COD estimates with the laboratory data mean and standard deviation values.
(7)
(8)
The scaling factors w1 and w2 were calculated from the estimation data set as:
(9)

Kalman filter implementation in IoT platform

A Kalman filter improves the future estimates of PO4 and COD by adjusting an error variable in the algorithm. Kalman filter ensures that the error between the model and measurements is very close to zero. A description of a linear Kalman filter algorithm can be found from text books, for example Ljung (1999).

The Kalman filter code was developed in Matlab m-script and manually translated to Python. NumPy (https://numpy.org/) was used for implementing the Kalman filter, and pandas (https://pandas.pydata.org/) was used for handling and pre-processing data. The Python driver for MongoDB (https://pypi.org/project/pymongo/) was used for fetching online data from the database, and for updating the database with the estimation results. The Kalman filter script in Python was containerized using Docker (https://www.docker.com/), and deployed to KYB as a serverless function in AWS Lambda (https://aws.amazon.com/lambda/). The script was then set up to run every 10 min, pushing the estimation results back into the database as virtual sensor updates.

Data collection and preprocessing

The online data and laboratory data were collected for a period of 21 March–31 July 2023. Prior to the measurement equation work, an extended laboratory data set was created. The laboratory data set with daily measurements was manually filled with values for EC, ORP and COD from the online data set with matching time stamps. During period of 1 April–15 May 2023 many online measurements were missing, hence, for most of the results, the online data set is for period 15 May 2023–31 July 2023, whereas and extended period of 21 March–31 July 2023 is used for the laboratory data set.

As the data is not normally distributed, the outliers in the data set were identified based on more than 1.5 interquartile ranges above the upper quartile (75%) or below the lower quartile (25%). Then, the missing values were filled in using fillmissing based on previous feasible value. In the Kalman filter online implementation, the outlier detection limit was set to three standard deviations and outliers and missing values were filled with previous feasible value.

Input–output time delays θ

The time delay in samples between two zones was defined as: the volume of the N zones between these, divided by average flow rate and the sampling time Ts of 10 min, as presented in Equation (10). The resulting time delays in are given in Table 2.
(10)
Table 2

Time delays in samples with

SymbolZones in betweenDelay in samples
θ8 10 29 
θ7 26 
θ4 15 
θ3 11 
θ2 
θ1 
SymbolZones in betweenDelay in samples
θ8 10 29 
θ7 26 
θ4 15 
θ3 11 
θ2 
θ1 

Nutrient estimation at inlet

A state-space system at inlet can be defined between:

  • the measured outputs () are and

  • the states () consist of unmeasured and measured

  • the measured inputs () at inlet are , , , and

It is assumed that there are no time delays between the input and output variables.

Selection of input variables

The input variables were plotted against the ECin and CODin measurements to choose a set of input variables with high correlation. It is desired to have two to six variables as inputs to the state-space model. Based on the correlation plot, Figure 2, the following input variables with absolute correlation value over 0.4 were considered:
  • ECin correlation over 0.4 : NO2in (0.61), NO3in (0.52), NOXin (0.66).

  • CODin correlation over 0.4: T (-0.42), NO3in (0.78), NOXin (0.72).

Figure 2

Matrix of correlation plots between variables at inlet.

Figure 2

Matrix of correlation plots between variables at inlet.

Close modal
Flowrate F was not correlated to either of the output variables. Temperature T was negatively correlated with COD. As NOX is strongly correlated with NO2 (0.74) and NO3 (0.90) and all the NO measurements are correlated with the output variables, NOX with was chosen. Hence, the final selection of input variables includes: T, NOXin. Applying the state-space approach of Equation (6) gives the following state-space presentation of the inputs, states and outputs:
(11)

Measurement equation

Based on an extended laboratory data set for period 21 March–30 July 2023, in total 32 unique laboratory data points with ECin, PO4in and CODin were used to fit a linear regression in Equation (16). The parameter values were and with strong modeling accuracy of .
(12)

State-space model identification

The state-space model was identified between the inputs and outputs with two state-variables. The resulting state-space model coefficients and error indices are presented in Table 3. The state-space model comparison to the estimation data set gave R2 index of 0.62 and 0.64 for EC and COD, and the comparison to the validation data set gave R2 index of 0.52 and 0.66 for EC and COD. The model accuracy is sufficient.

Table 3

Inlet SS model parameters and error indices

Statesa11a12
x1 1.078 -0.2512 
   
x2 0.4884 -0.3337 
Inputs   
 −0.1591 −0.9459 
NOXin 0.1156 0.5399 
Measurements   
ECin 0.3488 0.6138 
   
CODin 
Error Estimation Validation 
index data data 
 0.62 0.52 
 0.64 0.66 
Statesa11a12
x1 1.078 -0.2512 
   
x2 0.4884 -0.3337 
Inputs   
 −0.1591 −0.9459 
NOXin 0.1156 0.5399 
Measurements   
ECin 0.3488 0.6138 
   
CODin 
Error Estimation Validation 
index data data 
 0.62 0.52 
 0.64 0.66 

Inlet PO4 and COD estimates

The inlet PO4 and COD were estimated using the state variables x1 and x2 from the state-space models. The mean and standard deviation of these variables were calculated on the laboratory data. The standard deviation of the state estimates was close to one and the scaling factors w1 and w2 were also 1. The PO4 and COD estimates were calculated by scaling the state estimates x1 and x2 with the laboratory data set mean and standard deviation values as presented in Table 4.

Table 4

Inlet laboratory data properties and scaling factor w

VariableNMeanStdevw
 20 5.812 1.1426 
CODin 13 478.92 115.68 
VariableNMeanStdevw
 20 5.812 1.1426 
CODin 13 478.92 115.68 

The estimated PO4 and COD are presented in Figures 3 and 4. The PO4 state estimates fit with about half of the laboratory samples. The COD estimates are following the trends in the online COD measurement, and most of the laboratory samples match with the COD estimates and the online data. It should be noted that the online data and laboratory data do not match at all points of time due to different measurement principles.
Figure 3

Inlet PO4 laboratory measurements (*) and PO4 model estimate (blue line).

Figure 3

Inlet PO4 laboratory measurements (*) and PO4 model estimate (blue line).

Close modal
Figure 4

COD laboratory measurements (*), COD online measurement (black like) and COD model estimate (blue line).

Figure 4

COD laboratory measurements (*), COD online measurement (black like) and COD model estimate (blue line).

Close modal

Nutrient estimation zone 3

A state-space system in zone 3 can be defined as:

  • the measured outputs () are

  • the states () consist of unmeasured and unmeasured

  • the measured inputs () at inlet with delay θ3 are , , , , ,

  • the measured input () at inlet with delay θ4 is

Selection of input variables with EC3 correlation plot

The scaled and delayed input variables were plotted against the EC3 measurements, given in Figure 5, to choose a set of input variables with high absolute correlation with EC3.

Based on the absolute correlation values, the following input variables were chosen for the state-space model for zone 3:
  • inlet variable NO2in (0.59): Nitrite has high correlation with EC3, but low correlation with the other variables.

  • inlet variable SS10 (0.57): Suspended solids in zone10 can represent the biomass capacity on releasing PO4 and binding COD on the next round in the Hias process zones 1–3. As SS10 and SS are highly correlated, only SS10 is selected.

  • inlet variable CODin (0.73): COD at inlet has high correlation with NO3, SS10, SS.

  • inlet variable ECin (0.98): is highly correlated with the output variable and most of the input variables.

Flow, F, and temperature, T, have both low absolute correlation values with EC3 and are discarded from the input variables. NO2 has higher correlation than NO3 and is chosen. SS10 has higher correlation than SS, and is chosen. The final combination of input variables are: CODin, NO2, SS10 and ECin Applying the state-space approach of Equation (6) gives the following state-space presentation of the inputs, states and outputs:
(13)

Measurement equation EC3

Based on an extended laboratory data set for period 21 March–30 July 2023, in total 32 unique laboratory data points with EC3, PO4z3 and COD3 were used to fit a linear regression to Equation (16). The parameters were and with a good fitness of .
(14)

State-space model identification

The state-space model was identified between the inputs and outputs with two state-variables. The first half of the data set was used for model estimation and the second half for model validation. The resulting matrix coefficients are presented in Table 5 . The state-space model fitness to estimation data had a R2 value of 0.96 and to validation data a R2 value of 0.97. The model fitness for the measured variable EC3 is very high because the correlations with the input variables are very high.

Table 5

Zone state-space model parameters and error indices

Statea11a12
x1  0.7556 −0.7581 
    
x2  −0.3866 −0.4218 
Inputs Delay   
CODin 11 0.04415 0.06767 
NO 11 0.05976 0.1121 
SS10 15 -0.02631 -0.04808 
ECin 11 0.8372 1.49 
Measurement    
EC3  0.648 0.3332 
Error  Estimation Validation 
index  data data 
R2 EC3  0.9578 0.9711 
Statea11a12
x1  0.7556 −0.7581 
    
x2  −0.3866 −0.4218 
Inputs Delay   
CODin 11 0.04415 0.06767 
NO 11 0.05976 0.1121 
SS10 15 -0.02631 -0.04808 
ECin 11 0.8372 1.49 
Measurement    
EC3  0.648 0.3332 
Error  Estimation Validation 
index  data data 
R2 EC3  0.9578 0.9711 

Zone 3 PO4 and COD estimates

The PO4 and COD at zone 3 were estimated using the state variables x1 and x2 from the state-space model. The mean and standard deviation of these variables were calculated on the laboratory data. The inverse standard deviation of the state estimates x1 and x2 was used as scaling factor w1 and w2. The PO4 and COD estimates were calculated by scaling the state estimates x1 and x2 with the laboratory data set mean and standard deviation values as presented in Table 6.

Table 6

Zone 3 laboratory data properties and scaling factor w

VariableNmeanstdevw
PO4z3 20 35.36 10.91 0.99 
COD3 13 221.77 44.21 1.42 
VariableNmeanstdevw
PO4z3 20 35.36 10.91 0.99 
COD3 13 221.77 44.21 1.42 

The PO4 zone 3 estimates with the training and validation data were plotted in the same figure with the 20 laboratory data points, as presented in Figure 6. Due to the scarceness of the laboratory data (20 points), a statistical evaluation of model fitness would not give reliable results. The PO4 estimate seems to fit with half of the laboratory data points.
Figure 5

Correlation plot matrix for variables at zone 3.

Figure 5

Correlation plot matrix for variables at zone 3.

Close modal
Figure 6

Zone 3 PO4 laboratory measurements (*) and PO4 model estimate (blue line).

Figure 6

Zone 3 PO4 laboratory measurements (*) and PO4 model estimate (blue line).

Close modal
The COD zone 3 estimates with the training and validation data were plotted in the same figure with the 13 laboratory data points, as presented in Figure 7. Due to the scarceness of the laboratory data (13 points), a statistical evaluation of model fitness would not give reliable results. The majority of the laboratory data points seem to fit with the COD estimate.
Figure 7

Zone 3 COD laboratory measurements (*) and COD model estimate (blue line).

Figure 7

Zone 3 COD laboratory measurements (*) and COD model estimate (blue line).

Close modal

Nutrient estimation zone 7

A state-space system in zone 7 can be defined as:

  • the measured outputs () are ORP and

  • the states () consist of unmeasured and measured

  • the measured input () at the end of Hias process with delay θ8 is .

  • the measured inputs () at inlet with delay θ7 are , , , EC, , , .

  • the measured input () at zone 3 with delay θ4 is EC.

  • the measured inputs () at zone 4 with delay θ3 are DO, FO.

  • the measured inputs () at zone 5 with delay θ2 are DO, FO.

  • the measured inputs () at zone 6 with delay θ1 are DO, FO.

  • the measured inputs () at zone 7 without delay are ,, , .

Selection of data set

The ORP7 measurements have poor data quality for most of the available online data set period. Therefore, the correlations with input variables and ORP7 were close to zero when the whole data set was used for correlation analysis. Hence, two data periods with sufficient ORP7 data quality were chosen, samples 7,000–8,000 for estimation data and samples 9,500–10,500 for validation data. With these data periods, the correlations between input variables and ORP7 were increased up to 0.56.

Selection of input variables with ORP7 and COD7 correlation plots

The delayed input variables were plotted against output variables OPR7 and COD7 to choose a set of input variables with high correlation to the output variables. As there are 19 input variables, the correlation matrices were plotted for subsets of input variables. As an example, the selected input–output pair plots are illustrated in Figure 8.
Figure 8

Correlation plot matrix for selected input variables at zone 7.

Figure 8

Correlation plot matrix for selected input variables at zone 7.

Close modal
Based on the correlation plots, input variables with absolute correlation value over 0.5 for ORP7 or COD7 were chosen to form a set of 6 input variables. The following input variables were chosen for the state-space model at zone 7, correlations are given, respectively, for ORP7 and COD7:
  • inlet variable T (0.56 and 0.65): Temperature has high correlation with both output variables, but also with most input variables.

  • inlet variable SS (0.52 and 0.63): suspended solids from the Hias process outlet are highly correlated with output variables, but also with most input variables. Suspended solids can represent the biomass capacity on releasing/binding PO4 and binding COD in the Hias process. As SS10 and SS are highly correlated, only the variable with highest correlation is selected.

  • zone 3 variable EC3 (0.38 and 0.72): electrical conductivity at zone 3 is highly correlated especially with COD7. As ECin and EC3 are highly correlated, EC3 with highest correlation to COD7 is chosen. The EC difference variable EC3-ECin has low correlation to both output variables.

  • zone 7 variable FO7 (0.40 and 0.64): The aeration rate in zone 7 is highly correlated with the output variables.

  • zone 7 variable NO (−0.50 and −0.53): Nitrite is negatively correlated with both output variables.

  • zone 7 variable DO7E (0.11 and −0.03): Dissolved oxygen in zone 7 is a variable in the measurement equation, and hence it should be included to the state-space model even though it has low correlation on both output variables.

Applying the state-space approach of Equation (6) gives the following state-space presentation of the inputs, states and outputs:
(15)

Measurement equation

The laboratory data set for the period 21 March–30 July 2023 with PO4 and COD measurements in zone 7 was extended with online measurements of ORP7, COD7, , NO, DO. Total number of laboratory measurements with all these variables was 20, but with poor ORP7 measurement quality for 15 of 20 measurements. These measurements were used to fit a linear regression. The parameters were . The model fitness is much lower than for the EC measurement equations, because issues with the ORP7 data quality for most of the period.
(16)

State-space model identification

The state-space model was identified between the inputs and outputs with two state-variables. The first half of the data set was used for model estimation and the second half for model validation. The resulting state-space model coefficients and error indices are presented in Table 7. The state-space model comparison to estimation data set gave R2 index of 0.54 and 0.73 for ORP7 and COD7, and the comparison to validation data set gave R2 index of -0.14 and 0.62 for ORP7 and COD7. The R2 values are sufficient for COD7, but the fit to validation data for ORP7 is insufficient.

Table 7

State-space model parameters and R2 at zone 7

Statea11a12
x1  0.4838 −3.076 
  a21 a22 
x2  -0.009 0.9097 
Input delay   
  4.223 0.08713 
SS  2.952 0.0516 
EC3  −0.05261 0.01237 
FO7  −1.227 −0.02149 
NO  −2.535 −0.05085 
DO7E  0.1727 −0.001416 
Output    
ORP7  0.126 −0.452 
    
COD7  
Input delay   
  
SS  
EC3  
FO7  
NO  0.353 
DO7E  0.293 
Error  Estimation Validation 
index  data data 
  0.5403 −0.1429 
  0.7282 0.6168 
Statea11a12
x1  0.4838 −3.076 
  a21 a22 
x2  -0.009 0.9097 
Input delay   
  4.223 0.08713 
SS  2.952 0.0516 
EC3  −0.05261 0.01237 
FO7  −1.227 −0.02149 
NO  −2.535 −0.05085 
DO7E  0.1727 −0.001416 
Output    
ORP7  0.126 −0.452 
    
COD7  
Input delay   
  
SS  
EC3  
FO7  
NO  0.353 
DO7E  0.293 
Error  Estimation Validation 
index  data data 
  0.5403 −0.1429 
  0.7282 0.6168 

Zone 7 PO4 and COD estimates

Zone 7 PO4 and COD were estimated using the state variables x1 and x2 from the state-space models. The mean and standard deviation of these variables were calculated on the laboratory data. The scaling factors w1 and w2 were calculated as inverse of the standard deviation in the state-variables. The PO4 and COD estimates were calculated by scaling the state estimates x1 and x2 with the laboratory data set mean and standard deviation values as presented in Table 8.

Table 8

Zone 7 laboratory data properties and scaling factor w

VariableNMeanStdevw
PO 20 3.1 3.56 1/8.53 
COD7 13 80.3 10.5 1/0.56 
VariableNMeanStdevw
PO 20 3.1 3.56 1/8.53 
COD7 13 80.3 10.5 1/0.56 

The estimated PO4 and COD are presented in Figures 9 and 10. For the period of reasonable ORP7 data quality, very few laboratory measurement points were available, three (3) for PO4 and two (2) for COD. As negative values of PO4 are not possible, more work should be done on estimation and scaling the model parameters. The COD model estimates have bias, but follow some of the peaks in the online COD measurement. More work is needed to improve accuracy. To improve the estimation approach with state-space model and measurement model in zone 7, a larger laboratory data set and a new online data set after the maintenance of the ORP sensor is required.
Figure 9

Zone 7 PO4 laboratory measurements (*) and PO4 model estimate (blue line).

Figure 9

Zone 7 PO4 laboratory measurements (*) and PO4 model estimate (blue line).

Close modal
Figure 10

Zone 7 laboratory measurements (*), COD online measurement (black like) and COD model estimate (blue line).

Figure 10

Zone 7 laboratory measurements (*), COD online measurement (black like) and COD model estimate (blue line).

Close modal

Kalman filter

The model accuracy was strongest at zone 3, and Kalman filter was implemented at the IIoT platform for zone 3. Kalman filter parameters Q and R were the same as the variance, respectively, in states and measurements. Since the inputs, state variables x1 and x2 and outputs are normalized with zero mean and standard deviation of 1, Q and R became an identity matrices. Initial guess for covariance matrix Pp diagonal elements were 0.1, and initial values for states were chosen 0.5. The Kalman estimated states x1 and x2 were then scaled to PO4 and COD values using the mean and standard deviation from the laboratory measurements.

The reduced ASM2d modeling approach of Nair et al. (2020) with Extended Kalman filter was considered first. We developed reduced order ASM2d models using the available laboratory measurements and online data. However, ammonia (NH3) was not included in the online measurements and the model accuracy decreased just within days. The laboratory data could not be made available in the KYB IIoT platform due to Microsoft two-factor authentication, and therefore, it was not possible to calibrate the reduced order ASM2d models online.

In our data-driven approach, each zone with EC or ORP measurement becomes it’s own virtual sensor. The virtual sensor consists of selected physical online measurements, a state-space model and a Kalman filter that continuously updates the error between the model predictions and real measurements. The state-space model consists of dynamic equation and measurement equation.

Input variable selection

The input variables were selected using correlation matrix with time-delayed variables and the EC/ORP measurement. The selection of input variables affects greatly the state-space model accuracy. Selection of too many input variables deteriorates the accuracy of the state-space model. For example in zone 3, selection of 6 input variables gives R2 value of 0.6 at best, where as selection of 4 input variables increases the R2 value up to 0.97. Other input variable selection methods, such as Shapley additive explanations (Paepae et al. 2022) or Matlab stepwise regression could be used in future work.

State-space model – Measurement equation

The measurement equation, a linear multivariate regression was developed, in the anaerobic zones between EC and nutrients PO4 and COD, and, in the anaerobic zones, between ORP and nutrients PO4, COD, NO2 and dissolved oxygen DO. The measurement equation coefficients give values for the C and D matrices. The challenge with the measurement equation is small number of laboratory samples where all the PO4, COD were present. For the inlet and zone 3, the measurement equation accuracy was reasonably strong, but for zone 7, the ORP sensor quality caused lower modeling accuracy. It is possible to increase the modeling accuracy by adding quadratic and interaction terms, but these would not fit with the linear state-space model.

State-space model – dynamic part

The system identification algorithm was forced to use the measurement equation C and D matrix values, and only find optimal parameters of A and B matrices for the state-space model using online data between measured input and output variables. This implementation enables meaningful relationship between the inputs, states and outputs, and hence information of the unmeasured states, or the nutrient concentrations.

Comparison of results

The model comparisons to online and laboratory measurements is illustrated for inlet, zone 3 and zone 7 only with the state-space models, not with Kalman filter. The Kalman filter will minimize the error between the model and output measurements, and not give a realistic picture of the model accuracy. The model accuracy was strongest at zone 3, and Kalman filter as implemented at the IIoT platform for zone 3 (and effluent). The main weakness of the modeling approach is that it is not directly related to material balances as the Extended Kalman Filter approach in Nair et al. (2019, 2020). If the laboratory data could be made available in the IIoT platform, more advanced modeling approaches can be developed. The modeling work for zone 7 was challenging due to very limited period of feasible ORP sensor data, and a few laboratory measurements. However, it was desired to test the same approach as for inlet and zone 3. Hence, the results are not strong.

This article presents development work for virtual sensors estimating PO4 and COD using online measurements and state-space models of the Hias process. As the laboratory data set was not available online, the modeling approach utilizing simplified ASM2d models could not be applied. Hence, data-driven approach of state-space models was chosen. Input variables for the dynamic part of the state-space models were selected using correlation plots. Linear measurement equations were developed between the online EC and ORP measurements and phosphate PO4 and carbon COD measurements using aggregated laboratory and online data sets. The modeling accuracy R2 was strong at inlet (0.86), and zone 3 (0.91), but moderate at zone 7 (0.66). The measurement equations were further used to formulate state-space models. The dynamic part of the state-space models were identified from the online data. The R2 error index for the state-space models was 0.6 for EC and COD at inlet, 0.9 for EC zone 3 and 0.6 for COD at zone 7, but less than 0.5 for ORP at zone 7. The estimated states were scaled, and the estimates of the PO4 and carbon concentrations were plotted against the laboratory measurements.The estimated values correspond to most of the laboratory data for inlet and zone 3, whereas for zone 7, there are too few laboratory data points to evaluate the result. There were too few laboratory data points to give statistical evaluation.

Kalman filter was implemented in KYB platform to provide online estimates of the phosphate PO4 and carbon COD concentrations at zone 3 with the best model accuracy. The ORP sensor data quality in zones 4 and 7 was insufficient. For zone 4 no measurements were available, whereas zone 7 had a limited data period of 2,000 online samples and 3 laboratory samples. This was challenging for modeling of the PO4 and COD in zone 7.

Future work is suggested on improvement of the model accuracy in zone 7 with a new data set, and development of design for energy-efficient control strategies using the virtual sensors.

RFF Innlandet is gratefully acknowledged for funding the PACBAL project (nr 337727). Torgeir Saltnes at Hias How2O, Katrine Marsteng Jansen from Hias IKS water resource recovery facility have provided valuable insights in the Hias process. Katrine Marsteng Jansen and Tobias Korten from Hias IKS have supported data collection. Axel Tveiten Bech and Ola Solli Grønningsæter from Digitread Connect have performed online data collection. Axel Tveiten Bech has implemented the state-space models and Kalman filter in the KYB IIoT platform. Professor Tiina Komulainen is the principal investigator of the project. She has developed the modeling approach, supervised and revised the dynamic modeling work, and prepared the article draft. Professor Olga Korostynska has supervised the sensor part of the work. Research assistant A.Malik Baqeri has worked on testing different modeling approaches. Professor Tiina Komulainen has prepared the final implementation of the models, virtual sensors and figures. Katrine Marsteng Jansen (Hias IKS) and Professor Olga Korostynska have done critical review of this article.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Aguado
D.
,
Ferrer
A.
,
Seco
A.
&
Ferrer
J.
2006
Comparison of different predictive models for nutrient estimation in a sequencing batch reactor for wastewater treatment
.
Chemometrics and Intelligent Laboratory Systems
84
(
1
),
75
81
.
Selected papers presented at the 9th Scandinavian Symposium on Chemometrics Reykjavik, Iceland 21–25 August 2005. https://doi.org/10.1016/j.chemolab.2006.03.009
.
European Commission
2021
.
European Commission
2022
.
Frau
I.
,
Wylie
S.
,
Byrne
P.
,
Cullen
J.
,
Korostynska
O.
&
Mason
A.
2019
Detection of Zn in water using novel functionalised planar microwave sensors
.
Materials Science and Engineering: B
247
,
114382
.
https://doi.org/10.1016/j.mseb.2019.114382
.
Frau
I.
,
Wylie
S.
,
Byrne
P.
,
Onnis
P.
,
Cullen
J.
,
Mason
A.
&
Korostynska
O.
2021
Microwave sensors for in situ monitoring of trace metals in polluted water
.
Sensors
21
(
9
),
3147
.
https://doi.org/10.3390/s21093147
.
Gu
Y.
&
Li
C.
2023
State Space Systems with Time-Delays Analysis, Identification, and Applications
, 1st edn.
Academic Press, London, UK
.
https://doi.org/10.1016/C2021-0-01181-0
.
Henze
M.
,
van Loosdrecht
M. C. M.
,
Ekama
G.
&
Brdjanovic
D.
2008
Biological Wastewater Treatment: Principles, Modelling and Design
.
IWA Publishing, London, UK
.
https://doi.org/10.2166/9781780401867
.
Hias
2022
The Hias ® process – probably the world’s most compact biological nutrient removal technology. https://www.hias.as/home
.
Kim
K. S.
,
Yoo
J. S.
,
Kim
S.
,
Lee
H. J.
,
Ahn
K. H.
&
Kim
I. S.
2007
Relationship between the electric conductivity and phosphorus concentration variations in an enhanced biological nutrient removal process
.
Water Science and Technology
55
(
1–2
),
203
208
.
https://doi.org/10.2166/wst.2007.053
.
Komulainen
T. M.
,
Baqeri
A. M.
,
Nermo
E.
,
Keprate
A.
,
Saltnes
T.
,
Jansen
K. M.
&
Korostynska
O.
2023
Estimation of effluent nutrients in municipal MBBR process. In: Proceedings of the 64th International Conference of Scandinavian Simulation Society. Linköping University Electronic Press, Västerås, Sverige September 26-27 2023. https://doi.org/10.3384/ecp200037
.
Korostynska
O.
,
Mason
A.
&
Al-Shamma’a
A.
2012
Monitoring of nitrates and phosphates in wastewater: Current technologies and further challenges
.
International Journal on Smart Sensing and Intelligent Systems
5
(
1
),
149
176
.
https://doi.org/10.21307/ijssis-2017-475
.
Ljung
L.
1999
System Identification – Theory for the User
, 2nd edn.
Prentice Hall, Englewood Cliffs, NJ, USA
.
https://doi.org/10.1002/047134608X.W1046
.
Nair
A. M.
,
Fanta
A.
,
Haugen
F. A.
&
Ratnaweera
H.
2019
Implementing and extended kalman filter for estimating nutrient composition in a sequential batch MBBR pilot plant
.
Water Science and Technology
80
(
2
),
317
328
.
https://doi.org/10.2166/wst.2019.272
.
Nair
A. M.
,
Gonzales-Silva
B. M.
,
Haugen
F. A.
,
Ratnaweera
H.
&
Østerhus
S. W.
2020
Real-time monitoring of enhanced biological phosphorus removal in a multistage EBPR-MBBR using a soft-sensor for phosphates
.
Journal of Water Process Engineering
37
(
101494
),
13
.
https://doi.org/10.1016/j.jwpe.2020.101494
.
Nermo
E.
2023
Mpc of Nutrient Removal in Wastewater Treatment Process. Master Thesis, Oslo Metropolitan University
.
Owodunni
A. A.
,
Ismail
S.
,
Kurniawan
S. B.
,
Ahmad
A.
,
Imron
M. F.
&
Abdullah
S. R. S.
2023
A review on revolutionary technique for phosphate removal in wastewater using green coagulant
.
Journal of Water Process Engineering
52
,
103573
.
https://doi.org/10.1016/j.jwpe.2023.103573
.
Paepae
T.
,
Bokoro
P. N.
&
Kyamakya
K.
2022
A virtual sensing concept for nitrogen and phosphorus monitoring using machine learning techniques
.
Sensors
22
(
19
),
7338
.
https://doi.org/10.3390/s22197338
.
Paepae
T.
,
Bokoro
P. N.
&
Kyamakya
K.
2023
Data augmentation for a virtual-sensor-based nitrogen and phosphorus monitoring
.
Sensors
23
(
3
),
1061
.
https://doi.org/10.3390/s23031061
.
Rezai
B.
&
Allahkarami
E.
2021
Chapter 2 – wastewater treatment processes–techniques, technologies, challenges faced, and alternative solutions. In: Soft Computing Techniques in Solid Waste and Wastewater Management (R. R. Karri, G. Ravindran & M. H. Dehghani edn.). Elsevier, pp. 35–53. doi: https://doi.org/10.1016/B978-0-12-824463-0.00004-5
.
Rudi
K.
,
Goa
I. A.
,
Saltnes
T.
,
Sørensen
G.
,
Angell
I. L.
&
Eikås
S.
2019
Microbial ecological processes in MBBR biofilms for biological phosphorus removal from wastewater
.
Water Science and Technology
79
(
8
),
1467
1473
.
https://doi.org/10.2166/wst.2019.149
.
Shewa
W. A.
&
Dagnew
M.
2020
Revisiting chemically enhanced primary treatment of wastewater: A review
.
Sustainability
12
(
15
),
5928
.
https://doi.org/10.3390/su12155928
.
Wang
X.
,
Wu
Y.
,
Chen
N.
,
Piao
H.
,
Sun
D.
,
Ratnaweera
H.
,
Maletskyi
Z.
&
Bi
X.
2022
Characterization of oxidation-reduction potential variations in biological wastewater treatment processes: A study from mechanism to application
.
Processes
10
(
2607
),
11
.
https://doi.org/10.3390/pr10122607
.
Weißbach
M.
,
Drewes
J. E.
&
Koch
K.
2018
Application of the oxidation reduction potential (ORP) for process control and monitoring nitrite in a coupled aerobic-anoxic nitrous decomposition operation (CANDO)
.
Chemical Engineering Journal
343
,
484
491
.
https://doi.org/10.1016/j.cej.2018.03.038
.
Zhou
Q.
,
Sun
H.
,
Jia
L.
,
Wu
W.
&
Wang
J.
2022
Simultaneous biological removal of nitrogen and phosphorus from secondary effluent of wastewater treatment plants by advanced treatment: A review
.
Chemosphere
296
,
134054
.
https://doi.org/10.1016/j.chemosphere.2022.134054
.
Zidaoui
I.
,
Wemmert
C.
,
Dufresne
M.
,
Joannis
C.
,
Isel
S.
,
Wertel
J.
&
Vazquez
J.
2023
Validation of wastewater data using artificial intelligence tools and the evaluation of their performance regarding annotator agreement
.
Water Science and Technology
87
(
12
),
2957
2970
.
https://doi.org/10.2166/wst.2023.174
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).