Comparative analysis of select techniques and metrics for data reconciliation in smart energy distribution network

Reliability of each state of process in many chemical process industries largely relies upon water and vitality supplies. In this way, there is great necessity to have an improved and controlled Smart Energy Distribution Network (SEDN) in industries. In SEDNs, sensor information related to flow control and optimization serves as a basis for modelling of energy management systems. Therefore, it is important to ensure that sensor data are accurate and precise. However, they are affected by random noise and measurement biases, which compromise the quality of measurements. Data Reconciliation (DR) is one such approach popularly used in industries to reduce the adverse impact of random errors present in pipe flow measurements. In this study, Python-based simulations of weighted least squares (WLS) and principal component analysis (PCA) based DR techniques are implemented on the selected flow streams of SEDN, and reconciled estimates are obtained. The results show that Root Mean Square Error (RMSE) is the best performance metric since it is more sensitive to small changes in the measurement values and the reconciled estimates. Further, it is observed that PCA-DR performs better than WLS-DR in reducing the random error (and thereby achieving greater precision of measured values).


INTRODUCTION
In most chemical industries, utilities such as water and energy play an important role. A sensor-based smart energy distribution network (SEDN) is required to monitor the consumption of these supplies. SEDN can aid in providing better process quality, more efficient operation, more accurate forecasting of supply and demand. SEDN operation is usually supported by Supervisory Control and Data Acquisition (SCADA) systems (Park & Jung 2014;Quevedo et al. ;Krȏ cová ). Therefore, measurement of process flow variables is an essential part of this process. The precision of the measurements is very important, without which modelling and analysis can be misleading. However, usually measured variables are contaminated by fixed and random errors (Schönenberger ; Tran et al. ).
These random errors creep into measurements from various sources like high frequency pickups, low resolution, and signal converters (Câmara et al. ). Data reconciliation (DR) is an approach usually applied to treat random errors present in a measured variable under a constrained process environment. The most important difference between DR and other signal processing techniques is that DR uses process constraints; that is, mass and energy balances of the process network, while the latter do not. At times it is not possible to measure all the process variables in a process due to practical difficulty. In such situations, unmeasured variables could be estimated through soft sensors or solving DR problem (Miao et  Principal Component Analysis (PCA) is a multivariate data processing technique extensively used for dimension reduction where data on a large number of variables are available. It is also widely used in behavioural modelling of large water management systems, monitoring of water distribution including leakage, abnormal use of water, illegal connections, process monitoring for multi input-multi output (MIMO) processes, soft sensor modelling, data reconciliation (DR), and gross error detection (GED). In In order to prove that the DR techniques are actually accomplishing the task of improving the precision of measured data, performance metrics are needed.  Zhang et al. (). In order to find the best metric, factors that lead to deterioration of data should be looked at, and that is discussed in the following section of this study.
In this paper, integrated water supply networks are considered whose pipe flow streams are assumed to be contaminated by gaussian noise. Two DR techniques, WLS-DR and PCA-based DR, are applied to measurements, and reconciled estimates are obtained. The selected metrics are applied to evaluate the performance of DR techniques and the best metric is then found. Further, the same is implemented and evaluated for other datasets that have different variances and serially correlated errors.

DATA RECONCILIATION (DR) TECHNIQUES
Data reconciliation is generally applied to reduce the effect of random errors present in the process variables. The reconciled estimates are obtained from the information con- where, M is the raw measurement matrix,M is the reconciled estimate of M, ρ is the cost function, m j is the j th measurement of i th variable,m ij is the j th reconciled estimate of i th variable, n is the number of process variables, N is the sample size, σ is the standard deviation of process variable and F(M) is the process constraint.
Optimised measurements of a system can be estimated which have lesser effect of random noise. Given below are certain prerequisites which are essential in order to apply DR to a dataset.
i. The process constraints, which consist of material and energy balance equations, should be defined.
ii. The set of measured process variables must be specified and the inaccuracies in these measurements must be specified in terms of the associated variances and covariance.
Application of DR to linear steady state processes is discussed below. Consider a(j) to be the true measurement vector of a steady state system for each sample j where j ¼ 1,2,3,…, N. For a steady state process, equality constraints are derived as, whereC is an incidence matrix of the process; a(j) is the j th true measurement vector and it can be termed as [a 1 , a 2 , . . . , a n ] T , N is the sample size, and n is the number of process variables. The process constraint matrix is derived from the incidence matrixC.
Let the measurement of process variables be m(j), then the generalised measurement model (Narasimhan & Bhatt ) is represented in Equation (3): where r(j) is a random error vector and δ(j) is the measurement bias.
The measurement model shown in Equation (3) represents a realistic model since in practice, over a long period, measurements may change only due to random error. Here, the true value vector a(j), variance of measurements (σ 2 ) and process incidence matrix (C) are considered (realistic assumption) to be fixed.
There are a few other assumptions about random errors considered.
In non-i.i.d. data, the Auto Regressive Moving Average (ARMA) model is considered for the random error and its structure is expressed as in Equation (4): where, φ is an auto regressive parameter, θ is a moving average parameter and ω is the white noise; p and q are the order of the model.

The weighted least squared-DR
The conventional approach to implement DR is the weighted least square data reconciliation (WLS-DR), which is used to get higher precision estimates of process variables from measurements which have noise added.
The basic idea of this approach is to minimise the residuals Equation (5): subject to f(A) ¼ 0 and g(A) 0, where, M -raw measurement, A-reconciled estimate of M, w ij -the j th weight of the i th measurement variable, and f(a) and g(a) represent the constraints of the process. The optimum value of w ij will provide good reconciled estimates for m ij . For a known covariance matrix (Λ), the reconciled estimates of the measurement of variablesâ(j) can be obtained by: where,â is reconciled estimates for m(j), Λ is the error covariance matrix and C is the constraint matrix.
The obtained estimates are normally distributed and satisfy the process constraints.  (7)   (7) where is a decomposition of error covariance matrix (Λ) of the process network.
Therefore the covariance matrix of M s can be calculated by Equation (9)

Global deviation (GD)
This is a measure of average squared difference between true and measured values.

Correlation coefficient (CC)
The Pearson correlation coefficient is a statistical measure that calculates the strength and direction of the linear relationship between two variables. The value of CC is between À1 and 1. It cannot capture non-linear relationships between two variables.
Signal to noise ratio (SNR) With respect to statistics, SNR is defined as a measure of the ratio between variance of mean deviation from the true value of the measured variable to the variance of the measured variable:

Relative error reduction (RER)
Relative Error Reduction (RER) is another measure used to evaluate the performance of reconciliation techniques. This measure is the ratio of relative errors between raw measurement and reconciled estimates. Higher value of RER indicates a better reconciliation technique: where, m is the measurements, a is the true value of the measurement, m is the mean of measurement variable I andm is the reconciled estimate. For raw measurements, RER is considered as zero.  (17), Input flow variable of ith node À Output flow variable of 0 i 0 th node ¼ 0 The incidence matrix for this process is shown in Input and output flow variables are denoted as '1' and ' À 1' respectively. The process constraint matrix is derived from the incidence matrix by removing node '5', which has non-recycled variables in the network.
In order to evaluate the performance of each technique, a few flow variables with specific magnitudes are identified. The performance indices are calculated as explained in the previous section. The reconciled estimates are obtained using corresponding DR techniques as explained above, and the results are compared with raw data. The performance index calculation procedure is explained as follows: Step 1: Obtaining raw measurement Step 2: Applying DR technique Step 3: Calculate reconciled estimates (Â) Step 4: Calculate Performance Index The performance of the recycled network is shown in

Example 2
The large process network (Varshith et al. ) shown in Figure 2 consists of 11 nodes representing the balance equations and 28 flow variables. The base value of each variable is referred as in Table 2. The constraint matrix for this process is calculated as in Equation (17). It is observed that in Figure 3(a), RMSE is the best metric for capturing the minute changes in noise present in the data, followed by RER and CC. SNR and GD remain constant throughout, proving to be of no valid significance in evaluating the different data sets. As in the case of variables F16 and F22 shown in Figure 3(b) and 3(c) respectively, SNR and GD remain constant here as well. PCA-DR performs linearly as the magnitude increases.
For high magnitudes, its performance is less accurate when compared to the estimates of WLS-DR. Thus, we can effectively say that the performance of PCA-DR decreases as the base magnitude of the flow variables increases.

DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.