Abstract
Because of the static nature of conventional principal component analysis (PCA), natural process variations may be interpreted as faults when it is applied to processes with time-varying behavior. In this paper, therefore, we propose a complete adaptive process monitoring framework based on incremental principal component analysis (IPCA). This framework updates the eigenspace by incrementing new data to the PCA at a low computational cost. Moreover, the contribution of variables is recursively provided using complete decomposition contribution (CDC). To impute missing values, the empirical best linear unbiased prediction (EBLUP) method is incorporated into this framework. The effectiveness of this framework is evaluated using benchmark simulation model No. 2 (BSM2). Our simulation results show the ability of the proposed approach to distinguish between time-varying behavior and faulty events while correctly isolating the sensor faults even when these faults are relatively small.
HIGHLIGHTS
A novel framework is proposed for adaptive fault detection and diagnosis based on IPCA.
The validity and applicability of the proposed framework are examined using benchmark simulation model No.2 (BSM2).
The realtime imputation of intricate patterns of missing values is performed using EBLUP.
The proposed framework can detect small-magnitude drift faults while tracking time-varying behavior.
Graphical Abstract
ABBREVIATIONS
- X
Data matrix
- T
Score matrix
- P
Loading matrix
- E
Residual matrix
- S
Covariance matrix
- n
Data matrix row number
- m
Data matrix column number
Eigenvector matrix for kth sample
Eigenvalues matrix for kth sample
jth eigenvalues
Number of principal components to be retained for kth sample
Hotelling's T-squared for kth sample
Square prediction error for kth sample
kth sample
Standardized kth sample
Mean of kth sample
- f
Forgetting factor
Standard deviation for kth sample
Projection of kth sample using the current eigenvector
Residual vector for kth sample
Normalized residual vector for kth sample
Euclidean norm of matrix
Rotation matrix for kth sample
A matrix that decomposed to obtain
Cumulative percent variance with retained principal components
Chi-distribution with degrees of freedom and a confidence limit
T2 threshold for kth sample
SPE threshold for kth sample
Normal deviate corresponding to the (1 − α) percentile
Complete decomposition contribution for kth sample using Hotelling's T-squared
Complete decomposition contribution for kth sample using SPE
Vector of kth sample with observed values
Vector of kth sample with missing values
Mean of observed values of k − 1th sample
Mean of missing values of k − 1th sample
Eigenvector matrix of observed values of k − 1th sample
Eigenvector matrix of missing values of k − 1th sample
Imputed missing values of kth sample
Conditional expectation
- Sin
BSM2 inorganic nitrogen concentration in anaerobic digester
BSM2 secondary clarifier exponential settling velocity function
BSM2 growth rate for the autotrophs
- PCA
Principal component analysis
- IPCA
Incremental principal component analysis
- EBLUP
Empirical best linear unbiased prediction
- BSM2
Benchmark simulation model No.2
- WRRF
Water resource recovery facilities
- WWTP
Wastewater treatment plant
- MSPCs
Multivariate statistical process controls
- RPCA
Recursive principal component analysis
- EWMA
Exponential weighted moving average
- FOP
First-order perturbation
- PCs
Principal components
- TSS
Total suspended solid
- PI
Proportional integral
- AD
Anaerobic digestion
- COD
Carbon-oxygen demand
- FAR
False alarm rate
- MDR
Missed detection rate
INTRODUCTION
Because of the complexity of modern industrial processes, the demand has increased to implement fault detection and a diagnosis framework for those processes. Water resource recovery facilities (WRRF), formerly called wastewater treatment plants (WWTP), are no exception (Sánchez-Fernández et al. 2018). Abnormal or faulty events are those that occur when the process indicates a deviation from its normal behavior. In WRRF, abnormal events include changes in influent quality (e.g. rainfall, industrial discharge), outbreaks of microorganisms (e.g. filamentous bacteria, algae) that impact treatment quality, irregularities or damage to treatment units (e.g. membranes, clarifiers), mechanical failures (e.g. pumps, air blowers) and sensor failure (e.g. drift, bias, electrical interference) (Newhart et al. 2019). All these faults can undermine process performance, so they must be detected and diagnosed as soon as they occur. Fault detection techniques are generally divided into three main groups: model-based methods, knowledge-based methods, and data-driven methods. To implement techniques in the first two groups, an in-depth knowledge of the system's behavior is required. However, obtaining in-depth knowledge of reactions or pathways phenomena (especially for WRRFs, which are very complicated processes) is both time-consuming and challenging.
Data-driven methods, on the other hand, rely only on historical and online data and do not require in-depth prior knowledge of the process (Kazemi et al. 2020). Thanks to the advanced process control framework, WRRFs collect large amounts of data using measurements from numerous online sensors (Wang & Shi 2010). Data-driven techniques have therefore received significant attention in process monitoring over the last few years. Multivariate statistical process controls (MSPCs) are a subset of data-driven methods used to analyze the performance of the processes and identify the parameters that govern them. One of the most widely used techniques for MSPC is principal component analysis (PCA) (Newhart et al. 2019). Various types of PCA methods, such as conventional PCA (Garcia-Alvarez et al. 2009; Sanchez-Fernández et al. 2015; Xiao et al. 2017), kernel PCA (Lee et al. 2004; Jun et al. 2006; Xiao et al. 2017) and dynamic PCA (Lee et al. 2006; Mina & Verde 2006) have been successfully used to monitor processes and detect faults in time-invariant processes. However, because of the complexity of the biological reactions and non-stationary plant influent, WRRFs have time-varying characteristics. Using conventional PCA to monitor such processes may therefore lead to excessive rates of false alarms and missing detections. An adaptive process monitoring framework is therefore needed. Although adaptive PCAs are more suitable for non-stationary processes, due to high computational costs, not every adaptive approach is applicable for realtime fault detection. Several adaptive approaches include recursive PCA (RPCA), moving window PCA and exponential weighted moving average (EWMA) have recently been used for fault detection (Li et al. 2000; Rosen & Lennox 2001; Choi et al. 2006; Elshenawy et al. 2010; Shang et al. 2015). Haimi et al. implemented a moving window PCA technique to detect and isolate the faults in WRRF (Haimi et al. 2016). These authors took into account the variable length of the historical data in the model's construction to prevent sub-optimal monitoring performance. Li et al. applied an RPCA monitoring framework that recursively updates the correlation matrix. These authors used two approaches, namely rank-one modification and Lanczos tridiagonalization, to compute the eigenvalues of the updated correlation matrix (Li et al. 2000). Recently, Elshenawy et al. proposed RPCA based on first-order perturbation analysis (FOP), which is a rank-one update of the eigenvalues and their corresponding eigenvectors from a sample covariance matrix (Elshenawy et al. 2010; Elshenawy & Mahmoud 2018). These authors stated that the computational cost of their approach is lower than that of previously used methods such as RPCA using Lanczos tridiagonalization and MWPCA.
In this paper we propose a new low-computational-cost adaptive fault detection framework for WRRF that uses incremental PCA (IPCA). This approach is based on the IPCA proposed by Hall et al., which was motivated by developments in the field of computer vision (Hall et al. 1998; Brand 2002; Arora et al. 2012). The main benefit of IPCA over other adaptive PCA approaches is that it does not require all eigenvalues to be computed since only the largest ones are needed. This considerably accelerates computation time. Cardot and Degras (Cardot & Degras 2018) compared the computation time of adaptive PCA methods such as perturbation and stochastic approximation and concluded that IPCA outperforms other methods, particularly when the data have many features (sensor measurements). These authors also studied the accuracy of adaptive methods in determining the eigenspace. Their results revealed that IPCA is more accurate than methods such as RPCA, which is based on FOP theory. The IPCA algorithms offer an excellent compromise between statistical accuracy and computational speed. Another major drawback of PCA approaches are missing data during realtime monitoring. In practice, due to sudden mechanical breakdown, maintenance, hardware sensor failure or malfunction of the data acquisition system, etc., some sensor signals may become unavailable (Zhang & Dong 2014). To meet the requirements of realtime fault detection and diagnosis, therefore, the problem of missing data needs to be considered. To handle missing data, the empirical best linear unbiased prediction (EBLUP) approach is incorporated into the proposed IPCA method (Cardot & Degras 2018).
Once a fault is detected, it is essential to find out its primary source. The most common fault isolation method are contribution charts (Nomikos & MacGregor 1995). With this method, the variables that contribute to the T2 (Hotelling's T-squared) and SPE (square prediction error) statistics are calculated and the variable with the highest contribution is considered the primary cause of the fault. Because of the recursive nature of IPCA, the contribution plots of T2 and SPE are updated in accordance with the adaptive eigendecomposition. To test the proposed framework, several types of process parameters and sensor faults were simulated using Benchmark Simulation Model No. 2 (BSM2) (Jeppsson et al. 2006). This paper is organized as follows: first we provide a brief review of the theoretical background behind the conventional PCA method; then we discuss the IPCA algorithm and its realtime application framework for fault detection and diagnosis; and finally we analyze the performance of the monitoring framework on the complete and incomplete (i.e. with missing values) data sets using BSM2.
CONVENTIONAL PCA
Here, as mentioned earlier, is the number of PCs to retain. To perform fault detection, suitable thresholds for these two statistical indices must be obtained. This enables small faults to be detected with a minimum rate of false alarms. Process operation is normal if these statistical indices remain below these thresholds. The procedure for estimating the thresholds of T² and SPE will be discussed later.
INCREMENTAL PCA
Estimating the number of PCs
Process monitoring statistics thresholds
The fault is detected once the T2 and SPE statistics exceed their respective adaptive thresholds, and .
Fault isolation
Missing data imputation
ADAPTIVE FAULT DETECTION AND ISOLATION
In this section we discuss the complete structure of the adaptive fault detection and isolation scheme. The proposed adaptive fault detection and isolation framework can be divided into two stages: (i) offline training and (ii) online monitoring. The following steps summarize the overall procedure.
Offline training
- (1)
Collect the training data matrix ∈ ℜn×m and normalize it to zero mean and unit variance
- (2)
Use Equation (2) to compute S and its corresponding eigenpairs , .
- (3)
Calculate the number of retaining PCs () by applying Equation (14).
- (4)
Calculate the thresholds of the fault detection statistics and using Equations (15) and (16).
Online monitoring
- (1)
Collect a new data vector and normalize it to zero mean and unit variance using Equations (5)–(7). If there are any missing values, compute them using Equation (21).
- (2)
Calculate the monitoring statistics and using Equations (3) and (4).
- (3)
If the values of the monitoring statistics are below their corresponding thresholds, the process status is normal. The updating procedure continues as follows:
- (4)
If the values of the monitoring statistics exceed their corresponding thresholds, the process status is faulty and the updating procedure should be stopped. The fault isolation procedure (CDC) needs to be run to identify the process variables responsible for the detected fault.
The complete diagram for the proposed adaptive IPCA monitoring procedure is shown in Figure 1.
SIMULATION RESULTS
Description of the water resource recovery facility
In this section, the proposed adaptive IPCA framework is applied to Benchmark Simulation No. 2 (BSM2) (Jeppsson et al. 2006; Nopens et al. 2010). BSM2 is a well-known and powerful dynamic mathematical simulation model able to simulate the physicochemical and biological phenomena in WRRF. BSM2 comprises numerous unit operations, including primary clarifier, activated sludge biological reactor, secondary clarifier, thickener, anaerobic digester, dewatering unit, and storage tank. Five reactors are included in the activated sludge process: two anoxic reactors (used for nitrification) with a total volume of 3,000 m3 and three aerobic reactors (used for predenitrification) with a total volume of 9,000 m3. The plant capacity is 20,648 m3 d−1 with an average influent dry weather flow rate of 592 mg L−1 of average biodegradable COD. Activated Sludge Model No. 1 (ASM1) and the Anaerobic Digestion Model (ADM1) were used to describe the biological phenomena that take place in the activated sludge and AD reactor, respectively. The influent characteristics consist of a 609-day dynamic influent data file (with sampling frequency equal to a data point every 15 min) that includes rainfall and seasonal temperature variations over the year (Jeppsson et al. 2006; Nopens et al. 2010).
Fault detection and isolation
BSM2 simulation was modified to obtain sensor data from various parts of the process by simulating different types of fault. By performing this modification, 320 data measurements (16 state variables × 20 measurement points) can be obtained. However, measuring all these variables is not common practice in a real WRRF. In this paper, therefore, we considered only realistic and commonly available sensor measurements. Twenty-eight process measurements obtained from various parts of BSM2 simulation (Figure 2) were recorded every 15 min as the inputs for the proposed monitoring framework. All these variables were corrupted with white noise to simulate real sensor measurements.
To build the offline adaptive IPCA process monitoring framework, 21 days of data measurements (days 435–456) were recorded under normal operating conditions. We chose this period because there were no rain or storm events on those days. The measurements from days 457 to 530 were used to test the performance of the IPCA during time-varying characteristics and abnormal behavior. On those days, there were three storm events, which we will discuss in the next section. The main sources of time-variant behavior in this simulation are drift and abrupt changes in the total suspended solid (TSS) concentration of the reject flow of the dewatering process. These changes in the TSS concentration of the dewatering unit are due to seasonal and diurnal patterns. The number of retained PCs, , was estimated recursively using the CPV method in such a way that the retained PCs explained almost 99% of total variance. Several varieties of faults in the sensors, process variables and process parameters were simulated according to Table 1. Each fault began on day 470 and continued until day 530. The thresholds confidence limit for all charts was considered equal to 99%.
Fault# . | Description . | . | Fault type . |
---|---|---|---|
1 | Sensor fault | Sensor no. 7 | Drift fault (0.1 g·m−3·d−1) |
2 | Change in inorganic nitrogen of anaerobic digester | Sin | Added 0.2 kmol·m−3 to its normal value |
3 | Secondary clarifier parameter | Decreased by 50% | |
4 | Step change in bioreactor parameters | Decreased from 0.5 to 0.1 d−1 |
Fault# . | Description . | . | Fault type . |
---|---|---|---|
1 | Sensor fault | Sensor no. 7 | Drift fault (0.1 g·m−3·d−1) |
2 | Change in inorganic nitrogen of anaerobic digester | Sin | Added 0.2 kmol·m−3 to its normal value |
3 | Secondary clarifier parameter | Decreased by 50% | |
4 | Step change in bioreactor parameters | Decreased from 0.5 to 0.1 d−1 |
Normal operation of the WRRF
First, we simulated the normal operation of the process (i.e. with no fault added) to show that conventional PCA is unable to deal with the time-variant characteristics of WRRF. The monitoring statistics obtained by conventional PCA and IPCA are shown in Figure 3.
Before we discuss the monitoring results, note that the three severe peaks in this figure for both PCA and IPCA represent storm events (around days 490, 505 and 523) during normal operation of the WRRF. These storm events, which exist in all the fault detections studied in this paper, are classified as faults. As Figure 3 shows, the statistics for conventional PCA exceeded their thresholds for several normal operational points in addition to these storm events. In other words, these results indicate the excessive rate of false alarms, which illustrates a major limitation of using conventional PCA for time-varying processes such as WRRF. What conventional PCA captures in this case is only the dynamic and statistical characteristics of the process supplied by the training data set. However, in time-varying processes, these dynamic or statistical characteristics change over time and must therefore be updated rather than considered constant. Unlike PCA (which produces an excessive rate of false alarms), IPCA produces a very low rate of false alarms. Apart from storm events, which are correctly labeled as faults, data points mislabeled by the T2 and SPE statistics of IPCA are very few (shown in red). Also, since the correlation for T2 did not change, the threshold for T2 remained constant. On the other hand, due to adaptation, the threshold for SPE varied more although its range remained quite low, i.e. between 0.115 and 0.121 (see Figure 3).
Drift fault in the dissolved oxygen sensor
The first fault, shown in Table 1, is simulated by applying a drift in the oxygen sensor of the fourth reactor at day 470. This sensor is used in combination with a proportional-integral (PI) controller to manipulate the concentration of dissolved oxygen in the third, fourth, and fifth reactors. The size of the drift was 0.1 g·m−3·d−1. This was intentionally small in order to evaluate the adaptive behavior of IPCA during such a fault. The monitoring statistics and results of diagnosis for IPCA are shown in Figure 4. As we can see, both monitoring statistics detected the fault, though there were some delays. These delays in detection were due to the fact that, as the drift starts, the PI controller tries to compensate for them and brings the dissolved oxygen concentration back to the setpoint by decreasing the blower speed. The fault therefore cannot be detected directly from the faulty sensor at the initial stage of its occurrence. The impact of the PI controller action is more significant on dissolved oxygen in the third and fifth reactors at the initial stage of the fault occurrence.
The SPE statistic therefore begins to violate the threshold around day 475. This is due to the change in the dissolved oxygen concentrations in the third and fifth reactors. However, it is difficult to allocate the drift because it is very small. The T2 statistic begins to violate the threshold significantly around day 483, which indicates that the PI controller reduces the blower speed to its minimum. At this point, most of the variables begin to deviate and the fault becomes more visible and easier to detect. The contribution chart for T2 shows that the ammonia, H2, TSS, and O2 sensors (numbers 16, 25, 17, and 11, respectively; see Figure 2 for the type and location of sensors) contribute the most. Of these sensors, sensor 11, which represents the dissolved oxygen in the fifth reactor, is isolated correctly. However, the other sensors, such as the concentration of ammonia in the effluent stream (number 16), may provide indirect clues as to the root of the fault. As the concentration of dissolved oxygen decreases due to the action of the PI controller, the nitrification rate decreases, so the concentration of ammonia increases. The most contributing sensors for the SPE statistic are numbers 6 and 7, which represent the concentration of dissolved oxygen in the third and fourth reactors, respectively. Sensor 6 contributes more than faulty sensor number 7, which shows that the PI controller has more impact on the concentration of dissolved oxygen in the third reactor at the initial stage of the fault.
Step change in inorganic nitrogen in the anaerobic digester
The nitrogen level in the anaerobic digestion (AD) process is critical because of its inhibitory impact on microbial activity and needs to be monitored carefully. Fault number 2 (Table 1) is simulated by inducing a step change at day 470 equal to +0.2 kmol·m−3 in the AD inorganic nitrogen level. The monitoring statistics contribution charts are shown in Figure 5.
As we can see, both monitoring statistics detect the fault, though with some delays. The delay is shorter in T2 than in SPE. These delays are due to the small size of the faults and their lesser impact at the initial stage of their occurrence. Figure 5 shows that sensor 20 has the largest contribution to the T2 statistic. Sensor 20 is the gas flow of AD, which is correctly isolated. Due to the inhibition phenomena caused by the increase in inorganic nitrogen, the activity of microbial communities decreases, which accounts for the reduction in gas flow. The contribution chart for SPE could not isolate the fault correctly.
Step change in the settling velocity of the secondary clarifier
To simulate fault number 3 (Table 1), the double exponential settling velocity function () of the second clarifier was reduced by 50%. This fault can be classified as a change in the process parameters. Figure 6 shows the detection performance and contribution charts for the T2 and SPE statistics.
Figure 6 shows that both statistics can detect the fault instantly after its occurrence. However, SPE has a stronger detection than T2. Clearly, both statistics provide accurate isolation. Sensors 13 (carbon-oxygen demand (COD)) and 14 (TSS) contribute the most to T2 and SPE, respectively. Since there is no sensor to measure settling velocity, COD and TSS are considered the most relevant sensors (these variables are directly related to the settling velocity). Note that the smaller magnitude of this fault could still be detected, though the detection rate may be lower due to its lesser impact on the process.
Step change in the bioreactor parameters
One of the most crucial parameters in WRRF is the specific growth rate for the autotrophs (), which determines the speed of ammonia conversion into nitrite (nitrification rate). If the inhibition (due to toxicity or changes in pH, etc.) occurs in the activated sludge reactors, the nitrification rate is altered due to the lower microbial activity. To simulate fault number 4 (Table 1), the value of was changed from 0.5 to 0.1 d−1. The monitoring statistics and contribution charts are shown in Figure 7.
This figure shows that the T2 statistic begins to violate the threshold earlier than the SPE statistic. Although T2 begins violating the threshold in the initial stage of the fault occurrence, it took almost 30 days to be completely above the threshold. As the nature of this fault is very similar to the drift fault, the detection delay for both methods is very high due to the lesser impact of the faults at the initial stage of their occurrence. The variable contribution chart shows that sensor 10 (the concentration of ammonia in reactor 5) has clearly contributed the most to the deviation in the T2 statistic. As the nitrification rate decreases, the concentration of ammonia increases. The contribution chart of the SPE statistic provides no information about the root cause of the fault.
Storm events
As we mentioned earlier, there are three storm events in the simulated data set. These events are visible in Figure 3 around days 490, 505, and 523. Figure 3 also shows that both the T2 and SPE statistics were able to detect these events accurately. Moreover, as the storm events end, the statistical indices immediately go back to their normal values, which indicates the robustness of the method during such a fault. Figure 8 shows the variables responsible for the deviation in the T2 and SPE statistics during the storm events. The sensors' contributions to these three storm events are quite similar. As well as the increasing influent flowrate, many other variables are influenced by storm events; therefore, it is difficult to isolate the fault just by looking at the contribution charts. The trends in all correlated variables therefore need to be examined by experienced process operators. The contribution chart shows that sensors 13 and 19 (effluent COD and thickener underflow flow rate) contribute the most to the deviation in T2. During the storm events, the COD also decreases due to the dilution effect caused by the increase in plant influent flow.
The thickener underflow flow rate also increases as a result of the increase in plant influent flow. Sensor 9 (inlet flow rate of the second clarifier) contributes the most to SPE deviation, which is directly related to the plant influent flow rate. Therefore, as the plant influent flow rate increases due to the storm, the thickener underflow flowrate also increases.
FAULT DETECTION WITH MISSING VALUES
Missing values in the input data vector of the monitoring framework is a complex and challenging issue. However, in practice, missing values due to sensor failure or maintenance activities are common. A robust monitoring framework must therefore be designed to handle these missing values. In this study, missing values are imputed using EBLUP. The advantage of this method is that these values can be introduced as they appear during the fault detection procedure, which is useful for realtime applications. To assess the accuracy of our proposed method in the presence of missing sensor signals, several measurements were deleted randomly from the data set. Figure 9 shows the pattern of missing values from day 457 to day 488 during normal operation of the WRRF. This period was chosen to avoid the storm events during the normal running of the plant. Sampled data are shown in a continuous grey/black color scheme (the darker the color, the higher the value of the sensor), while missing values are shown in red. As we can see, the pattern of the missing values is extremely intricate since many sensor signals can become unavailable at the same time, which is challenging for any fault detection and monitoring framework.
0%a . | 2%a . | 6%a . | 10%a . | ||||||
---|---|---|---|---|---|---|---|---|---|
. | Fault# . | T2 . | SPE . | T2 . | SPE . | T2 . | SPE . | T2 . | SPE . |
FAR % | normalb | 0.45 | 0.26 | 0.59 | 3.87 | 0.49 | 14.05 | 0.52 | 26.49 |
MDR% | – | – | – | – | – | – | – | – | |
FAR% | 1 | 0.07 | 0 | 0.22 | 3.20 | 0.15 | 15.70 | 0.15 | 30.10 |
MDR% | 18.53 | 14.70 | 18.4 | 12.99 | 18.5 | 10.33 | 18.58 | 8.80 | |
FAR% | 2 | 0.07 | 0 | 0.22 | 3.20 | 0.15 | 15.70 | 0.15 | 30.10 |
MDR% | 8.56 | 34.26 | 8.57 | 29.95 | 8.44 | 24.07 | 8.68 | 19.86 | |
FAR% | 3 | 0.07 | 0 | 0.22 | 3.20 | 0.15 | 15.7 | 0.15 | 30.10 |
MDR% | 7.91 | 0.06 | 8.45 | 0.08 | 9.72 | 0.08 | 11.41 | 0.06 | |
FAR% | 4 | 0.07 | 0 | 0.22 | 3.20 | 0.15 | 15.70 | 0.15 | 30.10 |
MDR% | 37.86 | 55.99 | 37.84 | 55.40 | 38.10 | 54.16 | 38.34 | 53.90 |
0%a . | 2%a . | 6%a . | 10%a . | ||||||
---|---|---|---|---|---|---|---|---|---|
. | Fault# . | T2 . | SPE . | T2 . | SPE . | T2 . | SPE . | T2 . | SPE . |
FAR % | normalb | 0.45 | 0.26 | 0.59 | 3.87 | 0.49 | 14.05 | 0.52 | 26.49 |
MDR% | – | – | – | – | – | – | – | – | |
FAR% | 1 | 0.07 | 0 | 0.22 | 3.20 | 0.15 | 15.70 | 0.15 | 30.10 |
MDR% | 18.53 | 14.70 | 18.4 | 12.99 | 18.5 | 10.33 | 18.58 | 8.80 | |
FAR% | 2 | 0.07 | 0 | 0.22 | 3.20 | 0.15 | 15.70 | 0.15 | 30.10 |
MDR% | 8.56 | 34.26 | 8.57 | 29.95 | 8.44 | 24.07 | 8.68 | 19.86 | |
FAR% | 3 | 0.07 | 0 | 0.22 | 3.20 | 0.15 | 15.7 | 0.15 | 30.10 |
MDR% | 7.91 | 0.06 | 8.45 | 0.08 | 9.72 | 0.08 | 11.41 | 0.06 | |
FAR% | 4 | 0.07 | 0 | 0.22 | 3.20 | 0.15 | 15.70 | 0.15 | 30.10 |
MDR% | 37.86 | 55.99 | 37.84 | 55.40 | 38.10 | 54.16 | 38.34 | 53.90 |
aPercent of missing values; bfrom day 457 to day 488 during normal operation; MDR%, missed detection rate; FAR%, false alarm rate.
We can see that the FAR% for the T2 statistic is not affected by the missing values. On the other hand, the FAR% for the SPE statistic increases significantly in the presence of missing values. The most significant changes to the FAR% occurred with the largest level of missingness (10%). This may be because the imputation of missing values has a more significant impact on the residual subspace compared to the mean and covariance in PCs. Since the variation in the residual subspace is measured by SPE, it is more sensitive to the amount of imputed missing values.
CONCLUSION
We have proposed a novel fault detection and isolating framework based on incremental PCA (IPCA) and complete decomposition contribution (CDC). The advantage of IPCA over other adaptive PCA approaches is that it does not require all eigenvalues to be computed. This characteristic speeds up computation time. The proposed adaptive IPCA monitoring framework consists of (i) adaptive estimation of the mean and variance of data, (ii) adaptive estimation of selected PCs and threshold, (iii) estimation of eigenvalue decomposition by incrementing the new data to the PCA, and (iv) adaptive estimation of the contribution charts for T2 and SPE. This framework was applied to BSM2. Our results, demonstrated with simulated faulty scenarios, show that IPCA is able to adapt time-varying process behavior while detecting and isolating faults. Although there are some delays in detecting the faults, the performance is acceptable since the faults under study were small at the initial stage of their occurrence. The contribution charts showed that this method was able to correctly isolate the faults. Sometimes, however, due to a lack of sensor measurements, it was not possible to isolate the sensor directly and only the impact of the fault on the other measurements could be detected. Our proposed framework can also handle in realtime highly intricate patterns of missing values (e.g. more than one sensor signal failure at the same time) by applying the EBLUP method. Our study of fault detection performance with missing values shows that SPE is more sensitive to the imputation of missing values than T2. This is due to the significant impact of the imputed values on the residual subspace.
ACKNOWLEDGEMENTS
We gratefully thank the Spanish Ministry of Economy and Competitiveness for providing financial support through grants CTM2015-67970-P and for funding the doctoral scholarship (BES-2012-059675). Some authors are recognized by the Comissioner for Universities and Research of the DIUE (Department of Innovation, Universities and Business) of the autonomous government of Catalonia (2017 SGR 396) and supported by the Universitat Rovira i Virgili (2017PFR-URV-B2-33) and Fundació Bancària ‘la Caixa’ (2017ARES-06). The authors are also grateful to Dr Ulf Jeppsson of Lund University, Sweden, for kindly providing the BSM2 Matlab code.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.