Water treatment processes are required to be in statistical control and capable of meeting drinking water specifications. Control charts are used to monitor the stability of quality parameters by distinguishing the in-control and out-of-control states. The basic assumption in standard applications of control charts is that observed data from the process are independent and identically distributed. However, the independence assumption is often violated in chemical processes such as water treatment. Autocorrelation, a measure of dependency, is a correlation between members of a series arranged in time. The residuals obtained from an autoregressive integrated moving averages (ARIMA) time series model plotted on a standard control chart is used to overcome the misleading of standard control charts in the case of autocorrelation. In this study, a special cause control (SCC) chart, also called a chart of residuals from the fitted ARIMA model, has been used for turbidity and pH data from a drinking water treatment plant in Samsun, Turkey. ARIMA (3,1,0) for turbidity and ARIMA (1,1,1) for pH were determined as the best time series models to remove autocorrelation. The results showed that the SCC chart is more appropriate for autocorrelated data to evaluate the stability of the water treatment process, since it provides a higher probability of coverage than an individual control chart.
INTRODUCTION
Drinking water treatment plants are responsible for providing high quality drinking water to consumers. High quality drinking water refers to esthetically appealing water that is free of both pathogens (disease causing organisms) and chemical contaminants that have been known to cause undesirable health effects. Monitoring and control are indispensable actions for ensuring the production of high quality drinking water. Recently, quality control charts have been used for the detection of changes in biological and chemical water quality parameters (Mitsakos & Psarakis 2005; Smeti et al. 2005, 2006, 2007a, 2007b; Usman & Kontagora 2010). The selection of quality parameters for monitoring water depends upon the purpose for which we are going to use that water, and the extent to which we need its quality and purity. pH is one of the most important operational water quality parameters in determining the corrosive nature of water. pH control is necessary at all stages of water treatment to ensure satisfactory water clarification and disinfection. Turbidity also adversely affects the efficiency of disinfection, and it is also measured to determine what type and level of treatment are needed. Turbidity in water is caused by suspended matter, such as clay, silt, finely divided organic and inorganic matter, soluble colored organic compounds, plankton and other microscopic organisms (Patil et al. 2012).
Control charts are used to monitor two types of process variation, common cause variation and special cause variation. The presence of an assignable variation is an indicator of an out-of-control situation, and is characterized by non-randomness in the data. A standard assumption of traditional control charts is that observations taken from the process are independently distributed, with a constant mean and variance. However, observations depart from the assumption of independence for many industrial processes because of the advanced measurement technology, shortened sampling interval, and the nature of the process. The existence of autocorrelation in observations causes problems: detecting ‘special causes’ that do not exist and not detecting ‘special causes’ that truly exist, implying a high probability of false positives and/or false negatives (Smeti et al. 2007a). Alwan & Roberts (1988) proposed a method to deal with data autocorrelation. The method consists of selecting and fitting an autoregressive integrated moving average (ARIMA) model to the process, and then applying a traditional control chart to the residuals. The reason for monitoring residuals instead of actual observations is that they are independent and identically distributed with mean zero when the process is controlled, and remain independent of possible differences in the mean when the process gets out of control (Russo et al. 2012). Some examples considering data autocorrelation are given by Stone & Taylor (1995), Reynolds & Lu (1997), Young & Winistorfer (2001), Bisgaard & Kulahci (2005), Mitsakos & Psarakis (2005), Smeti et al. (2005, 2006, 2007a), Elevli et al. (2009), Noskievicova (2009), Karaoglan & Bayhan (2011), Martin & Petr (2012), Tasdemir (2012), Kandananond (2014), and Perzyck & Rodziewicwz (2015).
Since quality parameters of drinking water need to be monitored to guarantee the final water quality, it is necessary to use the appropriate control chart based on the data structure. The objective of this study is to compare the results of the traditional control chart (individual control (IC) chart) and the chart of residuals from a fitted ARIMA model (special cause control (SCC) chart) when the independence assumption is not valid. The data under investigation are turbidity and pH, which are routinely monitored on a daily basis.
THEORETICAL BACKGROUND OF CONTROL CHARTS FOR AUTOCORRELATED DATA
SCC chart
Setting control limits correctly in control charts is one of the main requirements for statistical quality control to verify the statistical stability of the analyzed process. Autocorrelation is a measure of dependency between observations. Autocorrelated data are data that are ‘self-correlated’. If the quality characteristic that is under investigation exhibits various levels of autocorrelation, control charts that are based on the assumption of independence will give a higher false alarm rate. In other words, the main effect of autocorrelation in the process data on control charts is that it produces control limits that are much tighter than desired (Mitsakos & Psarakis 2005). Therefore, if significant autocorrelation in the process data is present, it is necessary to modify traditional methodology to account for this autocorrelation.
The autocorrelation function (ACF) and partial autocorrelation function (PACF) are used to detect whether autocorrelation is present in the data or not. ACF simply shows how correlation between pairs of values varies as a function of the lag or number of periods (k) between them. PACF measures correlation between observations that are k time periods apart after controlling for correlations at intervening lags. In order to visually assess the autocorrelation in a set of time-ordered data, ACF and PACF plots that show non-zero or significant autocorrelations are usually used.
There are two main approaches for constructing control charts for autocorrelated data. The first approach uses standard control charts, but adjusts the method of estimating the process variance so that the true process variance is being estimated. Therefore, in this chart, correlation is taken care of by placing the control limits according to the changed variability (Samanta & Bhattacherjee 2001).
The second approach that has proven useful in dealing with autocorrelated data is to directly model the correlative structure with an appropriate time series model then use that model to remove autocorrelation from the data, and apply control charts to the residuals (Montgomery 1997). This type of control chart is called an SCC chart.
The SCC chart plots the residuals that are obtained after fitting the process to an ARIMA model, rather than plotting the actual observations. All the traditional tools of process control are applicable to this kind of chart (Alwan & Roberts 1988). Since the mean of the residuals is zero, the centerline is zero. The standard deviation used in this case is the standard deviation of the residuals, σa. The limits of the chart are given by:
where L is a constant multiplier, and is usually assumed to be equal to 3 (Mitsakos & Psarakis 2005). Since the residuals will be independently and identically distributed random variables, all the assumptions of traditional control charts will be met.
ARIMA models
A time series is defined as a set of observations that are ordered in time. The time series analysis aims to investigate the mechanism generating the time series, to forecast future values of the series, to describe the behavior of the series (Russo et al. 2012). ARIMA models are flexible and widely used in time series analysis. They are like regression models, but the independent variables are past values of the time series and past and current values of random terms.
Yt = the observed value at time period t, as a linear combination of the values observed at the last p time periods, a random shock to the system at the current time period at, and random shocks that occurred at the previous q time periods;
ANALYSIS
Data
The data gathered on a daily basis covers August and November 2013. Descriptive statistics of turbidity and pH are given in Table 1. STATGRAPHICS Centurion XVI.I was used to analyze data and to construct control charts.
Descriptive statistics
Parameter . | Allowed limits . | Months . | Count . | Mean . | StDev. . | Minimum . | Q1 . | Median . | Q3 . | Maximum . | Range . |
---|---|---|---|---|---|---|---|---|---|---|---|
Turbidity | 1 NTU | August | 31 | 0.6935 | 0.1365 | 0.4000 | 0.6000 | 0.7000 | 0.8000 | 0.9000 | 0.5000 |
September | 30 | 0.8300 | 0.1643 | 0.5000 | 0.7000 | 0.8500 | 1.0000 | 1.1000 | 0.6000 | ||
October | 31 | 0.8355 | 0.1644 | 0.5000 | 0.7000 | 0.9000 | 1.0000 | 1.1000 | 0.6000 | ||
November | 30 | 1.1300 | 0.2891 | 0.6000 | 0.8750 | 1.1000 | 1.4000 | 1.6000 | 1.0000 | ||
pH | 6.5–9.5 | August | 31 | 8.0610 | 0.1293 | 7.7800 | 8.0000 | 8.0600 | 8.1800 | 8.2500 | 0.4700 |
September | 30 | 8.0873 | 0.0829 | 7.8500 | 8.0175 | 8.1050 | 8.1600 | 8.2300 | 0.3800 | ||
October | 31 | 8.0723 | 0.0619 | 7.8600 | 8.0500 | 8.0800 | 8.1100 | 8.2000 | 0.3400 | ||
November | 30 | 7.8867 | 0.1027 | 7.6000 | 7.8400 | 7.8750 | 7.9225 | 8.1300 | 0.5300 |
Parameter . | Allowed limits . | Months . | Count . | Mean . | StDev. . | Minimum . | Q1 . | Median . | Q3 . | Maximum . | Range . |
---|---|---|---|---|---|---|---|---|---|---|---|
Turbidity | 1 NTU | August | 31 | 0.6935 | 0.1365 | 0.4000 | 0.6000 | 0.7000 | 0.8000 | 0.9000 | 0.5000 |
September | 30 | 0.8300 | 0.1643 | 0.5000 | 0.7000 | 0.8500 | 1.0000 | 1.1000 | 0.6000 | ||
October | 31 | 0.8355 | 0.1644 | 0.5000 | 0.7000 | 0.9000 | 1.0000 | 1.1000 | 0.6000 | ||
November | 30 | 1.1300 | 0.2891 | 0.6000 | 0.8750 | 1.1000 | 1.4000 | 1.6000 | 1.0000 | ||
pH | 6.5–9.5 | August | 31 | 8.0610 | 0.1293 | 7.7800 | 8.0000 | 8.0600 | 8.1800 | 8.2500 | 0.4700 |
September | 30 | 8.0873 | 0.0829 | 7.8500 | 8.0175 | 8.1050 | 8.1600 | 8.2300 | 0.3800 | ||
October | 31 | 8.0723 | 0.0619 | 7.8600 | 8.0500 | 8.0800 | 8.1100 | 8.2000 | 0.3400 | ||
November | 30 | 7.8867 | 0.1027 | 7.6000 | 7.8400 | 7.8750 | 7.9225 | 8.1300 | 0.5300 |
IC charts
IC charts are used to track the process level and detect the presence of assignable causes when the data have been collected one at a time rather than in groups. This type of chart uses the moving range of two successive observations to estimate the process variability.
Testing autocorrelation and trend
Estimated autocorrelations for turbidity (a) and pH (b) after first differencing.
Estimated autocorrelations for turbidity (a) and pH (b) after first differencing.
ARIMA model
Estimated partial autocorrelations for turbidity (a) and pH (b) after first differencing.
Estimated partial autocorrelations for turbidity (a) and pH (b) after first differencing.
In addition to the above approach based on the ACF and PACF plots of the first-differenced series, several alternative models were also examined for model selection (Table 2). At this stage, statistics calculated from the one-ahead forecast errors were considered. Better ARIMA models have smaller root mean squared error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) values, which measure the variability of the forecasting errors. If the forecasts are not biased, mean error (ME) and mean percentage error (MPE) should be close to zero. ARIMA (3,1,0) and ARIMA (1,1,1) were found to be suitable models for turbidity and pH, respectively. The number of differences (d) is 1 for both models, since the data were first differenced to render stationarity.
Model comparison
Parameter . | Model . | RMSE . | MAE . | MAPE . | ME . | MPE . |
---|---|---|---|---|---|---|
Turbidity | ARIMA (3,1,0) | 0.1720 | 0.1278 | 16.0378 | − 0.0012 | − 3.1232 |
ARIMA (2,1,0) | 0.1775 | 0.1292 | 16.4177 | − 0.0004 | − 3.0140 | |
ARIMA (3,2,0) | 0.2029 | 0.1534 | 18.5821 | 0.0001 | − 1.5897 | |
ARIMA (3,1,1) | 0.1729 | 0.1284 | 16.1178 | − 0.0013 | − 3.1543 | |
ARIMA (0,1,2) | 0.1726 | 0.1305 | 16.4740 | − 0.0015 | − 3.5570 | |
pH | ARIMA (1,1,1) | 0.0844 | 0.0536 | 0.6713 | 0.0016 | 0.0085 |
ARIMA (1,1,2) | 0.0845 | 0.0536 | 0.6712 | 0.0017 | 0.0104 | |
ARIMA (3,1,1) | 0.0847 | 0.0540 | 0.6752 | 0.0056 | 0.0584 | |
ARIMA (0,0,1) | 0.0993 | 0.0746 | 0.9334 | 0.0003 | − 0.0137 | |
ARIMA (1,0,0) | 0.0848 | 0.0529 | 0.6625 | − 0.0001 | − 0.0123 |
Parameter . | Model . | RMSE . | MAE . | MAPE . | ME . | MPE . |
---|---|---|---|---|---|---|
Turbidity | ARIMA (3,1,0) | 0.1720 | 0.1278 | 16.0378 | − 0.0012 | − 3.1232 |
ARIMA (2,1,0) | 0.1775 | 0.1292 | 16.4177 | − 0.0004 | − 3.0140 | |
ARIMA (3,2,0) | 0.2029 | 0.1534 | 18.5821 | 0.0001 | − 1.5897 | |
ARIMA (3,1,1) | 0.1729 | 0.1284 | 16.1178 | − 0.0013 | − 3.1543 | |
ARIMA (0,1,2) | 0.1726 | 0.1305 | 16.4740 | − 0.0015 | − 3.5570 | |
pH | ARIMA (1,1,1) | 0.0844 | 0.0536 | 0.6713 | 0.0016 | 0.0085 |
ARIMA (1,1,2) | 0.0845 | 0.0536 | 0.6712 | 0.0017 | 0.0104 | |
ARIMA (3,1,1) | 0.0847 | 0.0540 | 0.6752 | 0.0056 | 0.0584 | |
ARIMA (0,0,1) | 0.0993 | 0.0746 | 0.9334 | 0.0003 | − 0.0137 | |
ARIMA (1,0,0) | 0.0848 | 0.0529 | 0.6625 | − 0.0001 | − 0.0123 |
RMSE, root mean squared error; MAE, mean absolute error; MAPE, mean absolute percentage error; ME, mean error; MPE, mean percentage error.
Table 3 provides the final estimates of the parameters. Each of the estimated model coefficients is shown together with a t-test. Since the p-values associated with the coefficients are less than 0.05, the coefficients are significantly different from 0 at the 5% significance level.
Parameter estimates
Parameter . | Model . | Model parameter . | Estimate . | Std. Error . | t . | P-value . |
---|---|---|---|---|---|---|
Turbidity | ARIMA (3,1,0) | AR (1) | − 0.232335 | 0.0886801 | − 2.61993 | 0.009961 |
AR (2) | − 0.250073 | 0.0880416 | − 2.8404 | 0.005316 | ||
AR (3) | − 0.261968 | 0.089041 | − 2.9421 | 0.003931 | ||
Mean | 0.00133828 | 0.00913336 | 0.146527 | 0.883758 | ||
Constant | 0.00233447 | |||||
pH | ARIMA (1,1,1) | AR (1) | 0.610754 | 0.098874 | 6.17709 | 0.000000 |
MA (1) | 0.937621 | 0.0471965 | 19.8663 | 0.000000 | ||
Mean | − 0.00135921 | 0.00160166 | − 0.848629 | 0.397806 | ||
Constant | − 0.000529069 |
Parameter . | Model . | Model parameter . | Estimate . | Std. Error . | t . | P-value . |
---|---|---|---|---|---|---|
Turbidity | ARIMA (3,1,0) | AR (1) | − 0.232335 | 0.0886801 | − 2.61993 | 0.009961 |
AR (2) | − 0.250073 | 0.0880416 | − 2.8404 | 0.005316 | ||
AR (3) | − 0.261968 | 0.089041 | − 2.9421 | 0.003931 | ||
Mean | 0.00133828 | 0.00913336 | 0.146527 | 0.883758 | ||
Constant | 0.00233447 | |||||
pH | ARIMA (1,1,1) | AR (1) | 0.610754 | 0.098874 | 6.17709 | 0.000000 |
MA (1) | 0.937621 | 0.0471965 | 19.8663 | 0.000000 | ||
Mean | − 0.00135921 | 0.00160166 | − 0.848629 | 0.397806 | ||
Constant | − 0.000529069 |
SCC chart
RESULTS
Traditional control charts require that observations from the process are independent of one another. Failure to meet this requirement causes the traditional control charts to give a huge number of false alarms. Even a very low degree of autocorrelation in observations has a substantial effect on the performance of these charts. SCC charts, traditional control charts applied to monitor the residuals of the ARIMA model, are more appropriate in a correlated environment as they provide a higher probability of coverage than the individual charts.
In this study, SCC charts for turbidity and pH are based on ARIMA (3,1,0) and ARIMA (1,1,1) models, respectively. In other studies, Smeti et al. (2006, 2007a) used ARIMA (0,1,1). Elevli et al. (2009), Noskievicova (2009), and Karaoglan & Bayhan (2011) used ARIMA (1,0,0) or AR (1). ARIMA (1,0,1), ARIMA (0,1,1) and ARIMA (0,1,2) were used by Tasdemir (2012) and Kandananond (2014). Different models are the results of ACF and PACF plots used during determination of models.
According to the SCC charts given in Figure 7, the water treatment process is not in control statistically for both of the parameters. The outside points for turbidity are observed after the 90th data point. This means that the variability in November was increased compared to other months. According to the descriptive statistics of Table 1, the process is also not capable of meeting the allowed upper limit (1 NTU). Consequently, some out of specification drinking water is supplied to consumers from the turbidity point of view. A number of out-of-control points for pH are higher than turbidity. However, the observed variation for pH is narrower than the specifications (6.5–9.5). This situation is an indication that the process is capable. Since the objective of water treatment plants is to provide drinking water that is safe and acceptable to consumers, it is necessary to investigate the assignable causes of process variation and to eliminate them.