Water treatment processes are required to be in statistical control and capable of meeting drinking water specifications. Control charts are used to monitor the stability of quality parameters by distinguishing the in-control and out-of-control states. The basic assumption in standard applications of control charts is that observed data from the process are independent and identically distributed. However, the independence assumption is often violated in chemical processes such as water treatment. Autocorrelation, a measure of dependency, is a correlation between members of a series arranged in time. The residuals obtained from an autoregressive integrated moving averages (ARIMA) time series model plotted on a standard control chart is used to overcome the misleading of standard control charts in the case of autocorrelation. In this study, a special cause control (SCC) chart, also called a chart of residuals from the fitted ARIMA model, has been used for turbidity and pH data from a drinking water treatment plant in Samsun, Turkey. ARIMA (3,1,0) for turbidity and ARIMA (1,1,1) for pH were determined as the best time series models to remove autocorrelation. The results showed that the SCC chart is more appropriate for autocorrelated data to evaluate the stability of the water treatment process, since it provides a higher probability of coverage than an individual control chart.

## INTRODUCTION

Drinking water treatment plants are responsible for providing high quality drinking water to consumers. High quality drinking water refers to esthetically appealing water that is free of both pathogens (disease causing organisms) and chemical contaminants that have been known to cause undesirable health effects. Monitoring and control are indispensable actions for ensuring the production of high quality drinking water. Recently, quality control charts have been used for the detection of changes in biological and chemical water quality parameters (Mitsakos & Psarakis 2005; Smeti *et al.* 2005, 2006, 2007a, 2007b; Usman & Kontagora 2010). The selection of quality parameters for monitoring water depends upon the purpose for which we are going to use that water, and the extent to which we need its quality and purity. pH is one of the most important operational water quality parameters in determining the corrosive nature of water. pH control is necessary at all stages of water treatment to ensure satisfactory water clarification and disinfection. Turbidity also adversely affects the efficiency of disinfection, and it is also measured to determine what type and level of treatment are needed. Turbidity in water is caused by suspended matter, such as clay, silt, finely divided organic and inorganic matter, soluble colored organic compounds, plankton and other microscopic organisms (Patil *et al.* 2012).

Control charts are used to monitor two types of process variation, common cause variation and special cause variation. The presence of an assignable variation is an indicator of an out-of-control situation, and is characterized by non-randomness in the data. A standard assumption of traditional control charts is that observations taken from the process are independently distributed, with a constant mean and variance. However, observations depart from the assumption of independence for many industrial processes because of the advanced measurement technology, shortened sampling interval, and the nature of the process. The existence of autocorrelation in observations causes problems: detecting ‘special causes’ that do not exist and not detecting ‘special causes’ that truly exist, implying a high probability of false positives and/or false negatives (Smeti *et al.* 2007a). Alwan & Roberts (1988) proposed a method to deal with data autocorrelation. The method consists of selecting and fitting an autoregressive integrated moving average (ARIMA) model to the process, and then applying a traditional control chart to the residuals. The reason for monitoring residuals instead of actual observations is that they are independent and identically distributed with mean zero when the process is controlled, and remain independent of possible differences in the mean when the process gets out of control (Russo *et al.* 2012). Some examples considering data autocorrelation are given by Stone & Taylor (1995), Reynolds & Lu (1997), Young & Winistorfer (2001), Bisgaard & Kulahci (2005), Mitsakos & Psarakis (2005), Smeti *et al.* (2005, 2006, 2007a), Elevli *et al.* (2009), Noskievicova (2009), Karaoglan & Bayhan (2011), Martin & Petr (2012), Tasdemir (2012), Kandananond (2014), and Perzyck & Rodziewicwz (2015).

Since quality parameters of drinking water need to be monitored to guarantee the final water quality, it is necessary to use the appropriate control chart based on the data structure. The objective of this study is to compare the results of the traditional control chart (individual control (IC) chart) and the chart of residuals from a fitted ARIMA model (special cause control (SCC) chart) when the independence assumption is not valid. The data under investigation are turbidity and pH, which are routinely monitored on a daily basis.

## THEORETICAL BACKGROUND OF CONTROL CHARTS FOR AUTOCORRELATED DATA

### SCC chart

Setting control limits correctly in control charts is one of the main requirements for statistical quality control to verify the statistical stability of the analyzed process. Autocorrelation is a measure of dependency between observations. Autocorrelated data are data that are ‘self-correlated’. If the quality characteristic that is under investigation exhibits various levels of autocorrelation, control charts that are based on the assumption of independence will give a higher false alarm rate. In other words, the main effect of autocorrelation in the process data on control charts is that it produces control limits that are much tighter than desired (Mitsakos & Psarakis 2005). Therefore, if significant autocorrelation in the process data is present, it is necessary to modify traditional methodology to account for this autocorrelation.

The autocorrelation function (ACF) and partial autocorrelation function (PACF) are used to detect whether autocorrelation is present in the data or not. ACF simply shows how correlation between pairs of values varies as a function of the lag or number of periods (*k*) between them. PACF measures correlation between observations that are *k* time periods apart after controlling for correlations at intervening lags. In order to visually assess the autocorrelation in a set of time-ordered data, ACF and PACF plots that show non-zero or significant autocorrelations are usually used.

There are two main approaches for constructing control charts for autocorrelated data. The first approach uses standard control charts, but adjusts the method of estimating the process variance so that the true process variance is being estimated. Therefore, in this chart, correlation is taken care of by placing the control limits according to the changed variability (Samanta & Bhattacherjee 2001).

The second approach that has proven useful in dealing with autocorrelated data is to directly model the correlative structure with an appropriate time series model then use that model to remove autocorrelation from the data, and apply control charts to the residuals (Montgomery 1997). This type of control chart is called an SCC chart.

The SCC chart plots the residuals that are obtained after fitting the process to an ARIMA model, rather than plotting the actual observations. All the traditional tools of process control are applicable to this kind of chart (Alwan & Roberts 1988). Since the mean of the residuals is zero, the centerline is zero. The standard deviation used in this case is the standard deviation of the residuals, *σ _{a}*. The limits of the chart are given by:

where *L* is a constant multiplier, and is usually assumed to be equal to 3 (Mitsakos & Psarakis 2005). Since the residuals will be independently and identically distributed random variables, all the assumptions of traditional control charts will be met.

### ARIMA models

A time series is defined as a set of observations that are ordered in time. The time series analysis aims to investigate the mechanism generating the time series, to forecast future values of the series, to describe the behavior of the series (Russo *et al.* 2012). ARIMA models are flexible and widely used in time series analysis. They are like regression models, but the independent variables are past values of the time series and past and current values of random terms.

*p*,

*d*, and

*q*denote the number of autoregressive terms, the number of times the series has to be differenced before it becomes stationary, and the number of moving average terms, respectively. The error terms are generally assumed to be independent and identically distributed variables sampled from a normal distribution with zero mean. The general mathematical formula of ARIMA models is as follows: where

*Y _{t}* = the observed value at time period t, as a linear combination of the values observed at the last

*p*time periods, a random shock to the system at the current time period

*a*, and random shocks that occurred at the previous

_{t}*q*time periods;

## ANALYSIS

### Data

The data gathered on a daily basis covers August and November 2013. Descriptive statistics of turbidity and pH are given in Table 1. STATGRAPHICS Centurion XVI.I was used to analyze data and to construct control charts.

Parameter | Allowed limits | Months | Count | Mean | StDev. | Minimum | Q1 | Median | Q3 | Maximum | Range |
---|---|---|---|---|---|---|---|---|---|---|---|

Turbidity | 1 NTU | August | 31 | 0.6935 | 0.1365 | 0.4000 | 0.6000 | 0.7000 | 0.8000 | 0.9000 | 0.5000 |

September | 30 | 0.8300 | 0.1643 | 0.5000 | 0.7000 | 0.8500 | 1.0000 | 1.1000 | 0.6000 | ||

October | 31 | 0.8355 | 0.1644 | 0.5000 | 0.7000 | 0.9000 | 1.0000 | 1.1000 | 0.6000 | ||

November | 30 | 1.1300 | 0.2891 | 0.6000 | 0.8750 | 1.1000 | 1.4000 | 1.6000 | 1.0000 | ||

pH | 6.5–9.5 | August | 31 | 8.0610 | 0.1293 | 7.7800 | 8.0000 | 8.0600 | 8.1800 | 8.2500 | 0.4700 |

September | 30 | 8.0873 | 0.0829 | 7.8500 | 8.0175 | 8.1050 | 8.1600 | 8.2300 | 0.3800 | ||

October | 31 | 8.0723 | 0.0619 | 7.8600 | 8.0500 | 8.0800 | 8.1100 | 8.2000 | 0.3400 | ||

November | 30 | 7.8867 | 0.1027 | 7.6000 | 7.8400 | 7.8750 | 7.9225 | 8.1300 | 0.5300 |

Parameter | Allowed limits | Months | Count | Mean | StDev. | Minimum | Q1 | Median | Q3 | Maximum | Range |
---|---|---|---|---|---|---|---|---|---|---|---|

Turbidity | 1 NTU | August | 31 | 0.6935 | 0.1365 | 0.4000 | 0.6000 | 0.7000 | 0.8000 | 0.9000 | 0.5000 |

September | 30 | 0.8300 | 0.1643 | 0.5000 | 0.7000 | 0.8500 | 1.0000 | 1.1000 | 0.6000 | ||

October | 31 | 0.8355 | 0.1644 | 0.5000 | 0.7000 | 0.9000 | 1.0000 | 1.1000 | 0.6000 | ||

November | 30 | 1.1300 | 0.2891 | 0.6000 | 0.8750 | 1.1000 | 1.4000 | 1.6000 | 1.0000 | ||

pH | 6.5–9.5 | August | 31 | 8.0610 | 0.1293 | 7.7800 | 8.0000 | 8.0600 | 8.1800 | 8.2500 | 0.4700 |

September | 30 | 8.0873 | 0.0829 | 7.8500 | 8.0175 | 8.1050 | 8.1600 | 8.2300 | 0.3800 | ||

October | 31 | 8.0723 | 0.0619 | 7.8600 | 8.0500 | 8.0800 | 8.1100 | 8.2000 | 0.3400 | ||

November | 30 | 7.8867 | 0.1027 | 7.6000 | 7.8400 | 7.8750 | 7.9225 | 8.1300 | 0.5300 |

### IC charts

IC charts are used to track the process level and detect the presence of assignable causes when the data have been collected one at a time rather than in groups. This type of chart uses the moving range of two successive observations to estimate the process variability.

### Testing autocorrelation and trend

### ARIMA model

*th*order is the final process model to describe the stationary ARMA (p,q). The

*p*and

*q*in ARMA (p,q) measure the orders of the autoregressive and moving average components, respectively. To get an idea of what orders to consider, the PACF and ACF were examined. In Figure 5, there is no significant autocorrelation coefficient for turbidity, that is, the ARIMA model for turbidity will not have a moving range component. However, it is seen that one significant autocorrelation coefficient at lag 1 exists for pH. Therefore, MA(1), having a memory of only one period, should be considered for pH. Notice from Figure 6 that the second and third autocorrelations are statistically significant for turbidity. This suggest the AR (3) model. There is another statistically significant autocorrelation at lag 6, but this can be ignored. Since only the first partial autocorrelation coefficient is significant, the time series for pH has an autoregressive order of 1, called AR (1).

In addition to the above approach based on the ACF and PACF plots of the first-differenced series, several alternative models were also examined for model selection (Table 2). At this stage, statistics calculated from the one-ahead forecast errors were considered. Better ARIMA models have smaller root mean squared error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) values, which measure the variability of the forecasting errors. If the forecasts are not biased, mean error (ME) and mean percentage error (MPE) should be close to zero. ARIMA (3,1,0) and ARIMA (1,1,1) were found to be suitable models for turbidity and pH, respectively. The number of differences (*d*) is 1 for both models, since the data were first differenced to render stationarity.

Parameter | Model | RMSE | MAE | MAPE | ME | MPE |
---|---|---|---|---|---|---|

Turbidity | ARIMA (3,1,0) | 0.1720 | 0.1278 | 16.0378 | − 0.0012 | − 3.1232 |

ARIMA (2,1,0) | 0.1775 | 0.1292 | 16.4177 | − 0.0004 | − 3.0140 | |

ARIMA (3,2,0) | 0.2029 | 0.1534 | 18.5821 | 0.0001 | − 1.5897 | |

ARIMA (3,1,1) | 0.1729 | 0.1284 | 16.1178 | − 0.0013 | − 3.1543 | |

ARIMA (0,1,2) | 0.1726 | 0.1305 | 16.4740 | − 0.0015 | − 3.5570 | |

pH | ARIMA (1,1,1) | 0.0844 | 0.0536 | 0.6713 | 0.0016 | 0.0085 |

ARIMA (1,1,2) | 0.0845 | 0.0536 | 0.6712 | 0.0017 | 0.0104 | |

ARIMA (3,1,1) | 0.0847 | 0.0540 | 0.6752 | 0.0056 | 0.0584 | |

ARIMA (0,0,1) | 0.0993 | 0.0746 | 0.9334 | 0.0003 | − 0.0137 | |

ARIMA (1,0,0) | 0.0848 | 0.0529 | 0.6625 | − 0.0001 | − 0.0123 |

Parameter | Model | RMSE | MAE | MAPE | ME | MPE |
---|---|---|---|---|---|---|

Turbidity | ARIMA (3,1,0) | 0.1720 | 0.1278 | 16.0378 | − 0.0012 | − 3.1232 |

ARIMA (2,1,0) | 0.1775 | 0.1292 | 16.4177 | − 0.0004 | − 3.0140 | |

ARIMA (3,2,0) | 0.2029 | 0.1534 | 18.5821 | 0.0001 | − 1.5897 | |

ARIMA (3,1,1) | 0.1729 | 0.1284 | 16.1178 | − 0.0013 | − 3.1543 | |

ARIMA (0,1,2) | 0.1726 | 0.1305 | 16.4740 | − 0.0015 | − 3.5570 | |

pH | ARIMA (1,1,1) | 0.0844 | 0.0536 | 0.6713 | 0.0016 | 0.0085 |

ARIMA (1,1,2) | 0.0845 | 0.0536 | 0.6712 | 0.0017 | 0.0104 | |

ARIMA (3,1,1) | 0.0847 | 0.0540 | 0.6752 | 0.0056 | 0.0584 | |

ARIMA (0,0,1) | 0.0993 | 0.0746 | 0.9334 | 0.0003 | − 0.0137 | |

ARIMA (1,0,0) | 0.0848 | 0.0529 | 0.6625 | − 0.0001 | − 0.0123 |

RMSE, root mean squared error; MAE, mean absolute error; MAPE, mean absolute percentage error; ME, mean error; MPE, mean percentage error.

Table 3 provides the final estimates of the parameters. Each of the estimated model coefficients is shown together with a *t*-test. Since the *p*-values associated with the coefficients are less than 0.05, the coefficients are significantly different from 0 at the 5% significance level.

Parameter | Model | Model parameter | Estimate | Std. Error | t | P-value |
---|---|---|---|---|---|---|

Turbidity | ARIMA (3,1,0) | AR (1) | − 0.232335 | 0.0886801 | − 2.61993 | 0.009961 |

AR (2) | − 0.250073 | 0.0880416 | − 2.8404 | 0.005316 | ||

AR (3) | − 0.261968 | 0.089041 | − 2.9421 | 0.003931 | ||

Mean | 0.00133828 | 0.00913336 | 0.146527 | 0.883758 | ||

Constant | 0.00233447 | |||||

pH | ARIMA (1,1,1) | AR (1) | 0.610754 | 0.098874 | 6.17709 | 0.000000 |

MA (1) | 0.937621 | 0.0471965 | 19.8663 | 0.000000 | ||

Mean | − 0.00135921 | 0.00160166 | − 0.848629 | 0.397806 | ||

Constant | − 0.000529069 |

Parameter | Model | Model parameter | Estimate | Std. Error | t | P-value |
---|---|---|---|---|---|---|

Turbidity | ARIMA (3,1,0) | AR (1) | − 0.232335 | 0.0886801 | − 2.61993 | 0.009961 |

AR (2) | − 0.250073 | 0.0880416 | − 2.8404 | 0.005316 | ||

AR (3) | − 0.261968 | 0.089041 | − 2.9421 | 0.003931 | ||

Mean | 0.00133828 | 0.00913336 | 0.146527 | 0.883758 | ||

Constant | 0.00233447 | |||||

pH | ARIMA (1,1,1) | AR (1) | 0.610754 | 0.098874 | 6.17709 | 0.000000 |

MA (1) | 0.937621 | 0.0471965 | 19.8663 | 0.000000 | ||

Mean | − 0.00135921 | 0.00160166 | − 0.848629 | 0.397806 | ||

Constant | − 0.000529069 |

### SCC chart

## RESULTS

Traditional control charts require that observations from the process are independent of one another. Failure to meet this requirement causes the traditional control charts to give a huge number of false alarms. Even a very low degree of autocorrelation in observations has a substantial effect on the performance of these charts. SCC charts, traditional control charts applied to monitor the residuals of the ARIMA model, are more appropriate in a correlated environment as they provide a higher probability of coverage than the individual charts.

In this study, SCC charts for turbidity and pH are based on ARIMA (3,1,0) and ARIMA (1,1,1) models, respectively. In other studies, Smeti *et al.* (2006, 2007a) used ARIMA (0,1,1). Elevli *et al.* (2009), Noskievicova (2009), and Karaoglan & Bayhan (2011) used ARIMA (1,0,0) or AR (1). ARIMA (1,0,1), ARIMA (0,1,1) and ARIMA (0,1,2) were used by Tasdemir (2012) and Kandananond (2014). Different models are the results of ACF and PACF plots used during determination of models.

According to the SCC charts given in Figure 7, the water treatment process is not in control statistically for both of the parameters. The outside points for turbidity are observed after the 90th data point. This means that the variability in November was increased compared to other months. According to the descriptive statistics of Table 1, the process is also not capable of meeting the allowed upper limit (1 NTU). Consequently, some out of specification drinking water is supplied to consumers from the turbidity point of view. A number of out-of-control points for pH are higher than turbidity. However, the observed variation for pH is narrower than the specifications (6.5–9.5). This situation is an indication that the process is capable. Since the objective of water treatment plants is to provide drinking water that is safe and acceptable to consumers, it is necessary to investigate the assignable causes of process variation and to eliminate them.