Metodoloski zvezki, Vol. 11, No. 2, 2014, 79-92 Confidence Interval for the Process Capability Index Cp Based on the Bootstraps Confidence Interval for the Standard Deviation Wararit Panichkitkosolkul1 Abstract This paper proposes a confidence interval for the process capability index based on the bootstrap-^ confidence interval for the standard deviation. A Monte Carlo simulation study was conducted to compare the performance of the proposed confidence interval with the existing confidence interval based on the confidence interval for the standard deviation. Simulation results show that the proposed confidence interval performs well in terms of coverage probability in case of more skewed distributions. On the other hand, the existing confidence interval has a coverage probability close to the nominal level for symmetrical or less skewed distributions. The code to estimate the confidence interval in R language is provided. 1 Introduction Statistical process quality control has been widely applied in many industries. One of the quality measurement tools used for improvement of quality and productivity is the process capability index (PCI). Process capability indices are practical tools for establishing the relationship between the actual process performance and the manufacturing specifications. Although there are many process capability indices, the most commonly used index is Cp (Kane, 1986; Zhang, 2010). In this paper, we focus on the process capability index Cp, defined by Kane (1986) as: C _ USL - LSL (1) p _ 6a ' ( } where USL is the upper specification limit, LSL is the lower specification limit, and a is the process standard deviation. The numerator of Cp gives the size of the 1 Department of Mathematics and Statistics, Faculty of Science and Technology, Thammasat University, Thailand; wararit@mathstat.sci.tu.ac.th 80 Wararit Panichkitkosolkul range over which the process measurements can vary. The denominator gives the size of the range over which the process actually varies (Kotz and Lovelace, 1998). Due to the fact that the process standard deviation is unknown, it must be estimated from the sample data (Xj,..., Xn}. The sample standard deviation S; / \1/2 in \ S = i=1 (n -1) 1 ^(X; -X)2 is used to estimate the unknown parameter a in Equation (1). The estimator of the process capability index Cp is therefore C = USL - LSL p 6S " 1 ^ Although the point estimator of the capability index Cp shown in Equation (2) can be a useful measure, the confidence interval is more useful. A confidence interval provides much more information about the population characteristic of interest than does a point estimate (e.g., Smitson, 2001; Thompson, 2002; Steiger, 2004). The confidence interval for the capability index Cp is constructed by using a pivotal quantity Q = (n-1)S2/a2 ~^(2n-1). Therefore, the (1 -a)100°% confidence interval for the capability index Cp is 2 I 2 ^ (3) where ^/2,n-1 and x2-a/2,n-1 are the (a/2)100th and (1 -a/2)100th percentiles of the central chi-square distribution with n -1 degrees of freedom. The confidence interval for the process capability index Cp shown in Equation (3) is to be used for data that are normal. The coverage probability of this confidence interval is close to a nominal value of 1 -a when the data are normally distributed. However, the underlying process distributions are non-normal in many industrial processes. (e.g., Chen and Pearn, 1997; Bittanti et al., 1998; Wu et al., 1999; Chang et al., 2002; Ding, 2004). In these cases, the coverage probability of the confidence interval can be appreciably below 1 -a. Cojbasic and Tomovic (2007) presented a nonparametric confidence interval for the population variance based on ordinary t-statistics combined with the bootstrap method for a skewed distribution. In this paper, we propose a new confidence interval for the process capability index Cp based on the bootstrap-t confidence interval proposed by Cojbasic and Tomovic (2007). The paper is organized as follows. In Section 2, the theoretical background of the existing confidence interval for the Cp is discussed. In Section 3, we provide an analytical formula for the confidence interval for the Cp based on the bootstrap-t confidence interval for the standard deviation. In Section 4, the performance of the confidence intervals for the Cp are investigated through a Monte Carlo simulation study. Conclusions are provided in the final section. Confidence Interval for the Process Capability Index 81 2 Existing confidence interval for the process capability index Suppose Xi ~N),i = 1,2,...,n, a well-known (1 -a)100% confidence interval for the population variance a2, using a pivotal quantity Q = (n -1)S2/a2, is (Cojbasic and Loncar 2011) ( n -1) S2 ( n -1) S2 (4) ( n -1) S2 2 ( n -1) S2 —-—< u < „ — X- Xa where S2 = (n-1)-1 £(Xt -X)2, and ¿Un-1 and Zl2-a/2,„-1 are the (a/2)100th and i=1 (1 -a/2)100th percentiles of the central chi-square distribution with n -1 degrees of freedom, respectively. From Equation (4), we have P f{ n -1) S2 (5) CI1 = A lX a/2,n-1 A X 1-a/2,n-1 3 Proposed confidence interval for the process capability index The bootstrap introduced by Efron (1979) is a computer-based and resampling method for assigning measures of accuracy to statistical estimates (Efron and Tibshirani, 1993). For a sequence of independent and identically distributed (i.i.d.) random variables, the bootstrap procedure can be defined as follows (Tosasukul et al., 2009). Let X1,X2,...,Xn be independently and identically distributed random 82 Wararit Panichkitkosolkul variables from some distribution with mean / and variance a2. Let the random variables {X*,1 < j < m} be the result of sampling m times with replacement from the n observations X1,X2,...,Xn. The random variables {X*,1 < j < m} are called the bootstrap samples from original data X1, X2,..., Xn. A confidence interval for the population variance can be constructed using the aforementioned pivotal quantity Q = (n - 1)S2/a2. For large sample sizes, a central chi-square distribution with n -1 degrees of freedom can be approximated by a normal distribution with mean n -1 and variance 2(n -1) (Cojbasic and Tomovic, 2007). Therefore, the distribution of the standardized variable (n -1) S2 - ( n -1) Z = S2 -c2 V2(n -1) ^var(S2) converges to a standardized normal distribution as n increases to infinity. The bootstrap confidence interval for the a2 is calculated based on the statistic S2 -a2 T = VVârCS2) where var(S2) is a consistent estimator of the variance of S2. Casella and Berger (2001) have shown the estimator of var(S2) for a non-normal distribution such that Vàr(S2) =1U -S4 n I n -1 I " _ and & = -£(X, -X)4. n i= After re-sampling B bootstrap samples, in each bootstrap sample we compute the value of the following statistic T = S 2 - S2 VVar( S *2) where S is a bootstrap replication of statistic S Var(S*2) = 1 f ^4 -nzlS*4 n I n -1 (6) and ¿4 = — £(X* -X*)4. The (1 -«)100% bootstrap-t confidence intervals for the cr2 is m S2^2(n -1) S^2(n -1) V 2C„2 W2(n -1) ' 2a/2 +42(n -1) where t'a/2 and t**_a/2 are the (a/2)100iA and (1 -a/2)100th percentiles of T* shown in Equation (6). Additionally, the (1 -a)100% confidence interval for the standard deviation a is ( 21 S V2(n -1) ->/2(n -1) 1/2 1-a/2 2 a S V2(n -1) V2(n -1) 1/2 (7) Then, from Equation (7), we construct the confidence interval for the Cp based on the bootstrap-t confidence interval for the standard deviation which is Confidence Interval for the Process Capability Index 83 P P S V2(n -1) 2^ +V2(n -1) S V2(n -1) 2ran +V2(n -1) 1/2 2. On the other hand, when the data are symmetrical or have a coefficient of skewness < 2, the estimated coverage probability of the existing confidence interval can be close to the nominal level. 90 Wararit Panichkitkosolkul Appendix: Source R code for all confidence intervals CI1 <- function (x,LSL,USL,alpha) { n <- length(x) S <- sd(x) chisql <- qchisq(alpha/2,df=n-1) chisq2 <- qchisq(1-alpha/2,df=n-1) K <- (USL-LSL)/(6*S) ci.low <- K*sqrt(chisq1/(n-1)) ci.up <- K*sqrt(chisq2/(n-1)) out <- cbind(ci.low,ci.up) return(out) } CI2 <-function (x,LSL,USL,alpha) { n <- length(x) s2 <- var(x) percentile.T.S <- percentile.T.star(x,alpha) T1 <- percentile.T.S[1] T2 <- percentile.T.S[2] K1 <- (USL-LSL)/6 K2 <- s2*sqrt(2*(n-1)) ci.low <- K1*(K2/(2*T1+sqrt(2*(n-1))))A(-1/2) ci.up <- K1*(K2/(2*T2+sqrt(2*(n-1))))A(-1/2) out <- cbind(ci.low,ci.up) return(out) } percentile.T.star <-function (x,alpha) { B <- 1000 n <- length(x) S2 <- var(x) T.star <- numeric(B) for (i in 1:B){ xs <- sample(x,n,replace=TRUE) s2.star <- var(xs) T.star[i] <- sqrt((n-1)/2)*((s2.star/S2)-1) } T1 <- quantile(T.star,probs=alpha) T2 <- quantile(T.star,probs=1-alpha) out <- cbind(T1,T2) return(out) } Confidence Interval for the Process Capability Index 91 Acknowledgements The author would like to thank the anonymous referees for their helpful comments, which resulted in an improved paper. The author is also thankful for the support in the form of the research funds awarded by Thammasat University. References [1] Bittanti, S., Lovera, M. and Moiraghi, L. (1998): Application of non-normal process capability indices to semiconductor quality control. IEEE Transactions on Semiconductor Manufacturing, 11, 296-303. [2] Casella, G. and Berger, R.L. (2001): Statistical Inference. Duxbury Press: Pacific Grove. [3] Chen, K.S. and Pearn, W.L. (1997): An application of non-normal process capability indices. Quality and Reliability Engineering International, 13, 335360. [4] Cojbasic, V. and Tomovic, A. (2007): Nonparametric confidence intervals for population variances of one sample and the difference of variances of two samples. Computational Statistics & Data Analysis, 51, 5562-5578. [5] Ding, J. (2004): A model of estimating process capability index from the first four moments of non-normal data. Quality and Reliability Engineering International, 20, 787-805. [6] Efron, B. (1979): Bootstrap methods: Another look at the jackknife. Annals of Statistics, 7, 1-26. [7] Efron, B. and Tibshirani, R.J. (1993): An Introduction to the Bootstrap. Chapman & Hall: New York. [8] Ihaka, R. and Gentleman, R. (1996): R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5, 299-314. [9] Kane, V.E. (1986): Process Capability Indices. Journal of Quality Technology, 18, 41-52. [10] Kotz, S. and Johnson, N.L. (1993): Process Capability Indices. London: Chapman & Hall. [11] Kotz, S. and Lovelace, C.R. (1998): Process Capability Indices in Theory and Practice. Arnold: London. [12] Pearn, W.L. and Kotz, S. (2006): Encyclopedia and Handbook of Process Capability Indices: A Comprehensive Exposition of Quality Control Measures. Singapore: World Scientific. [13] Smithson, M. (2001): Correct confidence intervals for various regression effect sizes and parameters: the importance of noncentral distributions in computing intervals. Educational and Psychological Measurement, 61, 605632. 92 Wararit Panichkitkosolkul [14] Steiger, J.H. (2004): Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9, 164-182. [15] Thompson, B. (2002): What future quantitative social science research could look like: confidence intervals for effect sizes. Educational Researcher, 31, 25-32. [16] Tosasukul, J., Budsaba, K. and Volodin, A. (2009): Dependent bootstrap confidence intervals for a population mean. Thailand Statistician, 7, 43-51. [17] Wu, H.-H., Swain, J.J., Farrington, P.A., and Messimer, S.L. (1999): A weighted variance capability index for general non-normal processes. Quality and Reliability Engineering International, 15, 397-402. [18] Zhang, J. (2010): Conditional confidence intervals of process capability indices following rejection of preliminary tests. Ph.D. Thesis, The University of Texas at Arlington, USA.