CONTENTS
Metodoloski zvezki, Vol. 13, No. 2, 2016
Emilio Gomez-Deniz and Enrique Calderin
The Mixture Poisson Exponential-Inverse Gaussian Regression Model: An application in Health Services	71
Kristina Veljkovic
X bar Control Chart for Non-normal Symmetric Distributions	87
Marta Ferreira
Estimating the Coefficient of Asymptotic Tail Independence: a Comparison of Methods	101
Wararit Panichkitkosolkul
Approximate Confidence Interval for the Reciprocal of a
Normal Mean with a Known Coefficient of Variation	117
Gloria Mateu-Figueras, Josep Daunis-i-Estadella, Germa Coenders, Berta Ferrer-Rosell, Ricard Serlavos and Joan Manuel Batista-Foguet Exploring the Relationship between two
Compositions using Canonical Correlation Analysis	131
Metodološki zvezki, Vol. 13, 2016
Reviewers for Volume Thirteen
Mojca Bavdaž Rok Blagus Gregor Dolinar Anuška Ferligoj Herwig Friedl Andreja Jaklič Borut Jurčič Zlobec Damjana Kastelec Tina Kogovšek Katarina Košmelj Vyacheslav Lyubchich Susana Martins Stanislav Mejza Giovanni Millo Irena Ograjenšek Klemen Pavlic Marko Robnik Sikonja Jože Rovan Tamas Rudas Damjam Skulj Gregor Sočan Janez Stare Jordan Stoyanov Gaj Vidmar Blaz Zupan Aleš Žiberna
Metodoloski zvezki, Vol. 13, No. 2, 71-85
The Mixture Poisson Exponential-Inverse Gaussian Regression Model: An application in Health Services
Emilio Gomez-Deniz,1 Enrique Calderin-Ojeda2
Abstract
In this paper a mixed Poisson regression model for count data is introduced. This model is derived by mixing the Poisson distribution with the one-parameter continuous exponential-inverse Gaussian distribution. The obtained probability mass function is over-dispersed and unimodal with modal value located at zero. Estimation is performed by maximum likelihood. As an application, the demand for health services among people 65 and over is examined using this regression model since empirical evidence has suggested that the over-dispersion and a large portion of non-users are common features of medical care utilization data.
1 Introduction
Counting data are common in many social and biomedical studies to explain differences among cases that generate small counts of events. The Poisson distribution plays an important role in the modeling of count data. In this regard, Poisson regression models have been traditionally used to analyze data with a nonnegative integer response variable in a wide range of different applied areas, for example, biostatistics, epidemiology, accident analysis and prevention, insurance and criminology among other fields. Nevertheless, the rigidity of the Poisson mean-variance relationship makes the Poisson regression models exposed to over-dispersion (i.e. the empirical variance is larger than the empirical mean). This is a crucial modeling issue for count data since inadequate confidence interval coverage is produced when over-dispersed count data are considered. The Poisson model does not allow for heterogeneity among individuals. Often there is additional heterogeneity between individuals that is not accounted for by the predictors in the model which results in over-dispersion. To overcome this difficulty, practitioners usually use more general specifications, e.g. negative binomial regression model (Hilbe (2007) and Greene (2009)). The latter model is an example of mixed Poisson regression model. Mixed Poisson regression models are natural extensions of the Poisson regression model allowing for
1	Department of Quantitative Methods and TiDES Institute, University of Las Palmas de Gran Canaria, Gran Canaria, Spain; emilio.gomez-deniz@ulpgc.es
2	Centre for Actuarial Studies, Department of Economics, The University of Melbourne, Australia; enrique.calderin@unimelb.edu.au
72
E. Gimez-Diniz and E. Calderin-Ojeda
over-dispersion. This feature can be included in the model by assuming that the parameter of the Poisson distribution is not fixed due to the heterogeneity of the population, being likewise considered a random variable. For instance, for over-dispersed count-panel data the negative binomial and Poisson-Inverse Gaussian regression models are well-known in the statistical literature. In this regard, by using a gamma distribution for the unknown parameter d, the former model is obtained. The latter model was proposed by Dean et al. (1989), in this case an inverse Gaussian distribution is used to describe the parameter of the Poisson distribution. These models account for over-dispersion by assuming that there will be unexplained variability among individuals who have the same predicted value. It leads to larger variance in the overall outcome distribution but has no effect on the mean.
Regrettably, other mixed Poisson regression models have not been used since they involve special functions and appropriate numerical methods are required. Nevertheless, due to the fast improvement of mathematical software these models can be handled relatively easily. In this article a new mixed Poisson regression model is proposed. As mixing distribution, a particular case of the continuous Exponential-Inverse Gaussian distribution in Bhattacharya and Kumar (1986) when one of the parameter tends to infinity is considered. Furthermore, as it arises from a mixed Poisson distribution, many of its properties can be derived from the ones of the mixing distribution. In this sense, it displays interesting features such as over-dispersion, unimodality, closed-form expressions for factorial moments of any order among other nice properties. The mixed Poisson regression model introduced in this paper does not belong to the linear exponential family of distributions. However, as Wedderburn (1974) showed, the parameter estimation and inference theory developed for the exponential family (i.e. generalized linear models), can be extended to models where a relation between the mean and variance of the response variable can be specified, even though they were not associated with a known likelihood. In this sense, the unconditional distribution obtained in the Poisson-Inverse Gaussian regression model (Dean et al. (1989)) is not part of the exponential family of distributions.
In this manuscript, the demand for health services among people 65 and over is analyzed by using this new mixed Poisson regression model. In particular, the number of hospital stays among the elderly population is considered as response variable. Moreover, as it will be shown later, the data include two important features a high proportion of zeros and over-dispersion. The use of regression model to explain the demand for health services has been studied by Gurmu and Elder (2000) where bivariate regression model for count data was used and also by Lahiri and Xing (2004) by using two-parts model based on Poisson selection model.
The remainder of the paper is structured as follows. Section 2 introduces the new Poisson distribution together with some properties; additionally parameter estimation is discussed; section 3 describes the mixed Poisson regression model derived from this distribution. Estimation is performed by maximum likelihood. Next, a numerical application to analyze factors explaining medical care of people 65 and over is examined in section 4. Finally, some conclusions are drawn in section 5.
The Mixture Poisson Exponential-Inverse Gaussian Regression Model...
73
2 The discrete model
The continuous Exponential-Inverse Gaussian distribution in Bhattacharya and Kumar (1986) can be simplified by letting one of its parameters tends to infinity. Then a more simple probability density function (pdf) is obtained. Then, the pdf of a random variable Q following an Exponential-Inverse Gaussian distribution with a single scale parameter 0 (henceforward EZQ(0)) is given by
f (U|0) = V 0 exp (■, with U > 0 and 0 > 0. V 2 U V	'
(2.1)
Let us now consider the Poisson distribution (henceforward P(0)) whose probability mass function is given by
uy
Pr{Y = y} = e —, y = 0,1,..., U > 0.
y!
(2.2)
Definition 1. We say that a random variable Y has a Poisson-Exponential-Inverse Gaussian distribution if it admits the stochastic representation:
Y|0 - P(0) 0 - EIG(0),
with 0 > 0. We will denote this distribution by Y ~ PEIG(0).
Then, the unconditional probability mass function (pmf) of Y is given by
-20T(2y+l) (11 0 \ Py =—09,,+T -i— u( o+ v>o>o ) > v = 0,1>
(2.3)
(2.4)
22y+i y!
2
22
(2.5)
where U(a, b, z) represents the Tricomi confluent hypergeometric function given by (a, z > 0):
1
U(a, b, z) = -— e-zssa-1(1 + s)b-a-1ds	(2.6)
r(a) J0
(see Gradshteyn and Ryzhik (1994), page 1085, formula 9211-4). The probability generating function is given by
Gy (s) =
0n 1- s
exp
0
2(1 ■ s)
1 — erf
0
2(1 ■ s)
(2.7)
where erf(z) is the error function given by
2 p	oz
erf(z) = -= e-t2dt = iFi(1/2,3/2, -z2), o	Vn
being 1F1(-, ■) the confluent hypergeometric function.
The factorial moments of order k can be obtained from (2.5). They are provided by
^(Y ) = E [Y (Y - 1)... (Y - k +1)] =
2k r(2k) (20)k :
(2.8)
74
E. Gimez-Diniz and E. Calderin-Ojeda
with k = 1,2,...
From the latter expression it can be seen that (2.5) is over-dispersed, since
var(Y) 5
W=*+1>1
Additionally, as (2.1) has an asymptotic mode at 0, the discrete model (2.5) is unimodal with mode at 0 (see Holgate (1970)). Besides, as (2.1) is log-convex, then (2.5) is infinitely divisible and therefore, it is a compound Poisson distribution (see Propositions 8 and 9 in Karlis and Xekalaki, 2005).
Let us now suppose that Y = (Yx,..., Yn) is a random sample of size n from the VEZQ distribution with pmf (2.5). The log-likelihood function is proportional to
¿(0; Y) a n log ^ + è logU (1 + Y, 2, 0) .
i=i ^ '
(2.9)
Having into account that
d
—U(a, b, z) = -a U(a + 1, b + 1, z),
the maximum likelihood estimate of the parameter 0 can be simply obtained by solving this normal equation
dl(0; Y) _ n d0 0

E
i=i
(1 + Yi) U (f + Yi, f, D
U (2 + Yi, 2 , 2)
0.
(2.10)
The Fisher's information matrix can be approximated from
where
dfl(0; Y) d0f
Mi(Yi,0) Mf(Yi,0) Mf (Yi, 0)
n

E
i=1
(Y - 1) {Mi(Yj,0) + [Mf(Yj,0)]f}
[Mf(Yi,0)]f	:
(2.11)
_ -(3 + Yi) U (5 + Yi, 5,0) U fi + Yi, 1,0 1 2 V V 2 " 2'2/ V 2 " 22
_ |i + Y
U( Yi + 3,3,0
I i -r ^ 2 2
_ U ( 1 + y, 1,0
1 2 2 2
This maximum likelihood estimate can also be calculated by using the EM algorithm. This method is a powerful technique that provides an iterative procedure to compute maximum likelihood estimation when data contain missing information. This methodology is suitable for distributions arising as mixtures since the mixing operation produces missing data. One of the main advantages of the EM algorithm is its numerical stability, increasing the likelihood of the observed data in each iteration. It does not guarantee convergence to the global maximum. It can be usually reached by starting the parameters at the moment estimates. The EM algorithm maximizes •£(*; Y) by iteratively
2
The Mixture Poisson Exponential-Inverse Gaussian Regression Model...
75
maximizing E(l(<$; Y, Z)) where Y = ,..., Yn) denotes the sample observations and Z = (6>i,..., 0n) denotes the missing observations and l(§\ Y, Z) is the complete log-likelihood function.
The EM algorithm is based on two steps, the E-step, or expectation, fills in the missing data. Once the missing data are built-in, the parameters are estimated in the M-step (maximization step).
At the E-step of the (j+l)-th iteration the expected log-likelihood of the complete data model is computed by
E(£(fc Y, Z) | Y, 0(j)).	(2.12)
In the M-step, the updated parameter estimate is computed from maximizing the quantity (2.12) with respect to Then, if some terminating condition is satisfied we stop iterating, otherwise move back to E-step for more iterations.
In mixed Poisson distributions (Karlis, 2005) the unobserved quantities are the realizations of 0i of the unobserved mixing parameter for each data point Yi, i = 1... n. Additionally, we assume that the distribution of Y | 0i is Poisson with 0i following (2.1). On the other hand, when the complete model is from the exponential family then the E-step computes the conditional expectations of its sufficient statistics. As it can be seen below, the continuous distribution given in (2.1) is a member of the exponential family of probability distributions since it can be written as
f (O|0) = h(0) exp (A(0) T(0) - B(0)) where h(0) = , A(0) = -V20, T(0) = V0 and B(0) = - log yffi. Then, T(0) is a
V 20
sufficient statistic of this distribution.
The EM type algorithm for this model can be described as follows. From the current estimates
• E-step: Calculate the pseudo-values
ti = E(70i | Yi,<^)
for i = 1,..., n.
M-step: Find the new estimates (/>(j+1)
n
n Er=i ti
If some convergence condition is satisfied then stop iterating, otherwise move back to the E-step for another iteration.
2
1
3 The regression model
Let us now consider a random variable Yi denoting event counts and a vector of covariates or explanatory variables x; = (xi1,..., xip)f, including an intercept, related to the i-th
76
E. Gimez-Diniz and E. Calderin-Ojeda
observation that denotes a weight of observable features. In this model with fixed effects, it is assumed that
Y|0i - P(0-M-)
e% - eig(0)
Mi = exp(xi^),	(3.1)
where ft = (ft, ft, • • •, ft) a vector of regression coefficients.
The PEIG distribution has mean m =1/0 and variance 1/0 + 5/02. If we parameterize m = 1/0 = exp(xitft), the marginal mean and the marginal variance of the response distribution distribution are given by
E(Y|xi) = exp(xi^) and
v«r(Yi|Xi) = E (Yi|xi) + 5E (Y-|X-)2,
respectively.
Likewise the conditional mean of the response variable is related to the explanatory variables through a link function, g(E(Yi|xi)) = x^ft where g(-) is a monotonic function. The link function determines the function of the conditional mean that is predicted by x^ft As the mean of (2.5) is non-negative, the log-link is the usual choice for PEIG regression model since it guarantees a non-negative value for the conditional mean. Additionally, as var(Yi|xi) > E(Yi|xi), this mixed Poisson regression model is over-dispersed. In addition to this, as the variance is determined by the mean, no additional variance estimate is required. Besides, this model does not nest the Poisson regression model. Maximum likelihood estimation for this fixed effect regression model involves setting the partial derivatives of the log-likelihood function with respect to regression coefficients ft with j = 1,..., p equal to zero.
Let us now suppose that (yi, xi), i = 1,..., n are n independent realizations of the regression model given in (3.1) where yi is the response variable and xi a vector of explanatory variables. Then, the log-likelihood function can be expressed as
.£(ft,...,ft) = ^ ft,...,ft)
i=i

nn - 2 log M + J]logr(2yi + 1) - 2^ yi + ^ log2
i=i \ i=i ) n n /1 11 \
5>g + U (2+y-. 2 • 2^7)-	(3-2)
i=i i=i v 'n/
Then, the normal equations to obtain the maximum likelihood estimates are given by
= n ^ ^ i 1 ^ xis 1U (3 + y- 2 ■ ) = 2hx"+M2+yv 2 M-u(2+y-2.2j.)■
The Mixture Poisson Exponential-Inverse Gaussian Regression Model...
77
with s = 1, 2,... ,p.
Furthermore, the required expressions to approximate the Fisher's information matrix associated with maximum-likelihood estimates are provided by
for s = 1, 2,... ,p and k = 1,2,... ,p.
4 Application to health service data 4.1 Estimation of parameters
In the following, we are going to illustrate the performance of this mixed Poisson regression model. For that reason, let us consider now the number of hospital stays among the elderly population age 65 and over in the U.S. This amount represents a significant portion of the annual expenditures on hospital care since government insurance programs in the U.S. bear the highest financial burden for health care. Moreover, it has been forecasted that the number of elderly will continue to grow in the coming years. This set of data appears originally in Deb and Trivedi (1997) in their analysis of various measures of health-care utilization using a sample of 4406 single-person households in 1987. Data have been obtained from the Journal of Applied Econometrics 1997 Data Archive. Estimation of model and all the data analyses were done using Mathematica 9.0 software package. All the codes used to obtain reported results and all additional information useful to make research reproducible can be found on the journal's website or it will be made available by the authors on request. Our goal is to model the number of hospital stays (HOSP) as the response variable. This measure includes two interesting features, on the one hand over-dispersion, the mean and variance of the empirical distribution are 0.30 and 0.56 respectively, and, on the other hand, a very high proportion of non-users (80.36%). Since the Poisson regression model is not able to capture the the heterogeneity among individuals found in the data, the VEZQ regression model is used to explain the demand for health services.
Let us firstly considered the model without covariates. Parameter estimates, standard errors (in brackets) and the maximum of the log-likelihood (^max) of the distribution of the hospital stays are 0 = 0.296 (0.01) and £max = -3304.51 for Poisson model and 0 = 0.308 (0.01), lrnax = -3021.92 for PEIG model respectively. For the latter model, the estimate can also be obtained by using the EM algorithm after 25 iterations when the relative change of the estimate between two successive iterations is smaller than 1 x 10-10,
78
E. Gimez-Diniz and E. Calderin-Ojeda
after taking initial starting value in the neighborhood of the moment estimate. Therefore, it can be concluded that the VEZQ model provides a better fit to the data than Poisson distribution by considering maximum of the log-likelihood as criterion of comparison. For the standard model given in (2.5) the estimated value of 0 is 3.24783 with a standard error of 0.138. Since the empirical distribution is over-dispersed the Poisson model seems to be inadequate for estimating these count data. Next, in Figure 1 the histogram of the empirical distribution of the number of hospital stays (Observed), together with fitted distribution, obtained from the Poisson distribution and VEZQ distribution has been plotted. As it can be observed, there is a clear spike of extra zeros representing the non-hospitalization of the elderly population with the best fit to the data obtained with the VEZQ model.
Figure 1: Observed and fitted (VEZQ and Poisson) distribution of the number of hospital
stays (HOSP)
Let us now analyze the model with covariates. The explanatory variables are as follows: (1) a dummy variable (EXCLHLTH) which takes the value 1 if self-perceived health is excellent; (2) a dummy variable (POORHLTH) which takes the value 1 if self-perceived health is poor; (3) a count variable (NUMCHRON) giving the number of chronic disease and condition (cancer, heart attack, etc.); (4) age (AGE) divided by 10; (5) a dummy variable (MALE) with value 1 if the patient is male. For the ith patient, the number of hospital stays Y follows a VEZQ whose mean depends on a set of covariates trough the log-link function. The goal is to predict the number of hospital stays Y (response variable) using a vector of explicative variables (covariates).
At first sight, it seems logical that due to the presence of over-dispersion, a relative large long right tail, and a high proportion of zeros as compared to the proportion of other values, a simple Poisson regression model is not adequate to explain the number of hospital stays since it tends to overestimate the probability of lower values and underestimate the probability of larger values. For that reason, it is expected that a a mixed Poisson regression model will describe in a more accurate way the right tail of empirical data and the high proportion of zeros in the sample. As it can be observed in Table 1, the VEZQ and a Poisson (in brackets) regression model have been fitted to data. From left to right parameter estimates, standard errors, t-Wald and p-values are shown for both models. After
The Mixture Poisson Exponential-Inverse Gaussian Regression Model...
79
observing the values of the estimated regressors, there exists some differences between estimated effects of both models. In this sense, the VEZQ regression model predicts a higher use of the health service when self-perceived health is poor, the number of chronic disease and condition and age increases and the patient is male. Furthermore, when self-perceived health is excellent then the predicted change in the number of hospital stays decreases at a lower rate than in the Poisson regression model. The intercept coefficient -3.959 is the predicted logarithm of the number of hospital stays when the values of EX-CLHLTH, POORHLTH, NUMCHR, AGE and MALE are equal to 0. Having said that, it can be concluded, from this numerical application, that the VEZQ regression model predicts a higher use of the health service for this set of explanatory variables. All of parameter estimates are significant at the usual nominal level.
Table 1: Parameter estimates, standard errors, t-Wald and p-values for VEZQ and Poisson (in brackets) regression models for the number of hospital stays.
Parameter	Estimate	S.E.	t-Wald	Pr > |t|
INTERCEPT	-3.959(-3.220)	0.52(0.32)	-7.63(-10.19)	0.00(0.00)
EXCLHLTH	-0.688(-0.720)	0.22(0.18)	-3.15(-4.10)	0.00(0.00)
POORHLTH	0.683(0.613)	0.12(0.07)	5.60(9.18)	0.00(0.00)
NUMCHRON	0.326(0.264)	0.03(0.02)	9.72(14.48)	0.00(0.00)
AGE	0.268(0.183)	0.07(0.04)	3.93(4.39)	0.00(0.00)
MALE	0.196(0.109)	0.10(0.06)	2.17(1.94)	0.03(0.05)
Following the work of Wedderburn (1974), we have also estimated the parameters by using a quasi-likelihood model. In this case, we need only to specify the marginal response variance in terms of the marginal mean, i.e. var(Yi) = + , (i = 1,..., n). Via quasi-likelihood estimation, the estimates are very close to the ones shown in Table 1. Note that they are given in the same order as in Table 1, that is, -3.92958, -0.679321, 0.605773, 0.307492, 0.262405 and 0.187604. The value of the negative of the maximum of the log-likelihood is 2896.79.
4.2 Model assessment
Several measures of model validation to compare the VEZQ and Poisson regression model are shown in Table 2. Firstly, the value of the negative of the maximum of the log-likelihood (NLL) and Akaike Information Criterion (AIC) are given in the first two rows of this Table; as a lower value of these measures is desirable, the VEZQ regression model is preferable. Bozdogan (1987) proposed a corrected version of AIC, the Consistent Akaike Information Criteria (CAIC), in an attempt to overcome the tendency of the AIC to overestimate the complexity of the underlying model. Bozdogan (1987) also observed that AIC does not directly depend on the sample size and, as a result, it lacks certain properties of asymptotic consistency. See also Anderson et al. (1998). When formulating the CAIC, a correction factor based on the sample size is used to compensate for the overestimating nature of AIC. The CAIC is defined as CAIC = 2 NLL + (1 + log n) p,
80
E. Gimez-Diniz and E. Calderin-Ojeda
where p refers to the number of estimated parameters and n is the sample size. Again, a model that minimize the Consistent Akaike Information Criteria is preferable. As it can be observed, the VEZQ regression model also dominates the Poisson regression model in terms of the CAIC.
Table 2: Measures of model selection for the models considered.
Distribution
Criterion	Poisson	vszg
NLL	3047.32	2895.11
AIC	6116.63	5802.22
CAIC	6150.98	5846.57
Pearson statistic, (ef )2	7071.90	4626.74
Deviance residual/df	-0.30183	-0.33572
Now we perform some diagnostic checks based on analysis of residuals. This is a useful method to detect outliers and check the variance assumption in a more general setting (see Cameron and Trivedi (1986), for details). Perhaps the most common choice is Pearson's residuals. They are used to identify discrepancies between models and data, and they are based upon differences between observed data points and fitted values predicted by the model. The i-th Pearson residual for a given model is provided by
ef = y - ji ,	(4.1)
8 a/ var(jli)
where j is the fitted marginal mean and var(j) is the estimated marginal variance under the discussed model. Hence, if the model is correct, the variability of these residuals should appear to be fairly constant, when they are plotted against fitted values or predictors. The Pearson's residuals are often skewed for non-normal data, and this make the interpretation of the residual plots more difficult to interpret. For that reason, other quantifications of the discrepancy between observed and fitted values have been suggested in the literature. In this regard, another choice in the analysis of residual is the signed square root of the contribution to the deviance goodness-of-fit statistic (i.e. deviance residuals). This is given by D = ^"=i di, where
di = sgn(yi - j2i)y/2(£(yi) - l(ji)), i = 1,22,...,n,
and sgn is the function that returns the sign (plus or minus) of the argument. The £(yi) term is the value of the log likelihood when the mean of the conditional distribution for the i-th individual is the individual's actual score of the response variable. The is the log-likelihood when the conditional mean is plugged into the log-likelihood. Usually the deviance divided by its degree of freedom is examined by taking into account that a value much greater than one indicates a poorly fitting model. See for example De Jong and Heller (2008).
The Mixture Poisson Exponential-Inverse Gaussian Regression Model...
81
It is well-known that for the Poisson distribution with parameter Oi the deviance residuals are given by (see Dunteman and Ho 2006))
di = sgn(y - #i)
2( yi log( |) - (yi - Oi)
1/2
i = 1, 2,... ,n.
(4.2)
For the model introduced in this manuscript the deviance residuals are easily obtained
by
di = sgn(yi - Vi) S 2
log
U(0.5 + yi, 0.5, (2y)-1)\ 1 /y± U(0.5 + Oi, 0.5, (2£i)-1)y 2 0HV
1/2
i = 1,2, ...,n.
Note that the deviance does not exist whenever there are zero responses in the data. However, it is usually assumed that di = 0 when yi = 0 (e.g. yi log yi is zero for yi = 0). The Pearson's statistics together with the deviance residual divided by the degree of freedom are shown in Table 2. The VEZQ dominates widely the Poisson distribution in terms of the Pearson's statistics and small differences appear in the value of the deviance residual. Recall that we have taken this value as zero when the observed response variable takes the value zero.
Graphical model diagnostic may also be developed using expression (4.1). In this case, for the Poisson regression model this reduces to ef = (yi — 0)/\fdi, while for the distribution VEZQ regression model, this expression is given by ef = (y—)0i)/ y/)ui(1 + 500 as it can be easily verified. For this example, not much differences are found between these plots and those ones produced by the raw residuals, yi — °i, which are shown in Figure 2. On the other hand, the Pearson's residuals are usually standardized by divid-


Figure 2: Plots of the raw residuals for the Poisson (left) and the VEZQ (right) regression
models.
ing by V1 — hi, where hi are the leverages obtained from the diagonal of the hat matrix W1/2X (X 'WX )-1 X'W1/2, being W equal to the n x n diagonal matrix with i-th entry wi, given by wi = (dOi/dx'P))2 /var(Yi). This results Oi for the Poisson regression model and )i/(1 + 5)2) for the regression based on the new distribution presented here. See Cameron and Trivedi (1986) for details about the construction of the hat matrix. The standardized Pearson's residuals have also been plotted, they are shown in Figure 3. As it can be seen, for the Poisson regression model many of the values of the Pearson's standardized residuals lie outside the range (—2,2), pointing out a poorer fit to data than the
82
E. Gimez-Diniz and E. Calderin-Ojeda
0	1000	2000	3000	4000	0	1000	2000	3000	4000
Figure 3: Standardized Pearson's residuals for the Poisson (left) and the V£IQ (right)
distributions
one obtained for the V£TQ regression model presented in this work. See Hilbe (2007) for details.
In the following, as the regression model introduced in this paper is not nested in the Poisson regression model, the Vuong's test can be used to compare the estimates of the Poisson regression model and PEIG regression model. In this regard, one might be interested in testing the null hypothesis that the two models are equally close to the actual model, against the alternative one that one of the model is closer (see Vuong (1989)). The z-statistic is
Z =
1
where
w = — n
E
i=1
log
wvn
f (ft)
¿(ft) - ¿(¡9)

n
log
i=1
f (ft)
and f and g represent here the V£XQ and Poisson distributions, respectively.
Due to the asymptotic normal behaviour of the Z statistic under the null hypothesis, rejection of null hypothesis in favour of the alternative one that f occurs with significance level a, when Z > z1-a being z1-a the (1 — a) quantile of the standard normal distribution. For the Vuong's test, Z = 3.95754, then the V£XQ model is preferred at the usual nominal levels.
2
2
1
1
4.3 Comparisons with other models
Finally the fit obtained with the PE IG regression model is compared to two other mixed Posisson regression models traditionally used in the statistical literature, the negative binomial and the Poisson-Inverse Gaussian regression models (see Dean et al. (1989)). Furthermore, when the empirical data includes a high presence of zeros it is usual to consider a reparameterization of the parent distribution to capture all zeros in the sample, the zero-inflated (ZI) model. If the parent distribution is p(x), a ZI distribution is built as follows (see Cohen (1966))
p(x) = { (1 — + ^P(0), x = 0,
\ "0p(x),	x > 0,
The Mixture Poisson Exponential-Inverse Gaussian Regression Model...
83
where p(x) is the parent distribution and 0 < 0 < 1 is the inflated parameter. The PEIG, negative binomial and Poisson-Inverse Gaussian distributions have been reparameterized to obtain the maximum likelihood estimates under the ZI model and the results, together with the homogeneous models (without inflation), are displayed in Table 3.
Table 3: Maximum of the log-likelihood and Consistent Akaike Information Criteria (CAIC) for different homogeneous and ZI models.
	Homegeneous	ZI	
Distribution	NLL CAIC	NLL	CAIC
V£XÇ	2895.11 5846.57	2851.90	5769.74
NB	2857.11 5779.95	2853.37	5781.87
PIG	2877.33 5820.40	2847.69	5770.51
As it can be seen in this Table, the (ZI) VETQ regression model provides the best fit to data for this particular dataset when the CAIC is used as a criterion of comparison since the other two mixed Poisson regression models include an additional parameter. Since the global maximum of the log-likelihood surface is not guaranteed, different initial values of the parametric space were considered as a seed point. The calculations have been completed by using the FindMaximum function of Mathematica software package v.9.0 (Wolfram (2003)) (the derivative of the modified Bessel function of the third kind is available in this package). Additionally, by using other different methods such as Newton, PrincipalAxis and QuasiNewton the same results were obtained.
5 Conclusions
In this paper, a new mixed Poisson regression model to explain the demand for health services among people 65 and over to account for a large portion of non-users has been proposed. This model has been derived by mixing the Poisson distribution with a particular case of the continuous Exponential-Inverse Gaussian distribution when one of its parameter tends to infinity. Additionally, it is over-dispersed and unimodal with modal value located at zero. The model might be considered an alternative to Poisson regression model when the empirical data include a high proportion of zeros. In this regard, several measures of model assessment, including the Vuong's test for non-nested model selection, have been provided to support this goal. Apart from that, due to the high proportion of zeros in the empirical data, a zero-inflated version of this model has also been used to explain the demand for health services of elderly people.
Acknowledgment
The authors would like to thank the editor and two anonymous referees for their relevant and useful comments. Research partially funded by grant by grant ECO2013-47092 (Ministerio de Economía y Competitividad, Spain).
84
E. Gimez-Diniz and E. Calderin-Ojeda
References
[1]	Anderson, D.R., Burnham, K.P. and White, G.C. (1998). Comparison of Akaike information criterion and consistent Akaike information criterion for model selection and statistical inference from capture-recapture studies. Journal of Applied Statistics, 25, 2, 263-282.
[2]	Bhattacharya, S.K., Kumar, S. (1986): E-IG model in life testing. Calcutta Statistical Association Bulletin, 35, 85-90.
[3]	Bozdogan, H. (1987): Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions. Psychometrika, 52, 3, 345-370.
[4]	Cameron, A.C. and Trivedi, P.K. (1986): Econometric models based on counts data: comparisons and applications of some estimators and tests. Journal of Applied Econometrics, 1, 1, 29-53.
[5]	Cohen, A. C. (1966): A note on certain discrete mixed distributions. Biometrics, 22, 3, 566-572.
[6]	De Jong, P. and Heller, G.H. (2008): Generalized Linear Models for Insurance Data. Cambridge University Press.
[7]	Dean, C.B., Lawless, J., and Willmot, G.E. (1989): A mixed Poisson-inverse Gaussian regression model. Canadian Journal of Statistics, 17, 2, 171-181.
[8]	Deb, P. and Trivedi, P.K. (1997): Demand for Medical Care by the Elderly: A Finite Mixture Approach. Journal of Applied Econometrics, 12, 3, 313-336.
[9]	Dunteman, G.H. and Ho, M-H.R. (2006): An Introduction to Generalized Linear Models. SAGE Publications.
[10]	Gradshteyn, I.S., Ryzhik, I.M. (1994): Table of Integrals, Series, and Products. Alan Jeffrey, Editor. Fifth Edition. Academic Press, Boston.
[11]	Greene, W. (2009): Models for count data with endogenous participation. Empirical Economics, 36, 133-173.
[12]	Gurmu, S. and Elder, J. (2000): Generalized bivariate count data regression models. Economics Letters, 68, 31-36.
[13]	Hilbe, J.M. (2007): Negative Binomial Regression. New York: Cambridge University Press.
[14]	Holgate, P. (1970): The modality of some compound Poisson distribution.
Biometrika, 57, 666-667.
[15]	Karlis, D. (2005): EM algorithm for mixed Poisson and other discrete distributions.
Astin Bulletin, 35, 3-24.
The Mixture Poisson Exponential-Inverse Gaussian Regression Model...
85
[16]	Karlis, D. and Xekalaki, E. (2005): Mixed Poisson distributions. International Statistical Review, 73, 35-58.
[17]	Lahiri, K. and Xing, G. (2004): An econometric analysis of veterans' health care utilization using two-part models. Empirical Economics, 29, 431-449.
[18]	Vuong, Q. (1989): Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57, 307-333.
[19]	Wedderburn, R.W.M. (1974): Quasi-likelihood functions, generalized linear models and the Gauss-Newton method. Biometrika, 61, 439-447.
[20] Wolfram, S. (2003): The Mathematica Book. Wolfram Media, Inc.
Metodoloski zvezki, Vol. 13, No. 2, 87-100
X bar control chart for non-normal symmetric
distributions
Kristina Veljkovic 1
Abstract
In statistical quality control, X bar control chart is extensively used to monitor a change in the process mean. In this paper, X bar control chart for non-normal symmetric distributions is proposed. For chosen Student, Laplace, logistic and uniform distributions of quality characteristic, we calculated theoretical distribution of standardized sample mean and fitted Pearson type II or type VII distributions. Width of control limits and power of the X bar control chart were established, giving evidence of the goodness of fit of the corresponding Pearson distribution to the theoretical distribution of standardized sample mean. For implementation of X bar control chart in practice, numerical example of construction of a proposed chart is given.
1 Introduction
The X bar chart is extensively used in practice to monitor a change in the process mean. It is usually assumed that measured quality characteristic has normal or approximately normal distribution. On the other hand, occurrence of non-normal data in industry is quite common (see Alloway and Raghavachari, 1991; Janacek and Meikle, 1997). Violation of normality assumption results in incorrect control limits of control charts (Alwan, 1995). Misplaced control limits lead to inappropriate charts that will either fail to detect real changes in the process or which will generate spurious warnings when the process has not changed.
In the case of non-normal symmetric distribution of quality characteristics, no recommendations, except the use of the normal distribution, are given in the quality control literature. Approximation of the distribution of sample mean with normal distribution is based on the central limit theorem, but in practice small sample sizes are usually used.
We will consider four types of non-normal symmetric distributions of quality characteristic: Student, Laplace, logistic and uniform distributions. These distributions are chosen because of their applications in various disciplines (economics, finance, engineering, hydrology, etc., see for instance Ahsanullah, et al., 2014; Balakrishnan, 1992; Kotz et al., 2001). For each of these distributions, we calculated theoretical distribution of the standardized sample mean (or its best approximation) and approximated it with Pearson type II or type VII distributions. Pearson system of distributions is known to provide approximations to a wide variety of observed distributions (Johnson et al., 1994).
1 Department of Probability and Statistics, Faculty of Mathematics, University of Belgrade, Serbia; kristina@matf.bg.ac.rs
88
Kristina Veljkovic
It is presumed that a process begins in in-control state with mean and that single assignable cause of magnitude 5 results in a shift in the process mean from to either
— 5a or + 5a, where a is the process standard deviation (Montgomery, 2005). It is also assumed that the standard deviation remains stable. Center line of the X bar chart is set at and upper and lower control limits, respectively, + ka/^/n and — ka/^/n, where n represents the sample size and k width of control limits. Samples of size n are taken from the process and the sample mean is plotted on the X bar chart. If a sample mean exceeds control limits, it is assumed that some shift in the process mean has occurred and a search for the assignable cause is initiated.
The rest of the paper is organized as follows. In Sections 2, 3 and 4, respectively, descriptions of chosen distributions of quality characteristic, distributions of standardized sample mean and Pearson types II and VII distributions are given. Construction of the X bar control chart and its power are examined in Section 5, along with the comparisons of theoretical distribution of sample mean with the corresponding Pearson distribution. In Section 6, implementation of proposed X bar chart is considered. Finally, conclusions are drawn in Section 7.
2 Distribution of quality characteristic
We considered four types of non-normal symmetric distributions of quality characteristic X: Student distribution t(10), standard Laplace L( 1) distribution and logistic distribution LGS(1) (see Johnson et al. 1994; Johnson et al. 1995) as representatives of symmetric distributions with heavier tails than normal distribution (Figure 1) and uniform U(0,1) distribution as a representative of symmetric distributions with lighter tails than normal distribution. For simplicity, we have chosen standard forms of all four distributions.
in
o
£ °
t o
N(0,1) t(10) L(1) LGS(1)
w C
o
o
o
o
■2
4
Figure 1: Probability density functions of Student t(10), Laplace L( 1), logistic LGS( 1) and standard normal N(0,1) distributions
Control chart for non-normal symmetric distributions.
89
Distribution	fx			ß	a2	a4
t(10)	315 (1 + 256^/I0) V	A 10 y	-5.5 , X G R	0	1.25	4
L(1)	1	e-l 2	e	x| X	G R	0	2	6
LGS (1)	e-x (1+e-	x2 x G R c )2		0	n2 3	4.2
U(0,1)	X,X	G [0,1]		0.5	1 12	1.8
Table 1: Chosen distributions of quality characteristics
Distributions are given in Table 1 by their probability density function fX, mean variance a2 = Var(X) and kurtosis a4 = E(X-E4(X^ . As all chosen distributions are symmetric around the zero, skewness a3 = E(X-E3(X)) = o.
a 2
3 Distribution of standardized sample mean
For chosen distributions of quality characteristic, we will derive the distribution of standardized sample mean Tn = n. As all chosen distributions are symmetric, skewness of standardized sample mean will also be equal to 0.
3.1 Sample from Student's distribution
Witkowsky (2001, 2004) proposed a method for numerical evaluation of the distribution function of a linear combination of independent Student variables. The method is based on the inversion formula which leads to the one-dimensional numerical integration.
Let (Xi, X2,..., Xn) be a sample from Student t(v) distribution. Further, let Y = ELi Xk be sum of these variables and (t) denote the characteristic function of Xk. The characteristic function of Y is
n	n	v
^Y(t) = n ^Xk (t) = JJ	(v2 |t|) 2 Kv/2 (v 1 |
k=1 k=1 2 i( 2)
where Ka(z) denotes modified Bessel function of the second kind.
The cumulative distribution function FY (y) of random variable Y is, according to the inversion formula due to Gil-Pelaez (1951), given by
Fy (y) = 2 +1 i 2 n J 0
1 , 1 r sin (ty)^Y (t)
2 n 0	t
dt	(3.1)
For any chosen y algorithm tdist in R package tdist (Witkowsky and Savin, 2005) evaluates the integral in (3.1) by multiple p-points Gaussian quadrature over the real interval t e (0,10n). The whole interval is divided in m subintervals and the integration over each subinterval is done with p-points Gaussian quadrature which involves base points
90
Kristina Veljkovic
bij, and weight factors Wj, i = 1,2,..., p, j = 1, 2,..., m. So,
FY(y) - 1 + 1 £ £ ^^^Wij(bij).
2 n z—' ^ bij
j=i i=i j
Then, cumulative distribution function of standardized sample mean is equal to
FTn (t) =	e R.
Kurtosis of Tn is equal to a4jT„ = 3 + n
3.2 Sample from Laplace distribution
Let (Xi, X2,..., Xn) be a sample from standard Laplace L(1) distribution. Difference of two independent random variables with standard exponential e(1) distribution has standard Laplace distribution. Further, standard exponential distribution is gamma distribution, r(1,1). Sum of n independent variables with r(1,1) distribution is gamma distribution r(n, 1). In that way, we conclude that sum Y of n independent random variables X1, X2,..., Xn with standard Laplace distribution can be written as the difference of two random variables with gamma distribution r(n, 1) which is called bilateral gamma distribution.
Bilateral gamma distribution is symmetric around 0 (Kuchler and Tappe, 2008), with cumulative distribution function for y > 0
(y) = 2 + 2n ■ è a* Y(k + 1,y)
v ' k=0
where the coefficients (ak)k=0,...,n-1 are given by
(_ 1\ 1 n-2-k
nk j^n—k n (n +1), an-1 = 1' 1=0
and Y(n, y) is incomplete gamma function.
Then, cumulative distribution function of standardized sample mean is equal to
Fn (t) = Fy(v^t) , t £ R.
Kurtosis of standardized sample mean is equal a4jT„ = 3 + n.
3.3 Sample from logistic distribution
Let (X1, X2,..., Xn) be a random sample from logistic LGS(1) distribution. Insofar, the best approximation of the distribution of standardized sample mean Tn is given by Gupta and Han (1992). They considered the Edgeworth series expansions up to order n-3 for
Control chart for non-normal symmetric distributions.
91
the distribution of the standardized sample mean. Cumulative distribution function of Tn is given by
FTn( » *W 1 (4,5H.M)+ if +
35 ( 6 \2„,A\ 1 ( 1 432 „_ 210486
+ ifUJ H'(t) + S3 U-H7(i) + T0TT5H»(t) +
5775 /6N 3
+ i2fU; Hnitv)•t € R
where and $(•) are standard normal pdf and cdf and Hj(x) is the Hermite polynomial.
Kurtosis of standardized sample mean is a4,Tn = 3 + I2 ■
3.4 Sample from uniform distribution
Let (Xi, X2, ■ ■ ■, Xn) be a random sample from uniform U(0,1) distribution. The sum Y = n=i Xk has Irwin-Hall distribution (Johnson et al., 1995) with cumulative distribution function
Fy(y) = 1 + ¿y ¿(-1)^k) Sgn(y - k)(y - ^ x € R ' k=0 ^ '
Then, standardized sample mean has cumulative distribution function equal to
FT' (()= ^ (7S +5) € R-
Kurtosis of standardized sample mean is a4,Tn =3 — I2 ■
4 Symmetric Pearson distributions 4.1 Pearson type II distribution
Pearson type II distribution can be used for approximation of the distribution of random variable with skewness a3 = 0 and kurtosis a4 < 3 (Johnson et al., 1994). Cumulative distribution function of Pearson type II distribution is equal to
F(t) = It-a(a,a), 0 < —- < 1,
s	s
where
- = — l 2®4 s = 2 2a a = 5a4 —9 +1	(4 1)
3 — a4'	V 3 — a4' 2(3 — a4) '	'
It(a, b) = Bab)' B(a, b) is beta function and Bt(a, b) is incomplete beta function. In other words, random variable has beta distribution B(a, a).
92
Kristina Veljkovic
4.2 Pearson type VII distribution
Pearson type VII distribution can be used for approximation of the distribution of random variable with skewness a3 = 0 and kurtosis a4 > 3 (Johnson et al., 1994). Cumulative distribution function of Pearson type VII distribution is equal to
F(t) = 2Ia2/(a2+t2) (m — 1, ^ , t < 0
and
where
F(t) = 1 - 11a2/(a2+t2)	- 1, ^ , t >
5a4 — 9	/ 2a4
m = —-—, a =
2(a — 3):
a4 — 3
(4.2)
5 Design of X bar control chart
For sample sizes n = 3, 4,..., 10, we calculated theoretical distribution of the standardized sample mean of considered distributions, using results from Section 3 and then we approximated it with Pearson type II distribution in the case of uniform distribution and with Pearson type VII distribution in the case of Student, Laplace and logistic distributions. Parameters of the fitted Pearson types II and VII distributions are calculated using formulas (4.1) and (4.2). Code for all calculations was written, by the author, in statistical software R and is available as supplementary code on the web site of the Journal. Width of control limits of the X bar control chart is calculated from
a = 1 - o - k< X < ^a + k= M = 2(1 - Frn(k)), (5.1) nnn
where FTn is cumulative distribution function of standardized sample mean, using Brent's root-finding method (Brent, 1973). Same procedure was followed for both the theoretical distribution of standardized sample mean and corresponding Pearson distribution.
Control limits of the X bar control chart for non-normal symmetric distributions are calculated for specified probability 0.0027 of type I error, in analogy with X bar control chart for normal distribution. When quality characteristics is normally distributed, the probability that sample mean falls outside three standard deviations from the center line is 0.0027, for in-control process. These are so called three-sigma control limits (here sigma refers to the standard deviation of sample mean) and they are frequently used in construction of X bar control chart (Montgomery, 2005).
Calculated widths of control limits, for considered distributions of quality characteristic, sample sizes n = 3,4,..., 10, probability of false alarm a = 0.0027, for theoretical distribution of the standardized sample mean and Pearson types II and VII distributions, are given in Table 2.
As it can be seen in the Table 2, the values of the width of the control limits calculated from theoretical distribution and corresponding Pearson distribution are very close, i.e. corresponding Pearson distribution fits very well to the theoretical distribution of the
Control chart for non-normal symmetric distributions.
93
Sample size	Width of control limits			
	Student t(10) Theor. Pearson	Laplace L(l) Theor. Pearson	Logistic LGS(l) Theor. Pearson	Uniform U (0,1) Theor. Pearson
n = 3 n = 4 n = 5 n = 6 n = 7 n = 8 n = 9 n = 10	3.21966 3.22227 3.16998 3.17156 3.13867 3.13966 3.11712 3.11775 3.10136 3.10178 3.08934 3.08962 3.07987 3.08005 3.07221 3.07233	3.54221 3.53915 3.43224 3.43628 3.36034 3.36606 3.30939 3.31520 3.27130 3.27668 3.24168 3.24652 3.21796 3.22227 3.19852 3.20234	3.25580 3.26074 3.20035 3.20234 3.16405 3.16527 3.13877 3.13966 3.12021 3.12091 3.10602 3.10660 3.09482 3.09531 3.08577 3.08619	2.59834 2.65308 2.72926 2.74902 2.79650 2.80355 2.83511 2.83866 2.86060 2.86314 2.87932 2.88118 2.89366 2.89502 2.90489 2.90597
Table 2: Width of control limits of X bar control chart
standardized sample mean. On the other hand, normal approximation would give value of k = 2.99998, for all n and all distributions of quality characteristics.
Now, we are interested to see what is the power of X bar control charts for detecting shifts 5 = 0.5,1.0,..., 3.0, for calculated width of control limits. Power of X bar control chart for detecting shifts from mean to = ± can be calculated from
1 - ^ = 1 - P- k—^ < X < ^o + k—=k = =
nn
= Ft„(-k - 5Vn)+ Ft„(-k + 5\/n).
We should note that power of proposed X bar control chart for detecting shift 5 = 0 is 0.0027 for all considered distributions and sample sizes, i.e. it maintains probability of type I error.
Mainly, we want to investigate what is the minimum shift that X bar control chart can detect with a power of at least 90%.
Calculated power of X bar control chart, for considered distributions of quality characteristic, sample sizes n = 3,4,..., 10, shifts 5 = 0.5,1.0,..., 3.0 for both theoretical distribution of standardized sample mean and corresponding Pearson distribution, are given in Table 3.
¿From the Table 3, we see that X bar control chart can detect shifts of 5 = 1.5 with power of at least 90% for sample sizes of n = 9 and greater for all considered distributions. In order for the X bar chart to detect shifts of 5 = 2.0 with power of 90% and greater, it is necessary to take samples of size at least n = 4 for Student, Laplace and logistic distributions and sample sizes of n = 5 and greater for uniform distribution of quality characteristic. Also, we can once more notice that the corresponding Pearson distribution approximates the distribution of standardized sample mean rather well. In general, it can be concluded that X bar control chart can detect shifts of at least 5 = 1.5 with power of 90% and greater for non-normal symmetric distribution of quality characteristic.
6 Implementation of proposed X bar control chart
Now we are interested to see how proposed X bar control chart can be implemented in practice, in case when the distribution function of the quality characteristic is non-normal,
Distribution
Power
Theor. Pearson
t(10) n = 3 n = 4 n = 5 n = 6 n = 7 n = 8 n = 9 n = 10 L(l) n = 3 n = 4 n = 5 n = 6 n = 7 n = 8 n = 9 n = 10 LGS(l) n = 3 n = 4 n = 5 n = 6 n = 7 n = 8 n = 9 n = 10
17(0, 1)
n = 3 n = 4 n = 5 n = 6 n = 7 n = 8 n = 9 n = 10
<5 = 0.5
0.0110 0.0162 0.0224 0.0296 0.0377 0.0468 0.0566 0.0674
0.0077 0.0111 0.0156 0.0209 0.0273 0.0347 0.0430 0.0523
0.0106 0.0155 0.0215 0.0285 0.0364 0.0452 0.0550 0.0655
0.0424 0.0419 0.0470 0.0544 0.0630 0.0726 0.0830 0.0942
0.0110 0.0162 0.0224 0.0296 0.0377 0.0468 0.0566 0.0673
0.0071 0.0104 0.0148 0.0201 0.0265 0.0339 0.0423 0.0517
0.0103 0.0153 0.0213 0.0283 0.0363 0.0451 0.0549 0.0655
0.0357 0.0397 0.0461 0.0538 0.0626 0.0723 0.0828 0.0941
Theor. Pearson
<5 = 1.0
0.0665 0.1175 0.1795 0.2488 0.3218 0.3957 0.4678 0.5363
0.0370 0.0718 0.1215 0.1842 0.2561 0.3327 0.4101 0.4850
0.0619 0.1109 0.1718 0.2409 0.3143 0.3887 0.4616 0.5309
0.2022 0.2403 0.2929 0.3533 0.4167 0.4801 0.5416 0.6001
0.06620 0.1171 0.1791 0.2484 0.3216 0.3955 0.4677 0.5363
0.0355 0.0708 0.1212 0.1842 0.2559 0.3321 0.4089 0.4835
0.0613 0.1106 0.1718 0.2408 0.3142 0.3886 0.4614 0.5308
0.1890 0.2348 0.2909 0.3523 0.4159 0.4794 0.5411 0.5996
Theor. Pearson
<5 = 1.5
0.2609 0.4307 0.5871 0.7145 0.8100 0.8775 0.9231 0.9528
0.1541 0.3179 0.4973 0.6511 0.7670 0.8487 0.9039 0.9400
0.2469 0.4178 0.5775 0.7079 0.8056 0.8746 0.9211 0.9515
0.4999 0.6030 0.7062 0.7946 0.8629 0.9120 0.9455 0.9673
0.2597 0.4300 0.5868 0.7144 0.8100 0.8775 0.9231 0.9528
0.1588 0.3207 0.4949 0.6469 0.7638 0.8469 0.90305 0.9397
0.2460 0.4172 0.5769 0.7074 0.8053 0.8744 0.9211 0.9515
0.4794 0.5951 0.7034 0.7934 0.8622 0.9117 0.9453 0.9672
Theor. Pearson
<5 = 2.0
0.5998 0.8016 0.9108 0.9625 0.9850 0.9942 0.9979 0.9992
0.4642 0.7317 0.8761 0.9452 0.9765 0.9901 0.9960 0.9984
0.5864 0.7945 0.9072 0.9607 0.9841 0.9938 0.9977 0.9991
0.7976 0.8937 0.9527 0.9816 0.9936 0.9980 0.9994 0.9999
0.5989 0.8014 0.9107 0.9625 0.9850 0.9942 0.9978 0.9992
0.4674 0.7258 0.8740 0.9451 0.9767 0.9903 0.9960 0.9983
0.5840 0.7935 0.9070 0.9607 0.9842 0.9938 0.9977 0.9991
0.7803 0.8898 0.9521 0.9816 0.9936 0.9980 0.9994 0.9998
Theor. Pearson
<5 = 2.5
0.8715 0.9661 0.9919 0.9982 0.9996 0.9999 1 1
0.8061 0.9433 0.9841 0.9957 0.9989 0.9997 0.9999 1
0.8711 0.9659 0.9918 0.9982 0.9996 0.9999 1 1
0.8014 0.9438 0.9846 0.9958 0.9988 0.9997 0.9999 1
0.8653	0.8638 0.9637 0.9637
0.9911	0.9911 0.9979 0.9979 0.9995 0.9995 0.9999 0.9999
0.9575	0.9531
0.9906	0.9903
0.9986	0.9985
0.9999	0.9998
Theor. Pearson
<5 = 3.0
0.9750 0.9748 0.9967 0.9967 0.9996 0.9996
0.9514	0.9532
0.9916	0.9921
0.9986	0.9986
0.9998	0.9997 1 1 1 1
0.9726	0.9726
0.9962	0.9963
0.9995	0.9995
0.9999	0.9999 1 1 1 1 1 1 1 1
0.9986 0.9976 1 1 1 1 1 1 1
Table 3: Power of X bar control chart
Control chart for non-normal symmetric distributions.
95
symmetric but unknown. For fitting Pearson type II or type VII distributions to data, we need an estimate of kurtosis based on sample of means.
6.1 Measures of sample kurtosis
We have three measures of sample kurtosis
* m4 n*	N - 1 HAT , 1\ , s\ , q	u* m4
g* = ml' = (N - 2)(N - 3) ((N + 1)g2 + 6) + 3'	62 = ^
where mk are sample central moments.
Joanes and Gill (1998) investigated three measures g2 = g* — 3, G2 = G| — 3 and b2 = b* — 3 of sample excess kurtosis. They showed that, generating 100000 samples of different sizes from Student t5 distribution, g2 generally has the smallest mean-squared error. We followed the same procedure for measures g*, G| and b* and generated 100000 samples of different sizes from distributions of standardized sample mean of Student t(10), Laplace L(1), logistic LGS(1) and uniform U(0,1) distributions. We confirm Joanes and Gill's findings. So, we will use, for calculation of the parameters of Pearson types II and VII distributions, measure g* as an estimate of sample kurtosis.
6.2 Empirical power of X bar control chart
In this section, we will calculate the empirical power of proposed X bar control chart in order to investigate its performance in practice. We will take, by Monte Carlo simulations, m = 25, 50,100 samples of sizes 3 to 10 from Student t(10), Laplace L(1), logistic LGS(1) and uniform U(0,1) distributions. Sample means, as well as estimates of mean and standard deviation, are calculated. Further, we estimated kurtosis of the distribution of sample mean with g*. Then, corresponding Pearson type II or type VII distribution is fitted to m sample means and control limits and power of the X bar control chart are calculated. This procedure is repeated 100000 times. The average power of the X bar control chart, for considered distributions, is presented in Table 4 (rounded to four decimal places). It is expected that sample size and number of groups will affect sample estimates, i.e. values of parameters of fitted Pearson distribution and therefore power of proposed X bar control chart.
We compared the values of empirical power for a number of groups m = 25, 50,100 with theoretical power from Table 3, giving accent on the values of theoretical power of 90% and greater. We made the following conclusions for shift sizes of 1.5 and greater. Zero difference is present at sample sizes of at least 7 and 5 = 3. Absolute difference between theoretical and empirical power gets smaller as a number of groups and shift sizes rise. In most of the cases, the difference exists on third to the fourth decimal place. In other words, proposed X bar control chart has quite satisfactory performance. General advice for its use in practice would be to choose preferably more than 25 groups of sample size of 9 and greater, in order to detect shift 5 = 1.5 with the power of at least 90%.
£ £
? =3
S
Distribution	Power																	
t( 10)			m =	25					m =	= 50					m =	100		
	S = 0.5	S = 1.0	<5 = 1.5	<5 = 2.0	S = 2.5	S = 3.0	S = 0.5	S = 1.0	<5 = 1.5	<5 = 2.0	S = 2.5	S = 3.0	S = 0.5	S = 1.0	<5 = 1.5	<5 = 2.0	S = 2.5	S = 3.0
n = 3	0.0522	0.1857	0.4231	0.7150	0.9163	0.9842	0.0302	0.1369	0.3643	0.6701	0.8986	0.9808	0.0201	0.1073	0.3252	0.6423	0.8873	0.9786
n = 4	0.0688	0.2541	0.5670	0.8627	0.9776	0.9972	0.0424	0.2014	0.5161	0.8378	0.9734	0.9968	0.0292	0.1672	0.4822	0.8219	0.9705	0.9967
n = 5	0.0852	0.3225	0.6914	0.9394	0.9937	0.9993	0.0551	0.2688	0.6495	0.9281	0.9928	0.9993	0.0391	0.2328	0.6233	0.9206	0.9924	0.9993
n = 6	0.1019	0.3909	0.7909	0.9742	0.9980	0.9998	0.0682	0.3376	0.7588	0.9699	0.9979	0.9998	0.0499	0.3014	0.7385	0.9667	0.9978	0.9998
n = 7	0.1180	0.4565	0.8629	0.9888	0.9993	0.9999	0.0821	0.4060	0.8405	0.9872	0.9992	0.9999	0.0615	0.3717	0.8264	0.9863	0.9993	0.9999
n = 8	0.1341	0.5199	0.9128	0.9949	0.9997	1.0000	0.0963	0.4728	0.8984	0.9944	0.9997	1.0000	0.0736	0.4405	0.8886	0.9942	0.9997	1.0000
n = 9	0.1517	0.5820	0.9460	0.9976	0.9999	1.0000	0.1105	0.5360	0.9366	0.9974	0.9999	1.0000	0.0865	0.5075	0.9307	0.9975	0.9999	1.0000
n = 10 r/u	0.1679	0.6380	0.9664	0.9988	0.9999	1.0000	0.1253	0.5958	0.9609	0.9987	0.9999	1.0000	0.0996	0.5702	0.9575	0.9988	0.9999	1.0000
L(l) n = 3	0.0364	0.1420	0.3542	0.6481	0.8828	0.9751	0.0188	0.0930	0.2814	0.5841	0.8531	0.9677	0.0119	0.0655	0.2325	0.5401	0.8326	0.9622
n = 4	0.0528	0.2117	0.5102	0.8263	0.9681	0.9956	0.0292	0.1536	0.4427	0.7886	0.9596	0.9944	0.0187	0.1172	0.3953	0.7628	0.9533	0.9936
n = 5	0.0691	0.2828	0.6481	0.9227	0.9911	0.9990	0.0408	0.2205	0.5913	0.9040	0.9889	0.9988	0.0271	0.1795	0.5532	0.8913	0.9873	0.9986
n = 6	0.0861	0.3539	0.7587	0.9667	0.9972	0.9997	0.0535	0.2912	0.7158	0.9589	0.9966	0.9996	0.0366	0.2483	0.6870	0.9532	0.9962	0.9996
n = 7	0.1030	0.4235	0.8412	0.9856	0.9990	0.9999	0.0665	0.3617	0.8101	0.9824	0.9988	0.9999	0.0471	0.3203	0.7903	0.9802	0.9988	0.9999
n = 8	0.1197	0.4902	0.8986	0.9935	0.9996	0.9999	0.0804	0.4316	0.8776	0.9922	0.9995	0.9999	0.0585	0.3927	0.8645	0.9915	0.9995	1.0000
n = 9	0.1364	0.5542	0.9367	0.9969	0.9998	1.0000	0.0947	0.4993	0.9232	0.9964	0.9998	1.0000	0.0708	0.4638	0.9148	0.9962	0.9998	1.0000
n = 10	0.1536	0.6136	0.9605	0.9984	0.9999	1.0000	0.1100	0.5643	0.9527	0.9982	0.9999	1.0000	0.0833	0.5302	0.9470	0.9982	0.9999	1.0000
LGS(1)																		
n = 3	0.0495	0.1787	0.4130	0.7058	0.9120	0.9831	0.0284	0.1301	0.3526	0.6590	0.8932	0.9793	0.0185	0.0998	0.3105	0.6281	0.8803	0.9766
n = 4	0.0664	0.2483	0.5599	0.8584	0.9766	0.9970	0.0402	0.1943	0.5060	0.8314	0.9717	0.9966	0.0275	0.1600	0.4711	0.8151	0.9686	0.9964
n = 5	0.0831	0.3177	0.6864	0.9376	0.9935	0.9993	0.0531	0.2625	0.6425	0.9253	0.9924	0.9992	0.0373	0.2255	0.6147	0.9172	0.9918	0.9993
n = 6	0.0996	0.3862	0.7870	0.9734	0.9979	0.9998	0.0662	0.3315	0.7533	0.9685	0.9977	0.9998	0.0481	0.2953	0.7332	0.9654	0.9977	0.9998
n = 7	0.1160	0.4524	0.8605	0.9884	0.9992	0.9999	0.0801	0.4005	0.8370	0.9867	0.9992	0.9999	0.0594	0.3650	0.8222	0.9857	0.9993	0.9999
n = 8	0.1331	0.5175	0.9117	0.9948	0.9997	1.0000	0.0940	0.4675	0.8959	0.9942	0.9997	1.0000	0.0716	0.4348	0.8859	0.9940	0.9997	1.0000
n = 9	0.1500	0.5792	0.9450	0.9975	0.9999	1.0000	0.1085	0.5314	0.9349	0.9973	0.9999	1.0000	0.0847	0.5028	0.9290	0.9973	0.9999	1.0000
n = 10	0.1669	0.6359	0.9658	0.9987	0.9999	1.0000	0.1239	0.5930	0.9602	0.9987	0.9999	1.0000	0.0976	0.5662	0.9565	0.9988	0.9999	1.0000
C/(0,1)																		
n = 3	0.0788	0.2527	0.5190	0.7998	0.9553	0.9938	0.0573	0.2227	0.4978	0.7874	0.9543	0.9949	0.0463	0.2061	0.4876	0.7826	0.9538	0.9959
n = 4	0.0918	0.3107	0.6377	0.9049	0.9879	0.9988	0.0669	0.2772	0.6157	0.8975	0.9885	0.9991	0.0536	0.2578	0.6054	0.8936	0.9893	0.9994
n = 5	0.1068	0.3731	0.7436	0.9584	0.9965	0.9997	0.0784	0.3372	0.7222	0.9557	0.9969	0.9998	0.0627	0.3157	0.7122	0.9542	0.9975	0.9999
n = 6	0.1219	0.4356	0.8276	0.9823	0.9989	0.9999	0.0914	0.4002	0.8103	0.9819	0.9991	0.9999	0.0733	0.3779	0.8012	0.9818	0.9993	1.0000
n = 7	0.1377	0.4976	0.8886	0.9924	0.9996	1.0000	0.1044	0.4623	0.8759	0.9925	0.9997	1.0000	0.0844	0.4400	0.8685	0.9928	0.9998	1.0000
n = 8	0.1525	0.5554	0.9290	0.9965	0.9998	1.0000	0.1181	0.5226	0.9215	0.9967	0.9999	1.0000	0.0968	0.5023	0.9167	0.9971	0.9999	1.0000
n = 9	0.1690	0.6132	0.9562	0.9983	0.9999	1.0000	0.1318	0.5801	0.9514	0.9985	0.9999	1.0000	0.1094	0.5614	0.9487	0.9987	1.0000	1.0000
n = 10	0.1849	0.6659	0.9729	0.9991	1.0000	1.0000	0.1469	0.6359	0.9706	0.9993	1.0000	1.0000	0.1226	0.6179	0.9691	0.9994	1.0000	1.0000
Table 4: Empirical power of X bar control chart
\D ON
Control chart for non-normal symmetric distributions.
97
6.3 Example
Montgomery (2005) gave data set on thickness of a printed circuit board (in inches), for 25 samples of three boards each.
m
■3-2-10 1 2 3 x
Figure 2: Boxplot of the thickness data (left graph) and empirical cumulative distribution function of standardized sample means with fitted Pearson type II distribution (right graph)
As we can see on boxplot (Figure 2, left graph), sample distribution seems symmetric. We tested symmetry of data distribution using Mira test (Mira, 1999), the Cabilio-Masaro test (Cabilio and Masaro, 1996) and Miao-Gel-Gastwirth (MGG) test (Miao et al., 2006). Based on results of all three tests, we can conclude that data distribution is symmetric (Mira test: Test Statistic = 0.9029, p-value = 0.3666; Cabilio-Masaro test: Test Statistic = 0.8846, p-value = 0.3764; MGG test: Test Statistic = 1.0162, p-value = 0.3095). R function symmetry.test for these tests can be found in R package lawstat (Gastwirth et al., 2015).
Now we will test the normality of the sample distribution using Shapiro-Wilk, Anderson-Darling and Lilliefors normality tests (Razali and Wah, 2011). Based on results of all three tests, we conclude that data distribution is not normal (Shapiro-Wilk test: W = 0.9589, p-value = 0.01584; Anderson-Darling test: A = 1.4759, p-value = 0.00076; Lilliefors test D = 0.1467, p-value = 0.00039). We used R function shapiro.test (package stats) for Shapiro-Wilk test and ad.test, lillie.test from R package nortest (Gross and Ligges, 2015) for Anderson-Darling and Lilliefors normality tests, respectively.
For each of 25 samples, we calculated sample mean. Mean of all sample means is equal to X = 0.06295 and this is the estimate of unknown process mean and center line of X bar control chart. Further, we estimated process standard deviation with mean range, a = R = 0.00092. Now, we can calculate standardized sample means and kurtosis of standardized sample means. We got a4 = g2 = 2.83154 (measures of sample excess kurtosis can be found in R package e1071 (Meyer et al., 2014)). So, as the distribution of standardized sample means is symmetric with kurtosis smaller than 3, we will approximate its distribution with Pearson type II distribution. We calculated parameters of distribution using equation (4.1). Empirical distribution function along with fitted Pearson
98
Kristina Veljkovic
type II distribution of standardized sample means is given on Figure 2, right graph.
For probability of false alarm a = 0.0027, we get, using equation (5.1), that width of control limits is equal to k = 2.83665. Now we may calculate lower and upper control limits of X bar control chart, LCL = X - k—= = 0.06143, UCL = X + k-R = 0.06448 and construct X bar chart (Figure 3). As we can see on Figure 3, all sample means are within the control limits and we can conclude that process is in-control and keep the estimates of unknown process mean, standard deviation, as well as the width of control limits.
Sample number
Figure 3: X bar control chart for the thickness data
7 Conclusions
We considered design of the X bar control chart when quality characteristic has one of the following non-normal symmetric distributions: Student distribution with 10 degrees of freedom, standard Laplace, standard logistic and standard uniform distributions. We calculated theoretical distribution of the standardized sample mean (or its best approximation) and approximated it with Pearson type II or type VII distributions. Then we calculated width of control limits of the X bar chart, which gave evidence of the goodness of fit of the corresponding Pearson distribution to the theoretical distribution of the standardized sample mean. Further, we examined the power of X bar control chart in detecting the shifts. Results suggest that the X bar chart can detect shifts of at least 5 =1.5 with power of 90% and greater. Then we undertook Monte Carlo study in order to calculate empirical power of proposed X bar control chart, confirming its quite satisfactory performance. Finally, we constructed X bar chart for a given data set, when data distribution is non-normal and symmetric, but unknown.
Control chart for non-normal symmetric distributions.
99
References
[1]	Ahsanullah, M., Golam Kibria, B.M. and Shakil, M. (2014): Normal and Student's t Distributions and Their Applications. Atlantis Press, Paris.
[2]	Alloway, J.A. and Raghavachari, M. (1991): Control charts based on Hodges-Lehmann estimator. Journal of Quality Technology, 23, 336-347.
[3]	Alwan, L.C. (1995): The Problem of Misplaced Control Limits, Journal of the Royal Statistical Society, Series C, 44, 269-278.
[4]	Balakrishnan, N. (1992): Handbook of the Logistic Distribution. Marcel Dekker, New York.
[5]	Brent, R.P. (1973). Algorithms for Minimization without Derivatives. Prentice-Hall, New Jersey.
[6]	Cabilio, P. and Masaro, J. (1996): A simple test of symmetry about an unknown median.The Canadian Journal of Statistics, 24, 349-361.
[7]	Gastwirth, J.L., Gel, Y.R., Wallace Hui, W.L., Miao, W. and Noguchi, K. (2015): lawstat: Tools for Biostatistics, Public Policy, and Law. R package version 2.5.
[8]	Gil-Pelaez, J. (1951): Note on the inversion theorem. Biometrika, 38, 481-482.
[9]	Gross, J. and Ligges, U. (2015): nortest: Tests for Normality. R package version 1.0-3.
[10]	Gupta, S.S. and Han, S. (1992): Selection and ranking procedures for logistic populations, In: Order Statistics and Nonparametrics: Theory and Applications (Edited by P.K. Sen and I.A. Salama). Elsevier, Amsterdam, 377-404.
[11]	Janacek, G.J. and Meikle, S.E. (1997): Control charts based on medians. The Statistician, 46, 19-31.
[12]	Joanes, D.N. and Gill, C.A. (1998): Comparing measures of sample skewness and kurtosis. The Statistician, 47, 183-189.
[13]	Johnson, N.L., Kotz, S. and Balakrishnan, N. (1994): Continuous Univariate Distributions Volume 1. Wiley, New York.
[14]	Johnson, N.L., Kotz, S. and Balakrishnan, N. (1995): Continuous Univariate Distributions Volume 2. Wiley, New York.
[15]	Kotz, S., Kozubowski, T.J. and Podgorski, K. (2001): The Laplace Distribution and Generalizations : a Revisit with Applications to Communications, Economics, Engineering, and Finance. Springer, New York.
[16]	Kuchler, U. and Tappe, S. (2008): On the shapes of bilateral Gamma densities. Statistics & Probability Letters, 78, 2478-2484.
100
Kristina Veljkovic
[17]	Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A. and Leisch, F. (2014): e1071: Misc Functions of the Department of Statistics, TU Wien. R package version 1.6-4.
[18]	Miao, W., Gel, Y.R. and Gastwirth, J.L. (2006): A New Test of Symmetry about an Unknown Median, In: Random Walk, Sequential Analysis and Related Topics - A Festschrift in Honor ofYuan-Shih Chow (Edited by A. Hsiung, C.-H. Zhang and Z. Ying). World Scientific Publisher, Singapore.
[19]	Mira, A. (1999): Distribution-free test for symmetry based on Bonferroni's measure. Journal of Applied Statistics, 26, 959-972.
[20]	Montgomery, D.C. (2005): Introduction to Statistical Quality Control. Wiley, New York.
[21]	Razali, N. and Wah, Y.B. (2011): Power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests. Journal of Statistical Modeling and Analytics, 2, 21-33.
[22]	Witkovsky, V. (2001): On the exact computation of the density and of the quantiles of linear combinations of t and F random variables. Journal of Statistical Planning and Inference, 94, 1-13.
[23]	Witkovsky, V. (2004): Matlab algorithm tdist: the distribution of a linear combination of Student's t random variables. COMPSTAT 2004 Symposium, Prague.
[24]	Witkovsky, V. and Savin, A. (2005): tdist: Distribution of a linear combination of independent Student's t-variables. R package version 0.1.1.
Metodoloski zvezki, Vol. 13, No. 2, 2016, 101-116
Estimating the Coefficient of Asymptotic Tail Independence: a Comparison of Methods
Marta Ferreira1
Abstract
Many multivariate analyses require the account of extreme events. Correlation is an insufficient measure to quantify tail dependence. The most common tail dependence coefficients are based on the probability of simultaneous exceedances. The coefficient of asymptotic tail independence introduced in Ledford and Tawn ([18] 1996) is a bivariate measure often used in the tail modeling of data in finance, environment, insurance, among other fields of applications. It can be estimated as the tail index of the minimum component of a random pair with transformed unit Pareto marginals. The literature regarding the estimation of the tail index is extensive. Semi-parametric inference requires the choice of the number k of the largest order statistics that lead to the best estimate, where there is a tricky trade-off between variance and bias. Many methodologies have been developed to undertake this choice, most of them applied to the Hill estimator (Hill, [16] 1975). We are going to analyze, through simulation, some of these methods within the estimation of the coefficient of asymptotic tail independence. We also compare with a minimum-variance reduced-bias Hill estimator presented in Caeiro et al. ([3] 2005). A pure heuristic procedure adapted from Frahm et al. ([13] 2005), used in a different context but with a resembling framework, will also be implemented. We will see that some of these simple tools should not be discarded in this context. Our study will be complemented by applications to real datasets.
1 Introduction
It is undeniable that extreme events have been occurring in areas like environment (e.g. climate changes due to pollution and global heating), finance (e.g., market crashes due to less regulation and globalization), telecommunications (e.g., growing traffic due to a high expanding technological development), among others. Extreme values are therefore the subject of concern of many analysts and researchers, who have come to realize that they should be dealt with some care, requiring their own treatment. For instance, the classical linear correlation is not a suitable dependence measure if the dependence characteristics in the tail differ from the remaining realizations in the sample. An illustration is addressed in Embrechts et al. ([9] 2002). To this end, the tail dependence coefficient (TDC) defined in
1 Center of Mathematics of University of Minho, Center for Computational and Stochastic Mathematics of University of Lisbon and Center of Statistics and Applications of University of Lisbon, Portugal; msferreira@math.uminho.pt
102
Marta Ferreira
Joe ([17] 1997), usually denoted by A, is more appropriate. More precisely, for a random pair (X, Y) with respective marginal distribution functions (dfs) FX and FY, the TDC is given by
A = lim P(Fy(Y) > 1 - t|Fx(X) > 1 - t),	(1.1)
whenever the limit exists. Roughly speaking, the TDC evaluates the probability of one variable exceeding a large value given that the other also exceeds it. A positive TDC means that X and Y are tail dependent and whenever null we conclude the random pair is tail independent. In this latter case, the rate of convergence towards zero is a kind of residual tail dependence that, once ignored, may lead to an under-estimation of the risk underlying the simultaneous exceedance of a large value. On the other hand, by considering that the random variables (rv's) X and Y are tail dependent when they are actually asymptotically independent, it will result in an over-estimation of such risk. The degree of misspecification depends on the degree of asymptotic independence given by the mentioned rate of convergence, denoted n in Ledford and Tawn ([18] 1996). More precisely, it is assumed that
P (Fx (X) > 1 - t, Fy (Y) > 1 - t) = t1/n L(t), n e (0,1],	(1.2)
where L(t) is a slowly varying function at zero, i.e., L(tx)/L(t) ^ 1 as 11 0 for all x > 0. We call the parameter n the coefficient of asymptotic tail independence. Whenever n < 1, X and Y are asymptotically independent and, if n = 1, asymptotic dependence holds if L(t) ^ c > 0, as 11 0. In case X and Y are exactly independent then n = 1/2 and we can also discern between asymptotically vanishing negative dependence and asymptotically vanishing positive dependence if, respectively, n e (0,1/2) and n e (1/2,1). Observe that we can state (1.2) as
P (mi" (i-FXiX) ■ r-TiW)) > 0 =t-1/n L(1/t)' (13)
and thus n corresponds to the tail index of the minimum of the two marginals standardized as unit Pareto. The tail index, also denoted extreme value index, quantifies the "weight" of the tail of a univariate distribution: whenever negative, null or positive it means that the tail of the underlying model is, respectively, "light", "exponential" or "heavy". In what concerns univariate extreme values, it is the primary parameter as it is implicated in all other extremal parameters, such as, extremal quantiles, right end-point of distributions, probability of exceedance of large levels, as well as return periods, among others. Therefore, the estimation of the tail index is a crucial issue, with numerous contributions in the literature. A survey on this topic can be seen, for instance, in Beirlant et al. ([2] 2004).
Under a semi-parametric framework in the domain of heavy tails, the Hill estimator, introduced in Hill ([16] 1975), have proved to possess good properties, being an essential tool in any application on this topic. For a random sample (T1,..., Tn), the Hill estimator corresponds to the sample mean of the log-excesses of the k + 1 larger order statistics
Tn:n ^ • • • ^ Tn-k:n, i.e.,
1 k T
Hn(k) = H (k) := 1 J>g	, 1 < k < n,	(1.4)
k - -,	Tn—k:n
i=1
Estimating the Coefficient of Asymptotic Tail Independence
103
Consistency requires that k must be intermediate, that is, a sequence of integers k = kn, 1 < k <n, such that
kn —y to and kn/n — 0, as n — to.
There is no definite formula to obtain k and it must be chosen not too small to avoid high variance but also not to large to prevent high bias. Figure 1 illustrates this issue, particularly the dashed line corresponding to a unit Frchet model where the tail index is 1. Observe also that there is a kind of stable area of the sample path around the true value of the tail index, where the variance is no longer high and the bias haven't started to increase. This disadvantage is transversal to the semi-parametric tools concerning extreme values inference. In the particular case of the Hill estimator, many efforts have been made to minimize the problem, ranging from bias-corrected versions to the implementation of procedures to compute k. The minimum-variance reduced-bias (MVRB) Hill estimator presented in Caeiro et al. ([3] 2005; see also Neves et al. [21] 2015) was developed for the Hall-Welsh class (within Generalized Pareto distributions), with reciprocal quantile function
F-1(1 - 1/x) = CxY (1 + yftxp/p + o(xp)) , x — to,	(1.5)
where y > 0 is the tail index of model F, C > 0, and ft = 0 and p < 0 are second order parameters. The MVRB Hill estimator is given by
CHn(k) = CH(k) := H(k) -	) , 1 < k < n,	(1.6)
where ft and p3 are suitable estimators of ft and p, respectively. Details about these latter are addressed in Caeiro et al. ([4] 2009) and references therein. We will denote it "corrected Hill" (CH). Our aim is to compare, through simulation, several methods regarding the Hill and corrected Hill estimators applied to the estimation of n. We also consider the graphical and pure heuristic procedure presented in Frahm et al. ([13] 2005) in the context of estimating the TDC A in (1.1), also relying on the choice of k upper order statistics with the same bias/variance controversy. All the estimation procedures are described in Section 2. The simulation study is conducted in Section 3 and applications to real datasets appear in Section 4. A small discussion ends this work in Section 5.
2 Estimation methods
In this section we describe the procedures that we are going to consider in the estimation of the coefficient of asymptotic tail independence n given in (1.3) and therefore corresponding to the tail index of
T = min((1 - Fx(X))-1, (1 - Fy(Y))-1).	(2.1)
Coefficient n is positive and we can use positive tail index estimators such as Hill. Observe that T is the minimum between two unit Pareto r.v.'s Alternatively, we can also undertake
104
Marta Ferreira
Figure 1: Hill plots of 1000 realizations of a unit Pareto (full line) and a unit Frechet (dashed line), both with tail index equal to 1 (horizontal line).
a unit Frchet marginal transformation since 1 — FX (X) ~ — log FX (X). However, in the sequel, we prosecute with unit Pareto marginals, since the Hill estimator has smaller bias in the Pareto models than in the Frchet ones (see Figure 1; see also Draisma et al. [6] 2004 and references therein). In order to estimate the unknown marginal df's FX and FY we consider their empirical counterparts (ranks of the components), more precisely,
Ti(n) := min((n + 1)/(n + 1 — RX), (n + 1)/(n + 1 — RY)), i = 1,..., n
where RX denotes the rank of Xi among (Xi,..., Xn) and RY denotes the rank of Yi among (Yi,..., Yn).
The estimation of n through the tail index estimators Hill and maximum likelihood (Smith, [24] 1987) was addressed in Draisma et al. ([6] 2004). Other estimators were also considered in Poon et al. ([23] 2003; see also references therein) and more recently in Goegebeur and Guillou ([14] 2013) and Dutang et al. ([8] 2014). However, no method was analyzed in order to attain the best choice of k in estimation.
In the domain of positive tail indexes, the Hill estimator is the most widely studied and many developments have been appearing around it. The main topics concern methods to obtain the value of k related to the number of tail observations to use in estimation and procedures to control the bias without increasing the variance. The corrected Hill version in (1.6), for instance, removes from Hill its dominant bias component estimated by H(k)(/3(n/k)p)/(1 — J5).
In the following, we describe the methods developed in literature for the Hill estimator to compute the value of k, that will be used to estimate n (the tail index of rv T in (2.1)) in our simulation study.
Based on Beirlant et al. ([1] 2002) and little restrictive conditions on the underlying
Estimating the Coefficient of Asymptotic Tail Independence
105
model, we have
T(n) H(i)	/ i \ -p
Y :=(i + l)log T (n) — + = n + b(n/k)( -k) + ei,i =	(2.2)
n-(i+1):n (i + )
where the error term ei is zero-centered and b is a positive function such that b(x) — 0, as x —y <x>. Extensive simulation studies conclude that the results tend to be better when p is considered fixed, even if misspecified. Matthys and Beirlant ([19] 2000) suggest p = -1. From model (2.2), the resulting least squares estimators of n and b(n/k) are given by
C = Yk - C/(1 - p) and = (1-p)p21-2p)k E?=1 ((k)-P - 1-) Yi. (2.3)
Thus, by replacing these estimates in the Hill's asymptotic mean squared error (AMSE)
2
i-p '
amse(h (-)) = + ( ^np1)2
we are able to compute klpt as the value of k that minimizes the obtained estimates of the AMSE and estimate n as H(k1pt).
On the other hand, we can compute the approximate value of k that minimizes the AMSE, given by
-opt - b(n/k)-2/(1-2p)k-2p/(1-2p^1/(1-2p).	(2.4)
See, e.g., Beirlant et al. ([1] 2002). Replacing again n and b(n/k) by the respective least squares estimates in (2.3) with fixed p = -1, we derive koptjk, for k = 3, ...,n, using (2.4). Then compute -2pt = median{koptjk, k = 3,..., |_fJ}, where [xj denotes the largest integer not exceeding x and consider n estimated by H(k2pt).
Further reading of the methods is referred to Beirlant et al. ([1] 2002), Matthys and Beirlant ([19] 2000) and references therein. In the sequel, they are shortly denoted, respectively, AMSE and KOPT.
The adaptive procedure of Drees and Kaufmann ([6] 1998) looks for the optimum k under which the bias starts to dominate the variance. The method is developed for the Hall-Welsh class of models defined in (1.5), for which it is proved that the maximum random fluctuation of \/i(H(i) - n), i = 1,k - 1, with k = kn an intermediate sequence, is of order \Jlog log n. More precisely, for p fixed at -1, we have:
1.	Consider rn = 2.5 x j x n0 25, with j = k2^n,n.
2.	Calculate j(rn) := min{k = 1,...,n - 1 : maxi<k \/i|H(i) - H(k)| > rn}. If \/i|H(i) - H(k)| > rn doesn't hold for any k, consider 0.9 x rn to rn and repeat step 2, otherwise move to step 3.
3.	For e 6 (0,1), usually e = 0.7, obtain
1/(1-0
1 . . „ / KrE ) \
k =
kDK —
l(2jj2)1/31 3	Ur(rn))£
106
Marta Ferreira
This method will be shortly referred DK.
Sousa and Michailidis (2004) method is based on the Hill sum plot, (k, Sk), k = 1,...,n — 1, where Sk = kH(k). We have E(Sk) = kn, an thus the sumplot must be approximately linear for the values of k where H(k) k n, with the respective slope being an estimator of n. The method essentially seeks the breakdown of linearity. Their approach is based on a sequential testing procedure implemented in McGee and Carleton ([20] 1970), leaning over approximately Pareto tail models. More precisely, consider the regression model y = Xn + 5, with y = (Si,..., Sk)', X = [1 i]k=1 and 5 the error term. It is checked the null hypothesis that a new point y0 is adjacent to the left or to the right of the set of points y = (y1,..., yk), through the statistics
k
TS = s-21 (yo - £0)2 + - yn2
i= 1
where * denotes the predictions based on k + 1 and s2 = (k — 2)-1(y/y — r/X'y). Since TS is approximately distributed by F1k-2, the null hypothesis is rejected if TS is larger than the (1 — a)-quantile, F1jk_2;1_Q. The method, shortly denoted SP from now on, is described in the following algorithm:
1.	Fit a least-squares regression line to the initial k = vn upper observations, y = [yi]k=1 (usually v = 0.02).
2.	Using the test statistic TS, determine if a new point y0 = yj for j > k, belongs to the original set of points y. Go adding points until the null hypothesis is rejected.
3.	Consider knew = max(0, {j : TS < F1jk_2;1_Q}). If k^w = 0, no new points are added to y and thus move forward to step 4. Return to step 1. if knew > 0 by considering k = knew.
4.	Estimate n by considering the obtained k.
The heuristic procedure introduced in Gomes et al. ([15] 2013), searches for the supposed stable region encompassing the best k to be estimated. More precisely, we need first to obtain the minimum value j0, such that the rounded values to j decimal places of H(k), 1 < k < n, denoted H(k; j) are not all equal. Identify the set of values of k associated to equal consecutive values of H(k; j0). Consider the set with largest range ^ := kmax — kmin. Take all the estimates H(k; j0 + 2) with kmax < k < kmin, i.e., the estimates with two additional decimal points and calculate the mode. Consider K the set of k-values corresponding to the mode. Take H(k), with k being the maximum of K. Since it was specially designed for reduced-bias estimators, we shortly referred it as RB method hereinafter.
Frahm et al. ([13] 2005) also presented a heuristic procedure that can be applied to all estimators depending on a number k of rv's whose choice bears the mentioned trade-off between bias and variance. Indeed is was developed within the estimation of the TDC A defined in (1.1). It was adapted to the Hill estimator in Ferreira ([11, 12] 2014, 2015) as follows:
Estimating the Coefficient of Asymptotic Tail Independence
107
1.	Smooth the Hill plot (k, H(k)) by taking the means of 2b + 1 successive points, H(1),...,H(n - 2b), with bandwidth b = [w x nJ.
2.	Define the regions pk = (H(k),..., H(k + m — 1)), k =1,..., n — 2b — m + 1, with length m = [Vn — 2bJ. The algorithm stops at the first region satisfying
k+m-1
E |H(i) — H(k) | < 2s,
i=k+1
where s is the empirical standard-deviation of H(1),..., H(n — 2b).
3.	Consider the chosen plateau region pk * and estimate n as the mean of the values of pk* (consider the estimate zero if no plane region fulfills the stopping condition).
The estimation of n through the plateau method was analyzed in Ferreira and Silva ([10] 2014) with respect to the sensibility of the bandwidth. The value w = 0.005 seems a reasonable choice (thus each moving average in step 1. consists in 1% of the data), also suggested in Frahm et al. ([13] 2005). In the sequel it will be referred as plateau method (in short PLAT).
Both RB and PLAT are simultaneously graphical and free-assumption methods since they are based on the search of a plane region of the estimator's plot that presumably contains the best sample fraction k to be estimated through a totally "ad-hoc" procedure. The sumplot is also a graphical method and the remaining procedures are neither graphical nor free-assumption.
3 Simulation study
In this section we compare through simulation the performance of the methods described above within the estimation of n through the under study estimators Hill in (1.4) and corrected Hill in (1.6).
We have generated 100 runs of samples of sizes n = 100,1000, 5000 from the following models:
•	Bivariate Normal distribution (n = (1 + p)/2; see, e.g., Draisma et al. [6] 2004); we consider correlation p = —0.2 (n = 0.4), p = 0.2 (n = 0.6) and p = 0.8 (n = 0.9); we use notation, respectively, N(—0.2), N(0.2) and N(0.8).
•	Bivariate t-Student distribution tv with correlation coefficient given by p = — 1
(A = 2Fiv+1 (—V(v + 1)(1 — p)/(1 + p)), see Embrechts et al. [9] 2002; we have A > 0 and thus n = 1); we consider v = 4 and p = 0.25 (A = 0.1438) and v = 1 and p = 0.75 (A = 0.6464); we use notation, respectively, t4 and t1.
•	Bivariate extreme value distribution with a asymmetric-logistic dependence function	= (1 — a1)x + (1 — a2)y + ((a1x)1/a + (a2y)1/a)a, with > 0,
108
Marta Ferreira
dependence parameter a G (0,1] and asymmetric parameters a^ a2 G (0,1] (A = 2 —1(1,1), see Beirlant et al. [1] 2004; we have A > 0 and thus n = 1); we consider a = 0.7 and ai = 0.4, a2 = 0.2 (A = 0.1010) and a = 0.3 and ai = 0.6, a2 = 0.8 (A = 0.5182); we use notation, respectively, AL(0.7) and AL(0.3).
•	Farlie-Gumbel-Morgenstern distribution with dependence 0.5 (n = 0.5, see Dutang et al. [8] 2014); we use notation FGM(0.5).
•	Frank distribution with dependence 2 (n = 0.5, see Dutang et al. [8] 2014); we use notation Fr(2).
Observe that the case N(0.8) is an asymptotic tail independent model close to tail dependence since n = 0.9 k 1. On the other hand, the cases t4 and AL(0.7) are tail dependent cases (n = 1) near asymptotic tail independence since A = 0.1438 k 0 and A = 0.1010 k 0, respectively. We consider these examples in order to assess the robustness of the methods in border cases.
In Figures 2 and 3 are plotted, respectively, the results of the simulated values of the absolute bias and root mean squared error (rmse), for the Hill and corrected Hill estimators, in the case n = 1000. All the results are presented in Table 1 concerning the Hill estimator and Table 2 with respect to the corrected Hill. Observe that this latter case requires the estimation of additional second order parameters (^ and p). To this end, we have followed the indications in Caeiro et al. ([4] 2009). For the p estimation, there was an overall best performance whenever it was taken fixed at value —1, leading to the reported results.
The largest differences between Hill and corrected Hill can be noticed in the above mentioned border cases, with the corrected one presenting lower absolute bias and rmse. The other models also show this difference but in a small amount. We remark that we are working with the minimum of Pareto rv's and the Hill estimator is unbiased in the Pareto case. The FGM and Frank models behave otherwise with a little lower absolute bias and rmse within the Hill estimator, for either estimated or several fixed values tried for p.
The failure cases in the DK method (column "NF" of Tables 1 and 2) correspond to an estimate of k out of the range {1,..., n — 1}, which were ignored in the results. It sets up the worst performance, which may be justified by the fact that the class of models underlying the scope of application of this method excludes the simple Pareto law.
The corrected Hill exhibits better results in general, particularly for methods KOPT, PLAT and AMSE, followed by SP and RB, in large sample sizes ^¿=1000). The PLAT procedure also performs well with the Hill estimator unlike the SP.
For n = 100, we have good results within RB and SP based on corrected Hill. Once again, the PLAT method behaves well in both estimators.
The border cases of weak tail dependence (t4 and AL(0.7)) are critical throughout all evaluated procedures and estimators. On the other hand, the methods are robust in the border case of tail independence near dependence expressed in model N(0.8).
4 Applications
In this section we illustrate the methods with three datasets analyzed in literature:
Estimating the Coefficient of Asymptotic Tail Independence
109















Figure 2: Simulated results of the absolute bias of Hill (full) and corrected Hill (dashed), for n = 1000, of the models (left-to-right and top-to-down): N(-0.2), N(0.2), N(0.8), tA, ti, AL(0.3), AL(0.7), FGM(0.5) and Fr(2).
•	I: The data consists of closing stock index levels of S&P 500 from the US and FTSE 100 from the UK, over the period 11 December 1989 to 31 May 2000, totalizing 2733 observed pairs (see, e.g., Poon et al. ([23] 2003)).
•	II: The wave-surge data corresponding to 2894 paired observations collected during 1971-77 in Cornwall (England); it was analyzed in Coles and Tawn ([5] 1994) and later also in Ramos and Ledford ([22] 2009) under a parametric view.
•	III: The Loss-ALAE data analyzed in Beirlant et al. ([2] 2004; see also references therein) consisting of 1500 pairs of registered claims (in USD) corresponding to an
110
Marta Ferreira
Figure 3: Simulated results of the rmse of Hill (full) and corrected Hill (dashed), for n = 1000, of the models (left-to-right and top-to-down): N(-0.2), N(0.2), N(0.8), tA, ti, AL(0.3), AL(0.7), FGM(0.5) and Fr(2).
indemnity payment (loss) and an allocated loss adjustment expense (ALAE).
The respective scatter-plots are placed in Figure 4. For the US and UK stock market returns, the largest values in each tail for one variable correspond to reasonably large values of the same sign for the other variable, hinting an asymptotic independence but not exactly independence. In the wave-surge data, the dependence seems a bit more persistent within large values, as well as in Loss-ALAE data. The Hill and corrected Hill sample paths of n estimates are pictured in Figure 5. Table 3 reproduces the estimates obtained with each method and estimators under study. The estimation results found in literature for the financial (I), environmental (II) and insurance datasets (III) are respec-
Estimating the Coefficient of Asymptotic Tail Independence
111
tively approximated by 0.731, 0.85 and 0.9. The results seem to be in accordance with the simulation study.
k k k
Figure 5: From left to right: sample paths of Hill (full;black) corrected Hill (dashed;grey) of
datasets I, II and III.
5 Discussion
In this paper we have analyzed some simple estimation methods for the coefficient of asymptotic tail independence, with some of them revealing promising results. However, the choice of the estimator is not completely straightforward. It can be seen from simulation results that the ordinary Hill estimator may be still preferred over the corrected one in some situations. Also in boundary cases of tail dependence near independence, there are still some worrying errors to correct. These will be topics of a future research.
112
Marta Ferreira
Acknowledgment
The author wishes to thank the reviewers for their constructive and valuable comments that have improved this work. This research was financed by Portuguese Funds through FCT - Fundacao para a Ciencia e a Tecnologia, within the Project UID/MAT/00013/2013 and by the research centre CEMAT (Instituto Superior Tecnico, Universidade de Lisboa) through the Project UID/Multi/04621/2013.
References
[1]	Beirlant, J., Dierckx, G., Guillou, A. and Starica, C. (2002): On Exponential Representation of Log-Spacings of Extreme Order Statistics. Extremes, 5, 157-180.
[2]	Beirlant, J., Goegebeur, Y., Segers, J. and Teugels, J.L. (2004): Statistics of Extremes: Theory and Applications. J. Wiley & Sons.
[3]	Caeiro, F., Gomes, M.I. and Pestana, D.D. (2005): Direct reduction of bias of the classical Hill estimator. Revstat, 3(2), 111-136.
[4]	Caeiro, F., Gomes, M.I. and Henriques-Rodrigues, L. (2009): Reduced-Bias Tail Index Estimators Under a Third-Order Framework. Communications in Statistics -Theory and Methods, 38(7), 1019-1040.
[5]	Coles, S.G. and Tawn, J.A. (1994): Statistical methods for multivariate extremes: an application to structural design (with discussion). Appl. Statist., 43, 1-48.
[6]	Draisma, G., Drees, H., Ferreira, A. and de Haan, L. (2004): Bivariate tail estimation: dependence in asymptotic independence. Bernoulli, 10(2), 251-280.
[7]	Drees, H. and Kaufmann, E. (1998): Selecting the optimal sample fraction in univariate extreme value estimation. Stochastic Process Appl., 75, 149-172.
[8]	Dutang, C., Goegebeur, Y. and Guillou, A. (2014): Robust and bias-corrected estimation of the coefficient of tail dependence. Insurance: Mathematics and Economics, 57, 46-57.
[9]	Embrechts, P., McNeil, A. and Straumann, D. (2002): Correlation and dependency in risk management: properties and pitfalls. In: Risk Management: Value at Risk and Beyond, M.A.H. Dempster, Ed. Cambridge University Press, 176-223.
[10]	Ferreira, M. and Silva, S. (2014): An Analysis of a Heuristic Procedure to Evaluate Tail (in)dependence. Journal of Probability and Statistics, Vol. 2014, Article ID 913621, 15 pages.
[11]	Ferreira, M. (2014): A Heuristic Procedure to Estimate the Tail Index. Proceedings of the 14th International Conference in Computational Science and Its Applications -ICCSA 2014, June 30 - July 3 (2014), Guimares, Portugal, IEEE-Computer Society, 4264a241, 241-245.
Estimating the Coefficient of Asymptotic Tail Independence
113
[12]	Ferreira, M. (2015): Estimating the tail index: Another algorithmic method. ProbStat Forum, 08, 45-53.
[13]	Frahm, G., Junker, M. and Schmidt R. (2005): Estimating the tail-dependence coefficient: properties and pitfalls. Insurance: Mathematics & Economics, 37(1), 80100.
[14]	Goegebeur, Y. and Guillou, A. (2013): Asymptotically unbiased estimation of the coefficient of tail dependence. Scand. J. Stat., 40, 174-189
[15]	Gomes, M.I., Henriques-Rodrigues, L., Fraga Alves, M.I. and Manjunath, B.G. (2013): Adaptive PORT-MVRB estimation: an empirical comparison of two heuristic algorithms. Journal of Statistical Computation and Simulation, 83(6), 11291144.
[16]	Hill, B.M. (1975): A Simple General Approach to Inference About the Tail of a Distribution. Ann. Stat., 3, 1163-1174.
[17]	Joe, H. (1997): Multivariate Models and Dependence Concepts. Harry Joe, Chapman & Hall.
[18]	Ledford, A. and Tawn, J. (1996): Statistics for near independence in multivariate extreme values. Biometrika, 83(1), 169-187.
[19]	Matthys, G. and Beirlant, J. (2000): Adaptive Threshold Selection in Tail Index Estimation. In: Extremes and Integrated Risk Management, (Edited by P. Embrechts), 37-49. Risk Books, London.
[20]	McGee, V.E. and Carleton, W.T. (1970): Piecewise Regression. Journal of the American Statistical Association, 65, 1109-1124.
[21]	Neves, M., Gomes, M.I., Figueiredo, F. and Prata-Gomes, D. (2015): Modeling Extreme Events: Sample Fraction Adaptive Choice in Parameter Estimation. Journal of Statistical Theory and Practice, 9(1), 184-199.
[22]	Ramos, A. and Ledford, A. (2009): A new class of models for bivariate joint tails. Journal of the Royal Statistical Society, Series B, 71, 219-241.
[23]	Poon, S.-H., Rockinger, M. and Tawn, J. (2003): Modelling extreme-value dependence in international stock markets. Statistica Sinica, 13, 929-953.
[24]	Smith, R.L. (1987): Estimating tails of probability distributions. Ann. Statist., 15, 1174-1207.
[25]	Sousa, B. and Michailidis, G. (2004): A Diagnostic Plot for Estimating the Tail Index of a Distribution. Journal of Computational and Graphical Statistics, 13(4), 1-22.
a
•<—I Ë
£
	SP			KOPT			AMSE			RB			DK				PLAT	
n = 100	abias	rmse	k	abias	rmse	k	abias	rmse	k	abias	rmse	k	abias	rmse	k	NF	abias	rmse
N(-0.2)	0.0449	0.0590	90	0.0387	0.1232	12	0.0258	0.0579	68	0.0286	0.0470	69	0.0350	0.2883	3	4	0.0111	0.0780
N( 0.2)	0.0574	0.0698	89	0.1202	0.2002	15	0.0878	0.1224	64	0.0532	0.0714	75	0.0388	0.4878	4	2	0.0384	0.1042
N( 0.8)	0.1372	0.1460	93	0.1881	0.2726	16	0.1935	0.2402	77	0.1323	0.1397	75	0.1320	0.4158	8	7	0.1133	0.1440
ti	0.4187	0.4223	96	0.4121	0.4458	20	0.4309	0.4362	79	0.4155	0.4188	76	0.3007	0.5849	3	5	0.3539	0.3734
h	0.2266	0.2323	96	0.1605	0.2297	14	0.2318	0.2344	95	0.2144	0.2199	76	0.1923	0.3481	12	5	0.1300	0.1507
AL( 0.7)	0.4642	0.4658	94	0.4625	0.4895	18	0.4784	0.4863	92	0.4572	0.4594	78	0.3447	0.6026	4	3	0.4199	0.4342
AL( 0.3)	0.2825	0.2855	98	0.1686	0.2364	17	0.2877	0.3024	73	0.2498	0.2556	74	0.1991	0.3459	14	6	0.1585	0.1864
FGM( 0.5)	0.0383	0.0578	90	0.0507	0.1683	12	0.0163	0.1117	56	0.0362	0.0585	75	0.0508	0.3649	6	8	0.0302	0.1052
Fr( 2)	0.0805	0.0954	88	0.2065	0.1762	13	0.0320	0.1265	61	0.0839	0.0960	77	0.0041	0.3391	5	5	0.0764	0.1293
n = 1000	abias	rmse	k	abias	rmse	k	abias	rmse	k	abias	rmse	k	abias	rmse	k	NF	abias	rmse
JV(—0.2)	0.0425	0.0546	819	0.0059	0.0515	121	0.0378	0.0474	652	0.0437	0.0455	755	0.0242	0.3225	48	2	0.0247	0.0399
N( 0.2)	0.0462	0.0642	826	0.0370	0.0687	171	0.0519	0.0690	111	0.0394	0.0432	754	0.0223	0.3651	39	0	0.0297	0.0452
N( 0.8)	0.1178	0.1266	866	0.0832	0.0907	277	0.1231	0.1239	920	0.0926	0.0940	625	0.0991	0.3588	84	1	0.0716	0.0784
U	0.3921	0.4013	893	0.3303	0.3339	220	0.3703	0.3737	460	0.4056	0.4061	822	0.0431	0.6092	29	1	0.3114	0.3172
h	0.1975	0.2095	933	0.0777	0.0896	238	0.1530	0.1562	509	0.1886	0.1906	779	0.0479	0.1042	78	0	0.0554	0.0664
AL( 0.7)	0.4518	0.4544	941	0.3906	0.3931	197	0.4245	0.4270	592	0.4392	0.4398	643	0.1613	0.6207	45	4	0.3827	0.3864
AL( 0.3)	0.2369	0.2597	885	0.1282	0.1356	303	0.1821	0.1859	496	0.1940	0.1945	580	0.0800	0.1506	108	1	0.0868	0.0961
FG M (0.5)	0.0358	0.0430	846	0.0303	0.0525	178	0.0429	0.0600	630	0.0487	0.0516	762	0.0216	0.3347	50	0	0.0415	0.0532
Fr( 2)	0.0630	0.0859	696	0.0305	0.0791	132	0.0409	0.1136	405	0.0952	0.0963	786	0.0380	0.3451	50	3	0.0691	0.0795
n = 5000	abias	rmse	k	abias	rmse	k	abias	rmse	k	abias	rmse	k	abias	rmse	k	NF	abias	rmse
JV(—0.2)	0.0485	0.0515	4369	0.0217	0.0280	629	0.0424	0.0445	3353	0.0399	0.0406	3135	0.0920	0.3383	572	1	0.0214	0.0271
N(0.2)	0.0486	0.0490	4804	0.0288	0.0346	847	0.0410	0.0422	3684	0.0384	0.0391	3590	0.0601	0.4406	402	1	0.0261	0.0330
N( 0.8)	0.1253	0.1261	4902	0.0725	0.0745	1343	0.1021	0.1043	3357	0.0907	0.0915	3052	0.0696	0.2242	737	0	0.0585	0.0625
U	0.4103	0.4117	4853	0.2709	0.2745	548	0.2746	0.2829	648	0.4106	0.4107	4418	0.0636	0.4472	34	1	0.2653	0.2688
h	0.2075	0.2090	4902	0.0499	0.0543	1062	0.0804	0.0843	1442	0.2039	0.2043	4573	0.0209	0.0393	235	0	0.0201	0.0328
AL( 0.7)	0.4594	0.4595	4999	0.3428	0.3448	457	0.3558	0.3633	1178	0.4411	0.4413	3222	0.1898	0.5659	20	2	0.3511	0.3534
AL(0.3)	0.2694	0.2712	4950	0.0956	0.0989	969	0.1100	0.1137	1101	0.1989	0.1998	3024	0.0499	0.0641	298	0	0.0529	0.0642
FGM (0.5)	0.0391	0.0422	4562	0.0277	0.0387	705	0.0415	0.0460	2053	0.0487	0.0494	3655	0.0421	0.3120	190	0	0.0313	0.0379
Fr(2)	0.0831	0.0842	4854	0.0620	0.0684	617	0.0862	0.0926	1590	0.1027	0.1030	3650	0.0035	0.2501	286	0	0.0692	0.0738
-xt"
Table 1: Simulation results from Hill estimator, where abias denotes the absolute bias, NF the number of fails and k correspond to the mean of
the k values obtained in the 100 runs.
	SP			KOPT			AMSE			RB			DK				PLAT	
n = 100	abias	rmse	k	abias	rmse	k	abias	rmse	k	abias	rmse	k	abias	rmse	k	NF	abias	rmse
N(-0.2)	0.0186	0.0653	91	0.0427	0.1287	12	0.0032	0.0738	57	0.0137	0.0603	74	0.0416	0.2880	2	5	0.0076	0.0795
JV( 0.2)	0.0164	0.0977	90	0.1085	0.2044	15	0.0458	0.1295	58	0.0202	0.0961	74	0.0514	0.4604	3	1	0.0241	0.1130
N( 0.8)	0.0594	0.1066	93	0.1717	0.2675	17	0.1014	0.1860	66	0.0658	0.1050	77	0.1025	0.4099	7	6	0.0959	0.1436
ti	0.3446	0.3618	96	0.3846	0.4268	20	0.3649	0.3810	66	0.3566	0.3704	71	0.2871	0.6015	3	2	0.3361	0.3610
k	0.0952	0.1261	96	0.1369	0.2112	15	0.1104	0.1337	78	0.1118	0.1387	75	0.1437	0.3297	5	0	0.0850	0.1215
AL{ 0.7)	0.3995	0.4123	93	0.4528	0.4846	18	0.4245	0.4410	60	0.4122	0.4227	76	0.3313	0.5980	4	4	0.4046	0.4237
AL( 0.3)	0.0437	0.1355	96	0.1187	0.2105	21	0.0781	0.1698	66	0.0609	0.1418	71	0.1537	0.3491	7	3	0.0865	0.1519
FGM{ 0.5)	0.0659	0.1121	89	0.0439	0.1749	13	0.0199	0.1345	55	0.0565	0.1036	72	0.0468	0.3775	3	8	0.0393	0.1170
Fr( 2)	0.1237	0.1549	88	0.0199	0.1794	13	0.0733	0.1718	58	0.1210	0.1482	73	0.0048	0.3401	4	7	0.0912	0.1499
n = 1000	abias	rmse	k	abias	rmse	k	abias	rmse	k	abias	rmse	k	abias	rmse	k	NF	abias	rmse
JV(—0.2)	0.0165	0.0357	819	0.0008	0.0514	120	0.0119	0.0495	463	0.0204	0.0286	948	0.0103	0.3473	9	2	0.0206	0.0367
N( 0.2)	0.0200	0.0539	808	0.0305	0.0662	169	0.0273	0.0608	515	0.0179	0.0342	913	0.0432	0.3660	18	1	0.0222	0.0442
JV( 0.8)	0.0353	0.0552	848	0.0545	0.0674	253	0.0450	0.0505	527	0.0359	0.0438	837	0.1318	0.4158	23	7	0.0514	0.0622
ti	0.3255	0.3343	893	0.3061	0.3109	197	0.3275	0.3317	296	0.3471	0.3489	838	0.0806	0.6097	23	0	0.3042	0.3100
k	0.0514	0.0680	924	0.0278	0.0474	238	0.0525	0.0617	331	0.0667	0.0731	827	0.1303	0.2741	54	2	0.0276	0.0439
AL{ 0.7)	0.3937	0.3969	941	0.3751	0.3786	183	0.3920	0.3948	365	0.4009	0.4023	935	0.1170	0.6324	14	5	0.3781	0.3817
AL{ 0.3)	0.0063	0.0538	857	0.0371	0.0572	210	0.0610	0.1185	241	0.0239	0.0409	797	0.1388	0.2870	42	2	0.0413	0.0559
FGM(O.S)	0.0547	0.0649	846	0.0356	0.0572	180	0.0617	0.0671	600	0.0657	0.0698	904	0.0288	0.3346	42	0	0.0446	0.0585
Fr{ 2)	0.0854	0.1104	668	0.0371	0.0841	140	0.0845	0.1253	516	0.1172	0.1200	814	0.0442	0.3355	62	1	0.0729	0.0843
n = 5000	abias	rmse	k	abias	rmse	k	abias	rmse	k	abias	rmse	k	abias	rmse	k	NF	abias	rmse
JV(—0.2)	0.0199	0.0240	4368	0.0156	0.0248	584	0.0206	0.0254	1823	0.0208	0.0225	4686	0.0520	0.3914	59	0	0.0200	0.0248
JV(0.2)	0.0173	0.0223	4804	0.0223	0.0304	865	0.0193	0.0232	2720	0.0159	0.0205	4766	0.0292	0.4502	71	1	0.0212	0.0289
JV(0.8)	0.0324	0.0346	4902	0.0458	0.0495	1110	0.0481	0.0502	1291	0.0308	0.0337	4466	0.0992	0.4592	78	3	0.0475	0.0521
ti	0.3349	0.3360	4853	0.2495	0.2535	454	0.2549	0.2620	487	0.3447	0.3451	3940	0.1093	0.5322	39	2	0.2666	0.2696
ti	0.0446	0.0473	4901	0.0008	0.0191	747	0.0127	0.0216	979	0.0535	0.0566	3782	0.0849	0.2497	311	2	0.0032	0.0214
AL{ 0.7)	0.3967	0.3971	4999	0.3303	0.3325	397	0.3210	0.3279	465	0.4003	0.4007	4203	0.1226	0.6323	26	3	0.3509	0.3529
AL(0.3)	0.0157	0.0260	4902	0.0315	0.0401	550	0.0414	0.0485	633	0.0194	0.0292	4520	0.1619	0.3958	210	1	0.0355	0.0427
FGM(0.5)	0.0561	0.0602	4562	0.0316	0.0431	727	0.0511	0.0567	2440	0.0619	0.0634	4196	0.0484	0.2880	218	0	0.0328	0.0399
Fr(2)	0.1186	0.1208	4854	0.0693	0.0758	691	0.1225	0.1162	2431	0.1264	0.1270	4100	0.0333	0.2429	241	1	0.0703	0.0757
Table 2: Simulation results from corrected Hill estimator, where abias denotes the absolute bias, NF the number of fails and k correspond to the
mean of the k values obtained in the 100 runs.
116
Marta Ferreira
H (k)	I	k	II	k	III	k
DK	0.6510	21	0.8255	83	0.7827	78
SP	0.6025	2592	0.5922	2893	0.6584	1499
KOPT	0.6733	744	0.9137	738	0.8444	135
AMSE	0.6494	955	0.7076	1244	0.6850	1172
RB	0.6041	2477	0.5967	2772	0.7428	708
PLAT	0.7148	-	0.8755	-	0.8110	-
CH (k)	I	k	II	k	III	k
DK	0.7654	5	0.4521	1	0.7044	27
SP	0.6725	2592	0.8581	2893	0.8671	1499
KOPT	0.7070	585	0.8991	412	0.8661	176
AMSE	0.6925	726	0.8997	596	0.8386	678
RB	0.6652	2264	0.8300	2040	0.8671	1499
PLAT	0.7261	-	0.8908	-	0.8524	-
Table 3: Estimates of n and respective values k, of datasets I, II and III.
Metodoloski zvezki, Vol. 13, No. 2, 2016, 117-130
Approximate Confidence Interval for the Reciprocal of a Normal Mean with a Known Coefficient of Variation
Wararit Panichkitkosolkul1
Abstract
An approximate confidence interval for the reciprocal of a normal population mean with a known coefficient of variation is proposed. This has applications in the area of nuclear physics, agriculture and economic when the researcher knows the coefficient of variation. The proposed confidence interval is based on the approximate expectation and variance of the estimator by Taylor series expansion. A Monte Carlo simulation study was conducted to compare the performance of the proposed confidence interval with the existing confidence interval. Simulation results show that the proposed confidence interval performs as well as the existing one in terms of coverage probability. However, the approximate confidence interval is very easy to calculate compared with the exact confidence interval.
1 Introduction
The reciprocal of a normal mean is applied in the area of nuclear physics, agriculture and economic. For example, Lamanna et al. (1981) studied a charged particle momentum, p = / where / is the track curvature of a particle. The reciprocal of a normal mean is defined by 6 = / ', where / is the population mean. Many researchers studied the reciprocal of a normal mean. For instance, Zaman (1981) discussed the estimators without moments in the case of the reciprocal of a normal mean. The maximum likelihood estimate of the reciprocal of a normal mean with a class of zero-one loss functions was proposed by Zaman (1985). Withers and
1 Department of Mathematics and Statistics, Faculty of Science and Technology, Thammasat University, Thailand; wararit@mathstat.sci.tu.ac.th
118
Wararit Panichkitkosolkul
Nadarajah (2013) presented the theorem to construct the point estimators for the inverse powers of a normal mean.
Wongkhao et al. (2013) proposed two confidence intervals for the reciprocal of a normal mean with a known coefficient of variation. Their confidence intervals can be applied when the coefficient of variation of a control group is known. One of their confidence intervals is developed based on an exact method in which this confidence interval is constructed from the pivotal statistics Z, where Z follows the standard normal distribution. The other confidence interval is constructed based on the generalized confidence interval (Weerahandi, 1993). Simulation results show that the coverage probabilities of the two confidence intervals are not significantly different. However, the confidence interval based on the exact method is shorter than the generalized confidence interval. The exact method uses Taylor series expansion to find the expectation and variance of the estimator of 6 and uses these results for constructing the confidence interval for 6. The lower and upper limits of the confidence interval based on the exact method are difficult to compute since they depend on an infinite summation. Therefore, our main aim in this paper is to propose an approximate confidence interval for the reciprocal of a normal mean with a known coefficient of variation. The computation of the new proposed confidence interval is easier than the exact confidence interval proposed by Wongkhao et al. (2013). In addition, we also compare the estimated coverage probabilities of the new proposed confidence interval and existing confidence interval using a Monte Carlo simulation.
The paper is organized as follows. In Section 2, the theoretical background of the existing confidence interval for 6 is discussed. We provide the theorem for constructing the approximate confidence interval for 6 in Section 3. In Section 4, the performance of the confidence intervals for 6 is investigated through a Monte Carlo simulation study. Conclusions are provided in the final section.
2 Existing Confidence Interval
In this section, we review the theorem and corollary proposed by Wongkhao et al. (2013) and use these to construct the exact confidence interval for 6.
Approximate Confidence Interval for the Reciprocal of a Normal Mean	119
Theorem 1. (Wongkhao et al., 2013) Let Yj,...,Yn be a random sample of size n from a normal distribution with mean / and variance a2. The estimator of 6 is
n
6 = Y- where Y = n~l £Yj. The expectation of 6 and 6)2 when a coefficient of
j=1
variation, t = a is known, are respectively
E(6) = 6
1+v
(2k)!
k
k =1 2kk!
(1)
and
E(92) = 62 £
(2k +1)!(T ^k
2kk!
Proof of Theorem 1 See Wongkhao et al. (2013)
From (1), lim E(<9) = 6 and E
estimator of 6 is — = ^L.
w wY
( 9 ^
w
\
= 6, where w = 1 + £
(2k)!
k
1 2kk!
■
. Thus, the unbiased
6 2
Corollary 1. From Theorem 1, var( 9) « —t2.
Proof of Corollary 1 See Wongkhao et al. (2013)
■
Now we will use the fact that, from the central limit theorem,
9 - 6
Z =
Vvar(6?)
Based on Theorem 1 and Corollary 1, we get
N (0,1).
Z =
— - 6 w_
6rT
N (0,1).
(2)
Therefore, the 100(1 -a)% exact confidence interval for 6 based on Equation (2) is
CL,
"" (2k)!(t2 ^
9 . 192 2
' exact	± zi-a/2V T
w	\ n
where w = 1 + V
t!2kk!
normal distribution.
and z1-a/2 is the 100(1 -a/2) percentile of the standard
n
120
Wararit Panichkitkosolkul
3 Proposed Confidence Interval
To find a simple approximate expression for the expectation of 0, we use a Taylor series expansion of — around m:
1 « 1
y y
+ (y -m)—
dy
y
+2( y -m)2 j~r
2	dy
^ i ^ y
+ O
( y my )—
dy.
y
(3)
Theorem 2. Let Y1,...,Yn be a random sample of size n from a normal distribution
n
with mean m and variance a2. The estimator of 6 is 9 = Y- where Y = n-^Yt.
i=i
The approximate expectation and variance of Q when a coefficient of variation,
a
t = — is known, are respectively M
E6)

and
2
var( 9) « — t2.
(3)
(4)
Proof of Theorem 2.
Consider random variable Y where Y has support (0,<»). Let Q = Y~l. Find
approximations for E(6) and var(0) using Taylor series expansion of d around p
as in Equation (3). The mean of § can be found by applying the expectation operator to the individual terms (ignoring all terms higher than two),
E(0) =
"F
+ E
£ ( Y )(f"E (Y))
— + 0 +1 (-L-var(Y )|
M 2 ^ (E(Y))3 V ')
+ var(Y )
M M3
M
1 +
a
2 A
nM

M
+1E 2
-Y-[Y} (Y - E(Y))2
+ O(n_1)
(4)
1
1
Approximate Confidence Interval for the Reciprocal of a Normal Mean	121
An approximation of the variance of 9 is obtained by using the first-order terms of the Taylor series expansion:
var(6) =
var| J
E
Y E1 Y
_L _ L J
d4 ( L ) (YY _ E (YY)) _!
2
d_( L dY 1Y
^ 17 I I var(Y )
var(Y ) a2
4
n/d
62
T2.
(5)
n
■
It is clear from Equation (4) that 0 is asymptotically unbiased (limE(9) = 9)
(9 \	T
and E
v /
T	9 1
= 9, where v = 1 +—. Therefore, the unbiased estimator of 0 is — = —.
n	v vY
From Equation (5), 9 is consistent (limvar09) = 0).
We then apply the central limit theorem and Theorem 2,
9
_6
Z =
62 .
• N (0,1).
Therefore, it is easily seen that the (1 -a)100% approximate confidence interval for
9 is
rr = - + 6L 2
approx	— Z1_a/2\l T ,
v	V n
n
122
Wararit Panichkitkosolkul
T
where v = 1 + — and z1-a/2 is the 100(1 -a/2) percentile of the standard normal
n
distribution.
4 Simulation Study
A Monte Carlo simulation was conducted using the R statistical software [16] version 3.2.1 to compare the estimated coverage probabilities of the new proposed confidence interval and the exact confidence interval. Source code is available in Appendix. The estimated coverage probability (based on M replicates) are given by 1-a = #(L <9< U)/M, where #(L <9< U) denotes the number of simulation runs for which the true reciprocal of a normal mean 9 lies within the confidence interval. From two previous sections, we found that the lengths of both confidence
intervals are equal to 2z1-a/2V9V /n which the expected lengths are not considered in simulation study. The sets of normal data were generated with 9 = 0.1, 0.2, 0.5, 1, 5 and 10, and the coefficient of variation t = 0.05, 0.1, 0.2, 0.33, 0.5 and 0.67. The sample sizes were set at n = 10, 20, 30, 50, 100 and 500. The number of simulation runs was 10,000 and the nominal confidence level 1 -a was fixed at 0.95.
The results are demonstrated in Figure 1 and Tables 1-4. Both confidence intervals have estimated coverage probabilities close to the nominal confidence level for almost situations. However, the estimated coverage probabilities of the exact confidence interval are very poor when the coefficient of variation t is close to 1 and small sample sizes. Additionally, the estimated coverage probabilities of the confidence intervals do not increase or decrease according to the values of t and n. The estimated coverage probabilities of the proposed confidence interval are not significantly different from these of the exact confidence interval in any situation. However, the approximate confidence interval is very easy to calculate compared with the exact confidence interval because the exact confidence interval is based on an infinite summation.
Approximate Confidence Interval for the Reciprocal of a Normal Mean	123
□ □ 0.1
□ □ 0.2
~i-1-1-1-1-r
0.1 0.2 0.3 0.4 0.5 0.6
0.1
~i-1-1-r
0.2 0.3 0.4 0.5
0.6
□ □ 0.5
□□1
•	
	-.
/ N *--	
*	
^ Approx	
0.1 0.2 0.3 0.4 0.5 0.6
"i-1-1-1-1-r
0.1 0.2 0.3 0.4 0.5 0.6
□ □ 5
□ □ 10
*		
		
■ Exact		
^ Approx		
1 0.1	1 1 1 0.2 0.3 0.4	1 1 0.5 0.6
	□	
~i-1-1-1-1-r
0.1 0.2 0.3 0.4 0.5 0.6
□
□
□
□
□
Figure 1: Estimated coverage probabilities of confidence intervals for the reciprocal of a normal mean with a known coefficient of variation when n = 30 (solid line) and n = 100
(dash line)
124
Wararit Panichkitkosolkul
Table 1: Estimated coverage probabilities of confidence intervals for the reciprocal of a normal mean with a known coefficient of variation when 9 = 0.1 and 0.2.
n	T	9 =	0.1	9 =	0.2
		Exact	Approx.	Exact	Approx.
10	0.05	0.9475	0.9475	0.9489	0.9489
	0.10	0.9471	0.9471	0.9493	0.9493
	0.20	0.9498	0.9499	0.9500	0.9500
	0.33	0.9482	0.9483	0.9480	0.9486
	0.50	0.9325	0.9469	0.9334	0.9502
	0.67	0.0019	0.9456	0.0030	0.9455
20	0.05	0.9543	0.9543	0.9489	0.9489
	0.10	0.9489	0.9489	0.9529	0.9529
	0.20	0.9519	0.9519	0.9480	0.9479
	0.33	0.9514	0.9514	0.9492	0.9491
	0.50	0.9500	0.9505	0.9447	0.9452
	0.67	0.9475	0.9480	0.9457	0.9459
30	0.05	0.9481	0.9481	0.9484	0.9484
	0.10	0.9526	0.9526	0.9476	0.9476
	0.20	0.9498	0.9498	0.9501	0.9501
	0.33	0.9474	0.9475	0.9542	0.9541
	0.50	0.9489	0.9489	0.9479	0.9479
	0.67	0.9459	0.9464	0.9490	0.9492
50	0.05	0.9474	0.9474	0.9492	0.9492
	0.10	0.9485	0.9485	0.9500	0.9500
	0.20	0.9494	0.9494	0.9496	0.9496
	0.33	0.9499	0.9499	0.9476	0.9475
	0.50	0.9514	0.9517	0.9496	0.9497
	0.67	0.9485	0.9486	0.9476	0.9475
100	0.05	0.9536	0.9536	0.9495	0.9495
	0.10	0.9494	0.9494	0.9500	0.9500
	0.20	0.9509	0.9509	0.9494	0.9494
	0.33	0.9489	0.9489	0.9486	0.9486
	0.50	0.9509	0.9509	0.9481	0.9481
	0.67	0.9511	0.9510	0.9511	0.9511
500	0.05	0.9479	0.9479	0.9467	0.9467
	0.10	0.9488	0.9488	0.9511	0.9511
	0.20	0.9517	0.9517	0.9511	0.9511
	0.33	0.9519	0.9519	0.9481	0.9481
	0.50	0.9469	0.9469	0.9476	0.9476
	0.67	0.9484	0.9484	0.9480	0.9479
Approximate Confidence Interval for the Reciprocal of a Normal Mean	125
Table 2: Estimated coverage probabilities of confidence intervals for the reciprocal of a normal mean with a known coefficient of variation when 9 = 0.5 and 1.
n	T	9 =	0.5	9	= 1
		Exact	Approx.	Exact	Approx.
10	0.05	0.9489	0.9489	0.9475	0.9475
	0.10	0.9482	0.9482	0.9471	0.9471
	0.20	0.9491	0.9491	0.9498	0.9499
	0.33	0.9462	0.9463	0.9482	0.9483
	0.50	0.9357	0.9501	0.9325	0.9469
	0.67	0.0032	0.9471	0.0019	0.9456
20	0.05	0.9515	0.9515	0.9543	0.9543
	0.10	0.9482	0.9481	0.9489	0.9489
	0.20	0.9502	0.9502	0.9519	0.9519
	0.33	0.9518	0.9520	0.9514	0.9514
	0.50	0.9515	0.9518	0.9500	0.9505
	0.67	0.9445	0.9453	0.9475	0.9480
30	0.05	0.9444	0.9444	0.9481	0.9481
	0.10	0.9486	0.9486	0.9526	0.9526
	0.20	0.9517	0.9517	0.9498	0.9498
	0.33	0.9469	0.9470	0.9474	0.9475
	0.50	0.9499	0.9505	0.9489	0.9489
	0.67	0.9500	0.9498	0.9459	0.9464
50	0.05	0.9474	0.9474	0.9474	0.9474
	0.10	0.9520	0.9520	0.9485	0.9485
	0.20	0.9490	0.9490	0.9494	0.9494
	0.33	0.9485	0.9485	0.9499	0.9499
	0.50	0.9475	0.9475	0.9514	0.9517
	0.67	0.9503	0.9502	0.9485	0.9486
100	0.05	0.9531	0.9531	0.9536	0.9536
	0.10	0.9496	0.9496	0.9494	0.9494
	0.20	0.9438	0.9438	0.9509	0.9509
	0.33	0.9530	0.9530	0.9489	0.9489
	0.50	0.9510	0.9510	0.9509	0.9509
	0.67	0.9454	0.9454	0.9511	0.9510
500	0.05	0.9527	0.9527	0.9515	0.9515
	0.10	0.9469	0.9469	0.9507	0.9507
	0.20	0.9520	0.9520	0.9442	0.9442
	0.33	0.9500	0.9500	0.9495	0.9495
	0.50	0.9507	0.9507	0.9500	0.9500
	0.67	0.9507	0.9507	0.9519	0.9519
126
Wararit Panichkitkosolkul
Table 3: Estimated coverage probabilities of confidence intervals for the reciprocal of a normal mean with a known coefficient of variation when 0 = 5 and 10.
n	T	0	= 5	0 =	10
		Exact	Approx.	Exact	Approx.
10	0.05	0.9489	0.9489	0.9508	0.9508
	0.10	0.9476	0.9476	0.9473	0.9473
	0.20	0.9516	0.9515	0.9482	0.9482
	0.33	0.9500	0.9501	0.9497	0.9497
	0.50	0.9326	0.9475	0.9335	0.9481
	0.67	0.0020	0.9457	0.0028	0.9505
20	0.05	0.9490	0.9490	0.9514	0.9514
	0.10	0.9490	0.9490	0.9478	0.9478
	0.20	0.9522	0.9521	0.9440	0.9440
	0.33	0.9497	0.9497	0.9504	0.9504
	0.50	0.9474	0.9479	0.9475	0.9479
	0.67	0.9454	0.9462	0.9472	0.9478
30	0.05	0.9499	0.9499	0.9469	0.9469
	0.10	0.9511	0.9511	0.9483	0.9483
	0.20	0.9495	0.9495	0.9479	0.9479
	0.33	0.9485	0.9482	0.9486	0.9487
	0.50	0.9498	0.9498	0.9489	0.9488
	0.67	0.9494	0.9494	0.9461	0.9465
50	0.05	0.9516	0.9516	0.9512	0.9512
	0.10	0.9521	0.9521	0.9496	0.9496
	0.20	0.9510	0.9510	0.9480	0.9480
	0.33	0.9496	0.9496	0.9481	0.9481
	0.50	0.9498	0.9497	0.9506	0.9505
	0.67	0.9513	0.9512	0.9471	0.9471
100	0.05	0.9531	0.9531	0.9500	0.9500
	0.10	0.9473	0.9473	0.9517	0.9517
	0.20	0.9501	0.9501	0.9483	0.9483
	0.33	0.9493	0.9493	0.9556	0.9556
	0.50	0.9509	0.9509	0.9512	0.9512
	0.67	0.9469	0.9469	0.9475	0.9476
500	0.05	0.9497	0.9497	0.9516	0.9516
	0.10	0.9510	0.9510	0.9505	0.9505
	0.20	0.9502	0.9502	0.9528	0.9528
	0.33	0.9486	0.9486	0.9521	0.9521
	0.50	0.9484	0.9484	0.9525	0.9525
	0.67	0.9518	0.9518	0.9493	0.9493
5 An Illustrative Example
To illustrate an example of two confidence interval for the reciprocal of a normal mean proposed in the previous section, we used the weights (in kilograms) of 61 one-month old infants listed as follows:
4.960 5.130 4.260 5.160 4.050 5.240 4.350 4.360 3.930 4.410
Approximate Confidence Interval for the Reciprocal of a Normal Mean	127
4.610 4.550 4.460 2.940 4.160 4.110 4.410 4.800 5.130 3.670 4.550 4.290 4.950 5.210 3.210 4.030 3.580 4.360 4.360 3.920 4.050 4.630 3.756 4.586 5.336 2.828 4.172 4.256 4.594 4.866 4.784 4.520 5.238 4.320 5.330 3.836 5.916 5.010 4.344 3.496 4.148 4.044 5.192 4.368 4.180 4.102 5.210 4.382 5.070 5.044 3.530
The data were taken from the study by Ziegler et al. (2007) (cited in Ledolter and Hogg, 2010, p.287). From past experience, we assume that the coefficient of variation of the weights of 61 one-month old infants is about 0.14. The histogram, density plot, Box-and-Whisker plot and normal quantile-quantile plot are displayed in Figure 2. Algorithm 1 shows the result of the Shapiro-Wilk normality test.
(a)
(b)
m o
m o
izzn
i i i i i i i i 2.5 3.5 4.5 5.5
CD
o
CD
CD
o
weight
weight
2
3
4
5
6
(c)
(d)
o
CD
O
in
o
o
CO
ra
o
<u
ra CO
CD CD
CD UD
CD
CD 00
0 1 2 Theoretical Quantiles
Figure 2: (a) Histogram, (b) density plot, (c) Box-and-Whisker plot and (d) normal quantile-quantile plot of the weight of a one-month old infant
128
Wararit Panichkitkosolkul
Shapiro-Wilk normality test data: weight
W = 0.978, p-value = 0.3383 Algorithm 1: Shapiro-Wilk test for normality of the weight of a one-month old infant
The 95% exact and approximate confidence intervals for the reciprocal of a normal mean are calculated and reported in Table 4. The lower and upper limits of the both confidence intervals are not different.
Table 4: The 95% confidence intervals for the reciprocal of a normal mean of the weight
of a one-month old infant.
Methods	Confidence Intervals	Lengths
	Lower Limit Upper Limit	
Exact Approximate	0.2176837 0.2335416 0.2176838 0.2335416	0.0158579 0.0158578
6 Conclusions
In this paper, we proposed an approximate confidence interval for the reciprocal of a normal population mean with a known coefficient of variation. Normally, this arises when the coefficient of variation of the control group is known. The approximate confidence interval proposed uses the approximation of the expectation and variance of the estimator. The proposed new confidence interval is compared with the exact confidence interval constructed by Wongkhao et al. (2013) through a Monte Carlo simulation study. The approximate confidence interval performs as efficiently as the exact confidence interval in terms of coverage probability. Moreover, approximate confidence interval also is easy to compute compared with the exact confidence interval.
Appendix: Source R code for all confidence intervals
ci.exact <- function(y,tao,alpha) { n <- length(y) ybar <- mean(y) zeta.hat <- 1/ybar w <- cal.w(tao,n)
Approximate Confidence Interval for the Reciprocal of a Normal Mean	129
z <- qnorm(1-alpha/2) T1 <- (taoA2)/(n*(ybarA2)) lower <- (zeta.hat/w)-z*sqrt(T1) upper <- (zeta.hat/w)+z*sqrt(T1) out <- cbind(lower,upper) return(out)
}
ci.approx <- function(y,tao,alpha) { n <- length(y) ybar <- mean(y) zeta.hat <- 1/ybar v <- 1+(taoA2)/n z <- qnorm(1-alpha/2) T1 <- ((zeta.hatA2)*(taoA2))/n lower <- (zeta.hat/v)-z*sqrt(T1) upper <- (zeta.hat/v)+z*sqrt(T1) out <- cbind(lower,upper) return(out)
}
cal.w <- function(tao,n) { temp <- rep(0,50) for (k in 1:50) {
temp[k] <- (factorial(2*k)/((2Ak)*factorial(k)))*(((taoA2)/n)Ak)
}
w <- 1+sum(temp) return(w)
}
Acknowledgements
The author is grateful to two anonymous referees for their valuable comments and comments, which have significantly enhanced the quality and presentation of this paper. The author is also thankful for the support in the form of the research funds awarded by Thammasat University.
References
[1] Ihaka, R. and Gentleman, R. (1996): R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5, 299-314.
130
Wararit Panichkitkosolkul
[2]	Lamanna, E., Romano, G. and Sgrbi, C. (1981): Curvature measurements in nuclear emulsions. Nuclear Instruments and Methods, 187, 387-391.
[3]	Ledolter, L., Hogg, R.V. (2010): Applied Statistics for Engineers and Physical Scientists, Pearson, New Jersey.
[4]	Weerahandi, S. (1993): Generalized confidence intervals, Journal of the American Statistical Association, 88, 899-905.
[5]	Withers, C.S. and Nadarajah, S. (2013): Estimators for the inverse powers of a normal mean, Journal of Statistical Planning and Inference, 143, 441-455.
[6]	Wongkhao, A., Niwitpong, S. and Niwitpong, S. (2013): Confidence interval for the inverse of a normal mean with a known coefficient of variation.
International Journal of Mathematical, Computational, Statistical, Natural and Physical Engineer, 7, 877-880.
[7]	Zaman, A. (1981): Estimators without moments: the case of the reciprocal of a normal mean. Journal of Econometrics, 15, 289-298.
[8]	Zaman, A. (1985): Admissibility of the maximum likelihood estimate of the reciprocal of a normal mean with a class of zero-one loss functions. Sankhya: The Indian Journal of Statistics, Series A, 47, 239-246.
[9]	Ziegler, E., Nelson, S.E., Jeter, J.M. (2007): Early iron supplementation of breastfed infants, Department of Pediatrics, University of Iowa, Iowa City, USA.
Metodoloski zvezki, Vol. 13, No. 2, 2016, 131-150
Exploring the Relationship between two Compositions using Canonical Correlation
Analysis
1	2	3
Gloria Mateu-Figueras , Josep Daunis-i-Estadella , Germà Coenders , Berta Ferrer-Rosell4, Ricard Serlavos5, Joan Manuel Batista-Foguet6
Abstract
The aim of this article is to describe a method for relating two compositions which combines compositional data analysis and canonical correlation analysis (CCA), and to examine its main statistical properties. We use additive log-ratio (alr) transformation on both compositions and apply standard CCA to the transformed data. We show that canonical variates are themselves log-ratios and log-contrasts. The first pair of canonical variates can be interpreted as the log-contrast of a composition that has the maximum correlation with a log-contrast of the other composition. The second pair can be interpreted as the log-contrast of a composition that has the maximum correlation with a log-contrast of the other composition, under the restriction that they are uncorrelated with the first pair, and so on.
Using properties from changes of basis, we prove that both canonical correlations and canonical variates are invariant to the choice of divisors in alr transformation. We show how to implement the analysis and interpret the results by means of an illustration from the social sciences field using data from Kolb's Learning Style Inventory and Boyatzis' Philosophical Orientation Questionnaire, which distribute a fixed total score among several learning modes and philosophical orientations.
1	Department of Computer Science, Applied Mathematics and Statistics, University of Girona, Campus Montilivi, 17003 Girona, Spain; gloria.mateu@udg.edu
2	Department of Computer Science, Applied Mathematics and Statistics, University of Girona, Campus Montilivi, 17003 Girona, Spain; pepus.daunis@udg.edu
3	Department of Economics, University of Girona, Campus Montilivi, 17003 Girona, Spain; germa.coenders@udg.edu
4	Department of Economics, University of Girona, Edifici Sant Domènec, Plaça Ferrater Mora 1, 17004 Girona, Spain; berta.ferrer@udg.edu
5	Department of People Management and Organization, ESADE, University Ramon Llull, Av. Pedralbes, 60-62, 08034 Barcelona, Spain; ricard.serlavos@esade.edu
6	Department of People Management and Organization, ESADE, University Ramon Llull, Av. Pedralbes, 60-62, 08034 Barcelona, Spain; joanm.batista@esade.edu
Exploring the Relationship between two Compositions...
132
1 Introduction
Compositional data lie in a constrained positive space with a fixed sum and convey information on the relative importance of components. Typical examples are chemical and geological compositions (adding to 100% in weight or volume), genotype frequencies (adding to 1), time use (adding to 24 hours), voting (adding to 100% of votes), or household budget allocation (adding to 100% of the budget). The fixed sum is typically normalized to one, and a D-term composition (x¡, x2,..., xD) is thus constrained as follows:
D
0<xd<1 and £ xd = 1	(1.1)
d=1
Serious problems arise when using standard statistical analysis tools on compositional data (Aitchison, 1986, 2001; Pawlowsky-Glahn & Buccianti, 2011):
1.	Compositional data have a bounded distribution. This implies at least non-normality and heteroscedasticity (lower variance close to the boundary).
2.	One component can only increase if some others decrease. This results in negative spurious correlations among the components (Pearson, 1897) and prevents interpreting effects of linear models in the usual way "keeping everything else constant".
3.	The true dimensionality of a set of compositional variables is D-1. Analysis of all D dimensions leads to perfect collinearity.
4.	Compositional data lie in a (D-1)-dimensional Euclidean space called the simplex, with different operations and distance from real space (Billheimer et al., 2001; Pawlowsky-Glahn & Egozcue, 2001).
The compositional data analysis (CoDa) tradition started with Aitchison's seminal work (1986) on treating chemical and biological compositions. Nowadays, however, it spans almost all of the hard sciences and has started to be used in the social sciences, which often face similar problems (Batista-Foguet et al., 2015; Coenders et al., 2011; van Eijnatten et al., 2015; Ferrer-Rosell & Coenders, 2016; Ferrer-Rosell et al., 2015, 2016a, 2016b; Fry, 2011; Hlebec et al., 2012; Kogovsek et al., 2013; Vives-Mestres et at., 2016).
The literature on CoDa has extensively dealt with relating one composition to non-compositional data (Egozcue et al., 2012; Hron et al., 2012; Martín-Fernández et al., 2015) and with analyzing one single composition. As far as the exploratory data analysis of one
133
Gloria Mateu-Figueras et al.
single composition is concerned (Egozcue & Pawlowsky-Glahn, 2011), available methods include the variation array (Aitchison, 1986), principal component analysis (Aitchison, 1983; Aitchison & Greenacre, 2012), the CoDa-dendrogram (Pawlowsky-Glahn & Egozcue, 2011), and the CoDa-biplot (Aitchison & Greenacre, 2012). As regards exploratory tools to relate two compositions, the natural choice is canonical correlation analysis - CCA (Aitchison, 1986). Typical problems relating two compositions include the relationship between the composition of species and the chemical composition of the environment (ter Braak, 1996); between the composition of foods and the composition of their energy and nutrients; or between the composition of materials and the composition of spectral curves in image processing. The use of CCA for compositional data was foreshadowed in Aitchison (1986), without much mention of its properties or interpretation. At a later date, van den Boogaart and Tolosana-Delgado (2013) devised an advanced procedure for compositional CCA requiring software designed for this purpose.
Drawing from Aitchison (1986), in this article we develop and illustrate a simple procedure for carrying out CCA of two compositional vectors and examine its interpretation and main statistical properties. Even if specialized techniques for compositional data have appeared (van den Boogaart & Tolosana-Delgado, 2013; Pawlowsky-Glahn & Buccianti, 2011; Pawlowsky Glahn et al., 2015; Thió-Henestrosa & Martín-Fernández, 2005), compositional data can also be transformed so that they can be subject to standard and well-understood statistical techniques carried out using standard software. This is the approach we take in this article.
Given the fact that only information on the relative size of components is available in a compositional data context, logarithms of ratios between component values are a meaningful way of expressing the data and guaranteeing the principles of CoDa (Aitchison, 2001). A logarithm of a ratio is scale invariant, meaning that it does not change if the values involved are multiplied by an arbitrary constant. Adding or dropping components from a composition does not modify the log-ratios computed from the remaining components. This is related to the principles of scale invariance and subcompositional coherence. For full details on CoDa principles, see Pawlowsky Glahn et al. (2015).
Several log-ratio transformations have been suggested in the literature (Egozcue et al., 2003). Additive log-ratio transformation (alr) is the easiest to compute since it is simply the log-ratio between each component and the last:
Exploring the Relationship between two Compositions...
134
>d=ln(xyxD)=ln(xj)-ln(xD) with d=1,2,...,D-1	(1.2)
Alr-transformed yd variables recover the full unconstrained real space. It must be noted that one dimension is lost. Although alr transformation is used in this article due to its simplicity, there are alternatives (see Egozcue et al., 2003 for a general background on the transformations and Section 3.3. for a discussion of their applicability to CCA).
Since the decision on which component to leave in last place and serve as a reference in the alr transformation is often arbitrary, there is concern regarding whether the results of a statistical analysis are invariant to this arbitrary choice. Of course, different log-ratios constitute different variables and the raw results will never be invariant. However, it is considered desirable that overall goodness of fit measures be invariant to this choice. Once results are reexpressed as a function of the log components ln(xd), they should ideally also be invariant.
The structure of the article is as follows. First, we review the basics of CCA. We then come to the particular case in which CCA is applied to compositions that have been subjected to alr transformation, showing how to interpret the key results, proving that they are invariant to the choice of reference component, and discussing alternative transformations. Following this, we present an illustration from the field of education using data from Kolb's Learning Style Inventory (Batista-Foguet et al., 2015; Kolb, 1984, 1999) and Boyatzis' Philosophical Orientation Questionnaire (Boyatzis et al., 2000). The final section discusses the strengths and weaknesses of the method.
2 Canonical Correlation Analysis
Canonical correlation analysis (CCA) is a multivariate analysis technique which studies the relationships between two sets of variables Ya=(ya1, ya2, ..., yap) and Yb=(yb1, yb2, ..., ybq) usually defined in the real space. The method was first introduced in Hotelling (1936) and a non-technical description can be found in Hair et al. (2009).
CCA builds pairs of linear combinations of each set of variables called canonical variates. The first canonical variate cva1 for set Ya is derived so that it is maximally correlated with the first canonical variate cvb1 for set Yb. The second canonical variate cva2 for set Ya is derived so that it is maximally correlated with the second canonical variate cvb2 for set Yb under the restriction that both new canonical variates are uncorrelated with cva1 and cvbi. The following
135
Gloria Mateu-Figueras et al.
pairs are extracted in a similar manner and have the maximum mutual correlation, while being uncorrelated with the previous pairs. The process may be continued up to min{p,q} times.
The raw canonical coefficients waij and wbij are the weights used to compute the i-th pair of canonical variates from the j-th original variables:
CVa1=Wa1iya1+ Wa12 ya2+-" + Walp yap CVbl=Wbllybl+ Wbi2 yb2+-"+ Wbiq ybq
CVa2=Wa2iya1 + Wa22 ya2+-" + Wa2p yap	(2.1)
CVb2=Wb2iyb1+ Wb22 yb2+-"+ Wb2qybq
In practice, the canonical coefficients are computed from three covariance matrices: the square matrix Saa contains covariances in the first variable set, the square matrix Sbb covariances in the second set, and the rectangular matrix Sab covariances between variables of one set and the other. Canonical variates are obtained from an eigenvalue analysis of matrix:
S buS abS bbS ba	(2.2)
The correlation between cva1 and cvb1 is the first canonical correlation px, the correlation between cva2 and cvb2 is the second canonical correlation p2, and so on. These canonical correlations are obtained as the square root of the eigenvalues of the matrix in Equation (2.2).
The maximum number of canonical variates that can be extracted is the smallest dimension of the two sets of variables. For instance, if p=5 and q=8, then a maximum of 5 pairs of variates can be obtained. As with many other multivariate analysis techniques, the researcher is interested in a parsimonious solution and in interpreting only the most relevant variates. The relevance of a pair of canonical variates can be assessed by the sheer size of the canonical correlation, the interpretability of the canonical variates from the canonical weights, or the statistical significance of the canonical correlations according to Wilks' A tests, which are also a function of the eigenvalues. Since, y31>yo2>...>/3min{pq}, a common strategy is to sequentially test the following hypotheses:
Exploring the Relationship between two Compositions...
136
Hoi! Pl-p2-p3~ '-Pmin{p,q}-0
H02: P2=P3="'=Pmin{p,q}=0
..........(2.3)
H0min{p,q} -1 : Pmin{p,q} -1 Pmin{p,q} 0 H0min{p,q}: Pmin{p,q} 0
The rejection of H01 to H0i and the failure to reject H0i+1 to H0min{p,q} shows the first i canonical correlations to be statistically significant.
Other common results of a CCA which provide a useful aid to interpreting the canonical variates require standardization in some form or other (Hair et al., 2009) and are:
1.	Standardized canonical coefficients (coefficients used to compute canonical variates from standardized y variables).
2.	Canonical loadings (correlations between the canonical variates and the y variables they are computed from).
3.	Canonical cross-loadings (correlations between canonical variates and the other set of y variables).
4.	Redundancy analysis (percentages of variance for the y variables explained by their own canonical variates and from the canonical variates computed from the other set of y variables).
3 Canonical Correlation Analysis of Compositional Data Transformed by Means of alr
3.1 Interpretation
Given two compositions with Da and Db components, Xa—(xai, Xa2,-,XaDa) and Xb—(xbi,Xb2,-,XbDb), following Aitchison (1986) we first apply alr transformation with the last component in the denominator. The results are the following two real vectors withp=Da-l and q=Db-l elements:
137
Gloria Mateu-Figueras et al.
Y„ =
Y =
r r x ^
ln x
x
V V aDa J
f v
ln
V xaDa J
...,ln
r x ^
ap
x
V aDa J J
r r ^ >
ln ^
V V XbDb J
ln
r x ^
lb2
V XbDb J
,.,ln
r x ^
^bq
V xbDb JJ
(3.1)
We can rewrite Equation (3.1) as:
Ya = (ln(Xa1 ) " ln(XaDa ), ln(Xa2 ) - ln(XaDa ), • - ,ln(Xap ) - ln(XaDa )) Yb = (ln(xb1 )" ln(xbDb ),ln(xb2 )" ln(xbDb ),.,ln(xbq )-ln(xbDb ))
(3.2)
Ya and Yb are two sets of real variables to which we can apply the standard CCA procedure from the covariance matrices of each set of transformed variables and the covariance matrix between the transformed variables of one set and the other in Equation (2.2).
The first pair of canonical variates in Equation (2.1), when expressed in terms of logarithms of components, becomes:
CVal=Wallln(Xal)+ Wal2 ln(Xa2)+ ••• + Walp ln(Xap)-( Wall + Wal2+ + Walp) ln(XaDa) cvbi=wbnln(xbi)+ Wbl2 ln(xb2)+ •••+ Wblq ln(Xbq)-( Wbll+ Wbl2+ •••+ Wblq) ln(XbDb) (3.3)
Since the raw canonical coefficients are applied from ln(xal) to ln(xap) and again to ln(xaDa) with reversed signs, the weights of all Da logarithms add up to zero, and the same occurs with the weights of the Db logarithms of the xb variables. This would also hold for the remaining canonical variates.
This is the same as saying that the canonical variates are log ratios of the product of components with a positive weight raised to a power equal to that weight, over the product of components with a negative weight raised to a power equal to the absolute weight. Let us show an example of the former for a canonical variate of a 5-dimensional composition with:
CVal = 1yal + 1.5ya2+ 0.5ya3 -0.5 ya4 The reexpression of this canonical variate as a log-ratio is:
CVal = 1ln(Xal)+ 1.5ln(Xa2)+ 0.5ln(xa3) -0.5 ln(Xa4)-2.5ln(Xaj)=ln
(3.4)
1.5 _0.5 \
X 05 X 25 V Aa 4a a5 J
(3.5)
a2
Exploring the Relationship between two Compositions...
138
The cva1 log-ratio in this example is high mainly when xa1 and xa2 are high and xa5 is low.
Since the sum of positive exponents equals the sum of negative exponents, the log-ratio is also a log-contrast, that is, a log-linear combination where the sum of the coefficients is 0 (Aitchison, 1986: 84).
The first pair of canonical variates can thus be interpreted as the log-contrast of one of the compositions that has the maximum correlation with a log-contrast of the other composition. The second pair can be interpreted as the log-contrast of one of the compositions that has the maximum correlation with a log-contrast of the other composition, under the restriction that they are uncorrelated with the first pair of canonical variates. A similar interpretation would hold for the third pair, subject to zero correlation with the first two pairs, and so on.
3.2 Invariance of the Results to the Choice of Reference Component in alr
Although the last component in each composition was chosen as the common divisor in our alr transformation, this could equally have been any other component. Consequently, for any analysis involving alr vectors, it is important to check the invariance of the key results to component permutations, or in other words, their invariance with respect to the choice of divisor in alr transformation. In this section we show specifically that Wilks' A tests, canonical correlations, and canonical variates as functions of log components -Equation (3.3)- are invariant to this choice.
It is easy to see how two alr-transformed vectors using different components as a divisor are related using a change-of-basis matrix. Following Mateu-Figueras et al. (2011), the elements of an alr vector are the coefficients of the original composition with respect to a particular non-orthonormal basis on the simplex, the sample space of compositional data. The effect of changing the common divisor is to obtain the coefficients with respect to another particular basis, which is analogous to performing an oblique rotation of the data.
Let Ya and Yb be the alr transformed vectors using the last components as common
* *
divisors and let Ya and Yb be the alr-transformed vectors using other components as
**
denominators. Then, Ya =QYa and Yb =PYb. We can obtain the exact expression of matrices Q and P (see Aitchison, 1986: 94), but the important point here is that matrices Q and P are change-of-basis matrices. From the usual properties of covariance matrices we know that:
139
Gloria Mateu-Figueras et al.
Si = QS aaQ'	(3.6)
S bb = PS bb P'	(3.7)
S lb = QS ab P' and S ba = PS b, Q'	(3.8)
When using different common divisors in alr transformation, the analyzed matrix in Equation (2.2) becomes:
(Sla) Slb (Sbb ) Sbi	(3.9)
By using the relationships in Equations (3.6)-(3.8), Equation (3.9) becomes:
(Si, )-1Slb (Sbb )X = (QSaaQ')- (QS,bP^SbbP') (PSbaQ') =	(3 1 Q)
(Q')-1S -lS abS -1S baQ'
From linear algebra properties, we know that the eigenvalues of a matrix are invariant under changes of basis. Consequently, both the canonical correlations and Wilks' A tests are invariant under change of common divisor in alr transformation.
It is easy to see how the normalized eigenvectors of matrices in Equations (3.9) and (2.2), denoted as w"ai and wai respectively, must be related by Q'wl = wai or
w *ai = (Q')-1 w ai. Then we obtain the invariance of the corresponding canonical variates as:
cv". = (w*.) y* =((Q')_1 w .)qY = w' Q_1QY = w'.Y = cv .	(3.11)
ai \ a. / a	Z a^^^-a	a^^^a	a. a	a.	V	^
Conversely, all results that imply standardization, like standardized canonical coefficients, canonical loadings/cross-loadings and redundancy analysis, are not invariant to the choice of reference component in alr transformation. In the case of CoDa, however, given the facts that canonical variates can be readily interpreted as log-ratios and log-contrasts on their own, and that standardization is extremely uncommon for log-contrasts, standardized information is not needed to enhance interpretation and is not considered in this article.
Exploring the Relationship between two Compositions...
140
3.3 Appropriateness of Alternative Log-ratio Transformations for Canonical Correlation Analysis
One key issue when working with CoDa is the choice of the log-ratio transformation, since different possibilities are available. Additive log-ratio (alr) and centered log-ratio (clr) transformations were introduced in Aitchison (1986), while isometric log-ratio transformation (ilr) was introduced in Egozcue et al. (2003).
Aitchison's (1986) proposal for compositional CCA involved alr transformation. Although alr transformation is simple and easy to interpret, it is asymmetric in its parts. By changing the part in the denominator, a different alr-transformed vector is obtained. For this reason, when alr transformation is used, it is important to check the invariance of the results with respect to the choice of common denominator, as we have done in Section 3.2. However, as Egozcue et al. (2003) noted, the main drawback of alr transformation is that it is not an isometric transformation from the simplex to the real space. It was later shown that an alr vector can be viewed as the coefficients of a composition with respect to a non-orthonormal basis on the simplex (Mateu-Figueras et al., 2011). Consequently, it is not suitable for statistical techniques that use distances or angles between alr vectors, such as cluster analysis. Note that these problems do not occur when using CCA because eigenvalues and eigenvectors of a product of covariance matrices are involved. Due to the non-orthonormality of the basis, the equality Q'w*ai = wai is only true if the vector product Q'w*ai is normalized, although this does not affect the analyses considered in this article.
Clr transformation is defined as the logarithm of the ratio of each part over the geometric mean. It is a symmetric transformation with respect to the compositional parts and also an isometric transformation. Nevertheless, clr transformation has the disadvantage that the clr covariance matrix is singular. In our case, clr transformation would not be a good choice because CCA uses covariance matrices and their inverses. Conversely, it would be a good choice for cluster analysis or other statistical techniques in which distances are crucial and covariances do not need to be inverted.
Ilr transformation is isometric and consequently makes it possible to associate distances in the simplex with distances in the transformed space. Additionally, an ilr vector can be viewed as the coordinates of a composition with respect to an orthonormal basis on the simplex. Finally, covariance matrices can be inverted. It can thus be used in virtually all statistical analyses. The expression of the ilr using a particular orthonormal basis is given in Egozcue et
141
Gloria Mateu-Figueras et al.
al. (2003). Nevertheless, in inner product spaces, an orthonormal basis is not uniquely determined and in some cases it is not straightforward to determine which basis is the most appropriate to solve a specific problem and how it can be interpreted. Faced with the problem of interpreting CCA on ilr transformation, van den Boogaart and Tolosana-Delgado (2013) devise a graphical back-transformation of the canonical coefficients. In any case, the invariance of the ilr results with respect to the choice of the orthonormal basis also holds.
In this article, although alr transformation was used due to its simplicity, ilr transformation could also have been used, and we actually did rerun the illustration analysis with ilr transformation. The final canonical variates expressed in terms of the log components and as log-contrasts are invariant, because alr and ilr vectors are also related through a change-of-basis matrix.
4 Illustration
4.1 Background
In this illustration of compositional CCA, our aim is to relate students' learning styles to their philosophical orientations. Philosophical orientation is a good means of understanding the relationship between people's values and beliefs, and their behavior and approach to learning (Boyatzis et al., 2000). Since a person's behavior is related to his or her values and beliefs, philosophy is important for comprehending and predicting behavior, with the added advantage that a person's philosophy goes beyond social context. Philosophical orientation is useful for answering questions such as how individuals 'act across various social settings' or 'think about establishing the value of things, activities and others' (Boyatzis et al., 2000: 50). Three major clusters of philosophical systems have traditionally been proposed. These clusters define the extent to which a person is pragmatic (PR), intellectual (IN) or humanistic (HU).
A person with a predominantly PR philosophical orientation will make decisions based on the benefits of the action, measured in terms of utility or comparing input and output. If the objectives to be achieved are not clear or measuring utility is difficult, then an activity will be less valuable to a person with this orientation.
Exploring the Relationship between two Compositions...
142
Someone with a predominantly IN philosophical orientation will be rational, logical and focus on comprehending everything. The central concern underlying this philosophical orientation is analytical.
Someone with a predominantly HU orientation is thought to be committed to human values. This kind of person will tend to determine whether an activity is worthy in terms of its impact on other people and the quality of the relationship with these people. The central issue underlying HU orientation is a concern for close and personal relationships.
According to Experiential Learning Theory, learning is a process whereby knowledge is created through the transformation of experience (Kolb, 1984). Learning requires abilities to grasp and transform knowledge that are polar opposites. In grasping knowledge, some people perceive new information through experiencing the concrete, tangible, and felt qualities of the world, which is referred to as concrete experience (CE), while others tend to take hold of new information through symbolic representation or abstract conceptualization (AC). In transforming knowledge, some people tend to carefully watch others who are involved in the experience and reflect on what happens (reflective observation - RO), while others choose to start doing things (active experimentation - AE). Learning can also be conceived as a four-stage cycle, where each stage is represented by a learning mode.
At the CE stage, one tends to rely more on intuition than on a systematic focus. Moreover, in this stage, a learner relies on the ability to be open, receptive and adaptive to changes. At the RO stage, one comprehends situations by taking into account different perspectives. In this stage, a learner relies on patience and objectivity, as well as thoughts and feelings. At the AC stage, logic and ideas are needed to understand a problem, rather than feelings. A learner in this stage relies on systematic planning and the theoretical development of ideas. Finally, at the AE stage, one learns by experimenting with changing situations. In this stage, a learner will find it more useful to put ideas into practice and see what really works than to simply observe.
4.2 Data and Measures
Multidimensional forced-choice questionnaires to measure philosophical orientations and learning modes were designed in Boyatzis et al. (2000) and Kolb (1999). In these questionnaires, each question consists of a set of D statements, and each statement is an indicator of a different dimension, in our case, of a philosophical orientation (D=3) or a
143
Gloria Mateu-Figueras et al.
learning mode (D=4). Respondents are instructed to rank these statements. In this article, we assume that ranks are coded as D-l for the most preferred statement, D-2 to the second most preferred, down to 0 for the least preferred. The Philosophical Orientation Questionnaire consists of k=20 questions designed as in this example:
"I think of my value, or worth, in terms of:
(a)	My relationships (e.g. family, friends).
(b)	My ideas or ability to invent new concepts or ability to analyse things.
(c)	My financial net worth or income."
Statement (a) reflects the HU orientation, (b) the IN orientation, and (c) the PR orientation.
The Learning Style Inventory includes k=12 questions designed as in this example:
"When I am learning:
(a)	I like to experience sensations.
(b)	I like to observe and listen.
(c)	I like to think about ideas.
(d)	I like to do things."
Statement (a) reflects the CE mode, (b) the RO mode, (c) the AC mode, and (d) the AE mode.
The ranks of each dimension are summed across the k questions to produce D global scores, one for each dimension. These D scores have a fixed sum for all respondents, equal to kD(D-1)/2. Once the global scores have been computed, forced-choice instruments can be understood as compositions, in which the kD(D-1)/2 total is allocated to the D dimensions (components), so that data only convey information about the relative importance of dimensions (learning modes and philosophical orientations) for a given individual. Under this coding scheme, the dimension score is the number of times the dimension has been preferred over other dimensions in all possible pair-wise comparisons over the k questions. For instance, if a component is always ranked as the lowest, it has never been preferred to any other mode and receives a 0 score. If a component is always ranked as the highest, it is preferred k times to the other D-1 modes and receives a k(D-1) score. Scores can thus be understood as having ratio scale properties: a component with a score of 6 has been preferred
Exploring the Relationship between two Compositions...
144
to other components twice as many times across the k items than a mode with score of 3 (Batista-Foguet et al., 2015). Alternative ways of coding these questionnaires are discussed in de Vries and van der Ark (2008).
In this illustration, we use the same data as those used by Batista-Foguet et al. (2015), which cover 7 consecutive years (2006-2013) of candidates on an international MBA program at a leading European business school. The sample size was 1,194 full time participants from 86 countries, of which the most common were Spain (15.9%), the US (13.7%), India (9.6%), and Germany (5.6%). 69.7% were male and 30.3% female. Average age was 31.4 years (SD 2.8 years). Previous student background was heterogeneous, including not only economics (11%) and management studies (32%), but also engineering (36.4%), social sciences (9.3%), arts (5.7%) and hard sciences (5.5%).
The philosophical orientation components were labeled xp1=pragmatic (PR), xp2=intellectual (IN), and xp3=humanistic (HU); while the learning mode components were labeled x/1=abstract conceptualization (AC), xl2=concrete experience (CE), x/3=active experimentation (AE), and x/4=reflective observation (RO). The final two components, HU and RO, were used as a reference for the alr transformation:
f x 1
log - ratio of AC over RO yl1 = ln — = ln(xn ) - ln(xz4 )
V Xl 4 J
(4.1)
f X 1
log - ratio of CE over RO yt2 = ln — = ln(2 ) - ln(4 )
V Xl 4 J
log - ratio of AE over RO yl3 = ln = ln(3 ) - ln(4 )
V Xl 4 J
145
Gloria Mateu-Figueras et al.
4.3 Results
After submitting the sets (yp1, yp2) and (yli, yl2, y3) to a CCA using SPSS v.23, the resulting canonical correlations are p1 =0.246 and /32 =0.163. Their significance tests are in Table 1. The raw (unstandardized) canonical coefficients are in Table 2.
Table 1: Significance Tests for the Canonical Correlations
H0		Wilk's A	x2	DF	p-value
p1=	=P2=0	0.914	93.854	6	0.000
P2=	=0	0.973	28.295	2	0.000
Table 2: Raw Canonical Coefficients as a Function of the Log-ratios
	Variate 1	Variate 2
Philosophical orientations		
ypi (log-ratio of PR over HU)	-0.524	1.730
yp2 (log-ratio of IN over HU)	2.085	-0.274
Learning modes		
yn (log-ratio of AC over RO)	1.720	-0.177
yi2 (log-ratio of CE over RO)	-0.447	-1.347
yl3 (log-ratio of AE over RO)	-1.032	1.311
The original canonical variates are functions of the log ratios and are easily re-expressed by hand as a function of the log-components as in Equation (3.3). For instance, in the philosophical orientation composition the first canonical variate is:
cvpi =-0.524ypi + 2.085yp 2 =
-	0.524 ln(xp1 ) + 0.524 ln(xp3 ) + 2.085 ln(xp2 ) - 2.085 ln(xp3 ) =	(4.2)
-	0.524 ln(xp1 ) + 2.085ln(xp 2 ) - 1.561ln(xp3 )
Table 3: Raw Canonical Coefficients as a Function of the Log-components
	Variate 1	Variate 2
Philosophical orientations		
ln(xpl) (PR)	-0.524	1.730
ln(xP2) (IN)	2.085	-0.274
ln(xPs) (HU)	-1.561	-1.456
Learning modes		
ln(xi) (AC)	1.720	-0.177
ln(xi2) (CE)	-0.447	-1.347
ln(xl3) (AE)	-1.032	1.311
ln(xu) (RO)	-0.241	0.213
Exploring the Relationship between two Compositions...
146
Canonical variates as a function of log components are shown in Table 3. As in Equation (4.2), the coefficients in Table 2 apply to all rows in Table 3 but the last one of each composition, which receives their sum with reversed sign.
The canonical variates in Table 3 correspond to the following log-contrasts:
cvpi = In
•V 2
x0.524 1 561 VX pi p3
CVp2 = ln
p1
x0.274 1456 \Xp 2 p3 )
CVi = in
x0.447 x1.032 x0.241 V Xl 2 Xl3 Xl 4
( 1.311 0.213 A
CVl 2 = ln
x0.177 x1.347
V Xl1 Xl 2 y
(4.3)
The first pair of canonical variates can therefore be interpreted as follows: when the IN (xp2) orientation is high and the HU (xp3) orientation is low, then the AC (xu) mode is high and the AE (xl3) mode is low. The second pair of canonical variates can be interpreted as follows: when the PR (xpl) orientation is high and the HU (xp3) orientation is low, then the AE (xjs) mode is high and the CE (xl2) is low. Our results are similar to those of Boyatzis et al. (2000), who reported the PR orientation as correlating positively with AE and negatively with CE; and the IN orientation as correlating positively with AC and negatively with AE.
2.085
1.730
1.720
X
5 Discussion
The increasing awareness of CoDa leads to an increasing interest in problems involving more than one composition. Standard statistical analysis includes many tools for relating two sets of variables, and one of the most popular in multivariate exploratory analysis is CCA. Within CoDa, tools for relating several compositions are still underdeveloped. In this article we have shown how to adapt CCA to compositional data in order to explore the relationship between two compositions. In our illustration we have found learning styles to be related to philosophical orientations in an interpretable manner in accordance with the literature, which supports the practical usefulness of the method as an exploratory tool.
The appeal of the CoDa log-ratio approach for applied researchers lies in the fact that once the data have been transformed using appropriate log-ratios, standard and well-understood statistical techniques such as CCA can be used. Once log-ratios have been computed, a compositional CCA is no more complicated than a standard CCA and standard statistical software dealing with CCA can be used. In order to be used with compositional
147
Gloria Mateu-Figueras et al.
data, software must be able to derive the canonical variates from the covariance product in Equation (2.2) and include raw canonical coefficients as a part of the output. We recommend either SPSS, the cca function in theyacca R library (setting xscale=FALSE, yscale=FALSE), or the cc function in the CCA R library. It must be taken into account that some software for CCA either analyzes correlation matrices rather than covariance matrices (like the canocor function in the R library of the same name) or reports only standardized coefficients (like the CCorA function in the vegan R library). For the computation of canonical correlations and their significance tests, standardization or the use of correlations are irrelevant.
In some cases, the interpretation of the results of a statistical method on compositional data differs to some extent from its interpretation on unconstrained data. In the case of CCA, standardized results are neither usable nor needed, because unstandardized canonical variates can be interpreted as log-contrasts in a straightforward manner. This way of interpreting the results as log-contrasts fits well with the CoDa way of thinking and increases the attractiveness of the approach within an exploratory CoDa. CCA can also be applied to relate one composition to a set of numeric variables defined in the real space. In this case, the canonical variates are log-contrasts in the composition and linear combinations of the set of numeric variables with maximum mutual correlation.
The CoDa approach focuses on relative rather than absolute differences in the data. Treating compositional data directly without the log-ratio transformation implies assuming that the difference between scores 1 and 2 is the same as the difference between scores 10 and 11, while in the former case they differ by 100% and in the second by only 10%. A commonly mentioned limitation of the CoDa approach is the presence of zeros in the xd variables, which prevents the analyst from computing log-ratios. Details on methods available for treating zeros prior to analysis, which perform well if the percentage of cases with zeros is not large, can be found in Martín-Fernández et al. (2011).
Further research could include adapting other multivariate techniques that relate sets of variables to compositional data, such as redundancy analysis, in order to derive a specified number of new latent variables from a composition that explains as much variance as possible from the other compositions. Related methods in the statistical modeling arena include simultaneous regression systems in which both explanatory and dependent variables are compositional (Tolosana-Delgado and van den Boogaart, 2013) and compositional partial least squares (Kalivodová et al., 2015).
Exploring the Relationship between two Compositions...
148
Acknowledgements
The authors would like to acknowledge the support provided by Spanish Health Ministry Grant CB06/02/1002 funding the research group "Epidemiology and Public Health (CIBERESP)"; Catalan Autonomous Government Consolidated Research Group Grants 2014SGR551 and 2014SGR582 funding the research groups "Compositional and Spatial Data Analysis (COSDA)" and "Leadership Development Research Centre (GLEAD)"; Spanish Economy and Competitiveness Ministry grants MINECO/FEDER-EU MTM2015-65016-C2-1-R and EDU2015-68610-R funding the projects "COmpositional Data Analysis and RElated meThOdS (CODA-RETOS)" and "Assessing Individual and Team Entrepreneurial Potential"; and University of Girona grants MPCUdG2016/069 and MPCUdG2016/098.
References
[1]	Aitchison, J. (1983): Principal component analysis of compositional data. Biometrika, 70, 57-65.
[2]	Aitchison, J. (1986): The Statistical Analysis of Compositional Data. Monographs on Statistics and Applied Probability. London: Chapman and Hall.
[3]	Aitchison, J. (2001): Simplicial inference. In M.A. Viana and D.S. Richards (Eds):
Algebraic Methods in Statistics and Probability. Contemporary Mathematics Series of the American Mathematical Society, vol. 287, 1-22. Providence, RI: American Mathematical Society.
[4]	Aitchison, J. and Greenacre, M. (2002): Biplots of compositional data. Journal of the Royal Statistical Society: Series C (Applied Statistics), 51, 375-392.
[5]	Batista-Foguet, J.M., Ferrer-Rosell, B., Serlavos, R., Coenders, G. and Boyatzis, R.E. (2015): An alternative approach to analyze ipsative data. Revisiting Experiential Learning Theory. Frontiers in Psychology, 6, 1742.
[6]	Billheimer, D., Guttorp, P. and Fagan, W. (2001): Statistical interpretation of species composition. Journal of the American Statistical Association, 96, 1205-1214.
[7]	Van den Boogaart, K. G. and Tolosana-Delgado, R. (2013): Analyzing Compositional Data with R. Berlin: Springer.
[8]	Boyatzis, R.E., Murphy, A.J. and Wheeler, J.V. (2000): Philosophy as a missing link between values and behaviour. Psychological Reports, 86, 47-64.
[9]	ter Braak, C.J.F. (1996). Unimodal Models to Relate Species to Environment. Wageningen, NL: DLO-Agricultural Mathematics Group.
[10]	Coenders, G., Hlebec, V. and Kogovsek, T. (2011): Measurement quality in indicators of compositions. A compositional multitrait-multimethod approach. Survey Research Methods, 5, 63-74.
149
Gloria Mateu-Figueras et al.
[11]	Egozcue, J.J., Daunis-i-Estadella, J., Pawlowsky-Glahn, V., Hron, K. and Filzmoser, P. (2012): Simplicial regression. The normal model. Journal of Applied Probability and Statistics, 6, 87-108.
[12]	Egozcue, J.J., and Pawlowsky-Glahn, V. (2011): Basic concepts and procedures. In V. Pawlowsky-Glahn and A. Buccianti (Eds): Compositional Data Analysis. Theory and Applications, 12-28. New York: Wiley.
[13]	Egozcue, J.J., Pawlowsky-Glahn, V., Mateu-Figueras, G. and Barceló-Vidal, C. (2003): Isometric logratio transformations for compositional data analysis. Mathematical Geology, 35, 279-300.
[14]	van Eijnatten, F.M., van der Ark, L.A. and Holloway, S.S. (2015): Ipsative measurement and the analysis of organizational values: an alternative approach for data analysis. Quality & Quantity, 49, 559-579.
[15]	Ferrer-Rosell, B. and Coenders, G. (2016): Destinations and crisis. Profiling tourists' budget share from 2006 to 2012. Journal of Destination Marketing & Management. doi: 10.1016/j.jdmm.2016.07.002
[16]	Ferrer-Rosell, B., Coenders, G. and Martínez-Garcia, E. (2015): Determinants in tourist expenditure composition- the role of airline types. Tourism Economics, 21, 9-32.
[17]	Ferrer-Rosell, B., Coenders, G. and Martínez-Garcia, E. (2016a): Segmentation by tourist expenditure composition. An approach with compositional data analysis and latent classes. Tourism Analysis, 21, 589-602.
[18]	Ferrer-Rosell, B., Coenders, G., Mateu-Figueras, G. and Pawlowsky-Glahn, V. (2016b): Understanding low cost airline users' expenditure pattern and volume.
Tourism Economics, 22, 269-291.
[19]	Fry, T. (2011): Applications in economics. In V. Pawlowsky-Glahn and A. Buccianti (Eds): Compositional Data Analysis. Theory and Applications, 318-326. New York: Wiley.
[20]	Hair, J.F, Black, W.C., Babin, B.J. and Anderson, RE. (2009): Multivariate Data Analysis. A Global Perspective (7th ed.). Upper Saddle River, NJ: Pearson/Prentice Hall.
[21]	Hlebec, V., Kogovšek, T. and Coenders, G. (2012): The measurement quality of social support survey measurement instruments. Metodološki Zvezki, 9, 1-24.
[22]	Hotelling, H. (1936): Relations between two sets of variates. Biometrika, 28, 321-377.
[23]	Hron, K., Filzmoser, P. and Thompson, K. (2012): Linear regression with compositional explanatory variables. Journal of Applied Statistics, 39,. 1115-1128.
[24]	Kalivodová, A., Hron, K., Filzmoser, P., Najdekr, L., Janecková, H. and Adam, T. (2015): PLS-DA for compositional data with application to metabolomics. Journal of Chemometrics, 29, 21-28.
[25]	Kogovšek, T., Coenders, G. and Hlebec, V. (2013): Predictors and outcomes of social network compositions. A compositional structural equation modeling approach. Social Networks, 35, 1-10.
[26]	Kolb, D.A. (1984): Experiential Learning: Experience as the Source of Learning and Development. Englewood Cliffs, NJ: Prentice Hall.
Exploring the Relationship between two Compositions...
150
[27]	Kolb, D.A. (1999): Learning Style Inventory, Version 3. Boston, MA: Hay Resources Direct.
[28]	Martín-Fernández, J.A., Daunis-i-Estadella, J. and Mateu-Figueras, G. (2015): On the interpretation of differences between groups for compositional data. SORT- Statistics and Operations Research Transactions, 39, 231-252.
[29]	Martín-Fernández, J.A., Palarea-Albaladejo, J. and Olea, R.A. (2011): Dealing with zeros. In V. Pawlowsky-Glahn and A. Buccianti (Eds): Compositional Data Analysis. Theory and Applications, 47-62. New York: Wiley.
[30]	Mateu-Figueras, G., Pawlowsky-Glahn, V. and Egozcue, J.J. (2011). The principle of working on coordinates. In V. Pawlowsky-Glahn and A. Buccianti (Eds):
Compositional Data Analysis. Theory and Applications, 31-42. New York: Wiley.
[31]	Pawlowsky-Glahn, V. and Buccianti, A. (2011): Compositional Data Analysis. Theory and Applications. New York: Wiley.
[32]	Pawlowsky-Glahn, V. and Egozcue, J.J. (2001): Geometric approach to statistical analysis on the simplex. Stochastic Environmental Research and Risk Assessment, 15, 384-398.
[33]	Pawlowsky-Glahn, V. and Egozcue, J.J. (2011): Exploring compositional data with the CoDa-dendrogram. Austrian Journal of Statistics, 40, 103-113.
[34]	Pawlowsky-Glahn, V., Egozcue, J.J. and Tolosana-Delgado, R. (2015): Modeling and Analysis of Compositional Data. Chichester: Wiley.
[35]	Pearson, K. (1897): Mathematical contributions to the theory of evolution. On a form of spurious correlations which may arise when indices are used in the measurements of organs. Proceedings of the Royal Society, 60, 489-498.
[36]	Thió-Henestrosa, S. and Martín-Fernández, J.A. (2005): Dealing with compositional data: The freeware CoDaPack. Mathematical Geology, 37, 773-793.
[37]	Tolosana-Delgado, R. and van den Boogaart, K.G. (2013): Regression between compositional data sets. In K. Hron, P. Filzmoser and M. Templ (Eds): Proceedings of the 5th International Workshop on Compositional Data Analysis (CoDaWork 2013), 163-176. Vienna: Vienna University of Technology.
[38]	Vives-Mestres, M., Martín-Fernández, J.A. and Kenett, R. (2016): Compositional data methods in customer survey analysis. Quality and Reliability Engineering International, 32, 2115-2125.
[39]	de Vries, A.L.M. and van der Ark, L.A. (2008): Scoring methods for ordinal multidimensional forced-choice items. In J. Daunis-i-Estadella and J.A. Martín-Fernández (Eds): Proceedings of the 3rd International Workshop on Compositional DataAnalysis (CoDaWork 2008), 1-18. Girona: University of Girona.
INSTRUCTIONS TO AUTHORS
Language: Metodološki zvezki - Advances in Methodology and Statistics is published in English.
Submission of papers: Authors are requested to submit their articles (complete in all respects) to the Editor by e-mail (MZ@stat-d.si). Contributions are accepted on the understanding that the authors have obtained the necessary authority for publication. Submission of a paper will be held to imply that it contains original unpublished work and is not being submitted for publication elsewhere. Articles must be prepared in LaTeX or Word. Appropriate styles and example files can be downloaded from the Journal's web page (http://www.stat-d. si/mz/).
Review procedure: Manuscripts are reviewed by two referees. The editor reserves the right to reject any unsuitable manuscript without requesting an external review.
Preparation of manuscripts
Tables and figures: Tables and figures must appear in the text (not at the end of the text). They are numbered in the following way: Table 1, Table 2,..., Figure 1, Figure 2,...
References within the text: The basic reference format is (Smith, 1999). To cite a specific page or pages use (Smith, 1999: 10-12). Use "et al." when citing a work by more than three authors (Smith et al., 1999). The letters a, b, c etc. should be used to distinguish different citations by the same author(s) in the same year (Smith, 1999a; Smith, 1999b).
Notes: Essential notes, or citations of unusual sources, should be indicated by superscript number in the text and corresponding text under line at the bottom of the same page.
Equations: Equations should be centered and labeled with two numbers separated by a dot enclosed by parentheses. The first number is the current section number and the second a sequential equation number within the section, e.g., (2.1)
Author notes and acknowledgements: Author notes identify authors by complete name, affiliation and his/her e-mail address. Acknowledgements may include information about financial support and other assistance in preparing the manuscript.
Reference list: All references cited in the text should be listed alphabetically and in full after the notes at the end of the article.
References to books, part of books or proceedings:
[1]	Smith, J.B. (1999): Title of the Book. Place: Publisher.
[2]	Smith, J.B. and White A.B. (2000): Title of the Book. Place: Publisher.
[3]	Smith, J. (2001): Title of the chapter. In A.B. White (Ed): Title of the Proceedings, 14-39. Place: Publisher.
Reference to journals:
[4]	Smith, J.B. (2002): Title of the article. Name of Journal, 2, 46-76.
Metodološki zvezki
Advances in Methodology and Statistics
Published by Faculty of Social Sciences University of Ljubljana, for Statistical Society of Slovenia
Editors
Izdajatelj
Fakulteta za družbene vede Univerze v Ljubljani za Statistično društvo Slovenije
Urednika
Valentina Hlebec Lara Lusa
Founding Editors
Cover Design
Anuška Ferligoj Andrej Mrvar
Bojan Senjur Gregor Petrič
Prva urednika
Oblikovanje naslovnice
Typesetting
Lara Lusa
Računalniški prelom
Printing
Tisk
DEMAT, d.o.o. Ljubljana, Slovenia
MZ
is indexed and abstracted in
je indeksirana in abstrahirana v
SCOPUS EBSCO ECONIS STMA-Z ProQuest
Home page URL
Spletna stran
http://www.stat-d.si/mz/
ISSN 1854 - 0023