CONTENTS
Metodoloski zvezki, Vol. 13, No. 1, 2016
Morteza Amini and S.M.T.K. MirMostafaee
Interval Prediction of Order Statistics Based on Records by Employing Inter-Record Times:
A Study Under Two Parameter Exponential Distribution	1
Anindita Datta, Seema Jaggi, Cini Varghese and Eldho Varghese Series of Incomplete Row-Column Designs
with Two Units per Cell	17
Jose Pina-Sanchez
Adjustment of Recall Errors in Duration Data Using SIMEX	27
Janez Stare and Delphine Maucort-Boulch Odds Ratio, Hazard Ratio and Relative Risk
59
Metodoloski zvezki, Vol. 13, No. 1, 2016, 1-15
Interval Prediction of Order Statistics Based on Records by Employing Inter-Record Times: A Study Under Two Parameter Exponential
Distribution
Morteza Amini1 and S.M.T.K. MirMostafaee2
Abstract
In this note, we propose a parametric inferential procedure for predicting future order statistics, based on record values, which takes inter-record times into account. We utilize the additional information contained in inter-record times for predicting future order statistics on the basis of observed record values from an independent sample. The two parameter exponential distribution is assumed to be the underlying distribution.
1 Introduction
Suppose Y1,... ,Ym are independent and identically distributed (iid) observations from an absolutely continuous cumulative distribution function (cdf) F, possessing probability density function (pdf) f. The order statistics of the sample Y1,..., Ym, represented by Yi:m < ■ ■ ■ < Ym:m, are obtained by arranging the sample in an increasing order. Order statistics have been used in a wide range of applications, including robust statistical estimation, detection of outliers, characterization of probability distributions, goodness-of-fit tests, entropy estimation, analysis of censored samples, reliability analysis, quality control and strength of materials. A useful survey of available results until 2003 is given in the book of David and Nagaraja (2003).
Let X1,X2,... be a sequence of iid random variables, independent of and identically distributed to Y1. An observation Xj is called an upper (lower) record value if its value exceeds (resp. falls below) those of all the previous observations, that is the nth upper (resp. lower) record value, Un (resp. Ln), is defined as XTn, where T1 = 1, with probability 1, and Tn = min{j : j > Tn-1, Xj > XTn-1} (resp. Tn = min{j : j > Tn-1, Xj < XTn-1}), for n > 1. Throughout this paper we
department of Statistics, School of Mathematics, Statistics and computer Science, College of Science, University of Tehran, P.O. Box 14155-6455, Tehran, Iran; morteza.amini@ut.ac.ir
2 Department of Statistics, Faculty of Mathematical Sciences, University of Mazandaran, P.O. Box 47416-1467, Babolsar, Iran; m.mirmostafaee@umz.ac.ir
2
Amini and MirMostafaee
deal with upper record values for a predictive inference. Similar results can be obtained for the case of lower record values. The inter-record time statistic, defined as
As = Ts+1 - Ts, s > 1,
is the number of observations between sth and (s + 1)th record values. For more details we refer the reader to Arnold et al. (1998). Record data arise in a wide variety of practical situations including industrial stress testing, finance, meteorological analysis, hydrology, seismology, sporting and athletic events, and mining surveys.
The problem of predicting future observations has been extensively studied in the literature and several parametric and non-parametric procedures are developed for prediction. In many practical data-analytic situations, one is interested in constructing a prediction interval on the basis of available observations. There are situations in which the available observations and the predictable future observation are of the same type. The prediction of future records on the basis of observed records from the same distribution and prediction of order statistics based on order statistics are studied, among others, by Dunsmore (1983), Nagaraja (1984), Chou (1988), Awad and Raqab (2000), Raqab and Balakrishnan (2008) and the references therein.
Recently, Ahmadi and Balakrishnan (2010), Ahmadi and MirMostafaee (2009), Ah-madi et al. (2010) and MirMostafaee and Ahmadi (2011), discussed the prediction of future records from a Y-sequence based on the order statistics observed from an independent X-sequence, and vice versa.
In predicting future order statistics on the basis of observed record statistics, sometimes the available observations also include inter-record times which can be utilized as additional information to improve the predictive inference. In other words, when both record values and the inter-record times are available, it would be nice to employ the information included in both records and record times. Feuerverger and Hall (1998) emphasized that "However, the record times and record values jointly contain considerably more information about F than the record values alone." Actually, applying the additional information about record times is not a new subject and several authors focused on inference based on both record values and record times, see for example Samaniego and Whitaker (1986), Lin et al. (2003), Doostparast (2009), Doostparast and Balakrishnan (2013), Kizilaslan and Nadar (2014) and MirMostafaee et al. (2016).
In this paper, a two parameter exponential distribution, Exp(p, a), with pdf
f (x; a) = 1 e-{x-^)/a, x > ^ G R, a > 0,	(1.1)
a
is considered as the underlying distribution. We write Z ~ Exp(p, a) if the pdf of Z can be expressed as (1.1). Note that ^ and a are the location and scale parameters, respectively. Throughout this paper we assume that both parameters, ^ and a, are unknown.
Now, suppose that Y1, ■ ■ ■ ,Ym constitute a future random sample from a two parameter exponential distribution, i.e. Yi, ■ ■ ■ , Ym ~ Exp(^, a) and Yi:m < ■ ■ ■ < Yj:m are the corresponding order statistics of this sample. In addition, Ym = m-1 Y:m denotes the mean of this future sample. If Y1, ■ ■ ■ ,Ym denote the times to failure of m independent units in a lifetime test, then Ym can be interpreted as the mean time on test of these failed units. We assume that the available data include the observed upper record
Interval Prediction of Order Statistics...
3
values, U1, ■ ■ ■ , Un, given the inter-record times, (Ai,..., An-1). We emphasize that these record values are assumed to be extracted from a sequence of iid random variables {Xj,j = 1,2, ■ ■ ■ } where Xj ~ Exp(p, a) for j = 1,2, ■ ■ ■. Moreover, the sequence {Xj, j = 1, 2, ■ ■ ■ } and the sample {Yi, i = 1, ■ ■ ■ , m} are statistically independent. Note that n is the number of the observed record values and depends on the experiment, however, m is the sample size of the future observations and it can be considered arbitrary. In addition, n and m are unrelated. The problem of interest is to obtain conditional prediction intervals for jth future order statistic, Yj:m, as well as for the mean, Ym, in a future sample on the basis of the available data. We compare our conditional prediction intervals with the unconditional ones proposed by Ahmadi and MirMostafaee (2009) and observe an improvement over the predictive inference without inter-record times. Therefore, we consider two cases: (a) The informative data contain only the upper record values, (b) The informative data contain the upper record values and the inter-record times, and then we observe that case (b) has some predictive inferential improvement in comparison with case (a).
The rest of the paper is organized as follows. Some general preliminaries are presented in Section 2. Conditional prediction intervals for the future jth order statistic, Yj:m, and the mean of the future sample, Ym, based on record values of given inter-record times for the two parameter exponential distribution are studied in Sections 3 and 4. An illustrative example and some concluding remarks are involved in Sections 5 and 6. The R codes for computing some results of the paper are given in the appendix.
2 Preliminaries
In this section, we present some general preliminary results used in future sections. Given upper record values u1,..., un-1, which are observed and extracted from the sequence {Xj; j > 1}, inter-record times A1,..., An-1 are independent geometrically distributed random variables with success probabilities F(ui), i = 1,..., n — 1. Furthermore, the record values U1,... ,Un form a Markov Chain with adjacent transition pdf equal to the left truncated pdf of the underlying distribution, see Arnold et al. (1998). Thus, the joint distribution of Un = (U1,..., Un) and An = (A1,..., An-1) is
fUnAn (Un, Sn) = J] f (Ui)[F (Ui)]'1-1/K),	(2.1)
i=1
where un = (u1,..., un) G Xn, in which X is the support of X and Sn = (51,..., 5n-1) G Nn-1, see Samaniego and Whitaker (1986) and Arnold et al. (1998) page 169. We emphasize that An contains n — 1 positive integer-valued discrete random variables and 8n is the observed vector of An. By integrating (2.1) with respect to (w.r.t.) u1,...,un, we can easily prove the following result.
Lemma 1 The joint probability mass function of A1,..., An-1 is
n-1
Pa„ (¿n) = Pr(An = Sn) = ^ cj (n' 5n)[(a1 (n, j, Sn) + 1)(a1(n, j, Sn) + an(n,j, Sn) + 2)]--
j=1
4
Amini and MirMostafaee
where
"j-2 / n-ji-1 \ n-j-2 / n-j N
cj(n, ¿„) = (-1)n-j-1 u E * H E ^
jl=0 \t=n-j + 1 / j2=0 Vi=j2 + 2 ./ n-j	n-1
ai(n,j, 6n) = E ^ - 1' an(n,j, ¿n) = E
t=1	t=n-j+1
in which we assume for a > t=a = 0 and J}b=a ¿t = 1.
In this paper, we need the conditional distribution of U1 and Un given by An = ¿n as follows.
Lemma 2 The conditional pdfof U1 and Un given An = ¿n is
n-1
1
fuuun\ A„ (ui,un| ¿n) = [Pa„ (¿n)]-1 ^ Cj (n, ¿„)[F(ui)]ai	)/ (u„),
j=1
where Cj(n, ¿n), a1(n,j, 8n), an(n,j, ¿n) and Pa„ (¿n) are as in Lemma 1.
The proof of Lemma 2 is straightforward by integrating (2.1) w.r.t. u2,... ,un-1 and dividing the obtained equation by PA„ (¿n).
3 Conditional prediction intervals for order statistics
In this section, the goal is to find a conditional prediction interval for Yj:m when the observed U1,... ,Un are available given An = ¿n for the two parameter exponential distribution.
To this end, we consider the pivotal quantity
w =	(31)
Note that the pivotal quantity Wj is the same as the one considered by Ahmadi and MirMostafaee (2009). This quantity is location and scale invariant namely it is free of both unknown parameters i.e. the location parameter / and the scale parameter a. It is also a simple function of both observed and future statistics, so that the future statistic can be derived from it easily. Ahmadi and MirMostafaee (2009) found the unconditional distribution of Wj while we present the conditional distribution of Wj given An = ¿n, (i.e. the inter-record times are assumed to be known and fixed) in the following theorem.
Theorem 1 The conditional cdfof Wj in (3.1) given An = ¿n is for w > 0
m n-1 l ai(n,ji, Sn) a„(n,ji, Sn)	Sn)^ ^a„(n,ji,	l ^
Fw3\An(w^n) = EEE E E 1 lyj,^) ~
l=j ji = 1 j2=0 j3=0	J4 = 0	n'
xcji (n, ¿n)[(j2 + m - l + j3 + j4 + 2)((j2 + m - l)w + j + 1)]-1,
Interval Prediction of Order Statistics...
5
and for w < 0
FWj |A„ M^ = ^^^ I]	I] l (— 1) j +:3+:4 p/% ) j2
l=jjl = 1 j2=0 33 = 0	34 = 0	AnV nJ
xcji (n, ¿„)[(j2 + m - l + j3 + j4 + 2)(j4 + 1 - w(j3 + j4 + 2))] —1,
where a1(n,j1, 6n) and an(n,j1, 6n) are defined in Lemma 1 and PAn (5n) is the joint mass function of A1,..., An—1 which is also given in Lemma 1.
Proof: Letting J*^ = (Un - U1)/a, U* = (U1 - ß)/a and Yj*m = (Yj:m - ß)/a, we may write
i'^O i'^O
FWj | An (wl^n) = J j0 %m (vw + U)fu*,J*A | An (u, v|^n) du dv	(3-2)
00
For t > 0, we have
i=j
Also, from Lemma 2, we obtain
m
(t) = £ (7) (1 - e—')l
"')::- e—<)le—<m—l>'.	(3.3)
n-1
/u.,j:i|A„(u,v| ¿n) = [Pa„(¿n)]-^ Cj(n, ¿n)[1 —e-u]ai (n'3',")[1 —e-("+v)]a"(n35n)e-(2u+v).
3=1
(3.4)
Hence, by substituting (3.4) and (3.3) in (3.2) and using the binomial expansions, we have for w > 0,
m n-1 l ai(n,ji, Sn) a„(n,ji, Sn) /m\/ai(n,ji, ¿„)\/a„(n,ji, S„)\f l\ (n s )
F	(w| A )	V	V	{l){	33	A	J4	3Cji (n 0n)
| An	^	¿^ -(-1)32+33+34 Pa (S )-
l=jji=132=0 33=0	34=0	V '	AnV n7
x	/ e-(3'2+m-l+33+34 + 2)«e-((32+m-l)w+34+1)u d^ dV
00
and therefore we naturally arrive at the desired expression. Similarly, we may attain the expression for FWj| An(w|Sn) when w < 0 after substituting (3.4) and (3.3) in (3.2) by noting that the integral w.r.t. u must be taken from — vw to to.	□
Let wY(n, m, j; Sn) be the 7th conditional quantile of W3 given An = Sn, i.e.
Pr(Wj < w7(n,m,j; Sn)| An = Sn) = Y-
To find 100(1 — a)% two-sided conditional prediction intervals for l}:m based on record values given An = Sn, we have to find the conditional quantiles wai (n,m,j; Sn) and w1-a2 (n, m, j; Sn), for a1 + a2 = a, 0 < a < 1, i = 1, 2, numerically.
Now, a 100(1 — a)% conditional prediction interval for Y}:m based on record values given An = Sn, is given by
(U1 + wai(n,m,j; Sn)(Un — U1), U1 + w1-Q2(n,m,j; Sn)(Un — U1)). (3.5)
6
Amini and MirMostafaee
Table 1: The values of wo.025(3, m, j), wo.975(3, m,j), wo.975(3, m, j) - wo.o25(3,m, j),
wo.025(3, m, j; Sn), wo.975(3, m, j; Sn), wo.975(3, m, j; Sn) - wo.o25(3, m, j; Sn), for m = 10,20, j = 5,7,10 (for m = 10), j = 12,17,20 (for m = 20) and different values of
Sn-
	m		10			20	
	j	5	7	10	12	17	20
Unconditional	W0.025	-3.671	-2.814	-0.907	-3.140	-1.760	-0.380
	W0.975	1.278	2.635	9.761	1.827	4.767	12.186
	w0.975 — w0.025	4.949	5.449	10.668	4.967	6.527	12.566
Sn = (1, 2)	W0.025	-1.097	-0.500	0.249	-0.651	0.055	0.464
Pa„ (Sn) = 0.0833	W0.975	1.766	3.502	11.868	2.459	6.108	14.586
	w0.975 — w0.025	2.863	4.002	11.619	3.110	6.053	14.122
Sn = (1, 3)	W0.025	-1.288	-0.652	0.201	-0.827	-0.025	0.420
Pa„ (Sn) =0.05	W0.975	1.290	2.627	9.481	1.786	4.675	11.690
	W0.975 — W0.025	2.578	3.279	9.280	2.613	4.700	11.270
Sn = (1, 4)	W0.025	-1.427	-0.774	0.160	-0.965	-0.098	0.386
Pa„ (Sn) = 0.0333	W0.975	1.022	2.106	7.984	1.398	3.793	9.872
	W0.975 — W0.025	2.449	2.880	7.824	2.363	3.891	9.486
Sn = (2, 3)	W0.025	-2.181	-1.267	0.045	-1.538	-0.320	0.324
Pa„ (Sn) =0.0167	W0.975	1.027	2.413	10.212	1.502	4.669	12.787
	W0.975 — W0.025	3.208	3.680	10.167	3.040	4.989	12.463
Sn = (2,4)	W0.025	-2.330	-1.409	-0.008	-1.697	-0.415	0.289
Pa„ (Sn) =0.0119	W0.975	0.823	1.976	8.880	1.193	3.896	11.163
	W0.975 — W0.025	3.153	3.385	8.888	2.890	4.311	10.874
Conditionally on Sn, we get more information about the unknown parameters ^ and a, or generally more information about F, which leads to better prediction intervals for Yj:m. It is noted that conditioning on inter-record times does not decrease the length of the prediction interval necessarily and increase or decrease in the location and scale of the interval depend on the values of Sn. For the purpose of illustration, consider the conditional quantiles of Wj, which are computed and tabulated in Table 1, for a = 0.05, n = 3, m = 10, 20, j = 5,7,10 (m = 10), j = 12,17,20 (m = 20) and some values of Sn. The values of unconditional quantiles of Wj in Table 1 are taken from Ahmadi and MirMostafaee (2009), Tables 3 and 4. By comparing the entries of Table 1, one can observe that for a few cases, the conditional prediction intervals have bigger lengths, especially when we predict the biggest future order statistic, i.e. Ym:m. But note that in the most cases the conditional intervals are shorter than the unconditional ones for different values of Sn, so we may conclude that generally the conditional prediction approach leads to shorter (and hence better) prediction intervals in average for different values of Sn and this can be considered as an improvement.
Interval Prediction of Order Statistics...
7
4 Conditional Prediction Intervals for the mean of future sample
The problem of constructing a conditional prediction interval for Ym on the basis of observed U1,..., Un, given An = Sn, using the pivotal quantity
V™ = I-! •	(41)
is considered for the two parameter exponential distribution in this section. Note that the pivotal quantity Vm has been also considered by Ahmadi and MirMostafaee (2009) and its unconditional distribution has been obtained by them. Moreover, Vm is also location and scale invariant and therefore is free of the unknown location and scale parameters. The following theorem presents the conditional distribution function of Vm given An = 8n.
Theorem 2 The conditional distribution function of Vm in (4.1) given An = 6n is
m-1 n-1 l ai(n,ji, Sn) a„(n,ji, Sn) (ai(n,ji, ¿„)\(a„(n,ji,	l
FV_|A.(xiin) = 1 -E jTjr	1 (jj.)«
l=0 ji = 1 j2=0 js=0	j4=0	V '	AnV n>
^ Cji (n , ¿n)xj2mlr(l - j + 1)r(j2 + 1)
X (m + j3 + j4 + 2)l-j2+1(mx + j4 + 1)j2+1'
for x > 0, and
n-1 al(n,jl, Sn) a„(n,ji, S„) ( 1)js+j4 (ai(n,ji, S„)\(a„(n,ji, S„)\ (n r )
(*) = EE E (-1) ( jS K j4 K (n' 'n)
^ j=0 j4=0 Pa„ (¿n)(2+ j3 + j4)[j4 + 1 - (2+ j3 + j4)x]
m-1 n-1 l ai(n,ji, Sn) a„(n,ji, Sn) l—j2 (ai(n,ji, Sn)\ (a„(n,ji, 6„)W l
- EEE E E EVjS j4
(-1)53+54+55 PA (¿n)/l
1=0 ji=1 j2=0 j3=0	j4=0 j5=0 V 7	AnV n'
x Cj (n, ¿n)xj2+j5m'r(l - j2 + 1)r(j2 + j5 + 1)
X j5l(m + j3 + j4 + 2)1-j2-j5 + 1 [j4 + 1 - (j3 + j4 + 2)x]j2+j5 + 1 ' for x < 0, where a1(n, j1, 6n) and an(n, j1, 6n) are given in Lemma 1
Proof: Let = (Un - Ux)/o, UJ" = (U1 - v)/a and F' = (F™ - Note that
Fym\ A„ (x|^n)^ / (vx + u)/ur,j;il A„ CM^ du dV (4.2)
00
where fuj 11 An (u, v| ¿n) is given in (3.4). Since mF' ~ r(m, 1), that is for t > 0
FYm (t) = 1 - 'I: ^	(4.3)
1=0
8
Amini and MirMostafaee
so by substituting (3.4) and (4.3) in (4.2) and using the binomial expansions, we get for
x < 0
n-1 ai(n,ji, Sn) a„(n,ji, Sn) fa1(n,j1, S„)\fa„(n,ji, S„)\ c (n s )
rp	, ,J5 n _ V^	V^	V^	^	j3	A	j4	/tj1 °n>
FVm | A„ (x|Sn)~	^
j1 = 1 j3 = 0	j4=0	n
/»TO /»to
x / / e-(j3+j4+2)"e-(j4+1)v du dv
./0 J-vx
m-1 n-1 l ai(n,ji, Sn) a„(n,ji, 5„) /ai(n,ji, 6„)\/a„(n,ji, 5„)w l \
- ^ ^ ^ X/ X/ j3 j4
(-1)j3+j4 PA (¿Jll 1=0 ji = 1 j2=0 j3 = 0	j4 = 0	7	'
/»TO /»TO
j (n, ¿„)xj2m1 / / e-(m+j3+j4+2)"e-(mx+j4+1)vu1-j2 vj2 du dv
./0 J-vx
n-1 ai(n,j'i, 5„) an(n,ji, Sn) ( —1)j3+j4^ ai (n,ji, |"a„(n,ji, Cj (n £ £ £ j3 j4 ji ' n
ji = 1 j3=0 j4=0 PAn (¿n)(2+ 33 + j4)[j4 + 1 - (2+ 33 + 34)*]
m-1 n-1 I ai(n,ji, Sn) a„(n,ji, 5„) Ij /ai(n,j'i, ¿„)\^a„(n,ji, 6n)\/ 1 \
__ ^ ^ ^	^	^	V^ ^	j3	A	j4	AjV
(
1=0 ji = 1 j2=0 j3=0	j4=0 j5=0
(-1)j3+j4+j5 PAn (¿n)/l
x Cji (n, ¿n)xj2+j5mly(l - 32 + 1) fTO e-(j4+1-(j3+j4+2)x)vvj2+j5 dv
j5l(m + 33 + 34 +2)1-j2-j5 + 1 J0
and therefore we naturally attain the desired result. Similarly, we may deduce the desired expression for FVm | A„ (x|Sn) when x > 0.	□
To find conditional prediction interval for Ym based on records given An = Sn, we have to find the conditional quantiles of Vm given An = Sn, vai (n, m; Sn) and
v1-a2 (n, m; Sn), for a1 + a2 = a, 0 < a < 1, i = 1, 2, numerically, where
Pr(Vm < v7(n, m; Sn) | An = Sn) = YA 100(1 — a)% conditional prediction interval for Ym based on record values given An = Sn then is
(U + Vai (n, m; Sn)(Un — U1),U1 + V1-a2(n, m; Sn)(Un — U1)). (4.4) An illustrative example has been presented in Section 5.
5 An illustrative example
x
In this section, we illustrate the proposed procedures by considering a real data set. A rock crushing machine has to be reset if, at any operation, the size of rock being crushed
Interval Prediction of Order Statistics...
9
Table 2: 95% CPIs and UPIs for Yi2:20, Y>0:20 and Y20 for Example 1.
CPI
UPI
Yi2:20 (0,24.17836)
(0, 54.061745)
Y20:20 (13.290315,183.67385) (0,307.85602)
Y20
(0,26.233175) (0,61.32183)
is larger than any that has been crushed before. The following data given by Dunsmore (1983) are the sizes dealt with up to the third time that the machine has been reset:
9.3, 0.6, 24.4, 18.1, 6.6, 9.0, 14.3, 6.6, 13.0, 2.4, 5.6, 33.8. The record values were the sizes at the operation when resetting was necessary. Dunsmore (1983) assumed that these data follow an Exp(0, a) distribution. Clearly, we have
Consider a future sample of size m = 20. We want to find equi-tailed 95% conditional prediction intervals (CPIs) for Yi2:2o, Y20:20 and Y20 using (3.5) and (4.4) and compare these intervals with unconditional ones (UPIs). The results are given in Table 2. Note that some lower bounds have got negative values, which were replaced by zero. We can see that the conditional prediction intervals are shorter than the corresponding unconditional ones.
6 Concluding remarks
In this paper, we found prediction intervals for the future order statistics based on record values, given record time statistics, when the underlying distribution is two parameter exponential. These intervals have the advantage of utilizing more information embedded in the observed sequence in comparison with their corresponding unconditional ones obtained by Ahmadi and MirMostafaee (2009). These ideas can be extended to the non-parametric and the Bayesian context. The conditional point predictors are also of interest. Work on these problems is currently under process and we hope to report these findings in future papers.
Ui = 9.3, U2 = 24.4, U3 = 33.8,
T1 = 1, T2 = 3, T3 = 12,
Ai = 2, and A2 = 9.
Acknowledgement
We are very grateful to the respected editor and the respected referees for their insightful comments and suggestions which have led to this improved version.
10
Amini and MirMostafaee
References
[1]	Ahmadi, J. and Balakrishnan, N. (2010): Prediction of order statistics and record values from two independent sequences, Statistics, 44, 417-430.
[2]	Ahmadi, J. and MirMostafaee, S.M.T.K. (2009): Prediction intervals for future records and order statistics coming from two parameter exponential distribution, Statistics and Probability Letters, 79, 977-983.
[3]	Ahmadi, J. and MirMostafaee, S.M.T.K. and Balakrishnan, N. (2010): Nonparamet-ric prediction intervals for future record intervals based on order statistics, Statistics and Probability Letters, 80, 1663-1672.
[4]	Arnold, B.C., Balakrishnan, N., and Nagaraja, H.N. (1998): Records, John Wiley & Sons, New York.
[5]	Awad, A.M. and Raqab, M.Z. (2000): Prediction intervals for the future record values from exponential distribution: comparative study. Journal of Statistical Computation and Simulation, 65, 325-340.
[6]	Chou, Youn-Min. (1988): One-sided simultaneous prediction intervals for the order statistics of l future samples from an exponential distribution. Communications in Statististics-Theory and Methods, 17, 3995-4003.
[7]	David, H.A. and Nagaraja, H.N. (2003): Order Statistics, Third edition, John Wiley & Sons, New York.
[8]	Dunsmore, I.R. (1983): The future occurrence of records. Annals of the Institute of Statistical Mathematics, 35, 276-277.
[9]	Doostparast, M. (2009): A note on estimation based on record data. Metrika, 69, 69-80.
[10]	Doostparast, M. and Balakrishnan, N. (2013): Pareto analysis based on records. Statistics, 47, 1075-1089.
[11]	Feuerverger, A. and Hall, P. (1998): On statistical inference based on record values. Extremes, 1, 169-190.
[12]	Kizilaslan, F. and Nadar, M. (2015): Estimation with the generalized exponential distribution based on record values and inter-record times. Journal of Statistical Computation and Simulation, 85, 978-999.
[13]	Lin, C.T., Wu, S.J.S and Balakrishnan, N. (2003): Parameter estimation for the linear hazard rate distribution based on records and inter-record times. Communications in Statististics-Theory and Methods, 32, 729-748.
[14]	MirMostafaee, S.M.T.K. and Ahmadi, J. (2011): Point prediction of future order statistics from exponential distribution, Statistics and Probability Letters, 81, 360370.
Interval Prediction of Order Statistics...
11
[15]	MirMostafaee, S.M.T.K., Amini, M. and Balakrishnan, N. (2016): Exact nonpara-metric conditional inference based on k-records, given inter k-record times. Journal of the Korean Statistical Society, Accepted.
[16]	Nagaraja, H.N. (1984): Asymptotic linear prediction of extreme order statistics. Annals of the Institute of Statistical Mathematics, 36, 289-299.
[17]	Raqab, M.Z. and Balakrishnan, N. (2008): Prediction intervals for future records. Statistics and Probability Letters, 78, 1955-1963.
[18]	Samaniego, F.J. and Whitaker, L.R. (1986): On estimating population characteristics from record-breaking observations. I. parametric results. Naval Research Logistics Quarterly, 33, 531-543.
Appendix
Here, we present the R codes for computing the conditional cumulative distribution functions of Wj, (see Theorem 1) and Vm (see Theorem 2). R functions for computing the unconditional cumulative distribution functions of Wj and Vm (see Ahmadi and MirMostafaee, 2009) are also given.
ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
oooooooooooooooooo cjn function oooooooooooooooooooooooooo ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
cjn=function(n,j,delta){
z=(-ir(n-j-1)
z1=n-j+1
z2=j-2
z4=n-j-2
z5=n-j
s=1
if(z2>=0 & z1>=0){ for(j1 in 0:z2){ z3=n-j1-1
ss=ifelse(z3>=z1,sum(delta[z1:z3]),0)
s=s*ss }}
t=1
if(z4>=0){ for(j2 in 0:z4){ z6=j2+2
tt=ifelse(z5>=z6,sum(delta[z6:z5]),0)
t=t*tt }}
return(z/t/s)
12
Amini and MirMostafaee
}
ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
M:n c- c- -r\ v-/--\"K -n"K-i~l-i-l-T7 r\-F n^ "I -I- -n 9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-ooooooooo Mass piUbablliiy U! Delta ooooooooooooooooooooooo
ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
pdelta=functiun(n,delta){
n1=n-1
pdel=0
foi(jj in 1:n1){ nj=n-jj
nj1=n-jj+1 A=cjn(n,jj,delta)
a1=ifelse(nj>=1,sum(delta[1:nj]),0)-1 an=ifelse(n1>=nj1,sum(delta[nj1:n1]),0) C=(a1 + 1)* (a1+an+2)
pdel=pdel+A/C }
ietuin(pdel) }
9-9-9-9-9-9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'^ ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
9-9-9-9-9-9-9-9-9-	r*. -F TaT	9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-
ooooooooo CUnditiUnal cdf Uf W	ooooooooooooooooooooooo
9-9-9-9-9-9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9-ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
Fw=function(n,j,m,w,delta){
n1=n-1
pw=0
foi(l in j:m){ foi(j1 in 1:n1){ for(j2 in 0:l){ nj1=n-j1+1 nj=n-j1
a1=ifelse(nj>=1,sum(delta[1:nj]),0)-1 an=ifelse(n1>=nj1,sum(delta[nj1:n1]),0) foi(j3 in 0:a1){ for(j4 in 0:an){
A=chouse(m,l)*chouse(a1,j3)*chouse(an,j4)*chouse(l,j2)
*((-1)"(j2+j3+j4))*cjn(n,j1,delta)/pdelta(n,delta)
B=j2+m-l+j3+j4+2
if(w<0) C=B*(j4+1-w*(j3+j4+2))
if(w>=0) C=B*(w*(j2+m-l)+j4+1)
pw=pw+A/C }}}}}
Interval Prediction of Order Statistics...
13
return(pw) }
ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
2-2-2-2-2-2-2-2-2-	\T	2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-
ooooooooo LUnUitlUndl CU! of V	ooooooooooooooooooooooo
2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2^
ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
Fv=functiun(n,m,v,Ueltd){
pv=0
n1=n-1
m1=m-1
if(v>=0){
fur(l in 0:m1){
fur(j1 in 1:n1){
for(j2 in 0:l){
nj1=n-j1+1
nj=n-j1
d1=ifelse(nj>=1,sum(Ueltd[1:nj]),0)-1 dn=ifelse(n1>=nj1,sum(Ueltd[nj1:n1]),0) fur(j3 in 0:a1){ for(j4 in 0:an){
A=chuuse(d1,j3)*chuuse(an,j4)*choose(l,j2)/factorial(l) /pUelta(n,Ueltd)*((-1)~(j3 + j4))
B=cjn(n,j1,Uelta)*(v"j2)*(m"l)*gammd(l-j2+1)*gammd(j2+1) /((m+j3+j4+2)~(l-j2+1))/((m*v+j4+1)~(j2+1))
pv=pv+A*B }}}}}}
if(v>=0) pv=1-pv
pv1=0
pv2 = 0
if(v<0){
fur(j1 in 1:n1){
nj1=n-j1+1
nj=n-j1
d1=ifelse(nj>=1,sum(Ueltd[1:nj]),0)-1 an=ifelse(n1>=nj1,sum(Ueltd[nj1:n1]),0) fur(j3 in 0:a1){ fur(j4 in 0:an){
A=((-1)"(j3+j4))*choose(a1,j3)*chuuse(an,j4)*cjn(n,j1,Ueltd) /pUelta(n,Ueltd)/(2+j3+j4)/(j4+1-v*(2+j3+j4))
pv1=pv1+A }}}
fur(l in 0:m1){ fur(j1 in 1:n1){ fur(j2 in 0:l){
14
Amini and MirMostafaee
nj1=n-j1+1 nj=n-j1
a1=ifelse(nj>=1,sum(delta[1:nj]),0)-1 an=ifelse(n1>=nj1,sum(delta[nj1:n1]),0) for(j3 in 0:a1){ for(j4 in 0:an){ lj2=l-j2
for(j5 in 0:lj2){
A=choose(a1,j3)*choose(an,j4)*choose(l,j2)/factorial(l) /pdelta(n,delta)*((-1)~(j3+j4+j5)) B=cjn(n,j1,delta)*(v~(j2 + j5))*(m~l)*gamma(l-j2+1) *gamma(j2+j5+1)/factorial(j5)/((m+j3+j4+2)"(l-j2-j5+1)) /((j4+1-v*(j3+j4+2))"(j2+j5+1))
pv2=pv2+A*B }}}}}}
pv=pv1-pv2 }
return(pv) }
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%% unconditional cdf of W %%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
FwU=function(n,j,m,w){ pw=0
if(w<0) pw=(m-j + 1) * ((1-w)"(1-n))/(m+1) if(w>=0){ ss=0 j1=j-1
for(i in 0:j1){
ss=ss+choose(j1,i)*((-1)"i)*((1+w*(m-j+i+1))"(1-n))
/(m-j+i+1)/(m-j+i+2) }
pw=1-j*choose(m,j)*ss }
return(pw) }
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%% unconditional cdf of V %%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
FvU=function(n,m,v){ pv=0
if(v<0) pv=((1-v)~(1-n))/((1 + 1/m)~m)
if(v>=0){
Interval Prediction of Order Statistics...
15
m1=m-1
s1=0
s2=0
for(i in 0:m1){ nn=n+i-2
s1=s1+choose(nn,i)*((1-1/(m*v+1))~i)*((1/(m*v+1))~(n-1)) * ((m/(m+1))"(m-i))
s2=s2+choose(nn,i)*((1-1/(m*v+1))~i)*((1/(m*v+1))~(n-1)) }
pv=s1+1-s2 }
return(pv) }
Metodoloski zvezki, Vol. 13, No. 1, 17-25
Series of Incomplete Row-Column Designs with
Two Units per Cell
Anindita Datta1, Seema Jaggi1, Cini Varghese1 and Eldho Varghese1
Abstract
Here, two series of incomplete row-column designs with two units per cell have been developed that are structurally complete, i.e. all the cells corresponding to the intersection of row and column receive two distinct treatments. Properties of these classes of designs have been studied and the methods result in designs in which the elementary contrasts of treatment effects are estimated with same variance.
1. Introduction
Row-column designs are used for controlling heterogeneity in the experimental material in two directions. Most of the row-column designs developed in the literature have one unit corresponding to the intersection of row and column. Row-column designs with more than one unit per cell are used when the number of treatments is substantially large with limited number of replicates. For example, (Bailey and Monod, 2001), to conduct an experiment for comparing 4 treatments using 4 plants with leaves at 2 different heights, the following row-column design with complete rows and columns having two units per cell can be used:
	Plants			
Leaf Height	1 2	2 3	3 4	4 1
	3 4	4 1	1 2	2 3
These designs are termed as semi-Latin squares in the literature. An (n x n)/k semiLatin square is an arrangement of nk symbols (treatments) in an (n x n) square array such that
1 ICAR-Indian Agricultural Statistics Research Institute, Library Avenue, New Delhi-110 012. India. Tel.: +9111-25847284, E-mail address: seema.iasri@outlook.com, seema@iasri.res.in
18
Datta et al.
each row-column intersection contains k symbols and each symbol occurs once in each row and each column. Trojan squares are a special class of semi-Latin squares based on sets of mutually orthogonal superimposed Latin squares and have been shown to be maximally efficient for pair-wise treatment comparisons in the plots-within-blocks stratum (Bailey, 1992). Following is an example of a Trojan square design of size (4 x 4)/2 for 8 treatments constructed by superimposing two mutually orthogonal Latin squares of size 4, one with 1, 2, 3 and 4 treatments and the other with 5, 6, 7 and 8 treatments.
	Columns			
Rows	1 5	2 6	3 7	4 8
	2 7	1 8	4 5	3 6
	3 8	4 7	1 6	2 5
	4 6	3 5	2 8	1 7
This arrangement could be extended for 12 treatments of size (4 x 4)/3 by superimposing the third orthogonal Latin square of size 4 but no further Trojan extension is possible, there being only three mutually orthogonal Latin squares of size 4.
Complete Trojan squares of size (n x n)/k have n2 blocks of size k and require n replicates of nk treatments. Sometimes, design or cost constraints make complete Trojan squares impossible and then Incomplete Trojan squares of size [(n-1) x n]/ k or of size [n x (n-1)]/ k can be useful. Such incomplete Trojan squares can be constructed by omitting any complete row or any complete column from any Trojan design of size (n x n)/k. Trojan squares were first discussed by Harshbarger and Davis (1952) but then it was named as Latinized Near Balanced Rectangular Lattices having k = n-1. Later, Darby and Gilbert (1958) discussed the general case for k < n and introduced the name Trojan square designs where k > 2. However, all designs of the Latinized Rectangular Lattice type are now commonly described as Trojan squares for any 1 < k < n. Williams (1986) generalized the notion and called semi-Latin squares as Latinized incomplete-block designs. Andersen and Hilton (1980) called semi-Latin squares as (1, 1, k) Latin rectangles.
Series of incomplete row-column designs with two units per cell
19
Preece and Freeman (1983) discussed the combinatorial properties of semi-Latin squares and related designs. Bailey (1988) discussed further construction for a range of semi-Latin and Trojan square designs. Bailey (1992) gave methods of constructing a range of semi-Latin and Trojan square designs, studied their efficiencies and showed that the Trojan squares are the optimal choice of semi-Latin squares for pair-wise comparisons of treatment means. These are particularly suitable for crop research experiments either in field or in the glasshouse. Trojan squares are normally the best choice of semi-Latin squares for crop research (Edmondson, 1998). Bedford and Whitaker (2001) have given several methods of construction of semi-Latin squares.
Dharmalingam (2002) gave an application of Trojan square designs and used it to obtain partial triallel crosses. Edmondson (2002) constructed generalized incomplete Trojan square designs, denoted by (m x n)/k where m denotes the number of replicates of nk treatments, based on a set of k cyclic generators.
There existed three optimal (4 x 4)/4 semi-Latin squares (Bailey and Chigbu, 1997) for sixteen treatments in blocks of size four. Since these squares do not have the same concurrences, there was a need for distinguishing one square from the others and determining the most preferred square in a given context. Chigbu (2003) obtained the best of the three optimal (4 x 4)/4 semi-Latin squares by finding and comparing the variances of elementary contrasts of treatments for the squares.
Jaggi et al. (2010) defined Generalized Incomplete Trojan-Type Designs and developed method of constructing these designs. Varghese and Jaggi (2011) obtained generalized row-column designs and showed their application in obtaining mating plans. Datta et al. (2014) obtained some methods of constructing row-column designs with multiple units per cell that are structurally incomplete i.e. corresponding to the intersection of any row and column, there is at least one cell which does not contain any treatment. Datta et al. (2015) developed row-column designs with multiple units per cell with equal/ unequal cell sizes. Jaggi et al. (2016) obtained another series of generalized incomplete Trojan-type designs for number of treatments v= sm+1.
Most of the methods available in the literature are for complete rows and complete columns. Here, two methods of constructing row-column designs with two units per cell in incomplete rows or columns are obtained that are balanced for estimating elementary contrasts of treatment effects.
20
Datta et al.
2. Experimental Setup and Information Matrix
We consider a row-column design with v treatments arranged in m rows, n columns and in each row-column intersection (i.e. cells) there are k units or plots resulting in total mnk experimental units or observations. The following four-way classified model with treatments, rows, columns and cells as the four classifications, is considered:
Y= X101 + X202 + e,
where
Xi = [ A' ] , X2 = [1 d; d; d; ]',
01 = (t) is the vector of parameters of interest and 02 = (^ P y n) is the vector of nuisance parameters. Y is a mnk x 1 vector of observations, p is the grand mean, 1 is the mnk x 1 vector of ones, A' is mnk x v matrix of observations versus treatments, t is a v x 1 vector of treatment effects, d' is mnk x m matrix of observations versus rows, P is m x 1 vector of row effects, D2 is mnk x n matrix of observations versus columns, y is n x 1 vector of column effects, d; is mnk x mn matrix of observations versus cells, n is mn x 1 vector of cell effects and e is mnk x 1 vector of random errors with E(e) = 0 and D(e) = o2 I.
The information matrix of row-column design with multiple units per cell for treatment effects is obtained as
C = Rx - {( N1K11N;+N2K21N;+N3K31N;+N^N+N2K 22N; )
+ (N3K 23n;+^^3+N2K 23N;+N3K33N;)}	...(2.1)
where
Kn= Kp- + [KpXIMK 22M;+M2K ^MJ +MK 23M+M2K33M2 )Kp- ] K12 =-Kp-( MK + M2K 23 )
K^ = Kp- (MjK23 + M2K33 )
K22 = Ky- + Ky-M3 (Kn -M3Ky-M3 )- M3Ky-
K 23 =-K y-M3 (K n - M3K y-M3 )-
K33 = (Kn - M3K Y - M3 )
Series of incomplete row-column designs with two units per cell
21
Rt is the diagonal matrix of replication of treatments, Kp is the diagonal matrix of row-sizes, Ky is the diagonal matrix of column-sizes and Kr, is the diagonal matrix of cell-sizes. N1 is the incidence matrix of treatments versus rows, N2 is the incidence matrix of treatments versus columns, N3 is the incidence matrix of treatments versus cells, M1 is the incidence matrix of rows versus columns, M2 is the incidence matrix of rows versus cells and M3 is the incidence matrix of columns versus cells.
The v x v matrix C is symmetric, non-negative definite with zero row and column sums.
3. Methods of Construction
We present here two methods of constructing row-column designs with two units per cell. The first method is for odd number of treatments and the second is for even number.
Method 3.1: For v = 2t + 1 (t > 1), obtain the following initial column having two units per cell:
1	2t + 1
2	2t
3	2t - 1
	
	
	
t	2t - (t - 2)
Develop 2t more columns horizontally from the initial column by adding 1,2,...,2t consecutively reducing mod (2t+1). The resulting design is a row-column design with two units per cell and with m = t rows of size 2(2t+1), n = (2t+1) columns of size 2t, k = 2 and r = 2t replications. The design is complete row-wise and column-wise it is incomplete. The information matrix of the design for treatment effects is obtained from (2.1) as
c = (t + 0.5) i - 0.5 j .
It is seen that all the elementary treatment contrasts are estimated with same variance. This method would give designs for all odd number of treatments.
22
Datta et al.
Example 3.1.1: Let t = 3, so v = 7. The contents of the initial column are obtained as follows:
1	7
2	6
3	5
Developing this column by adding 1,2,...,6 reducing mod 7 would result in the following row-column design in three rows of size 14, 7 columns of size 6 with 2 units per cell and replication of each treatment being 6:
	Columns						
Rows	1 7	2 1	3 2	4 3	5 4	6 5	7 6
	2 6	3 7	4 1	5 2	6 3	7 4	1 5
	3 5	4 6	5 7	6 1	7 2	1 3	2 4
The information matrix for estimating treatment effects is C=3.51 - 0.5 J.
Example 3.1.2: For t = 4, the contents of the initial column for v = 9 are obtained as follows:
1	9
2	8
3	7
4	6
The row-column design in 4 rows of size 18, 9 columns of size 8 and 8 replications is constructed as given.
	Columns																	
	1	9	2	1	3	2	4	3	5	4	6	5	7	6	8	7	9	8
m *	2	8	3	9	4	1	5	2	6	3	7	4	8	5	9	6	1	7
o Pi	3	7	4	8	5	9	6	1	7	2	8	3	9	4	1	5	2	6
	4	6	5	7	6	8	7	9	8	1	9	2	1	3	2	4	3	5
Here, the design is complete row-wise with each treatment occurring twice and column-wise it is incomplete.
Series of incomplete row-column designs with two units per cell
23
Method 3.2: For v even, obtain the following initial column with 2 units per cell:
1	v
v	2
2	v-1
v-1	3
	
	
	
v - ( 2 -2)	v v — 2
v 2	v + 1 2
v
Develop — 1 more columns horizontally from the initial column by adding
1,2,..., (— -1) consecutively reducing mod v. The resulting design is a row-column design 2
with 2 units per cell in incomplete rows and complete columns. The parameters of the design
are v, m = (v-1) rows of size v, n = — columns of size 2(v-1), k = 2 and r = (v-1). From (2.1),
2
the information matrix for treatment effects is obtained as
v
C = — I-0.5 J.
2
Thus, all the elementary contrasts of treatment effects are estimated with same variance.
24
Datta et al.
Example 3.2.1: For v = 8, following is a row-column design with cells containing 2 units in 7 rows of size 8 each and 4 columns of size 14 each:
	Columns							
	1	8	2	1	3	2	4	3
	8	2	1	3	2	4	3	5
ows	2	7	3	8	4	1	5	2
	7	3	8	4	1	5	2	6
	3	6	4	7	5	8	6	1
	6	4	7	5	8	6	1	7
	4	5	5	6	6	7	7	8
The canonical efficiency factor of the above two series of designs is obtained as
E=H, where H=|— V^,-1 r ,	I v-1V 1
Xi are the eigen-values of C- matrix of the designs obtained and r is the number of replications of the treatments. It is assumed that c2 is same for the developed design and the orthogonal design to which it is compared. The canonical efficiency factor of the developed designs was worked out and was found to be fairly good.
Acknowledgements
The authors are grateful to the Editor and the reviewers for the constructive comments that have led to considerable improvement in the paper.
References
[1]	Andersen, L.D. and Hilton, A.J.W. (1980): Generalized Latin rectangles I: Constructions and decomposition, Discrete Maths., 31, 125-152.
[2]	Bailey, R.A. (1988): Semi Latin squares, J. Statist. Plan. Inf., 18, 299-312.
[3]	Bailey, R.A. (1992): Efficient semi-Latin squares, Statistica Sinica, 2, 413-437.
[4]	Bailey, R.A. and Chigbu, P.E. (1997): Enumeration of semi-Latin squares, Discrete Maths., 167/168, 73-84.
[5]	Bailey, R.A. and Monod, H. (2001): Efficient semi-Latin rectangles: Designs for plant disease experiments, Scand. J. Statist., 28, 257-270.
Series of incomplete row-column designs with two units per cell
25
[6]	Bedford, D. and Whitaker, R.M. (2001): A new construction for efficient semi-Latin squares, J. Statist. Plan. Inf., 98, 287-292.
[7]	Chigbu, P.E. (2003): The "best" of the three optimal (4 x 4)/4 semi-Latin squares, Sankhya: The Indian Journal of Statistics, 65(3), 641-648.
[8]	Darby, LA. and Gilbert, N. (1958): The Trojan Square, Euphytica, 7, 183-188.
[9]	Datta, A., Jaggi, S., Varghese, C. and Varghese, E. (2014): Structurally incomplete row-column designs with multiple units per cell. Statistics and Applications, 12, (1&2): 7179.
[10]	Datta, A., Jaggi, S., Varghese, C. and Varghese, E. (2015): Some series of row-column designs with multiple units per cell. Calcutta Statistical Association Bulletin, 67, (265266), 89-99.
[11]	Dharmalingam, M. (2002): Construction of partial triallel crosses based on Trojan square design, J. App. Statist., 29(5), 675-702.
[12]	Edmondson, R.N. (1998): Trojan square and incomplete Trojan square design for crop research, J. Agric. Sci. 131, 135-142.
[13]	Edmondson, R.N. (2002): Generalized incomplete Trojan designs, Biometrika, 89(4), 877-891.
[14]	Harshbarger, B. and Davis, L.L. (1952): Latinized rectangular lattices, Biometrics, 8, 73-84.
[14]	Jaggi, Seema, Varghese, Cini, Varghese, Eldho and Sharma, V.K. (2010): Generalized incomplete Trojan-type designs, Statistics and Probability Letters, 80, 706-710.
[15]	Jaggi, Seema, Varghese, Cini, and Varghese Eldho (2016): A series of generalized incomplete Trojan-type designs. Journal of Combinatorics, Information and System Sciences: American Journal, 40(1-4), 53-60.
[16]	Preece, D.A. and Freeman, G.H. (1983): Semi-Latin squares and related designs, J. R. Statist. Soc. B 45, 267-277.
[17]	Varghese, Cini and Jaggi, Seema (2011): Mating designs using generalized incomplete Trojan-type designs, Journal of Statistics and Applications, 6(3-4), 85-93.
[18]	Williams, E.R. (1986): Row and column designs with contiguous replicates, Australian J. Statist., 28, 154-163.
Metodoloski zvezki, Vol. 13, No. 1, 2016, 27-58
Adjustment of Recall Errors in Duration Data
Using SIMEX
Jose Pina-Sánchez1
Abstract
It is widely accepted that due to memory failures retrospective survey questions tend to be prone to measurement error. However, the proportion of studies using such data that attempt to adjust for the measurement problem is shockingly low. Arguably, to a great extent this is due to both the complexity of the methods available and the need to access a subsample containing either a gold standard or replicated values. Here I suggest the implementation of a version of SIMEX capable of adjusting for the types of multiplicative measurement errors associated with memory failures in the retrospective report of durations of life-course events. SIMEX is a method relatively simple to implement and it does not require the use of replicated or validation data so long as the error process can be adequately specified. To assess the effectiveness of the method I use simulated data. I create twelve scenarios based on the combinations of three outcome models (linear, logit and Poisson) and four types of multiplicative errors (non-systematic, systematic negative, systematic positive and heteroscedastic) affecting one of the explanatory variables. I show that SIMEX can be satisfactorily implemented in each of these scenarios. Furthermore, the method can also achieve partial adjustments even in scenarios where the actual distribution and prevalence of the measurement error differs substantially from what is assumed in the adjustment, which makes it an interesting sensitivity tool in those cases where all that is known about the error process is reduced to an educated guess.
1 Introduction
Applied quantitative searchers commonly assume that variables included in their models are measured perfectly. This often implicit assumption is, however, difficult to maintain when using survey data as interviewer effects, interviewee fatigue, social desirability bias, lack of cooperation, or plain deceit inevitably introduce measurement error (ME). This is especially true for surveys using a retrospective design, which collect information about past events
1 School of Law, University of Leeds, J.PinaSanchez@leeds.ac.uk
Adjustment of Recall Errors in Duration Data Using SIMEX
28
from a single contact with respondents. The advantages of retrospective designs, in comparison with prospective studies2, are well known: a) immune to problems of attrition; b) cheaper to administer; and c) more capable of detecting transitions occurring in short periods. Retrospective questions are however prone to ME as they require respondents to both interpret the question correctly and recall events that took place in the past.
The consequences of using data affected by ME are both difficult to estimate and potentially disastrous (Nugent, Graycheck & Basham, 2000; and Vardeman et al, 2010)3. Unfortunately, the latter is rarely acknowledged, and in certain cases it is directly misunderstood For example, Carroll et al (2006) point at the widespread belief that ME affecting an explanatory variable will only attenuate the regression estimate of that variable4. Even amongst researchers that acknowledge the potential consequences of ME very little is done to tackle the problem besides mentioning it as a caveat. There are two reasons for this: the requirement of additional data and the complexity of the adjustment methods available.
Generally, methods for the adjustment of ME need to be informed about the true unobserved values using additional data. For example, multiple imputation (Rubin, 1987, and Cole, Chu & Greenland, 2006) requires access to a validation subsample where the true values are observed. Regression calibration (Carroll & Stefanski, 1990; and Glesjer, 1990) needs at least repeated measurements, while two stage least squares (Theil, 1953) requires instrumental variables. However, researchers' access to this type of data tends to be the exception rather than the norm. In addition, these three methods belong to the family of functional methods - i.e. those that do not make any assumptions about the distribution of the true values. A second group of methods known as structural methods are technically more complex, amongst other things because they require specifying the probability function of the unobserved true values. Examples of structural methods are likelihood based adjustments, either Bayesian or Frequentist. These methods account directly for the ME mechanism in place, which tend to involve ad hoc specifications, in turn increasing the complexity of the adjustment.
2	See Solga (2001) for a comparison of data quality derived from prospective and retrospective questions.
3	"Measurement error is, to borrow a metaphor, a gremlin hiding in the details of our research that can contaminate the entire set of estimated regression parameters. " (Nugent, et al. 2000: 60). "Even the most elementary statistical methods have their practical effectiveness limited by measurement variation. " (Vardeman et al., 2010: 46).
4	"Despite admonitions of Fuller (1987) and others to the contrary, it is a common perception that the effect of ME is always to attenuate the line. In fact, attenuation depends critically on the classical additive ME model. " (Carroll et al., 2006: 46).
29
Jose Pina-Sanchez
In this paper I will use simulated data to study the effectiveness of multiplicative Simulation Extrapolation Method (SIMEX) (Carroll et al. 2006; and Biewen, Nolte & Rosemann, 2008). This is an extension of the standard SIMEX method (Cook & Stefanski, 1994) capable of adjusting for the recall errors that are typically observed in the retrospective reports of life-course events. SIMEX implementation is relatively simple in that only requires an estimate of the reliability of the variable affected by ME. This is normally obtained using a subsample of replicated data. Here I will assume that such information is not available to the researcher, as it is often the case. Instead I will use this technique to show its potential to carry out sensitivity analysis when the reliability ratios have to be assumed.
That is, I will demonstrate how the problem of recall errors so ubiquitous in retrospective data can be effectively dealt with by researchers who do not have neither the technical background to carry out complex adjustments, nor access to additional sources of data. In so doing my ultimate goal is to encourage a wider audience of survey researchers both to reflect about the implications of relying variables affected by ME and to consider the possibility of assessing the robustness of their findings.
In the following section I review the theory regarding the types of errors that can be expected from retrospective questions and the models that have been normally used to specify them. In Section 3, I present the simulated data that will be used in the analysis and illustrate the implications of using an explanatory variable affected by multiplicative errors in different outcome models. Section 4 lays out the functioning of the standard SIMEX and the extension considered to accommodate multiplicative ME. In Section 5 the results of the analysis are presented, and in Section 6 I conclude with a discussion of the relevance of the main findings.
2 Modelling Memory Failures in Retrospective Questions on Life-Course Events
Most studies aiming to assess the implications of ME or to adjust for them assume a simple error mechanism known as the classical ME model. This model was first formally defined by Novick (1966) as follows,
Adjustment of Recall Errors in Duration Data Using SIMEX
30
where X* is the observed variable, equal to the true variable X, plus the ME term V. which fulfils six important assumptions:
1.	Null expectancy refers to the assumption that the error term is non-systematic, or in other words, the expected value of the error term is zero, E(V) = 0.
2.	The assumption of homoscedasticity indicates that the variance of the error term is
assumed to remain constant across subjects, VarCVi) = Var(V) = <7y.
3.	In addition to having an expectation of zero and constant variance the error term is normally distributed, V~N (0, <Ty).
4.	The correlation between the true value and the error term is assumed to be zero,
5.	Furthermore, the correlation between different values of the error term is also assumed to be zero, Cov(VitVj^ = 0, where Vt and Vj represent any two values of the error term, for subjects i and /".
6.	The last assumption, non-differentiality, only becomes relevant when X* is used in a regression model to specify a response variable, Y It indicates that, given the true value, the ME is not associated with the residual term from the regression model, 6. That is, E(Y\X,X*) = E(Y\X), or alternatively, Cov(E,V) = 0.
The second and sixth assumptions were not originally established by Novick (1966), but they have been included here because they are often required in the application of the adjustment methods. Assumptions 1 and 4 (null expectancy and independence of the error and true value) can be used to define the expected value and the variance of the true value as follows,
fE(V) = 0; Far (Vi) = Var(V) ;
null expectancy homoscedasticity normally distributed independence error and true value independence between errors non — differentiality
Classical Model •
V~N(0, Var(V)) ; Cov(X,V) = 0; CovÇV^Vj) = 0; UCov(E,V) = 0;
(2.2)
E(X*) = E(X) + E(V) = EQO
(2.3)
and,
31
Jose Pina-Sanchez
VariX") = Var(X) + Cov(X, V) + VariV) = Var(X) + Var(V)
(2.4)
which can in turn be used to define the reliability of an observed variable affected by classical
ME, px*, as the ratio of the true to observed variance,
Notice that in order to calculate the reliability ratio either one of the unobserved variances (Var(X) or Var(V)) needs to have been previously estimated.
The classical model reflects nicely the type of ME that we can expect to find in measurement processes that are prone to random errors. However, it is not the most appropriate model that we can use to reflect the memory failures associated with retrospective reports of developmental milestones, transitions of events, or ages at onset (Pickles, Pickering & Taylor, 1996), such as the recall of age at menarche, or the date since last employed. Those reports convey duration (or time-to-event) data, which by definition cannot be negative, a possible outcome in the classical model when Vi is negative and bigger than X;.
Furthermore, although determining the prevalence of ME in these types of questions is not always straightforward, it makes sense to think that the distance from the interview has something to do with the magnitude of the ME. Pickles et al (1998) point at different views on this issue. On the one hand, different authors (Golub, Johnson & Labouvie, 2000; Johnson & Schultz, 2005) have detected telescoping effects in reports of dates of onset of a particular event. The term telescoping was coined by Neter and Waksberg (1964) to refer to the temporal displacement of an event whereby people perceive recent events as being more remote than they are (known as backward telescoping or time expansion) and distant events as being more recent than they are (forward telescoping or time compression).5 Golub, Johnson & Labouvie, (2000) and Johnson & Schultz (2005) have detected telescoping effects in reports of dates of onset of a particular event.
Conversely, another group of researchers (Huttenlocher, Hedges & Prohaska, 1988; Rubin and Baddeley, 1989; or Bradburn, Huttenlocher & Schwarz, 1994) have argued that rather than distorted time perceptions recall errors take the form of non-systematic ME around the reported date with its size being proportional to the distance between the day of the interview
Var(X)	Var(X)
(2.5)
Px*
Var(X*) Var(X) + Far(F)
5 For a review of the cognitive processes resulting in telescoping see Janssen and Chessa (2006).
Adjustment of Recall Errors in Duration Data Using SIMEX
32
and the reported date. That is, the further away the date of the event to be reported the harder its recall and therefore the bigger the ME. Both these findings question the appropriateness of applying the additive ME model to retrospective designs.
ME induced by memory failures can instead be better represented as a multiplicative ME model (Holt, McDonald & Skinner, 1991; Pickles et al., 1996 and 1998; Skinner & Humphreys, 1999; Augustin, 1999; Glewwe, 2007; and Dumangane, 2007) that builds upon the classical ME model as follows,
Here the multiplicative relation between X and V reflects that the effect of V on the observed variable X* is proportional to the value of X. In addition, for the specific case of non-systematic multiplicative ME, most of the assumptions about the error term described in equation 2.2 apply. I will refer to this type of error as classical multiplicative from here on. The only exceptions are items 1 and 3. Here, V follows a log-normal distribution bounded from 0 to and with mean equal to 1. This way the ME has a relatively symmetric effect across the true values and maintains the scale used in duration data. Note as well that, this same model can also be used to account for backward and forward telescoping effects by shifting the distribution of V to the right or left, so its mean goes below or above 1.
3 The Impact of Classical Multiplicative Measurement Error in Regression Analyses
Gustafson (2003) traced out analytically the impact of classical multiplicative ME affecting an explanatory variable, X*, in a linear model where a second explanatory variable, Z, is measured without error,
Classical multiplicative ME produces attenuation in the & regression coefficient directly proportional to the variance of the ME term, increasing the more skewed to the right X is, and as the correlation between the true variable X and Z grows stronger. Gustafson (2003) does not evaluate, however, the impact on the (32 error-free regression coefficient or consider nonlinear outcome models.
X' = X: ■ K
(2.6)
Y =ßQ + ß1X* + ß2Z+E
(3.1)
33
Jose Pina-Sanchez
To address these issues, this section explores the impact of different types of multiplicative ME on three different generalised linear model specifications - linear, logit and Poisson regressions - containing error-prone and perfectly measured explanatory variables, X* and Z. A simulated dataset of 1000 observations allows sufficient statistical power to detect moderate parameter estimates while keeping a low computational burden given the number of scenarios that will be explored. To reflect the features of duration data, the true variable X is taken to be exponentially distributed with mean 8.75 and range (1.15, 44.45); Z follows a standard normal distribution, and both of them are associated with the response variable of the linear model, Y, which is also shaped as a standard normal distribution. In addition, to generate the response variables for the logit and Poisson models, Y is recoded as a binary variable, Yca,
1 ca
C1, Y > 0 lO, 7 < 0
and as a count variable, Yco,
The four simulated ME scenarios are represented by the variables, X\ X?. X£ and X*. In each of these scenarios X is subject to normally distributed classical multiplicative ME. I choose to simulate normal instead of log-normal errors (as explained in equation 2.6) to ensure that they are perfectly symmetric around their mean (the latter are skewed to the right to a certain extent). Figure 1 shows the probability and mass functions for each of the variables simulated, while the specific code used in R is shown in Appendix I.
Adjustment of Recall Errors in Duration Data Using SIMEX
34
Figure 1: Probability Density and Mass Functions of the Simulated Variables
In the first ME scenario I simulate non-systematic errors distributed as a .25). The multiplicative effect of these errors results in a new variable X* with a reliability ratio (RR) of .816. In the second scenario I explore the effect of heteroscedastic ME by changing the distribution of the errors from N(l, .15) to Af(l, .35) when Z > 0. This is a type of ME that could take place when different survey modes are used. For example, Roberts (2007) - after reviewing the literature - concluded that telephone interviews place a higher cognitive demand on the interviewee than face-to-face interviews, which tend to make them more prone to measurement error. In the third scenario I study the effect of systematically underreported durations by simulating errors distributed as N(.9, .25). These are the types of errors that could be expected in the presence of forward telescoping bias (e.g. Golub et al., 2000, and Johnson and Schultz, 2005, found evidence of these types of errors in reports of onset of drug usage and smoking, respectively), but also in the report of durations of socially undesirable events (e.g. Pina-Sánchez, Koskinen & Plewis, 2013 and Pina-Sánchez, Koskinen & Plewis, 2014, found an increased tendency to underreport the longer spells of unemployment). Lastly, I explore the opposite scenario, one where the errors are distributed
35
Jose Pina-Sanchez
as /V(l.l, .25) to reflect overreported durations, which could be expected in the presence of backward telescoping or in reports of socially desirable events.
The effects of this four types of simulated ME are shown using scatter-plots in Figure 2. Notice that the top-right plot uses Z instead of X.in the y-axis.
Figure 2: Scatterplots of the effect of the different types of measurement error considered
To assess the impact that these types of errors have on the regression coefficient of a linear, a logit, and a Poisson model, I compare the results from each of these models when X * is used (the naïve model) instead of X (the true model). Specifically, I focus on the bias in the regression coefficients,
BIAS = ß„ ßt
(3-2)
where the subscript n stands for the naïve model and t for the true model. In addition, to compare the impact of ME across models and across regression coefficients using different scales, I calculate a relative measure of the bias as follows,
Adjustment of Recall Errors in Duration Data Using SIMEX
36
Results for the different models studied and the impact generated by the different types of ME are presented in Table 1. In all of the scenarios studied the effect of ME was reflected in a downward bias for pL (the coefficient for the variable X"), and in upward biases for j30 and p2 (the coefficients for the constant and Z). In addition to the observed differences in the direction of the biases across coefficients there are also strong differences in their intensity. The size of the bias for ¡i2 is about twice as large as the bias for ¡30 and ft, reaching levels as alarming as 94.8% for the logit model with heteroscedastic ME, although the average size of the bias across all the scenarios is 39.5%.
Table 1: Impact of Measurement Error in the Regression Estimates
		Linear				Logit				Poisson			
		Coef	SE	Bias	R.Bias	Coef	SE	Bias	R.Bias	Coef	SE	Bias	R.Bias
"ü	ßo	-1.297	.035			-5.997	.388			-1.362	.069		
o a	ßi	.150	.003			.768	.050			.092	.004		
§ s- H	ßi	.111	.016			.210	.099			.082	.038		
	ßo	-1.013	.038	.284	21.9%	-3.810	.258	2.187	36.5%	-1.198	.065	.284	20.8%
Naïve: multi.	ßl	.118	.004	-.032	21.2%	.494	.034	-.275	37.5%	.075	.004	-.032	34.6%
	ßl	.156	.019	.045	40.9%	.275	.084	.065	30.9%	.157	.037	.045	55.1%
	ßo	-.998	.039	.372	28.7%	-3.649	.248	2.372	39.5%	-1.178	.065	.372	27.4%
<U o > s-	ßl	.116	.004	-.043	28.8%	.471	.032	-.299	38.9%	.074	.004	-.043	47.0%
£ %	ßl	.169	.020	.067	60.7%	.377	.085	.199	94.8%	.141	.037	.067	81.7%
	ßo	-.937	.038	.325	25.1%	-3.441	.233	1.962	32.7%	-1.054	.059	.325	23.9%
Naïve: under.	ßl ßl	.121 .166	.004 .020	-.024 .052	15.9% 46.9%	.490 .321	.033 .084	-.183 .124	23.8% 58.9%	.069 .149	.004 .037	-.024 .052	26.1% 63.2%
	ßo	-1.028	.038	.245	18.9%	-4.029	.266	1.842	30.7%	-1.110	.061	.245	18.0%
S ^ 3 £	ßl	.108	.003	-.039	26.1%	.470	.031	-.284	37.0%	.061	.003	-.039	42.7%
£ °	ßl	.150	.019	.053	48.0%	.284	.087	.152	72.4%	.134	.037	.053	64.6%
While the different ME scenarios clearly show attenuated coefficients, none of the coefficients actually became statistically non-significant or changed their sign in comparison to the naïve models. This is partly due to the small effect that ME had on the standard errors, which were underestimated by a third of their size in the true logit model, and only slightly underestimated and overestimated when using a Poisson and a linear model, respectively.
These results are consistent with Biewen et al. (2008) who, in a simulated probit model with one predictor, find an upward bias in the constant and a downward bias in the slope
37
Jose Pina-Sanchez
induced by classical multiplicative ME. These results obtained here serve to reinforce these findings. In the presence of a type of ME different than classical additive or for a model different than simple linear regression the direction of the bias is not always towards the null. The difficulty to anticipate the direction and size of these biases - even in scenarios with moderate prevalence of ME - makes the implementation of adjustment methods an indispensable part analysing survey data prone to these types of ME.
4 Standard SIMEX and Extensions to Account for Classical Multiplicative Measurement Error
The study of the adjustment of multiplicative errors dates back to the decade of the 80s. Fuller (1984) and Hwang (1986) developed a method-of-moments correction for multiplicative ME in the explanatory variables of a linear model. This method assumes that the value of ME variance is known - or that it can be estimated - and is limited to applications where the ME mechanism is affecting one of the explanatory variables only in the context of a linear model. Lyles and Kupper (1997) compared the effectiveness of this method with others such as regression calibration, and a quasi-likelihood approach, which could be applied to other non-linear outcome models.
These methods, as mentioned above, are however of limited use to applied researchers in that they either require additional data in the form of replicated measures or validation subsamples, or are complex to implement. Regression calibration requires additional data in the form of replicated measures or a validation subsample. Quasi-likelihood approaches only need an estimate of the variance of the ME, and much like those relying on Bayesian statistics can be applied when a full likelihood approach is not feasible due to computational intractability. However, their implementation is relatively complex, starting from the need to use specialised software (such as WinBUGS when considering Bayesian adjustments), which discourages many analysts from attempting the implementation of the necessary adjustment.
Due to only requiring an estimate of the variance of the ME, the simplicity of its application, and its generalizability to any other outcome model regardless of its complexity6, SIMEX represents a very convenient alternative. SIMEX was first presented by Cook and Stefanski (1994) and refined in the following years by Stefanski and Cook (1995) and
6 See for example He et al. (2007) who applied SIMEX to an Accelerated Failure Time models with one of the explanatory variable affected by classical ME, or Battauz et al. (2008) who adjusted for a similar type of ME problem but for an ordinal probit model as the outcome model.
Adjustment of Recall Errors in Duration Data Using SIMEX
38
Carroll, Kuchenhoff, Lombard, and Stefanski (1996). "The key idea underlying SIMEX is the fact that the effect of measurement error on an estimator can be determined experimentally via simulation " (Carroll et al., 2006: 98).
In particular, SIMEX exploits the relationship between the size of the ME affecting a variable and the size of the bias in the regression estimates in the outcome model. Following Fuller (1987) we know that the unadjusted estimator of the slope, /?i, does not converge asymptotically to the parameter but to:
where erf and <Jy represent the variance of the true explanatory variable and the error term. In other words, the estimator of the slope is biased downwards in absolute terms by a factor equal to the reliability ratio, px- (defined in equation 2.5), of the observed variable, X* In this situation, and if px or &y is known, it would not be practical nor efficient to use SIMEX, since the adjustment would simply be achieved by substituting the variance terms in equation 4.1. However, I will use this simple setting for illustrative purposes.
To facilitate the understanding of the method, the steps involved in its implementation are outlined below using a simple example of bias in the slope of a simple linear regression, where explanatory variable, X*, is prone to classical additive ME (equations 2.1 and 2.2). The implementation of SIMEX is divided into six phases:
1)	The first step involves simulating additional explanatory variables with increasing levels of ME. These new variables are generated in a way that emulates the classical ME model, but with successively larger values of Oy affecting X. Specifically, K new explanatory variables (/Lfc) are generated by the rule:
with fc = 0,1, ...,K, the simulated error normally distributed, V--N(0, Uy ). and,
< :.,, < /,K, a set of parameters used to amplify the ME variance (often these are (.5, 1, 1.5, 2)).
2)	Once the different variables with added ME have been generated, the outcome model is re-estimated using this new data, and the values of the estimator of interest (i.e. &) for the different levels of ME (Ak) are saved. In particular, for the case of a simple linear model with
39
Jose Pina-Sanchez
the explanatory variable affected by classical ME, and using the data-generating rule described in equation 4.2, the estimator of the slope will now converge to:
(4.3)
where the bias increases monotonically as Ak increases.
3)	In order to reduce the Monte Carlo error associated with the simulation procedure steps 1 and 2 are repeated B times so a mean estimate of f¡lk for b = 1, ...,B can be computed, where the rule of thumb7 is to use B = 100 iterations.
4)	At this stage the and Ak values can be paired considering the former as a function of the latter, G ^ik,Ak^, known as the extrapolation function, which should be plotted in order to obtain a first insight of its shape.
5)	The extrapolation function is estimated using a regression model, with data ^ik,Ak^. Carroll et al. (2006) recommend the use of one of three types of simple functional forms.
a)	linear,	G [plk, Afc) = & + Ak
b)	quadratic,	G Ak) = & + (2 Ak + (3A2k
c)	non-linear or ratio-linear, G (filk,Ak^ = Ci + (2/((3 +
For the example presented here, and if the extrapolation function is well approximated by the chosen functional form, we would find the following function,
6)	From here, the SIMEX estimate, fisimex > can be calculated by extrapolating G (filk,Ak^ to G Ak = —lj. Note that from equation 4.4 when Ak — —1 the bias is cancelled out.
Figure 3 represents the SIMEX process graphically. The solid line denotes the part of the extrapolation function that can be approximately observed through the regression estimates resulting after the outcome model is specified using simulated predictors with increasing
7 This is the number of iterations used by default in the SIMEX packages in STATA and R.
Adjustment of Recall Errors in Duration Data Using SIMEX
40
levels of ME, and the dashed line represents the extrapolation to the case of no ME, which gives the adjusted estimate.
Figure 3: Extrapolation function
o
C\l -
o
fii " -a
o o
0.0 0.5 1.0 15 2 0 2.5 3.0 (1+A*)
Figure 3 also shows some of the limitations of SIMEX. The entire extrapolation function cannot be observed, hence, it is hard to assess the quality of the adjustment. In addition, the extrapolation function needs to be approximated using a simple functional form. Therefore adjustments are only approximated, and their effectiveness depends on how well the extrapolation function is estimated, for which the choice of the right functional form is crucial. In the case depicted by Figure 3 it makes sense to think of the quadratic function as the better approximation, but it might not always be so clear.
Another cause of concern stems from the accuracy of the estimate of <Jy that is used in the simulations. For example, considering the case depicted in Figure 3, if <Jy is underestimated, the extrapolation function will have a flatter slope and the adjustment would only be partial. That is, for an underestimated o'y. lower values of would have been generated for
, which would have made the estimated extrapolation function shallower, and produced a bigger - and still biased - adjusted estimate when extrapolating to
= —1. Such suboptimal adjustment is illustrated in Figure 4 where I compare the extrapolation function shown in Figure 3 with a similar one that would be obtained if Oy had been underestimated.
41
Jose Pina-Sanchez
Figure 4: Comparison of Extrapolation Functions
o □
o
o □
0.0 0.5 1.0 15 2 0 2.5 3.0 (1 +**)
Interestingly the application of SIMEX to any other regression model affected by a problem of error-in-variables would follow the same logic than the example of a simple linear regression model presented here, regardless of the complexity of the outcome model under study. Even more interestingly - at least for this paper's topic of study - is the fact that SIMEX can be applied to ME problems different than the standard classical additive model so long as the ME-generating process can be simulated via Monte Carlo methods (Carroll, 2006). Two remarkable extensions that have proven to be robust in the literature are: 1) MC-SIMEX (Kuchenhoff et al. 2006), the application of the SIMEX methodology to problems of misclassification of either the response or an explanatory variable in the outcome model; and 2) SIMEX for classical multiplicative ME (Carroll et al., 2006, and Biewen et al. 2008).
In order to accommodate the standard SIMEX method to account for the classical multiplicative ME setting Carroll et al. (2006) propose a change in the way the simulated variables are increasingly affected by ME, in step 1. In particular, equation 4.3 is substituted by
to represent the multiplicative relationship between the observed durations and the simulated noise, while the rest of the six-step algorithm is implemented as before. However, the expression in equation 4.5 cannot be used to generate negative errors, which in a setting like the one assumed here, where errors are normally (instead of lognormally) distributed, could create complications. To avoid this problem I will use the following error-generating rule suggested by Biewen et al. (2008),
Adjustment of Recall Errors in Duration Data Using SIMEX
42
Lastly, to estimate the standard errors of /?simex I use the bootstrapping pairs algorithm8, where entire cases covering the response and explanatory variables are resampled with replacement, and for each new sample the SIMEX process is rerun. Bootstrap is only one of the different options available to estimate the variance of the SIMEX estimator. Carroll et al. (1996) suggest using a method based on the sandwich estimator and on the theory of M-estimators to obtain an asymptotic covariance estimator. Specifically, this method is based on the asymptotic equivalence of /?(Afc) and an M-estimator, producing a closed form equation from which the standard errors of Psimex can be directly derived, which avoids the computationally intensive process of replicating the SIMEX procedure for a number of new samples. However, this method has been developed for the specific case of classical additive ME. For extensions of SIMEX to different types of ME processes non-parametric methods such as bootstrap or jackknife become a natural alternative.
5 Effectiveness of the Adjustments for Different Types of Errors and Estimates of their Variance
SIMEX requires an estimate of the variance of the ME, By. Ideally the actual parameter is known9, although in the presence of replicated measures By can also be estimated. However, access to such type of data tends to be the exception rather than the norm. Thus, here I study the effectiveness of SIMEX as a sensitivity tool. That is, I assume that the researcher suspects the presence of ME in one of the variables being used but can only provide an educated guess10 of the distribution and prevalence of that ME.
For each of the different ME scenarios presented in Section 3, I explore the effectiveness of SIMEX in reducing the bias found in the naive models when different values of oy are tried. In particular I review the extent of the adjustments assuming that E{V) = 0, and Oy is equal to .25, .176, .336, and .412 which are equivalent to assuming reliability ratios of X*
8	See Keele (2008) for a description of the differences between bootstrapping algorithms.
9	For example, Biewen et al. (2008) suggested using multiplicative ME as an strategy to anonymised data, while offering the value of Oy to data users so they can adjust for the implications of the artificially created ME in their analyses.
10	Ideally based on validation studies looking at similar types of questions available in the literature.
43
Jose Pina-Sanchez
equal to .816, .9, .7, and .6, respectively. I refer to each of these scenarios as assuming a "correct", "overestimated", "underestimated", and "highly underestimated" reliability ratio. In addition, to assess the effectiveness of the adjustment in the presence of forward or backward telescoping effects when the ME process is correctly estimated I take the "correct" adjustments for the systematic negative and positive ME scenarios to use the right ME process, that is V~N(.9, .25) and	.25), respectively.
For consistency's sake I use the linear extrapolation function across all of the adjustments11. This extrapolation function was chosen instead of the more commonly used quadratic extrapolation to generate more conservative adjustments in scenarios using , which in my analysis would be the cases of "underestimated" and "very
underestimated" reliability ratios. For the estimation of the standard errors of Psimex i run the bootstrapping pairs algorithm for 100 iterations (just like in Biewen et al., 2008)12, and within each of these iterations I run the six steps of the SIMEX process another 100 times13. Lastly, to calculate the effectiveness of the adjustments I use measures of absolute and relative bias as in Section 3. The only differences are in terms of notation: I now substitute /?s by Psimex in equation 3.2 and 3.3, and for the R.BIAS I also substitute the denominator by the BIAS in the survey. Results are shown in Table 2.
11	The adequacy of the linear extrapolation can be assessed in Figure A2 (Appendix III), where I show the extrapolation functions for the adjustment of non-systematic ME using the correct Gy.
12	This is less than what would be recommendable to obtain precise estimates of the standard errors but it is sufficiently good considering that the SIMEX process is also computationally intensive and that a compromise needs to be reached.
13	Figure Al (Appendix III) includes scatterplots that reflect the simulation of increasing levels of the ME generation process (step 1 of the SIMEX algorithm) when the correct 0y is used.
Adjustment of Recall Errors in Duration Data Using SIMEX
44
Table 2: Results of the Adjustments
Linear	Logit	Poisson
	| RR	Param.	| Coef.	SE |	Bias	| R.Bias	Coef.	SE	Bias	R.Bias	Coef.	SE	Bias	| R.Bias
		ß0	-1.283	.017	.014	4.8%	-5.030	.100	.967	44.2%	-1.400	.026	-.038	23.1%
	D g	ß1	.152	.002	.002	7.5%	.662	.014	-.106	38.6%	.097	.003	.005	31.6%
	O	ß2	.105	.006	-.006	12.9%	.184	.018	-.026	39.8%	.104	.007	.022	29.9%
	•a i .Sä	ß0	-1.198	.013	.098	34.7%	-4.714	.087	1.282	58.6%	-1.343	.021	.019	11.7%
	Ö s > S O '-9	ß1	.141	.002	-.009	26.8%	.617	.012	-.151	54.9%	.090	.002	-.001	7.3%
*Ö3 O	$	ß2	.121	.004	.010	23.2%	.207	.016	-.003	4.4%	.119	.005	.037	48.9%
i u	Underestimated	ß0	-1.339	.018	-.043	15.1%	-5.148	.095	.849	38.8%	-1.436	.027	-.074	44.9%
		ß1	.160	.002	.010	32.1%	.681	.014	-.087	31.6%	.101	.003	.010	58.1%
		ß2	.095	.006	-.016	35.5%	.174	.020	-.035	54.6%	.096	.009	.014	18.2%
	1 -a nd et	ß0	-1.361	.018	-.064	22.7%	-5.134	.076	.863	39.4%	-1.451	.030	-.089	54.4%
	3 a ry mit	ß1	.163	.002	.014	42.7%	.681	.011	-.087	31.6%	.103	.003	.012	71.0%
	e ts > »	ß2	.090	.007	-.020	44.7%	.175	.020	-.034	53.1%	.092	.009	.010	13.2%
		ß0	-1.265	.019	.032	10.6%	-4.790	.111	1.207	51.4%	-1.378	.034	-.016	8.6%
	e g	ß1	.149	.002	<.001	1.1%	.629	.016	-.140	46.9%	.096	.003	.004	22.8%
	O	ß2	.122	.005	.011	18.8%	.324	.020	.114	68.1%	.082	.009	<.001	0.8%
	d r- et	ß0	-1.182	.014	.114	38.3%	-4.481	.093	1.515	64.5%	-1.317	.027	.045	24.3%
o	re ta > S O '-9	ß1	.139	.002	-.011	32.3%	.585	.013	-.183	61.6%	.089	.003	-.003	15.4%
SO T3 Hi O	se	ß2	.137	.004	.026	44.4%	.340	.014	.130	77.4%	.098	.006	.016	26.3%
o fe	Underestimated	ß0	-1.320	.019	-.023	7.8%	-4.925	.090	1.071	45.6%	-1.414	.034	-.052	28.5%
Hi a:		ß1	.157	.002	.007	21.2%	.650	.013	-.118	39.8%	.100	.003	.008	47.7%
		ß2	.112	.007	.002	3.0%	.315	.020	.105	62.8%	.073	.010	-.009	15.9%
	rde d nd et	ß0	-1.342	.018	-.045	15.1%	-4.923	.083	1.073	45.7%	-1.427	.035	-.065	35.5%
	3 a ry mit	ß1	.160	.002	.011	31.1%	.651	.011	-.117	39.3%	.102	.003	.010	58.3%
	e ts > Ö	ß2	.108	.008	-.002	3.5%	.314	.019	.104	62.1%	.069	.009	-.013	22.6%
		ß0	-1.201	.019	.096	26.6%	-4.565	.085	1.432	56.0%	-1.214	.031	.148	47.9%
	e g	ß1	.153	.003	.003	11.4%	.649	.013	-.119	42.9%	.086	.003	-.006	25.2%
	O	ß2	.115	.005	.004	7.7%	.243	.017	.033	29.9%	.093	.009	.011	16.5%
	d r- et	ß0	-1.106	.013	.191	53.0%	-4.207	.072	1.790	70.1%	-1.157	.020	.205	66.6%
> "o3 eg	er ta > S O '-9	ß1	.144	.002	-.005	18.6%	.606	.011	-.163	58.5%	.081	.002	-.010	44.5%
£	se	ß2	.134	.004	.023	41.5%	.265	.012	.055	49.5%	.111	.007	.029	43.3%
1	Underestimated	ß0	-1.236	.022	.060	16.7%	-4.640	.094	1.357	53.1%	-1.235	.031	.127	41.3%
		ß1	.164	.003	.014	48.1%	.677	.015	-.091	32.9%	.092	.003	<.001	0.9%
Cfl		ß2	.109	.006	-.002	3.6%	.239	.020	.029	26.1%	.087	.010	.005	7.2%
	r-de d nd et	ß0	-1.257	.018	.040	11.1%	-4.644	.091	1.352	52.9%	-1.247	.031	.115	37.3%
	3 a ry mit	ß1	.167	.003	.017	60.5%	.680	.015	-.088	31.8%	.094	.004	.002	9.6%
	e ts > »	ß2	.105	.007	-.006	10.7%	.239	.017	.029	26.1%	.083	.010	.001	1.0%
	8	ß0	-1.278	.019	.018	6.9%	-5.252	.097	.744	37.8%	-1.263	.033	.099	39.2%
	e ë	ß1	.141	.002	-.009	21.9%	.632	.012	-.136	45.7%	.079	.003	-.013	41.4%
	O	ß2	.102	.005	-.008	20.3%	.199	.018	-.011	14.4%	.080	.009	-.002	4.0%
	d r- et	ß0	-1.209	.014	.087	32.4%	-4.959	.087	1.038	52.7%	-1.221	.017	.141	56.1%
.5»	er ta > S O "-P	ß1	.129	.002	-.021	50.4%	.584	.011	-.185	61.9%	.072	.002	-.019	63.4%
O OH O	se	ß2	.116	.004	.005	13.7%	.217	.014	.007	8.9%	.093	.007	.011	21.8%
"o3 S3	Underestimated	ß0	-1.355	.023	-.058	21.5%	-5.450	.110	.547	27.8%	-1.307	.038	.055	21.7%
.S3 £		ß1	.146	.003	-.004	8.7%	.649	.014	-.119	40.0%	.082	.003	-.010	32.7%
		ß2	.089	.006	-.022	55.2%	.188	.022	-.022	29.2%	.066	.010	-.016	30.3%
	r-de d nd et	ß0	-1.378	.020	-.082	30.4%	-5.444	.102	.553	28.1%	-1.322	.034	.040	16.1%
	3 a ry mit	ß1	.149	.002	.000	0.7%	.650	.013	-.118	39.6%	.084	.003	-.008	26.6%
	re ts V se	ß2	.084	.008	-.027	66.5%	.189	.019	-.021	27.6%	.062	.011	-.021	39.7%
45
Jose Pina-Sanchez
A first point to notice is that compared to results from the true models presented in Table 1, the standard errors are underestimated by a half. This might be due to the small size of the true standard errors (expressed in the second or third decimal point), but it also illustrates that the variance of ¡3simex using bootstrap can only be approximated.
Regarding the adjustment in terms of the reduction of the biases found in the naïve analyses we can observe varying levels of success. The effectiveness of the adjustment ranged from being able to reduce it to . 8% of its size (for jS2 in the Poisson model affected by heteroscedastic ME and using the correct estimate of <Jy) to a less impressive figure of 77.4% (for (32 in the logit model affected by heteroscedastic ME and using an over-estimated reliability ratio). In spite of this variability, the adjustments explored could be considered quite successful since on average they managed to reduce the biases found in the naïve models to 32.8% of their original size.
Figure 5: Adjustments in Terms of R.BIAS*'*
*The category of barplots "under +" represents a very underestimated reliability ratio of .6. **The flat lines indicate the average R.BIAS for the three types of models.
To elucidate some trends about the relative effectiveness of SIMEX for the different scenarios explored I have grouped some of the results in Figure 5. Each bar represents the average R.BIAS for the three regression coefficients comprised in each of the models and for
Adjustment of Recall Errors in Duration Data Using SIMEX
46
each of the scenarios studied, while the black horizontal lines represent the bars averages over the three outcome models studied.
On average the adjustments are most effective on the linear model, reducing the R.BIAS to 24.3% of its original size, whereas for the logit models the average adjustment is 43.7%, and for the Poisson is 30.3%. This better performance for the linear outcome model might be related to the linear extrapolation function used across all of the adjustments, which would be the most appropriate function when the changes in bias are proportional to the increased levels of ME. Directly from Figure 5 we can also see that considering the three coefficients of each model, the most successful adjustment was for the linear model when the correct ME process is used, which reduced the bias to just 8.4% of its original size. But even in the worst scenario where heteroscedastic ME is simulated and the reliability ratio of the measure is overestimated the bias can still be reduced to 38.3% of its size. On the other hand, the least promising results were obtained for the adjustments of the heteroscedastic ME when a logit outcome model is used. Here, regardless of the assumed reliability ratio no adjustment could reduce the bias by more than 50.6%.
6 Conclusion
The presence of ME in retrospective questions is widely acknowledged, however, little is done to tackle this problem. A majority of studies using this type of data deal with the ME problem by adding caveats to their findings, whereas those attempting to implement the necessary adjustments are very uncommon. The implications of this problem are truly daunting. Here I have simulated moderate levels of different types of multiplicative errors, which could be expected to arise as a result of memory failures in the report of dates of onset or end of spells, to explore the impact that such ME could have on the regression coefficients of different models. Across the scenarios studied I found an average bias of about 40% the size of the true coefficients, reaching up to 95% in certain cases.
I pointed at two fundamental barriers limiting the implementation of adjustment methods. First, most of these methods require access to additional sources of data in the form of replicated measures, validation subsamples, or instrumental variables, which are rarely available to a majority of researchers. Second, most methods are relatively complex to implement, discouraging researchers from using them. A very illustrative case is that of
47
Jose Pina-Sanchez
adjustments relying on Bayesian statistics; their reliance on MCMC and prior probabilities offers remarkable flexibility to deal with complex ME processes, even in the absence of replicated or validation data. However, it is this complexity together with other practical matters such as the need to use specialised statistical packages that tend to dissuade researchers from using them.
To deal with recall errors in the report of dates of onset when no data to inform about the ME process is available I have suggested the implementation of the more practical SIMEX method. SIMEX is relatively simple to implement, it can be easily replicated to different outcome models regardless of their complexity, and it only requires - in its standard form -knowledge about the variance of the error term. Furthermore, although SIMEX was initially created to adjust for classical additive ME, it can also be extended to account for different types of ME processes so long as these can be simulated using Monte Carlo methods. I have used this feature to assess the effectiveness of SIMEX in the presence of multiplicative errors. In particular, I have explored the application of SIMEX to classical, heteroscedastic, systematic positive and systematic negative multiplicative errors. The types of errors that could be expected from general memory failures, but also those seen when different survey modes are used, or in the presence of backward and forward telescoping effects, respectively.
In the presence of these types of errors, SIMEX adjustments where the distribution of the ME is known have shown satisfactory results, managing to reduce the size of the biases found in the estimates of the naïve models to less than one third of their size on average. But perhaps more interesting is the fact that SIMEX also achieved reasonably good results even when the ME process is assumed to be non-systematic and its variance is only approximated. The quality of the adjustments varied substantially - as it could not be any different given the several scenarios explored - but in each of the 144 estimates studied SIMEX managed to produce positive adjustments, with the worst of them all achieving a reduction of the bias found in the naïve model of 22.6%.
This capacity to obtain partial adjustments even when the type of multiplicative error is only approximated, together with the relative simplicity with which it is implemented, makes SIMEX an ideal method to be used as a sensitivity tool. Researchers concerned of using duration data affected by recall errors could obtain an estimate of the magnitude of the impact, which would allow them to provide more informative caveats regarding the degree to which the validity of their findings is affected. To do that they can use the multiplicative SIMEX process presented in Appendix II. The only two alterations needed would be the re-
Adjustment of Recall Errors in Duration Data Using SIMEX
48
specification of the outcome model and the choice of the size of the variance of the error term.
When the latter is not known the method could be run using an educated guess for the reliability ratio of the variable prone to ME. Alternatively, for those questions where previous studies of the validity and reliability of responses are available, the researcher could use an average of the estimates obtained in the literature. The opportunity to use such sensitivity analyses even when no replicated measures or a validation subsample is available also illustrates the importance of studies aiming to assess the prevalence of ME in different types of survey questions. The more we know about the ME processes affecting survey responses the better adjustments could be achieved and the higher the validity of studies using survey data will be.
Acknowledgements
I thank my colleague and friend Albert Varela for his useful comments, which have substantially improved the quality of this manuscript.
References
[1]	Augustin, T. (1999): Correcting for Measurement Error in Parametric Duration Models by Quasi-likelihood. Munchen Institut fur Statistik, from: http://epub.ub.uni-muenchen.de/1546/1/paper 157.pdf.
[2]	Battauz, M., Bellio, R. & Gori, E. (2008): Reducing Measurement Error in Student Achievement Estimation. Psychometrika, 73, 289-302.
[3]	Biewen, E., Nolte, S. & Rosemann, M. (2008): Perturbation by Multiplicative Noise and The Simulation Extrapolation Method. Advances in Statistical Analysis, 92, 375389.
[4]	Bradburn, N. M., Huttenlocher, J. & Hedges, L. (1994): Telescoping and Temporal Memory. In N. Schwarz et al. (Ed): Autobiographical Memory and The Validity of Retrospective Reports, 203-215. New York: Springer.
[5]	Carroll, R. J. & Stefanski, L. A. (1990): Approximate Quasilikelihood Estimation in Models with Surrogate Predictors. Journal of the American Statistical Association, 85, 652-663.
[6]	Carroll, R., Küchenhoff, H., Lombard, F. & Stefanski, L. (1996): Asymptotics for the SIMEX Estimator in Nonlinear Measurement Error Models. Journal of the American Statistical Association, 91, 242-250.
49
Jose Pina-Sanchez
[7]	Carroll, R., Ruppert, D., Stefanski, L. & Crainiceanu, C. (2006): Measurement Error in Nonlinear Models; a Modern Perspective, Boca Raton: Chapman and Hall.
[8]	Cook, J. & Stefanski, L. (1994): A Simulation Extrapolation Method for Parametric Measurement Error Models. Journal of the American Statistical Association, 89, 1314— 1328.
[9]	Cole, S., Chu, H. & Greenland, S. (2006): Multiple-Imputation for Measurement-Error Correction. International Journal of Epidemiology, 35, 1074-1081.
[10]	Da Silva, D. & Skinner, Ch. (2014): The Use of Accuracy Indicators to Correct for Survey Measurement Error. Journal of the Royal Statistical Society: Series C, 62, 303319.
[11]	Dumangane, M. (2007): Measurement Error Bias Reduction in Unemployment Durations. Centre for Microdata Methods and Practice, 3, from: http://www.cemmap.ac.uk/wps/cwp0603.pdf.
[12]	Efron, B. & Tibshirani, R. J. (1993): An Introduction to the Bootstrap. Boca Raton: CRC press.
[13]	Fuller, W. (1987): Measurement Error Models. New York: John Wiley and Sons.
[14]	Glesjer, L. (1990): Improvements of the Naive Approach to Estimation in Nonlinear Errors-in-Variables Regression Models. In P. Brown & W. Fuller (Ed): Statistical Analysis of Error Measurement Models and Application, 99-114. Providence: American Mathematics Society.
[15]	Glewwe, P. (2007): Measurement Error Bias in Estimates of Income and Income Growth among the Poor: Analytical Results and a Correction Formula. Economic Development and Cultural Change, 56, 163-189.
[16]	Golub, A., Johnson, B. D. & Labouvie, E. (2000): On Correcting Biases in Self-Reports of Age at First Substance Use with Repeated Cross-Section Analysis. Journal of Quantitative Criminology, 16, 45-68.
[17]	Gustafson, P. (2003): Measurement Error andMisclassification in Statistics and Epidemiology. Boca Raton: Chapman and Hall.
[18]	He, W., Yi, G. & Xiong, J. (2007): Accelerated Failure Time Models with Covariates Subject to Measurement Error. Statisitcs in Medicine, 26, 4817-4832.
[19]	Holt, D., McDonald, J.W. & Skinner, C.J. (1991): The Effect of Measurement Error on Event History Analysis. In P. Biemer (Ed): Measurement Error in Surveys, 665-685. New York: John Wiley.
[20]	Huber, P. J. (1964): Robust Estimation of a Location Parameter. Annals of Mathematical Statistics, 35, 73-101.
[21]	Huttenlocher, J., Hedges, L. & Prohaska, V. (1988): Hierarchical Organization in Ordered Domains: Estimating the Dates of Events. Psychological Review, 95, 471-484.
[22]	Hwang, J. T. (1986): Multiplicative Errors-in-Variables Models with Applications to Recent Data Released by the US Department of Energy. Journal of the American Statistical Association, 81, 680-688.
[23]	Janssen, S. M., Chessa, A. G. & Murre, J. M. (2006): Memory for Time: How People Date Events. Memory & Cognition, 34, 138-147.
Adjustment of Recall Errors in Duration Data Using SIMEX
50
[24]	Johnson, E. O. & Schultz, L. (2005): Forward Telescoping Bias in Reported Age of Onset: An Example from Cigarette Smoking. International Journal of Methods in Psychiatric Research, 14, 119-129.
[25]	Küchenhoff, H., Mwalili, S.M. & Lesaffre, E. (2006): A General Model for Dealing with Misclassification in Regression: The Misclassification SIMEX. Biometrics, 62, 85-96.
[26]	Lyles, R. H. & Kupper, L. L. (1997): A Detailed Evaluation of Adjustment Methods for Multiplicative Measurement Error in Linear Regression with Applications in Occupational Epidemiology. Biometrics, 1008-1025.
[27]	Neter, J. & Waksberg, J. (1964): A Study of Response Errors in Expenditures Data from Household Interviews. Journal of the American Statistical Association, 59, 18-55.
[28]	Novick, M.R. (1966): The Axioms and Principal Results of Classical Test Theory.
Journal of Mathematical Psychology, 3, 1-18.
[29]	Nugent, W., Graycheck, L. & Basham, R. (2000): A Devil Hidden in the Details: The Effects of Measurement Error in Regression Analysis. Journal of Social Service Research, 27, 53-75.
[30]	Pickles, A., Pickering, K. & Taylor, C. (1996): Reconciling Recalled Dates of Developmental Milestones, Events and Transitions: A Mixed Generalized Linear Model with Random Mean and Variance Functions. Journal of the Royal Statistical Society. Series A, 225-234.
[31]	Pickles, A., Pickering, K., Simonoff, E., Silberg, J., Meyer, J. & Maes, H. (1998): Genetic "Clocks" and "Soft" Events: A Twin Model for Pubertal Development and Other Recalled Sequences of Developmental Milestones, Transitions, or Ages at Onset. Behavior Genetics, 28, 243-253.
[32]	Pina-Sánchez, J., Koskinen, J. & Plewis, I. (2013): Implications of Retrospective Measurement Error in Event History Analysis. Metodología de Encuestas, 15, 5-25.
[33]	Pina-Sánchez, J., Koskinen, J. & Plewis, I. (2014): Measurement Error in Retrospective Work Histories. Survey Research Methods, 8, 43-55.
[34]	Poterba, J. M. & Summers, L. H. (1984): Response variation in the CPS: Caveats for the unemployment analyst. Monthly Labor Review, 107, 37-43.
[35]	Prentice, R. (1982): Covariate Measurement Errors and Parameter Estimation in a Failure Time Regression Model. Biometrika, 69, 331-342.
[36]	Rappaport, S. M. (1991): Assessment of Long-Term Exposures to Toxic Substances in Air. Annals of Occupational Hygiene, 35, 61-122.
[37]	Rappaport, S. M., Kromhouta, H. & Symanski, E. (1993): Variation of Exposure Between Workers in Homogeneous Exposure Groups. The American Industrial Hygiene Association Journal, 54, 654-662.
[38]	Roberts, C. (2007): Mixing modes of data collection in surveys: A methodological review. Economic and Social Research Council - National Centre for Research Methods, from: http://eprints.ncrm.ac.uk/418/.
[39] Rubin, D. C. (1987): Multiple Imputation for Nonresponse in Surveys. New York: John Wiley and Sons.
51
Jose Pina-Sanchez
[40]	Rubin, D. C. & Baddeley, A. D. (1989): Telescoping Is Not Time Compression: A Model. Memory & Cognition, 17, 653-661.
[41]	Skinner, C. & Humphreys, K. (1999): Weibull Regression for Lifetimes Measured with Error. Lifetime Data Analysis, 5, 23-37.
[42]	Skinner, C. (2000): Dealing with Measurement Error in Panel Analysis. In D. Rose (Ed): Researching Social and Economic Change, 113-125. New York: Routledge.
[43]	Solga, H. (2001): Longitudinal Survey and the Study of Occupational Mobility: Panel and Retrospective Design in Comparison. Quality and Quantity, 35, 291-309.
[44]	Stefanski, L. & Cook, J. (1995): Simulation-Extrapolation: The Measurement Error Jackknife. Journal of the American Statistical Association, 90, 1247-1256.
[45]	Theil, H. (1953): Repeated Least Squares Applied to Complete Equation Systems. The Hague: Central Planning Bureau.
[46]	Valaste, M., Lehtonen, R. & Vehkalahti, K. (2010): Multiple Imputation for Measurement Error Correction in Survey Data. Q2010 European Conference on Quality in Official Statistics, 89.
[47]	Vardeman, S. B., Wendelberger, J. R., Burr, T., Hamada, M. S., Moore, L. M., Jobe, J. M., Morris, M. D. & Wu, H. (2010): Elementary statistical methods and measurement error. The American Statistician, 64, 46-51.
Adjustment of Recall Errors in Duration Data Using SIMEX
52
Appendix I. R Script
Data Simulations
set.seed(10)
#The simulated variables
Y = rnorm(1000,0,1) Yca = ifelse(Y>=0,1,0)
Yco = ifelse(Y<0,0,ifelse(Y>=0&Y<1,1,ifelse(Y>=1&Y<2,2,ifelse(Y>=2&Y<3,3,4)))) X1 = exp((Y+4)*.5 + rnorm(1000,0,.25)) X2 = Y*.4 + rnorm(1000,0,1)
#The classical multiplicative measurement error
X1star = X1*rnorm(1000,1,.25)
#The heteroscedastic measurement error
U = seq(1:1000) for(i in 1:1000) {
U[i] = ifelse(X2[i] *rnorm( 1,1,.5)<0, rnorm( 1,1,15), rnorm( 1,1,.30)) }
U = ifelse(U<=0, .001, U)
plot(X2,U)
X1star = X1*U
#The systematic negative measurement error X1star = X1*rnorm(1000,.9,.25) #The systematic positive measurement error X1star = X1 *rnorm(1000,.1.1,.25)
The SIMEX Process
##Example assuming classical multiplicative measurement error, a linear outcome model, and known variance of the error term#############################################################################
#The outcome models.
lm.true = lm(Y ~ X1 + X2)
summary(lm.true)
lm.naive = lm(Y ~ X1 star + X2)
summary(lm.naive)
#Estimates of the impact of measurement error
sum.n = summary(lm.naive) sum.n = summary(lm.naive) biases = coef(lm.naive) - coef(lm.true)
#Matrices to save results from the SIMEX process.
results = matrix(c(0),nrow=12,ncol=12,byrow=TRUE) colnames(results) = c("lin.coef", "lin.bias", "lin.rbias")
rownames(results) = c("right.cons", "right.X1", "right.X2", "over.cons", "over.X1", "over.X2", "under.cons", "under.X1", "under.X2", "very.cons", "very.X1", "very.X2") SE = matrix(c(0),nrow=100,ncol=3,byrow=TRUE)
53
Jose Pina-Sanchez
##The SIMEX process####################################################################### #noise.5
noise.5 = matrix(c(0),nrow=1000,ncol=3,byrow=TRUE) for(i in 1:1000) {
ME.5 = rnorm(1000,1,.25)A.5 X1star.5 = X1star * ME.5 lm.noise.5 = lm(Y ~ X1star.5 + X2) noise.5[i,1] = coef(lm.noise.5)[1] noise.5[i,2] = coef(lm.noise.5)[2]
noise.5[i,3] = coef(lm.noise.5)[3]
}
avg.5_cons = mean(noise.5[,1]) avg.5_X1 = mean(noise.5[,2]) avg.5_X2 = mean(noise.5[,3])
#noise1
noise1 = matrix(c(0),nrow=1000,ncol=3,byrow=TRUE) for(i in 1:1000) {
ME1 = rnorm(1000,1,.25)A1 X1star1 = X1star * ME1 lm.noise1 = lm(Y ~ X1 star1 + X2) noise1[i,1] = coef(lm.noise1)[1] noise1[i,2] = coef(lm.noise1)[2]
noise1[i,3] = coef(lm.noise1)[3]
}
avg1_cons = mean(noise1[,1]) avg 1 _X1 = mean(noise1[,2]) avg1_X2 = mean(noise1[,3])
#noise1.5
noise1.5 = matrix(c(0),nrow=1000,ncol=3,byrow=TRUE) for(i in 1:1000) {
ME1.5 = rnorm(1000,1,.25)A1.5 X1star1.5 = X1star * ME1.5 lm.noise1.5 = lm(Y ~ X1 star1.5 + X2) noise1.5[i,1] = coef(lm.noise1.5)[1] noise1.5[i,2] = coef(lm.noise1.5)[2]
noise1.5[i,3] = coef(lm.noise1.5)[3]
}
avg1.5_cons = mean(noise1.5[,1]) avg1.5_X1 = mean(noise1.5[,2]) avg1.5_X2 = mean(noise1.5[,3])
#noise2
noise2 = matrix(c(0),nrow=1000,ncol=3,byrow=TRUE) for(i in 1:1000) {
ME2 = rnorm(1000,1,.25)A2 X1star2 = X1star * ME2 lm.noise2 = lm(Y ~ X1 star2 + X2) noise2[i,1] = coef(lm.noise2)[1] noise2[i,2] = coef(lm.noise2)[2]
noise2[i,3] = coef(lm.noise2)[3]
}
avg2_cons = mean(noise2[,1]) avg2_X1 = mean(noise2[,2]) avg2_X2 = mean(noise2[,3])
#I put the mean regression estimates from each level of simulated measurement error in a dataset
Adjustment of Recall Errors in Duration Data Using SIMEX
54
avg_noiseADJ_cons = NA avg_noiseADJ_X1 = NA avg_noiseADJ_X2 = NA lambda = c(-1, 0, .5, 1, 1.5, 2)
addi1 = c(avg_noiseADJ_cons, coef(lm.naive)[1], avg.5_cons, avg1_cons, avg1.5_cons, avg2_cons) addi2 = c(avg_noiseADJ_X1, coef(lm.naive)[2], avg.5_X1, avg1_X1, avg1.5_X1, avg2_X1) addi3 = c(avg_noiseADJ_X2, coef(lm.naive)[3], avg.5_X2, avg1_X2, avg1.5_X2, avg2_X2) SIMEX = data. frame(lambda, addi1, addi2, addi3) names(SIMEX) = c("lambda","cons","X1","X2")
#I obtain the adjusted SIMEX estimates using a linear extrapolation function SIMEXna = SIMEX[-1,]
SIMEX_cons = lm(SIMEXna$cons ~ SIMEXna$lambda) SIMEX[1,2] = coef(SIMEX_cons)[1] + (-1)*coef(SIMEX_cons)[2] SIMEX_X1 = lm( SIMEXna$X1 ~ SIMEXna$lambda) SIMEX[1,3] = coef(SIMEX_X1)[1 ] + (-1)*coef(SIMEX_X1)[2] SIMEX_X2 = lm( SIMEXna$X2 ~ SIMEXna$lambda) SIMEX[1,4] = coef(SIMEX_X2)[1] + (-1)*coef(SIMEX_X2)[2]
#I save the adjusted estimates and the remaining bias
results[1,1] = SIMEX[1,2] results[2,1] = SIMEX[1,3] results[3,1] = SIMEX[1,4] bias1 = SIMEX[1,2] -coef(lm.true)[1 ] bias2 = SIMEX[1,3]-coef(lm.true)[2] bias3 = SIMEX[1,4] -coef(lm.true)[3] results[1,2] = bias1 results[2,2] = bias2 results[3,2] = bias3
#I calculate the R.BIAS
R.BIAS.1 = (abs(coef(lm.naive)[1 ] -coef(lm.true)[1])*100)/abs(coef(lm.true)[1]) R.BIAS.adj.1 = (abs(SIMEX[1,2]-coef(lm.true)[1])*100)/abs(coef(lm.true)[1]) results[1,3] = R.BIAS.adj.1 / R.BIAS.1
R.BIAS.2 = (abs(coef(lm.naive)[2]-coef(lm.true)[2])*100)/abs(coef(lm.true)[2]) R.BIAS.adj.2 = (abs(SIMEX[1,3]-coef(lm.true)[2])*100)/abs(coef(lm.true)[2]) results[2,3] = R.BIAS.adj.2 / R.BIAS.2
R.BIAS.3 = (abs(coef(lm.naive)[3]-coef(lm.true)[3])*100)/abs(coef(lm.true)[3]) R.BIAS.adj.3 = (abs(SIMEX[1,4]-coef(lm.true)[3])*100)/abs(coef(lm.true)[3]) results[3,3] = R.BIAS.adj.3 / R.BIAS.3
##The bootstrap process to calculate the standard errors obtained from the SIMEX adjustment##############
#The double-loop
for(l in 1:100){
boot = data[sample(1:nrow(data), 1000, replace=TRUE),]
#noise.5
noise.5 = matrix(c(0),nrow=100,ncol=3,byrow=TRUE) for(i in 1:100){
boot$ME.5 = rnorm(1000,1,.25)A.5 boot$X1star.5 = boot$X1star * boot$ME.5 lm.noise.5 = lm(Y ~ X1star.5 + X2, data=boot) noise.5[i,1] = coef(lm.noise.5)[1] noise.5[i,2] = coef(lm.noise.5)[2]
noise.5[i,3] = coef(lm.noise.5)[3]
}
avg.5_cons = mean(noise.5[,1]) avg.5_X1 = mean(noise.5[,2]) avg.5_X2 = mean(noise.5[,3])
55
Jose Pina-Sanchez
#noise1
noisel = matrix(c(0),nrow=100,ncol=3,byrow=TRUE) for(i in 1:100){
boot$ME1 = rnorm(1000,1,.25)A1 boot$X1star1 = boot$X1star * boot$ME1 lm.noise1 = lm(Y ~ X1star1 + X2, data=boot) noise1[i,1] = coef(lm.noise1)[1] noise1[i,2] = coef(lm.noise1)[2]
noise1[i,3] = coef(lm.noise1)[3]
}
avg1_cons = mean(noise1 [,1]) avg1_X1 = mean(noise1[,2]) avg1_X2 = mean(noise1[,3])
#noise1.5
noise1.5 = matrix(c(0),nrow=100,ncol=3,byrow=TRUE) for(i in 1:100){
boot$ME1.5 = rnorm(1000,1,.25)A1.5 boot$X1star1.5 = boot$X1star * boot$ME1.5 lm.noise1.5 = lm(Y ~ X1star1.5 + X2, data=boot) noise1.5[i,1] = coef(lm.noise1.5)[1] noise1.5[i,2] = coef(lm.noise1.5)[2]
noise1.5[i,3] = coef(lm.noise1.5)[3]
}
avg1.5_cons = mean(noise1.5[,1]) avg1.5_X1 = mean(noise1.5[,2]) avg1.5_X2 = mean(noise1.5[,3])
#noise2
noise2 = matrix(c(0),nrow=100,ncol=3,byrow=TRUE) for(i in 1:100){
boot$ME2 = rnorm(1000,1,.25)A2 boot$X1star2 = boot$X1star * boot$ME2 lm.noise2 = lm(Y ~ X1star2 + X2, data=boot) noise2[i,1] = coef(lm.noise2)[1] noise2[i,2] = coef(lm.noise2)[2]
noise2[i,3] = coef(lm.noise2)[3]
}
avg2_cons = mean(noise2[,1]) avg2_X1 = mean(noise2[,2]) avg2_X2 = mean(noise2[,3])
#I save the adjusted estimates and the remaining bias
avg_noiseADJ_cons = NA avg_noiseADJ_X1 = NA avg_noiseADJ_X2 = NA lambda = c(-1, 0, .5, 1, 1.5, 2)
addi1 = c(avg_noiseADJ_cons, coef(lm.naive)[1], avg.5_cons, avg1_cons, avg1.5_cons, avg2_cons) addi2 = c(avg_noiseADJ_X1, coef(lm.naive)[2], avg.5_X1, avg1_X1, avg1.5_X1, avg2_X1) addi3 = c(avg_noiseADJ_X2, coef(lm.naive)[3], avg.5_X2, avg1_X2, avg1.5_X2, avg2_X2) SIMEX = data.frame(lambda, addi1, addi2, addi3) names(SIMEX) = c("lambda","cons","X1","X2")
#I obtain the adjusted estimate using a linear extrapolation function
SIMEXna = SIMEX[-1,]
SIMEX_cons = lm(SIMEXna$cons ~ SIMEXna$lambda) SIMEX[1,2] = coef( SIMEX_cons) [ 1 ] + (-1)*coef(SIMEX_cons)[2] SIMEX_X1 = lm(SIMEXna$X1 ~ SIMEXna$lambda) SIMEX[1,3] = coef(SIMEX_X1)[1] + (-1)*coef(SIMEX_X1)[2]
Adjustment of Recall Errors in Duration Data Using SIMEX
56
SIMEX_X2 = lm( SIMEXna$X2 ~ SIMEXna$lambda)
SIMEX[1,4] = coef(SIMEX_X2)[1] + (-1)*coef(SIMEX_X2)[2]
#I save the SIMEX adjustment for each of the 100 bootstrap iterations
SE[l,1] = SIMEX[1,2]
SE[l,2] = SIMEX[1,3]
SE[l,3] = SIMEX[1,4] }
#I obtain the standard errors
SE1 = sd(SE[,1]) SE2 = sd(SE[,2]) SE3 = sd(SE[,3])
57
Jose Pina-Sanchez
Appendix II. Illustrations of the SIMEX Process
Figure A1 shows the effect of the increased levels of simulated measurement error on X using <Ty~(0, .25) and Xk = (0.5,1,1.5,2).
Figure Al: Scatterplots of Xi and increasing levels of measurement error
Figure A2 shows the extrapolation functions for when the outcome model is linear and X1 is affected by classical multiplicative measurement. Each of the plots represents one of the four scenarios where different reliability ratios were assumed.
Adjustment of Recall Errors in Duration Data Using SIMEX
58
Figure A2: Extrapolation functions for the linear model
Metodološki zvezki, Vol. 13, No. 1,2016,59-67
Odds Ratio, Hazard Ratio and Relative Risk
Janez Stare1 Delphine Maucort-Boulch2
Abstract
Odds ratio (OR) is a statistic commonly encountered in professional or scientific medical literature. Most readers perceive it as relative risk (RR), although most of them do not know why that would be true. But since such perception is mostly correct, there is nothing (or almost nothing) wrong with that. It is nevertheless useful to be reminded now and then what is the relation between the relative risk and the odds ratio, and when by equating the two statistics we are sometimes forcing OR to be something it is not. Another statistic, which is often also perceived as a relative risk, is the hazard ratio (HR). We encounter it, for example, when we fit the Cox model to survival data. Under proportional hazards it is probably "natural" to think in the following way: if the probability of death in one group is at every time point k-times as high as the probability of death in another group, then the relative risk must be k, regardless of where in time we are. This could be hardly further from the truth and in this paper we try to dispense with this blunder.
1 Introduction
1.1 Relative risk
In medical studies, probability of seeing a certain event in some group is usually called risk, while epidemiologists might prefer the term incidence (Savitz, 1992). For comparison of risks between groups, the ratio of risks, or the relative risk, is a statistic of choice. Formally, if n is the probability of the event in group 1, and n2 is the probability of the event in group 2, then the relative risk is
RR = n. n2
The reason of preferring relative risk over the difference of risks
RD = n — n2
lies in the fact that the population risks of most diseases are rather small and so differences less dramatic (Walter, 2000). For example, if the probability of some cancer in one group is 0.001, and in the other 0.009, the difference is 0.008 (same as between 0.419 and 0.411), but the relative risk is 9!
1	Department of Biostatistics and Medical Informatics, University of Ljubljana, Slovenia; janez.stare at mf.uni-lj.si
2	Service de Biostatistique, Hospices Civils de Lyon, Lyon, France; delphine.maucort-boulch at chu-lyon.fr
60
J. Stare and D. Maucort-Boulch
Table 1: Probability of death among men and women on the Titanic.
Sex	Died	Survived	Risk
men	1364	367	1364/1731 = 0.79
women	126	344	126/470 = 0.27
Table 1 provides an example where the event, unfortunately, was not rare. The relative risk of death of men compared to women is
RR = 079 = 2.93. 0.27
1.2 Odds ratio
The other statistics, commonly encountered in medical literature, is the odds ratio (Bland and Altman, 2000). Odds are the ratio of the probability of an event occurring in a group, divided by the probability of that event not occurring
odds n
1 - n
For example, if probability of death in a group is 0.75, the odds are equal to 3, since the probability of death is three times higher than the probability of surviving. Table 2 gives the odds among men and women on the Titanic.
Table 2: Odds for death among men and women on the Titanic, n denotes the probability of
death.
	Death	Survival	
Sex	n	1 - n	Odds
men	0.79	0.21	3.76
women	0.27	0.73	0.37
If risk was the same in both groups, the odds would be equal. A comparison of odds, the odds ratio, might then make sense.
OR = ^
1—n2
Odds ratio for the Titanic example is
OR =376 = 10.16. 0.37
This is very different from the relative risk calculated on the same data and may come as a surprise to some readers who are accustomed of thinking of odds ratio as of relative risk (Greenland, 1987).
Odds Ratio, Hazard Ratio and Relative Risk
61
Since we already have relative risk, why would we want to calculate the odds ratio? The answer is not obvious and it is best explained via an example (Nurminen, 1995).
Case-control studies are quite common in medical studies. In these we select a sample of patients and a sample of controls, and study occurrence of some factor, hopefully predictive, in the two groups. The reason for collecting data in such a way is that it takes a long time and big sample sizes to do a follow up study, that is a study in which two groups, with and without a factor, are followed long enough for a disease to appear in numbers large enough to do statistical tests with acceptable power.
Table 3 shows fictional data on prostate cancer and baldness. We see that of the 129 cases, 72 were bald, and 55 were not, while among the 139 controls 82 were bald. Let us remind ourselves that in order to calculate the relative risk between the two groups we would need probabilities of cancer occurring, so probability to have cancer for bald and not bald people. It may seem natural to estimate these probabilities as -24 and -55, and so RR as
Table 3: Prostate cancer and baldness
	Case	Control	total
bald	72	82	154
not bald	55	57	112
total	129	139	268
72
RR = -54 = 0.95, H2
but is this correct?
It is very important to understand that this is not correct. Since we randomly chose cases and controls, we can estimate probabilities of observing baldness (or not) among them; but NOT the probabilities of observing cancer among the bald (and not bald) people. This means that in a study like this we CANNOT calculate the relative risk.
2 Relative risk and odds ratio (RR in OR)
The literature dealing with the relation between relative risk and odds ratio is quite extensive (some examples are (Davies et al., 1998; Deeks, 1998; Newman, 2001; Nurminen, 1995; Pearce, 1993; Savitz, 1992; Zhang and Yu, 1998)). We still hope that the derivation below will be useful.
Table 4 gives a 2x2 table in general notation. Using this notation we have
"11
RR = n11+n12 = n11 n21 + n22
and
OR
nsTfe n21 n11 + n12
ni	nii/(nn+ni2)
1-ni _ ni2/(nii+ni2) _ n11n22
"2i/("2i+"22) n12n21
1 n2 n22/(«2i+«22)
62
J. Stare and D. Maucort-Boulch
Table 4: A 2x2 table in general notation.
	Outcome		
Factor	Death (Case)	Survival (Control)	Total
yes	«11	«12	«11 + «12
no	«21	«22	«21 + «22
total	«11 + «21	«12 + «22	«
Let us now multiply one column, say cases, by k. Then we have
knn kn2i + «22 nn kn2i + «22
RR = km ■ knurm = n2i ■ knurm
and
OR
k«11«22 «11«22
ni2kn2i ni2n2i
We see that the 'relative risk' is now different, but the odds ratio does not change if we change the ratio of cases versus controls. Until now we have learned the following:
1.	we can calculate relative risk IF we can estimate probabilities of an outcome in EACH group.
2.	we can't do that in case control studies.
3.	we can calculate the odds ratio even if we don't know the probabilities in the groups.
It would then be nice, if odds ratio was close to relative risk.
Let us now look at the relation between the relative risk and the odds ratio (Zhang and Yu, 1998).
OR = i-ni = ^ ■ 1-12 = rr ■ 1-12	(2.1)
I-t ^2 1 - ni	1 - ni	^
From this we see that OR is always further away from 1 than RR. But, more importantly, we see that the odds ratio is close to the relative risk if probabilities of the outcome are small (Davies et al., 1998). And it is this fact that enables us, most of the time, to approximate the relative risk with the odds ratio. Table 5 below illustrates the relationship between RR and OR for some probabilities of the outcome.
3 Relation between RR and HR
If one searches the Internet for the relation between the hazard ratio and the relative risk, one will one will predominantly find statements that tell us that these two statistics are more or less equal (Nurminen, 1995). For example, the Glossary at the British Medical Journal site http://clinicalevidence.bmj.com/ceweb/resources/glossary.jsp says
Odds Ratio, Hazard Ratio and Relative Risk
63
Table 5: Examples of RR and OR for different probabilities.
ni	n2	RR	OR
.4	.1	4	6
.2	.3	.67	.58
.04	.01	4	4.125
.02	.03	.67	.66
Hazard ratio (HR)
Broadly equivalent to relative risk (RR); useful when the risk is not constant with respect to time. It uses information collected at different times. The term is typically used in the context of survival over time. If the HR is 0.5 then the relative risk of dying in one group is half the risk of dying in the other group.
The same site has the following definition for Relative Risk Relative risk (RR)
The number of times more likely (RR > 1) or less likely (RR < 1) an event is to happen in one group compared with another. It is the ratio of the absolute risk (AR) for each group. It is analogous to the odds ratio (OR) when events are rare.
Relative risk is calculated as the absolute risk (AR) in the intervention group divided by the AR in the control group.
It would seem that the claim above about HR and RR is generally accepted as correct, although we couldn't find any derivation supporting it. Some of the confusion might be caused by esteemed authors who, in trying to avoid a somewhat unfortunate name proportional hazards model, call such models relative risk models (Kalbfleisch and Prentice, 2002). It is of course obvious that by risk they are referring to the conditional probability of dying in a small interval, so r(t) = P(t < T < t + At|T > t), but the ratio of such risks is not what people usually understand under the term relative risk, since relative risk is about absolute and not conditional probabilities..
So most of the confusion, or wrong perception, probably comes from this 'natural' line of thought: if hazard ratio is k at all times, then the relative risk must be k at all times. And this is of course wrong.
Relative risk (RR) is a ratio of two probabilities: probability of an event in one group divided by the probability of the same event in the other group. When studying survival, we have to explicitly state in which time interval we are calculating this probability. So, for a given time t, the relative risk is
RR(t)= P(T < t|X = xi) RR(t) = P(T < tlX = x2)
where x1 and x2 are values of the covariate X defining the two groups (male, female for example).
64
J. Stare and D. Maucort-Boulch
Hazard ratio is a ratio of two hazard functions
HR« = i^y
and we remind the reader that the hazard function is defined as
w , , P(t < T <t + At\T > t,X = x)
A(t,x) = iim ---
v ; Ai^0+	At
and that hazard is connected to the survival function via the following formula
S (t, x) = e-ti X(u,x)du.
Since
S(t, x) = P(T > t\X = x) = 1 - P(T < t\X = x)
we can write
RR(t)= 1 - S(t,xi) = 1 - e-/q A(",xi)d" (t) 1 - S(t,x2) 1 - e- /0 A(«.x2)d« •
It is difficult to argue that equations (3.1) and (3.2) are similar, but let's try. Sometimes, like in the comparison between the Kaplan-Meier and the Nelson-Aalen estimate of the survival function, the following argument is brought into play
e-x « 1 - x.	(3.3)
This comes from the Taylor series expansion of the function e-x around the value 0
x2
e-x = 1 - x + —----
Obviously, approximation (3.3) makes sense only for very small values of x. Note that our values of x are /0 A(u, xi)du and /0 A(u, x2)du, which are cumulative hazards, increasing without limits when t increases. Such an approximation will never hold, except for early times in a survival study. Applying it (wrongly!) to formula (3.2) in case of proportional hazards, so when A(u, x1) = kA(u, x2), would make formulas (3.1) and (3.2) equal.
4 Illustration
For easier understanding in Table 6 we give detailed calculations for two groups for the first two times in a possible series of discrete event times and with proportional hazards.
So we have
RR(t2) = k(Pi + P2 - kpip2) 2 Pi + P2 - P1P2 which is NOT equal to k, but can be close for small probabilities and small k. As time passes, RR is further and further away from HR.
Another example is illustrated in Figure 1 and Table 7. Data were simulated from two exponential distributions with HR = 3 and with 500 cases in each group. We see that only at the first point, close to t = 0, the estimate is around 3. Later it quickly diminishes and is already halved at t = 1 .
Odds Ratio, Hazard Ratio and Relative Risk	65
Table 6: Calculation of relative risk for at two discrete time points. Hazards are proportional
and equal to k.
	ti	t2
probability of event in group 1	Pi	P2
probability of event in group 2	kpi	k'P2
probability of survival in group 1	1 — pi	(1 — pi)(1 — P2)
probability of survival in group 2	1 — kpi	(1 — kPi)(1 — k'P2)
probability of event up to t in group 1	Pi	1 — (1 — Pi)(1 — P2)
probability of event up to t in group 2	kpi	1 — (1 — kPi)(1 — kP2)
RR up to given time	— — k pi	i (i kpi )(i kp2) i (i Pi)(i P2 )
Table 7: Calculation ofRR at three different time points for the situation illustrated in
Figure 1
time	RR
0.1	3.15
0.5	2.12
1.0	1.51
ca
<D
oo
C3
CO C3
C3
C3
C3 C3
~T 0
~r 2
~r 8
Time
Figure 1: Two exponential curves with HR = 3.
66
J. Stare and D. Maucort-Boulch
5 Discussion
In our experience, equating odds ratios with relative risk has become too common, and results, even when probabilities of events are not small, are always interpreted as relative risks (Deeks, 1998; Greenland, 1987; Nurminen, 1995). Having odds ratios as a result of logistic regression fits of course adds to this. We believe that, in case the assumption of a rare event cannot be supported, an effort should be made to estimate relative risk correctly (if possible), or to at least give some estimates, using formula (2.1), for different values of n1 and n2.
It is of course possible that many of the claims about the similarity between HR and RR are made with small intervals in mind. If so, then this should be made very clear when such a statement is made (still, the question why that would be of interest, would remain). The above example from the British Medical Journal site certainly isn't clear about this.
Of course, simply stating that one has small intervals in mind is still not enough. One has to explicitly say that he/she has conditional probabilities in mind as the definition
= P (t<T < t + At\X = xi) (t) P(t < T < t + At\X = X2)
is still NOT the hazard ratio, as it is not a ratio of conditional probabilities.
Maybe the easiest way to understand that a hazard ratio cannot be equal to the relative risk for any time t is to realize that eventually everybody dies, so the relative risk will approach 1 with time, even though the hazard ratio is constant.
References
[1]	Beaudoin, G. (2014): Meeting the information needs of news media to increase citizens' understanding of statistical findings. Paper presented at Work Session on the Communication of Statistics. UNECE.
[2]	Bland, J.M. and Altman, D.G. (2000): The odds ratio. British Medical Journal, 320, 1468.
[3]	Davies, H.T., Crombie, I.K., and Tavakoli, M. (1998): When can odds ratios mislead? British Medical Journal, 316, 989-991.
[4]	Deeks, J. (1998): When can odds ratios mislead? British Medical Journal, 317, 1155-1156.
[5]	Greenland, S. (1987): Interpretation and choice of effect measures in epidemiologic analyses. American Journal of Epidemiology, 125, 761-768.
[6]	Kalbfleisch, J.D. and Prentice, R.L. (2002): The Statistical Analysis of Failure Time Data. John Wiley & Sons.
[7]	Newman, S.C. (2001): Biostatistical Methods in Epidemiology. John Wiley & Sons.
Odds Ratio, Hazard Ratio and Relative Risk
67
[8]	Nurminen, M. (1995): To use or not to use the odds ratio in epidemiologic studies? European Journal of Epidemiology, 11, 365-371.
[9]	Pearce, N. (1993): What does the odds ratio estimate in a case-control study? International Journal of Epidemiology, 22, 1189-1192.
[10]	Savitz, D.A. (1992): Measurements, estimates, and inferences in reporting epidemiologic study results. American Journal of Epidemiology, 135, 223-224.
[11]	Walter, S.D. (2000): Choice of effect measure for epidemiological data. Journal of Clinical Epidemiology, 53, 931-939.
[12]	Zhang, J. and Yu, K.F. (1998): What's the relative risk? A method of correcting the odds ratio in cohort studies of common outcomes. The Journal of the American Medical Association, 280, 1690-1691.
INSTRUCTIONS TO AUTHORS
Language: Metodološki zvezki - Advances in Methodology and Statistics is published in English.
Submission of papers: Authors are requested to submit their articles (complete in all respects) to the Editor by e-mail (MZ@stat-d.si). Contributions are accepted on the understanding that the authors have obtained the necessary authority for publication. Submission of a paper will be held to imply that it contains original unpublished work and is not being submitted for publication elsewhere. Articles must be prepared in LaTeX or Word. Appropriate styles and example files can be downloaded from the Journal's web page (http://www.stat-d.si/mz/).
Review procedure: Manuscripts are reviewed by two referees. The editor reserves the right to reject any unsuitable manuscript without requesting an external review.
Preparation of manuscripts
Tables and figures: Tables and figures must appear in the text (not at the end of the text). They are numbered in the following way: Table 1, Table 2,..., Figure 1, Figure 2,...
References within the text: The basic reference format is (Smith, 1999). To cite a specific page or pages use (Smith, 1999: 10-12). Use "et al." when citing a work by more than three authors (Smith et al., 1999). The letters a, b, c etc. should be used to distinguish different citations by the same author(s) in the same year (Smith, 1999a; Smith, 1999b).
Notes: Essential notes, or citations of unusual sources, should be indicated by superscript number in the text and corresponding text under line at the bottom of the same page.
Equations: Equations should be centered and labeled with two numbers separated by a dot enclosed by parentheses. The first number is the current section number and the second a sequential equation number within the section, e.g., (2.1)
Author notes and acknowledgements: Author notes identify authors by complete name, affiliation and his/her e-mail address. Acknowledgements may include information about financial support and other assistance in preparing the manuscript.
Reference list: All references cited in the text should be listed alphabetically and in full after the notes at the end of the article.
References to books, part of books or proceedings:
[1]	Smith, J.B. (1999): Title of the Book. Place: Publisher.
[2]	Smith, J.B. and White A.B. (2000): Title of the Book. Place: Publisher.
[3]	Smith, J. (2001): Title of the chapter. In A.B. White (Ed): Title of the Proceedings, 14-39. Place: Publisher.
Reference to journals:
[4]	Smith, J.B. (2002): Title of the article. Name of Journal, 2, 46-76.
Metodološki zvezki
Advances in Methodology and Statistics
Published by Faculty of Social Sciences University of Ljubljana, for Statistical Society of Slovenia
Editors
Izdajatelj
Fakulteta za družbene vede Univerze v Ljubljani za Statistično društvo Slovenije
Urednika
Valentina Hlebec Lara Lusa
Founding Editors
Cover Design
Anuška Ferligoj Andrej Mrvar
Bojan Senjur Gregor Petrič
Prva urednika
Oblikovanje naslovnice
Typesetting
Lara Lusa
Računalniški prelom
Printing
Tisk
Littera Picta d.o.o. Ljubljana, Slovenia
MZ
is indexed and abstracted in
je indeksirana in abstrahirana v
SCOPUS EBSCO ECONIS STMA-Z ProQuest
Home page URL
Spletna stran
http://www.stat-d.si/mz/
ISSN 1854 - 0023