Metodološki zvezki, Vol. 5, No. 2, 2008, 95-111 On Covariance Estimation when Nonrespondents are Subsampled Wojciech Gamrot1 Abstract The phenomenon of nonresponse in a sample survey reduces the precision of parameter estimates and introduces the bias. Several procedures have been developed to compensate for these effects. An important technique is the two-phase (or double) sampling scheme which relies on subsampling the nonrespondents and re-approaching them in order to obtain the missing data. This paper focuses on the application of double sampling to estimate the finite population covariance. Two covariance estimators using combined data from the initial sample and the subsample are considered. Their properties are derived. Two special cases of the general procedure are discussed. 1 Introduction The estimation of covariances between individual population characteristics and of the covariance matrix as a whole in the nonresponse situation has enjoyed vivid interest for years. The principle of maximum likelihood (ML) plays a prominent role in constructing estimators for covariance matrices as well as for individual covariances. Finkbeiner (1979) applies Fletcher-Powell optimum-seeking algorithm for obtaining ML estimates of covariance matrix for factor analysis. Lee(1986) proposes the estimation procedure under which missing values are viewed as latent variables and estimators are obtained under normality assumptions via generalized least squares and ML using Fisher scoring (iteratively reweighted Gauss-Newton algorithm). A coherent theory on the use of maximum likelihood principle to estimate the covariance matrix under some model describing the distribution of population values is given by Little and Rubin (1987). The Expectation-Maximization (EM) algorithm by Dempster, Laird and Rubin (1977) is contemplated as a standard method for dealing with nonresponse in a wide range of cases with special emphasis on the normal distribution. This approach is further developed in subsequent papers. Woodruff (1990) considers 1 Wojciech Gamrot, Department of Statistics, University of Economics, Bogucicka 14, 40-226 Katowice, Poland; gamrot@ae.katowice.pl. alternative estimator based on the EM algorithm with additional restrictions expressed by the regression superpopulation model. Schneider (2001) proposes a regularized EM algorithm utilizing ridge regression, for the case where number of variables exceeds the sample size. Bentler and Liang (2003) discuss the application of an EM-type gradient algorithm for maximum likelihood estimation in the context of two-level structural equation modelling. Jamshidian (1997) applies EM for confirmatory factor analysis. Jamshidian and Bentler (1999) explore the possibilities of using several complete-data algorithms (EM, Fletcher-Powell and Fisher's scoring) to provide maximum likelihood estimates of the covariance matrix with missing data. Under a competing approach Van Praag, Dijkstra and VanVelzen (1985) construct an asymptotically distribution-free (ADF) estimator of the covariance matrix based on linear regression. Arminger and Sobel (1990) abandon normality constraints and use pseudo-maximum likelihood (PML) method by Arminger and Schoenberg (1989) to construct estimators of covariance matrices while Yuan and Bentler (1995) provide theoretical justification for the use of PML in the context of non-normal distributions. Gold, Bentler and Kim (2003) compare ADF estimators with maximum likelihood estimators in the context of structural equation modelling. Significant attention is also devoted to the use of various imputation methods in covariance estimation. Brown (1994) compares estimators for listwise deletion, pairwise deletion and mean imputation. Kline (1998) compares the properties of estimators computed under mean imputation, regression imputation and pattern matching. Schafer and Olsen (1998) employ a method of multiple imputation invented by Rubin (1987). They develop data augmentation algorithm and provide justification for its use on the grounds of Bayesian theory. Data augmentation is also considered in the paper of Graham and Hofer (2000) and compared with the EM method. The comparison of estimators for listwise and pairwise deletion, mean imputation and full-information maximum likelihood is given by Wothke (2000). In this paper the problem of estimating individual covariance between two population characteristics is discussed under the quasi-randomization approach (Oh and Scheuren (1983)). It is assumed that the nonresponse is a random phenomenon but population values are fixed and they are not subject to modeling as opposed to most of the papers referenced above. Let us consider some characteristics X and Y in finite population U of size N. Fixed values of these characteristics are respectively denoted by x1,...,xN and y1,...,yN. The aim of the survey is to estimate some functions of these values called population parameters. Let the population be surveyed according to the following general two-phase sampling procedure. In the first phase a random sample s of size n is drawn from U according to some arbitrary sampling design p(s) determining individual inclusion probabilities of the first order pi = ^ s3i p(s) and of the second order n^ = ^ s3ij p(s), for i^jeU. We assume that nonresponse appears in the survey, and consequently some units respond while others do not. The sample s may consequently be divided Into two disjoint subsets s1 and s2 such that units from si respond and units from s2 do not. Following Cassel et al. (1983) we assume that nonresponse is a random event governed by some probability distribution q(s1 | s) usually referred to as response distribution (see e.g. Särndal et al. 1992). In general, it is defined conditionally with respect to s in order to reflect the interactions between sampled units. The response distribution determines individual response probability of any i-th unit Pi|s = X si 'iq(s1|s) and joint response probability of i-th and j-th unit Pij|s = Xsi'ijq(s1|s) for i^jeU. The distribution q(s1 | s) also determines the behaviour of the random set s2 which may be expressed as a function of s and s1, so for any s2=s-s1 we have q(s11 s) = q(s21 s) = q(s1,s21 s). The second phase of the survey is then carried out and in this second phase a subsample s' of size n' is drawn from s2 according to the sampling design p'(s'|s,s2) which is characterized by another set of inclusion probabilities of the first order ni|s = ^s,3i p'(s'|s,s2) and second order rc^ = ^s,3ij p'(s'|s,s2) i^jeU. We assume that necessary efforts are undertaken in the second phase that guarantee obtaining complete responses from all subsampled units. In the setup described above three sources of sample randomness were defined, each of them respectively associated with probability distribution p(s), q(s1|s) and p'(s'|s,s2). All expectations will be computed jointly with respect to these three probability distributions unless otherwise stated. 2 Estimation of the population total Let us consider the population total of X (the same may be defined for Y or any other characteristic): tx = Z Xi (2.1) X ieU Under complete response it is unbiasedly estimated by the Horvitz-Thompson (1952) statistic: i X, tx = (2.2) X - ies n In particular, by putting xi=1 for ieU we obtain an unbiased estimator of the population size N in the form: N = X - (2.3) Both estimators above are generally biased under nonresponse. As an example consider deterministic nonresponse model, according to which the population is divided into two strata: U1 and U2, of sizes N1 and N2 such that pi=1 for ieU1 and pi=0 otherwise. If the sample is drawn using simple random sampling without replacement (SRSWOR), and hence inclusion probabilities of the first and second order are respectively equal to ni=n/N and nij=n(n-1)/(N(N-1)) for i^jeU then the estimator turns out to be biased and its bias is equal to: B(tx) = txu -tx = -£iîU2Xi (2.4) where txU = Z iîUxi . The bias does not depend on the sample size and hence it does not tend to zero with growing n. However, using the data from both phases we can construct an unbiased estimator of tx in the form (Särndal et al 1992): t x = Z ^ + Z (2.5) ies! ni ies' ninis,s2 Putting xi=1 for ieU we again obtain an unbiased estimator of N: N •= Z - + Z —— (2.6) ies! ni ies' ninis,s2 These estimators will be used further as building blocks for more complicated estimation procedures. 3 Estimation of the population covariance Let us consider the population covariance between X and Y defined by the expression: CT i ( i Y i Ï (X, Y) = — V xi - - y x, yi - - y y, N -1 ieU v i NjeU jyv i NjeU jy (3.1) or by the equivalent formula: 1 Cu(X,Y) = —-1 tt (3.2) N - 1 xy N(N - 1) x y where txy = Z ieU^^ tx = Z ieUxi and ty = Z ieU y>. Under complete response the covariance is often estimated by respective statistics: C1(X,Y) = N-r 'xy- N(N~-1) Vty (3.3) 1 and C2(X,Y) 1 NN -1 -t„, - 1 N(N -1) t t (3.4) where unknown population totals are replaced with corresponding Horvitz-Thompson estimators. As indicated by Särndal et al (1992), the latter is usually preferred to the former due to better variance properties. These covariance estimators are however biased under nonresponse, As a special case let us consider SRSWOR and deterministic nonresponse. For large sample size approximate biases of both estimators respectively take the form: where AB(C (X, Y)) = CU (X, Y) - CU(X, Y) AB(C2(X, Y)) = CUi#(X, Y) - Cu (X, Y) 11 Cu (X, Y) =-1 U1 N -1 CuJX,Y): xyU1 N(N - 1)txU1tyU1 1 N1 -1 txyU1 AT /-AT ntxU1tyU1 N1(N -1) (3.5) (3.6) (3.7) (3.8) while txU1 = Z i^^ tyU1 = Z isU^i and txyU1 = Z ^ ^i . Hence the bias d°es not tend to zero when n grows in an apparent analogy to the expression (2.4). In order to correct for the bias we propose to replace Horvitz-Thompson estimators with their unbiased double sampling counterparts. This leads to two alternative estimators of the population covariance: and C .1(X, y) = —^t Xy —-1—-t xt y • N -1 xy N(N -1) x y C2(X,Y) = ^—t X --—J-1 *t * ; NN •-1 xy N • (N •-1) xy (3.9) (3.10) Using Taylor linearization we obtain the approximate variance of C.1(X, Y) : AV(C .1 (X, Y)) = where (N -1)2 Z uiuj i, jîU n, n, V J j + E„ u uj n n ij s,s2 V71 is,s^js,s2 ))) u, = (x, - X)(y, - Y) - XY (3.11) (3.12) while X = tx / N and Y = ty /N. The approximate bias may be expressed in the form: 1 1 1 1 AB(C .1 (X, Y)) = 1 N2(N -1)2 Z xiyj i,jîU ^ -1 n j % , V 1 j + E„ Z x y, % % \W lj s,s2 - 1 V 1 s,s2 js,s2 J JJ (3.13) It is worth noting, that this approximation of the bias is obtained by expanding the estimator in Taylor series including terms up to the second order, as opposed to more crude approximations based only on first-order terms. The symbols AB and AV are used to distinguish linearization-based approximations from the exact bias and variance. Using the same method we obtain the approximate variance of C,2(X, Y) in the form: c / \ / « « / \\\ AV(C .2(X,Y)) = 1 (N -1)2 E* * u,u* i,JîU -, -1 V i j + E_ y u^u, i,jîs2 ninj ijs,s2 V^ is,s^js,s2 J J 1 (3.14) where u* = (xi - X)(yi - Y) - Cu(X,Y) and its second-order bias: 1 (3.15) AB(C .2(X,Y)) = -where N2(N -1)2 T i,JîU u -îiL -1 V i j + E„ T u V i,JÎs2 i j ij|s,s2 -1 V^ ils,s^jls,s2 J Jy (3.16) u* = 1n2(ui + u,) + -2n(N- 1)((xi -x,)(yi -y,) + u + u, + 2Cu(X,Y)) (3.17) The symbol Epq() appearing in expressions above represents the expectation with respect to first-phase sampling design p(s) and to the response distribution q(s1|s). On assumed level of generality it is impossible to eliminate them from AV formulas but we may achieve this by making additional assumptions concerning the second phase sampling design and response distribution. The example is shown in the next section. On the other hand, the approximate variances may be estimated, without any assumptions, by employing the approach of Särndal et al (1992) which leads to respective statistics: V(C(X, Y) = 1 (N -1)2 u-u- II * V i,jes1usl nij v ninj y + I u-u- jîs' ninjnijs,s2 n \\ ij s,s2 V^ is,s^js,s2 )) (3.18) * 1 1 and V(C .2(X,Y)) = where and (N -1)2 ~ * / u- u- Z1 * ^ijesj us' n ij v ninj y + Z u- u- j jÎs i j ijs,s2 n \\ ij s,s2 f u- t • \ x;-- v i N y t yi • A y N u = . \ x- i NN * f v y N2 tx yi - -^r v n* C * 2(X,Y) / n ■ nij nijs,s2 nij nis,s2 for i, j î s2 for i î s2, j î s1 for i î s1, j î s2 for i, j î s1 vn is,s^js,s2 y y (3.19) (3.20) (3.21) (3.22) If the statistics ui and u* estimated constants ui and u* without error, then variance estimators above would respectively be unbiased for AV(C*1(X,Y)) and AV(C * 2(X, Y)). Obviously they do not and some bias appears, but we may hope that it remains modest and tends to zero for large samples. We will now present two important special cases of the general procedure presented above. 1 1 1 x * 4 Equal probability sampling Let us assume as in papers of Srinath (1971) and Rao (1986) that SRSWOR is used in both phases. Hence inclusion probabilities of the first and second order are respectively equal to ni=n/N and nij=n(n-1)/(N(N-1)) for i^jeU in the first phase while ni|s = n'/n2 and rc^ = n'(n'-1)/(n2(n2 -1)) for i^jeU in the second phase. We also assume that the subsample size is a linear function of the nonrespondent subset size according to formula: n' = cn2 where 0 1 i<=TT. 2 1 ieU2 ^ 2 jeU2 J (4.6) (4.7) The second-order approximate bias is 1 AB(C+(X, Y)) = --_ n 2 N2(N -1)2 ^ Cu(X, Y) + CU2 (X, Y) ' Nn n c 2 where ct(x,y)=—L_ y xi -— z x, yi -— Z y, N - 1 ^ i N ^ j N ^ j i J->2 jeU2 J\ ^2 jeU2 2 1 ieU2 V (4.8) (4.9) and W2=N2/N. Both approximate bias and approximate variance decrease when initial sample size n grows. From (3.18) we also obtain the variance estimator: V(C + (X, Y)) = N N - n (N -1)2 n(n -1) (n1 - 1)S2 (u+) + +N(n2-1) - cn2(n -1)+n1 s2'(u+)+- /"v t\ sv/ vs s y c(N - n) n (4.10) ies 2 S U 2 2 S U 2 where and —+ 1 v^ ~ + us1 =—L u+ ies1 u+= n z u+ n ^ 1(u+) = -J-r I (u+-U+) 1 ies1 (u+) = -1- S (u+- u+)2 i2 /■ + ~ + u = X; 1 N yi N txty N2 (4.11) (4.12) (4.13) (4.14) (4.15) + + t t y X 5 Unequal probability sampling We will now focus on another special case of the general two-phase procedure. The deterministic nonresponse model assuming that response probabilities are either equal to zero or equal to unity is seldom realistic. In practice they are more likely to take any value from the <0,1> interval and depend on the auxiliary variables as well as the variable under study. In particular it may be more reasonable to assume that response probabilities are described by logistic model given by expression: Pi = (1 + exp(ßxi))-1 (5.1) where xi is some vector of auxiliary variables corresponding to the i-th population unit while ß is the parameter vector. Also, more sophisticated sampling designs that make use of auxiliary information to improve the efficiency of parameter estimates are often preferred to the SRSWOR. One of such designs, known as Pareto sampling (Rosén 1997), allows to draw a fixed-size sample with first order inclusion probabilities approximately proportional to the values z1,...,zN of the auxiliary characteristic Z. According to this procedure the sample s of the size n is drawn in following two steps (Särndal and Lundström 2005): 1) For any ieU a realization ui of a random variable having uniform distribution on the <0,1> interval is generated and the following expression is evaluated: q, = (5.2) p.(1 - u,) where the desired inclusion probability n, is given by expression P, = (5.3) i zi ieU If n,>0 then i-th unit is automatically included in the sample and inclusion probabilities for remaining units are recomputed accordingly. 2) The population subset consisting of n units having the largest values of qk is included in the sample. Proposed covariance estimators are not equivalent now. Computation of their properties in such a situation using general formulas for approximate variance and approximate bias derived above is possible when some reasonable assumptions are made about response distribution. However, to evaluate exact inclusion probabilities it is necessary to use numerical procedures developed by Aires (2000). This makes the analytical comparison of covariance estimators difficult. Hence, a simulation study was carried out in order to compare both covariance estimators. During simulation experiments, the population under study was represented by the data obtained from the Polish'1996 agricultural census representing 2420 farms in three boroughs (Boleslaw, Grçboszôw, Radgoszcz) of the D^browa Tarnowska district. Three variables were used including farm sales (X), farm cattle stock (Y) and farm area (Z). The covariance between X and Y was the estimated parameter. A logistic nonresponse model was arbitrarily assumed stating that population units respond independently with response probabilities respectively equal to pi=(1+exp(ß 0+ß 1xi+ß 2yi))-1 for ieU with the parameter vector ß=[ß0,ß 1,ß2] chosen arbitrarily. Two simulation experiments were respectively executed for ß=[0,0,0] (pi independent on X,Y) and ß=[0,1,1] (p, dependent on X,Y). The simulations were independently repeated for Pareto sampling utilizing Z as the auxiliary variable in both phases, and for simple random sampling without replacement in both phases. The same sample size n=100, 150, ... , 600 was always assumed for both designs. For each sample size the drawing of 40000 sample-subsample pairs was simulated, and the properties of estimators were assessed on the basis of their empirical distributions. Hence, each point on following graphs corresponds to 40000 estimates. Estimators C ,j(X,Y), C .2(X,Y) and C + (X,Y) were compared. In the graphs they are respectively denoted by abbreviations (Cov1, Cov2 and Cov_srs). The mean square error (MSE), the bias and the share of bias in MSE observed in the first experiment are respectively shown on Figures 1, 2 and 3. 4, 3, 3, 2, m w 2, S 1, 1, 5, 0, 0E+07 5E+07 0E+07 5E+07 0E+07 5E+07 0E+07 0E+06 0E+00 ■ Cov1 ■ CoV2 ■ Cov srs 100 150 200 250 300 350 400 450 500 550 600 n Figure 1: MSE as a function of n for ß=[0,0,0], V) < CÛ ■ Cov1 ■ Cov2 ■ Cov srs n Figure 2: Bias as a function of n for ß=[0,0,0]. The MSE of all three estimators decreases with growing sample size. This observation is consistent with asymptotic results obtained for the SRSWOR case. For any value of n the estimator C.1(X,Y) is much less accurate than others. The lowest mean square error is observed for C,2(X,Y), which is significantly better than other two in terms of MSE. The bias of all three estimators is negative and its absolute value decreases with growing n. This is also consistent with asymptotic results for SRSWOR. The share of bias in the MSE is approximately constant for each estimator. It does not exceed 1,4% for C .2(X,Y) and 0,4% for C .1(X,Y) and C + (X,Y). Hence, each estimator may be treated as approximately unbiased. m V) 1, 1, 1, 1, i o, 0Q DQ 0, 0, 0, 0, 6% 4% 2% 0% 8% 6% 4% 2% 0% —■ùr ■ Cov1 ■ CoV2 ■ Cov srs 100 150 200 250 300 350 400 450 500 550 600 n Figure 3: Share of bias in MSE as a function of n for ß=[0,0,0]. The mean square error (MSE), the bias and the share of bias in MSE observed in the second experiment are respectively shown on Figures 4, 5 and 6. Again, the MSE of all three estimators tends to decrease with growing initial sample size. For any value of n observed MSE of the estimator C.2(X,Y) is the lowest and observed MSE of the estimator C.1(X,Y) is the highest. The bias is negative for C.1(X,Y) and C + (X,Y) while oscillating around zero for C.2(X,Y). Its absolute value is lowest for C.2(X,Y) and highest for C.1(X,Y). The observed share of bias in the MSE is again negligible, not exceeding 0,05% for C.2(X,Y) and 0,5% for the other two estimators. iu W 4,5E+07 4,0E+07 3,5E+07 3,0E+07 2,5E+07 2,0E+07 1,5E+07 1,0E+07 5,0E+06 0,0E+00 • covl • cov2 • cov srs 100 150 200 250 300 350 400 450 500 550 600 n Figure 4: MSE as a function of n for ß=[0,1,1]. 100 -100 <2 < -200 m -300 -400 -500 • cov1 ■ cov2 • cov srs 0 n Figure 5: Bias as a function of n for ß=[0,1,1]. 0,6% 0,5% m 0,4% w si 0,3% o3 m 0,2% 0,1% 0,0% 100 150 200 250 300 350 400 450 500 550 600 n Figure 6: Share of bias in MSE as a function of n for ß=[0,1,1]. 6 Conclusions In this paper two nonresponse-corrected estimators of the finite population covariance are considered under quasi-randomization approach. They are computed using the data obtained using a general two-phase sampling procedure involving arbitrary sampling designs in both phases. Their approximate variances and biases are derived under stochastic nonresponse characterized by arbitrary response distribution. No model assumptions on population distribution are made for their derivation, which alleviates the risk of model misspecification. For the important special case of simple random sampling without replacement and deterministic nonresponse derived formulas suggest that proposed estimators are nearly unbiased. The results of simulation experiments carried out for another special case of stochastic nonresponse and Pareto sampling also seem to support this hypothesis for estimators C + (X,Y) and C.2(X,Y). It is worth noting that the properties of proposed estimators have been derived and simulations were executed under the assumption of complete response in the second phase. Further research is needed on their properties if this assumption is not satisfied. Acknowledgements The research has been supported by the grant No: 1H02B 022 30 from the Polish Ministry of Education and Science. ■ cov_1 ■ cov_2 ■ cov srs References [1] Aires, N. (2000): Techniques to Calculate Exact Inclusion Probabilities for Conditional Poisson Sampling and Pareto nps Sampling Designs, PhD Thesis, Chalmers, Göteborg University. Göteborg. [2] Arminger, G. and Schoenberg, R. (1989): Pseudo-maximum likelihood estimation and a test for misspecification in mean and covariance structure models. Psychometrika, 54, 409-425. [3] Arminger, G. and Sobel, M.E. (1990): Pseudo-maximum likelihood estimation of mean and covariance structures with missing data. Journal of the American Statistical Association, 85, 195-203. [4] Bentler, P.M. and Liang, J. (2003): Two-Level mean and covariance structures: Maximum Likelihood via the EM algorithm. In Reise, S.P., Duan, N. (Eds.): Multilevel Modelling - Methodological Advances, Issues and Applications. London: Psychology Press. [5] Brown, R.L. (1994): Efficacy of the indirect approach for estimating structural equation models with missing data: A comparison of five methods. Structural Equation Modelling, 1, 287-316. [6] Cassel, C.M., Särdal, C.E., and Wretman, J. (1983): Some uses of statistical models in connection with the nonresponse problem. In Madow, W.G. and Olkin, I. (Eds.): Incomplete Data in Sample Surveys. Vol II, New York: Academic Press,. [7] Dempster, A.P., Laird, N.M., and Rubin, D. (1977): Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39, 1-38. [8] Finkbeiner, C. (1979): Estimation for the multiple factor model when data are missing. Psychometrika, 44, 409-420. [9] Gold, M.S., Bentler, P.M., and Kim, K.H. (2003): A comparison of maximum likelihood and asymptotically distribution free methods of treating incomplete nonnormal data. Structural Equation Modelling, 10, 47-79. [10] Graham, J.W. and Hofer, S.M. (2000): Multiple imputation in multivariate research. In Little, T.D., Schnabel, K.U., and Baumert, J. (Eds.): Modelling Longitudinal and Multilevel Data: Practical Issues, Applied Approaches and Specific Examples. Mahwah, NJ: Lawrence Erlbaum. [11] Horvitz, D.G. and Thompson, D.J. (1952): A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663-685. [12] Jamshidian, M. (1997): An EM algorithm for ML factor analysis with missing data. In Berkane, M. (Ed.): Latent Variable Modelling and Applications to Causality. New York: Springer, 247-258. [13] Jamshidian, M. and Bentler, P.M. (1999): ML estimation of mean and covariance structures with missing data using complete data routines. Journal of Educational and Behavioral Statistics, 24, 21-24. [14] Kline, R.B. (1998): Principles and Practices of Structural Equation Modelling. New York: Guildford. [15] Lee, S.-Y. (1986): Estimation for structural equation models with missing Data. Psychometrika, 51, 93-99. [16] Lessler, J.T. and Kalsbeek, W.D. (1992): Nonsampling Errors in Surveys. New York: Wiley & Sons. [17] Little, R.J.A. and Rubin, D.B. (1987): Statistical Analysis with Missing Data. New York: Wiley & Sons. [18] Oh, H.L. and Scheuren, F.S. (1983): Weighting adjustments for unit nonresponse. In Madow, W.G. and Olkin, I. (Ed.): Incomplete Data in Sample Surveys. Vol. 2, New York: Academic Press. [19] Rao, P.S.R.S. (1986): Ratio estimation with subsampling the nonrespondents. Survey Methodology, 12, 217-230. [20] Rosén, B. (1997): On sampling with probability proportional to size. Journal of Statistical Planning and Inference, 62, 159-191. [21] Rubin, D.B. (1987): Multiple Imputation for Nonresponse in Surveys, New York: Wiley. [22] Särndal, C.E., Swensson, B., and Wretman, J. (1992): Model Assisted Survey Sampling, New York: Springer-Verlag. [23] Särndal, C.E. and Lundström, S. (2005): Estimation in Surveys with Nonresponse, New York: Wiley. [24] Schafer, J.L. and Olsen, M.K. (1998): Multiple imputation for multivariate missing data problems: A data analyst's perspective. Multivariate Behavioral Research, 33, 545-571. [25] Schneider, T. (2001): Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values. Journal of Climate, 14, 853-871. [26] Srinath, K.P. (1971): Multiphase sampling in nonresponse problems. Journal of the American Statistical Association, 335, 583-586. [27] Van Praag, B.M.S., Dijkstra, T.K., and Van Velzen, J. (1985): Least squares theory based on general distributional assumptions with an application to the incomplete observations problem. Psychometrika, 50, 25-36. [28] Woodruff, S.M. (1990): Model-based estimation of covariance matrices with applications to the EM algorithm. Proceedings of the Survey Research Methods Section. American Statistical Association, 425-428. [29] Wothke, W.(2000): Longitudinal and multi-group modelling with missing data. In Little T.D., Schnabel K.U., and Baumert, J. (Eds.): Modelling Longitudinal and Multilevel Data: Practical Issues, Applied Approaches and Specific Examples. Mahwah, NJ: Lawrence Erlbaum. [30] Yuan, K.-H. and Bentler, P.M. (1995): Mean and covariance structure analysis with missing data. UCLA Statistical Series - Report No. 193, Los Angeles: University of California.