Image Anal Stereol 2014;33:107-119 doi:10.5566/ias.v33.p107-119 Original Research Paper A QUASI-LIKELIHOOD APPROACH TO PARAMETER ESTIMATION FOR SIMULATABLE STATISTICAL MODELS MARKUS BAASKEC,1, FELIX BALLANI1 AND KARL GERALD VAN DEN BOOGAART1,2 1Institute of Stochastics, Faculty of Mathematics and Computer Science, TU Bergakademie Freiberg, D-09596 Freiberg, Germany; 2Helmholtz Institute Freiberg for Resource Technology, Halsbr¨ ucker Str. 34, D-09599 Freiberg, Germany e-mail: markus.baaske@math.tu-freiberg.de, ballani@math.tu-freiberg.de, boogaart@hzdr.de (Received November 13, 2013; revised March 21, 2014; accepted March 24, 2014) ABSTRACT This paper introduces a parameter estimation method for a general class of statistical models. The method exclusively relies on the possibility to conduct simulations for the construction of interpolation-based meta­models of informative empirical characteristics and some subjectively chosen correlation structure of the underlying spatial random process. In the absence of likelihood functions for such statistical models, which is often the case in stochastic geometric modelling, the idea is to follow a quasi-likelihood (QL) approach to construct an optimal estimating function surrogate based on a set of interpolated summary statistics. Solving these estimating equations one can account for both the random errors due to simulations and the uncertainty about the meta-models. Thus, putting the QL approach to parameter estimation into a stochastic simulation setting the proposed method essentially consists of .nding roots to a sequence of approximating quasi-score functions. As a simple demonstrating example, the proposed method is applied to a special parameter estimation problem of a planar Boolean model with discs. Here, the quasi-score function has a half-analytical, numerically tractable representation and allows for the comparison of the model parameter estimates found by the simulation-based method and obtained from solving the exact quasi-score equations. Keywords: kriging meta-modelling, parameter estimation, quasi-likelihood, simulation-based optimization. INTRODUCTION Our initial aim motivating this research is .tting parameters of statistical models from stochastic geometry for modelling porous media. In many of these models no closed forms or direct computation algorithms for likelihood functions or distribution characteristics as function of the model parameters are known. This precludes many standard techniques of parameter estimation, such as maximum likelihood (ML), typical Bayesian algorithms (including Markov­chain-Monte-Carlo-type algorithms), or, least squares (including method of moments) based on exact model characteristics. It is, however, relatively easy – though still computationally demanding and af.icted with substantial random error – to approximately compute expected values of empirical characteristics through Monte Carlo simulations by doing a random simulation of the process and computing characteristics from the simulation result. Such empirical characteristics can be any type of descriptive statistic, for instance, for a random closed set, the densities of the Minkowski functionals like volume fraction or speci.c surface, the covariance and certain contact distribution functions (Schneider and Weil, 2008; Ohser and Schladitz, 2009; Chiu et al., 2013), or, for a random point pattern, the intensity (function), the pair correlation function and the nearest neighbour distance distribution function (Moller and Waagepetersen, 2003; Illian et al., 2008), or any other sort of summary statistic maybe tailored speci.cally for the given estimation problem. In particular, this includes those cases where a spatial random structure is practically observable only via planar sections or projections and, hence, only statistics based on sections or projections are available. If we could compute these characteristics precisely and fast enough, a general approach for estimation would be to .nd those model parameters producing characteristics most similar to the observed characteristics, following a least squares, minimum contrast or estimation equation approach. We would formulate the problem as a minimization problem and solve it using a standard optimization code. This is exactly the way how model parameters are commonly estimated in such a situation (see, e.g., Redenbach, 2009). However, on the one hand optimization codes expect their objective functions to be evaluated precisely. On the other hand, both the Monte Carlo simulations of the model as well as the computation of the empirical characteristics are often very time-consuming. In particular, this applies for stochastic geometric models of complex random structures such as .ber systems (Altendorf and Jeulin, 2011; Gaiselmann et al., 2013) or foam structures (Lautensack, 2008; Redenbach, 2009). That leaves us with the task of solving a global non-convex optimization or adjustment problem where the objective function can be evaluated only slowly and with a substantial random error. The aim of this paper is to give a fast algorithm for solving this sort of problems. Although it is in principle possible to apply the ideas of this algorithm also to a least squares approach, we will concentrate on a certain estimation equation approach coming from the quasi-likelihood (QL) theory (Godambe and Heyde, 1987; Heyde, 1997) which proposes to use the root of the so-called quasi-score function as the parameter estimate. The paper is organized as follows. At .rst, the necessary ideas from the QL theory are introduced. Then, in a step-by-step manner, a detailed description of our ideas to solve the problem of .nding a root of the quasi-score function in practice is given. Then, for the purpose of comparison, we apply our method to the example of a planar Boolean model with discs where also other ways of parameter estimation are available. Finally, we end up with some discussion. QUASI-LIKELIHOOD METHOD Let X be a random variable on the sample space X whose distribution depends on an unknown parameter . taking values in an open subset . of the q-dimensional Euclidean space Rq. The possible probability measures {P. } for X form a parametric family indexed by .. We de.ne a mapping Y = T (X) . Rp denoting a transformation of X to a set of possible summary statistics. The objective is the optimal estimation of the parameter . in the sense of (asymptotic) minimum variance unbiased estimation based on the speci.cation of the .rst two moments of Y = T (X) without imposing any distributional assumptions in the situation when likelihood functions are unavailable. Let Z(.)= E. [Y ] . Rp be a function of the parameter to the expected value of the summary statistics and denote the variance of Y as a function V (.)= Var. (Y ) . Rp×p of the parameter under P. . An unbiased estimating function for . is de.ned as a function of the data Y = y and the parameter of interest, such that E. [G(.,Y )] = 0 for each P. . For the class G = {A(.)(y - Z(.))} of all linear unbiased estimating functions, where A(.) is any non-singular matrix, G(.,y)= y - Z(.) is an optimal estimating function for which the “information criterion” t . G -1.G E (G)=E.E.GGtE. .... (1) is maximized in the partial order of non-negative de.nite matrices (Heyde, 1997). Then the QL theory states that Q(.,y)=(Z'(.))tV (.)-1(y - Z(.)) , (2) where Z' is the Jacobian of Z. Q is called the quasi-score function, which is an optimal estimating function among all functions of G . The “information criterion” (Eq. 1) can be seen as a generalization of the well-known Fisher information and coincides with it in case a likelihood is available and G is the usual score function, i.e.,. the derivative of the log-likelihood w.r.t. the parameter. Thus, in analogy to ML estimation, E (G)-1 has a direct interpretation as the asymptotic variance of the estimator .^. Moreover, E (G) might serve as a measure of how much the characteristics Y = T (X) contribute to the precision of the estimator derived from the quasi-score function. Under rather minor regularity conditions, the estimator .^ obtained from solving the estimating equation Q(.^,y)= 0 has minimum asymptotic variance among all functions G . G whereas consistency is mainly established due to the unbiasedness assumption of the estimating equation (Liang and Zeger, 1995) which yields, in terms of its root, a consistent estimator .^ even if the covariance structure V (.) is incorrectly speci.ed. Note that this is in general not the case if we applied weighted least squares with . dependent weighting to our setting. In what follows we always assume that the family of models {P. } is rich enough to .nd a root of the quasi-score equation. This could fail like for instance in classical ML when the observed characteristics are outside the range of the expected model characteristics or if the mapping from the parameter space to the expected characteristics is not continuous. It is worth noting that quasi-likelihood is a concept differing from pseudo-likelihood, which assumes the existence of a likelihood and estimates parameters by maximizing an approximation of the full likelihood rather than the full likelihood itself, e.g., because the latter is dif.cult to evaluate in case of complex dependencies. The most important variants are composite marginal likelihoods (see Varin (2008) for a comprehensive overview) .rst of all pairwise likelihoods, which have been used, e.g., for parameter estimation in random set models (Nott and Ryd´en, 1999; Wilson and Nott, 2001). Image Anal Stereol 2014;33:107-119 QUASI-LIKELIHOOD-BASED ESTIMATION APPROACH SIMULATION AND INTERPOLATION OF THE QUASI-SCORE Since closed form expressions of expectations are generally unavailable for complex stochastic models we assume there is no other way of infering Z(.) or V (.) than over extensive simulations of X. We assume that we can simulate realizations from the distributions P. for any . . .. An iterative algorithm solving the estimating equations based on the quasi-score function has thus to conduct simulations of X for various . infering Z(.) and V (.) from the simulation output. Depending on the complexity of the simulation models this can be very time consuming and typical solvers will be confused by the remaining random .uctuation. We solve the problem by estimating an approximating model for the dependency of the quasi-score function on the parameters from individual simulations and then solve the quasi-score equation on an iteratively improved version of this approximation. We propose the use of an interpolation-based meta­model of Z(.) and later V (.) for the construction of an approximating quasi-score function, since unlike the quasi-score function itself, these can be estimated unbiasedly from simulations. We use kriging (Cressie, 1993; Chiles and Del.ner, 1999) for this interpolation since it allows automatic adaption of the prediction rule to the properties of the unknown function to be interpolated and provides an indication of precision of the interpolation. Kriging is typically de.ned in a probabilistic context but it can also be understood as a method to approximate or interpolate data such as splines do (Wahba, 1990). It is primarily based on estimated covariance functions which quantify the change rates of the interpolated functions and their derivatives. Besides its usage in truly geostatistical problems, kriging has been also widely applied in the context of simulation optimization (Kleijnen and van Beers, 2009) and independently studied within the optimization literature of Design and Analysis of Computer Experiments (DACE) (Sacks et al., 1989) for expensive to evaluate, scalar valued objective functions. Unlike these approaches which use stationary random functions we propose the use of intrinsic random functions because quasi-scores typically diverge. Prediction model Let Z(.) denote one of the functions to be interpolated, e.g., Z(.)= E. [Y ]. The problem considered now is the prediction of the value Z(.0) for some new sampling point .0 in the parameter space given a .nite set Sn = {(.i,zi),.i . Rq,i = 1,...,n}of pairs of a sampling point .i and an “observed” zi . Z(.i). The observations zi = Z(.i)+.i are affected by the variance from the Monte Carlo simulations .i. The .i are assumed to be stochastically independent of each other. Its variance is estimated in the sampling procedure. In our setting, given Sn, we model the functional dependence between the parameter . and the value Z(.) by Z(.)= mß (.)+W (.) , (3) where mß (.)= .rl=0 ßlfl (.) denotes a linear model for the expected value of the “random” function Z. The functions fl are known basis functions, typically monomials for a polynomial approximation of the mean of Z. The index l has the meaning of a power, e.g., in 1D, fl (.)= .l. The case l = 0 is included for the constant mean, i.e., f 0 = 1. In the q-dimensional q+k space there are kn = k monomials of degree . k so that the term mß consists of kn + 1 functions fl . The unknown coef.cients ßl have to be estimated from the data. W is a zero-mean intrinsic random function (see next section). Though the form of W is in general unknown the covariance structure of W is subjectively chosen so that it relates to the data phenomenon and its smoothness properties as usually done in geostatistics. Intrinsic kriging Intrinsic Kriging extends the scope of Kriging to the case of intrinsic random functions (IRF) (Matheron, 1973) with unknown mean whose increments are second-order stationary. The main idea is to .lter out the mean so to consider a zero-mean process again. Here, we shortly recall the basic notion of IRF. For a set of weights .i . R and points .i . . . Rq, i = 1,...,n, we de.ne a discrete measure ., say . = .ni=1 .i..i . Let F be the .nite-dimensional vector space generated by fl. For a function f . F we de.ne the linear application of . on the function f by f (.)= .ni=1 .if (.i). For k . 0, .k denotes the set of all measures . for which f (. )= 0, that is, . annihilates separately all monomials of degree . k, .lq i.e., fl(.)= .l1 ···q , . . ., with the vector-valued 1 index l =(l1,...,lq), lj = 0,1,...,k and |l| = .qj=1 lj . k. Such measure . . .k is called an allowable measure, see, e.g., Chiles and Del.ner (1999). For any translation vector h . Rq and ., we de.ne the translate of . by .h. = .in =1 .i..i+h with the same weights as .. A second-order random function Z(.) is said to be an intrinsic random function of order k (IRF-k), if for any allowable measure . . .k the random function Z(.h. ) is second-order stationary. Since the mean of Z(.) is in F , Z(.), . . .k, is a zero-mean random function with covariance Cov Z(.h1 . ), Z(.h2 .) as a function of h1 - h2 only. From Matheron (1973) we know that any continuous IRF-k has a continuous generalized covariance function K(h) such that for any pair of measures . = .n1 . .k, µ = .n2 j . .k i=1 .i..ij=1 µj..* and sequences of points . =(.i), .* =(.j *), BAASKE M ET AL: Quasi-likelihood parameter estimation Since we only assume that we can infer V (.) from simulations the idea is to construct an interpolated version of the sample variance-covariance matrix based on a Cholesky decomposition. —where, for l,k = 1,..., p, Let the sample covariance matrix be V (.)=[v]lk, N n1 n21 . [Yl(., j) -Y—l(.)][Yk(., j) -Y—k(.)] . . . vlk = .iµjK(.i - .j * ) , N - 1 Cov(Z(.),Z(µ)) = j=1i=1 j=1 where K is unique up to an even polynomial of degree 2k for k . 0. For further details the reader is referred to the general geostatistical literature (Cressie, 1993; Chiles and Del.ner, 1999; Wackernagel, 2003). In our examples we use the generalized covariance functions proposed by Mardia (1996). We need this advanced covariance functions to adequately describe the typical divergence properties of the moments as functions of parameters. Kriging the statistics —— The Cholesky decomposition of V is of the form V = LLt where L is a unique lower triangular matrix with real and positive diagonal entries since V is (assumed —to be) positive de.nite. Suppose we have previously computed a series of such matrices V (.1),..., V (.n) ——for various sample points. In order to apply kriging we rewrite each Cholesky decomposition V (.k)= —L(k)(L(k))t as a row vector which allows us to treat the entries Li j (k),1 . j . i . p, k = 1,...,n, as the data to essentially the same type of kriging model as noted in (3). We de.ne the following data matrix . . L(1) L(1) L(1) ··· pp 11 21 Let the sample mean at some evaluation location . .. .. be .. . .. . .. . . Rn×m , (6) N 1 — Y (.)= . Y (., j) , L(n) L(n) L(n) ··· pp 11 21 N j=1 where m = p(p + 1)/2. Now, each column represents where Y (., j) . R denotes the jth simulation, j = the kriging data for a single kriging interpolation 1,...,N, of some real-valued characteristic. The value model which amounts in overall m kriging models. For — Y is regarded as a realization of an IRF-k of known some .0 the kriging prediction for each model results order k and known generalized covariance function L(0) L(0) in the kriging estimates L^(0)=(^11 ,..., ^pp )t . Finally, as a unique kriging approximation to — K(h). We are concerned with the estimation of the V (.0), we obtain — Y (.0) for some evaluation location .0. value The kriging predictor (best linear unbiased estimator) is V^(.0)= L^(0)(L^(0))t . (7)given by As done before in the case of kriging the statistics, n ^ Y^(.0)= . .i(.0)Y—(.i) , (4) one could also incorporate a measurement error of i=1 where the weights .^i, depending on the prediction location, are the solutions to the so-called intrinsic kriging equations (Chiles and Del.ner, 1999, sect. 4.6.). The prediction variance of this estimator is then given by the sample variances. However, as this quantity as a fourth order moment is dif.cult to estimate, we use a global nugget effect in the covariance function model to capture it. Modeling the covariance We use the restricted maximum likelihood method (REML) (Mardia and Marshall, 1984; Zimmermann, (5) 1989) to estimate the vector . . Rr of covariance nk . ^ .iK(.0 - .i) - . .^2(.0)= K(0) - µl .l 0 , i=1 |l|=0 where µ is the vector of Lagrange multipliers, introduced in order to ensure the unbiasedness of the kriging predictor. Modelling the variance of statistics The quasi-score function (Eq. 2) involves the variance-covariance matrix V (.) of the data Y = y .^. 2(.)= parameters. The variable under study is still Z(.) which is treated as non-stationary in its mean E[Z(.)] = mß (.) and has a generalized covariance model K(·;.). Clearly the response Y (.) cannot —be observed without noise. The variance of the measurement noise .(.), which is assumed Gaussian with zero mean, is estimated by N 1 (Y (., j) -Y—(.))2 . . which in general depends on the unknown parameter. N(N - 1) j=1 Image Anal Stereol 2014;33:107-119 In the application of kriging in geostatistics the variance of the measurement error is due to the so-called “nugget effect”. To account for the noise due to simulation replications we take .^. (.)= Diag{.^. 2(.1),..., .^. 2(.n)} as the diagonal matrix of estimated variances. Then the REML estimation is based on the overall covariance matrix K~(·; .)= K(·; .)+ .^. (·) , (8) which yields an estimate .^ of the covariance parameter. The kriging estimator and its prediction variance are further derived from Eq. 4 and Eq. 5 with K replaced by K~(·;.^). Other approaches would be possible, but we found this approach to be reasonable stable and effective. Quasi-score approximation We are now in the position to construct the approximated quasi-score based on the interpolated quantities which we derived in the previous sections. Suppose that for a given set Sn of sampled points and corresponding values we have constructed the kriging predictors Z^=( Y^1,..., Y^p)t for each statistic together with the approximating variance-covariance matrix V^. Note that all predictors depend on the given sampled data set Sn. Let .^K (.)= Diag{.^12(.),..., .^2(.)} (9) p denote the diagonal matrix of kriging variances (Eq. 5) for each statistic. Based on the optimal estimating function (Eq. 2) we construct the approximated quasi-score function Q^(.,y) . Rq by replacing the involved expectations and variances with the corresponding kriging approximations t -1 Q^(.,y)= Z^' (.) V^(.)+ .^Ky - Z^(.) , (10) such that the approximated “information criterion” becomes t -1 I^(.)= Z^' (.) V^(.)+ .^KZ^' (.) . (11) The term .^K explicitly accounts for the kriging prediction error of the variances of the statistics. This modi.cation of the variances leads to a systematic reduction of the prediction error induced by the kriging model when new solutions are added to the set Sn of all previously sampled points and values. For a consistent estimation recall that we do not need to correctly specify the (unknown) covariance structure V . The Jacobian of Z^, i.e., the matrix of partial derivatives in Eq. 10, is numerically calculated based on .nite differences of the kriging-interpolated statistics though one could also obtain estimates ofZ^' (.) by a cokriging approach (Chiles and Del.ner, 1999, sect. 5.5.1). SOLVING THE QUASI-SCORE EQUATION In analogy to the ML method one can regard the quasi-score function as some kind of a gradient speci.cation of a “quantity” similar to a likelihood which, however, is not considered in the QL theory. Likewise the “information criterion” (Eq. 1) is a generalization of the well-known Fisher information de.ned in the ML theory. Therefore, the same Newton-Raphson-type iteration used to .nd the maximum likelihood estimate as a root of the score function, which is usually called Fisher scoring, can be applied to the QL setting (Osborne, 1992). This leads to the Fisher quasi-scoring iteration .k+1 = .k + tkdk, dk = -I^(.k)-1Q^(.k) . (12) The step length parameter tk in Eq. 12 has to be chosen so that the iteration does not become unstable. Therefore tk is chosen as a minimizer of the auxiliary optimization problem min Q^(.k +tdk) 22 , (13) t.(0,1] (see Osborne, 1992, sect. 4 for details). Particularly Osborne (1992) shows that the quasi-scoring iteration converges a. s. to the QL estimator provided that the effective sample size is large enough and the chosen starting point .0 is suf.ciently close to a root of Eq. 10. The aspect of the necessary proximity of the starting point is dealt with by a simple restart. If the iteration fails a direct search method (Spall, 2003) is used to .nd a better starting value. This is possible since the kriging interpolations are relatively cheap to evaluate. The increasing sample size corresponds to an increasing window size in a spatial setting. Hence, ergodicity or mixing properties are required. Note that the iteration is based on the approximated quasi-score function (Eq. 10) and therefore will .nd a root of Eq. 10 rather than of the theoretical quasi-score function (Eq. 2). UPDATING THE QUASI-SCORE INTERPOLATION Since the overall aim is to sequentially improve the QL estimate based on Eq. 10 towards the solution of Eq. 2, the approximate quasi-score is updated as follows: To the current set Sn of all previous sample points and values the root .^ of the quasi-score equation (Eq. 10) as well as further points sampled from N (.^,I^(.^)-1) are added. This involves new simulations at all new locations followed by re.ning the approximation Q^through an update of the kriging-interpolated statistics and variance-covariance matrix (Eq. 7) along the lines of the previous subsections. The additional locations are used to improve not only the estimate of the value but also the estimate of the Jacobian Z^' (.) and, hence, also that of the quasi-score function. The chosen sampling distribution for the additional sample points is motivated by the idea that the estimation error is (under certain conditions ensuring asymptotic normality of the statistics T , like ergodicity and suf.cient regularity) asymptotically normal with this variance (see the subsequent subsection) and thus the true parameter value could be any of these (Godambe and Heyde, 1987, sect. 4.3). ESTIMATION ERROR The precision of the QL estimator is under certain conditions asymptotically equivalent to the inverse of the “information criterion” (Eq. 1). The variance of the quasi-score function Q(.,y) is given by Eq. 1 (Heyde, 1997) and reads I(.)= Z' (.)tV (.)-1Z' (.) , (14) which we call the quasi-information matrix. We can use a .rst-order Taylor approximation of Q(.^, y) in .^ = . and y = Z(.), Q(.^,y) . Q(.,Z(.)) + . Q(.,Z(.))(.^ - .) .. +Z ' (.)tV (.)-1(y - Z(.)) = 0 - I(.)(.^ - .)+ Q(.,y) , which leads to .^ . . + I(.)-1Q(.,y) . Using the law of error propagation we .nally get Var(.^) . I(.)-1I(.)I(.)-1 = I(.)-1 (15) for suf.ciently large datasets such that y is near E[Y ] just like in ML estimation. STOPPING CRITERIA The estimation algorithm has a twofold iteration. The inner loop searches for a root of the approximated quasi-score by means of the stabilized Fisher quasi-scoring iteration. This inner iteration terminates when the “Mahalanobis distance” of the quasi-score (Heyde, 1997, sect. 4.4) is numerically negligible, i.e., C = QtI-1Q . . « 1 , (16) e.g., . = 10-7 in our simulation study. If the algorithm converges numerically (i.e., the step size in . drops below some limit which is small relative to the extent of the parameter space, in our simulation study set to 10-6) we distinguish two cases. If C is signi.cantly distinguishable from 0 according to its approximate .q 2 distribution, the Fisher quasi-scoring iteration is restarted from multiple (e.g., Eq. 15 in our simulation study) random locations in the parameter space following the reasoning that this is not an approximate root. The root with the smallest sum of eigenvalues of the inverse quasi-information matrix (Eq. 14) is selected according to the idea that if really multiple substantially different parameters occur which would explain the data we should select one with a small local estimation error. Obviously this idea can be improved towards some more sophisticated method to select the best root according to additional evidence in the data, e.g., based on a difference quotient quasi-score -1 Zj(.2) - Zj(.1) V (.2)+V (.1) (Y - Z(.)) , .2i - .1i 2 ij for evaluation locations .1,.2 . .. If it could be a root statistically according to Eq. 16, the inner iteration is interrupted to improve the approximated quasi-score locally by new simulations in this area of the parameter space. The stopping rule for the outer iteration, which samples further locations, is based on the comparison of the interpolation error of the quasi-score approximation to the quasi-information matrix (Eq. 14). This interpolation error is measured by the kriging error of Z(.) and transformed into an error of the quasi-score by the law of error propagation assuming Z' (.)V (.)-1 and the expected statistical error of the quasi-score to be known. The approximation error of the quasi-score is calculated by B(.)= Z ' (.)V (.)-1.^KV (.)-1Z ' (.)t . The outer iteration is stopped as soon as the .rst is negligible compared to the second in all directions, i.e., if the largest generalized eigenvalue (Golub and Loan, 1996) of the two matrices drops below some limit b « 1(e.g., b = 10-4 in our simulation study), B .Lb·I(.), in the Loewner half order of non-negative de.nite matrices. Image Anal Stereol 2014;33:107-119 SUMMARY OF QUASI-SCORE ESTIMATION This section is intended to give a short outline of the intermediate steps during the overall estimation procedure based on the approximated quasi-score. The parameter estimation method requires the de.nition of a stochastic simulation model which – to some extent – describes the data-generating process. For such model, we assume that a .xed set of chosen statistics can be ef.ciently simulated for various values of the model parameters. For the initialization of the method one has to select a start sample of parameter values Sn0 of suf.ciently large size n0 where to simulate the model and to compute the values of these statistics needed to construct the initial kriging interpolation models. It is not important to select Sn0 according to a speci.c design methodology as long as it “reasonably” covers the parameter space such as space .lling designs do, e.g.,a Latin hypercube design (Santner et al., 2003, sect. 5). One could also account for some “user” knowledge of most likely regions of possible roots of the approximated quasi-score and generate a sample of initial model parameters from a pre-speci.ed a-priori distribution. The construction and calculation of the approximate quasi-score involves the following steps: Suppose that in the kth iteration of the algorithm a root .^k of Eq. 10 was found. Given a set of sampling locations Snk together with the corresponding simulation results the mean values and variances of the statistics are computed and the kriging interpolation models are individually updated. This step involves the REML estimation of the covariance parameters based on the modi.ed covariance matrix (Eq. 8) once for each kriging predictor of the statistics and all entries of the Cholesky decomposition of the variance-covariance matrix of the statistics. The kriging interpolation of the Cholesky decomposition of the variance-covariance matrix of the statistics is based on the same kriging model (Eq. 3) as used for the interpolation of the statistics but with separately .tted parameters. Besides the calculation of the Jacobian of the statistics, as a .nal step to calculate the approximation to the quasi-score, the kriging prediction variances (Eq. 9) of the statistics are added to the kriging interpolation V^in Eq. 10. Note that this only requires the calculation of the kriging prediction variances by Eq. 5 once for each statistic since at this point the kriging equations have already been solved for the prediction of the statistics at some speci.c parameter value. Based on the current estimated root .^k and kriging interpolation models a stabilized version of the Fisher quasi-scoring iteration is applied to solve Eq. 10 including the step-length control problem (Eq. 13) combined with a multi-start strategy to ensure convergence. This leads to an improved estimate .^k+1 followed by the above described update procedure of the quasi-score interpolation involving further simulations. AN ILLUSTRATIVE EXAMPLE By way of example we shall demonstrate the power of the proposed general parameter estimation method in what follows. 2D BOOLEAN MODEL WITH DISCS Let X be a stationary Boolean model in R2 with intensity . and with “typical” grain equal to a disc of some .xed radius R, that is, X is a random closed  set of the form X = iB(xi,R) where the centres xi of the discs B(xi,R) of radius R are the points of a homogeneous Poisson point process in R2 with intensity . (Schneider and Weil, 2008). Since Boolean models in general have been paid attention in many respects there exist many results on the statistics of Boolean models in the literature, in particular several special methods of parameter estimation in Boolean models have been established and investigated (Heinrich, 1993; Molchanov, 1997; Chiu et al., 2013). A certain method, which is known as the method of intensities (Molchanov, 1997) or method of densities (Chiu et al., 2013), relies on the fact that the area fraction AA, the speci.c boundary length LA and the speci.c Euler number .A of X as the intensities/densities of the three Minkowski functionals in R2 (Schneider and Weil, 2008) are given as closed form expressions of, in the present example, . and R: AA = 1 - exp(-..R2) , (17) LA = 2. .Rexp(-..R2) , (18) .A = (. - .2.R2)exp(-..R2) . (19) Solving for . and R leads to . = .A 1 - AA + 1 4. L2 A (1 - AA)2 , (20) 2LA(1 - AA) R = , (21) 4.(1 - AA).A + L2 A cf. Molchanov (1997, p. 82), or, using only AA and LA, to L2 A . = , (22) -4.(1 - AA)2 · ln(1 - AA) -2(1 - AA) · ln(1 - AA) R = . (23) LA Replacing the model characteristics AA, LA and .A by certain corresponding empirical counterparts will lead to estimators of . and R. In the following we always assume that we observe our model X in some rectangular window W . However, we distinguish the following two cases. On the one hand, we use the fact that X restricted to W is essentially a .nite union of discs from which the area AX (W ), the boundary length LX (W ) and the Euler number .X (W ) of X within W can be computed exactly (Tscheschel, 2005). This results in the (unbiased) estimators AX (W ) LX (W ) .X (W ) LLL AA = , LA = , .A = , A(W ) A(W ) A(W ) (24) where A(W ) is the area of W . Inserting these estimators into Eqs. 20 and 21 gives the estimator (R.MIa,R MIa) for (.,R), inserting instead into Eqs. 22 and 23 gives (R .MIb,R MIb). On the other hand, we consider the case that X restricted to W is only observable in a discretized form, i.e., on a rectangular lattice of some lattice constant a. In this case we use estimators AAA, LAA, and A .A for AA, LA and .A, respectively, along the lines given in Ohser and M¨ucklich (2000, sect. 4.2, .lter mask F1), .nally resulting in (R.MIa,R MIa) when inserting in Eqs. 20 and 21, and (R.MIb,R MIb) when inserting in Eqs. 22 and 23. Subsequently, in the application of quasi-score estimation, either the triple (LAA,LLA,L .A) or the triple (AA.A) will play the role of the set of statistics AA,LA,AY = T (X). EXACT QUASI-SCORE ESTIMATION Due to unbiasedness the expected values of (LL.A)(AA,LA, .A), AA,LA,Lare Z(. ,R)= explicitly given by Eqs. 17–19. This in turn leads to . Z(. ,R) = exp(-..R2) .(.,R) .. .R22..R ×. 2.R(1 - ..R2) 2.. (1 - 2..R2) .. 1 - 3..R2 + .2.2R42.2.R(..R2 - 2) Furthermore, also the matrix V (.,R) of variances and covariances of (LAA,LA,L L.A) can be calculated, at least numerically. The simplest case is the variance VAA = E[AX (W )2/A(W )2] - A2 A of L AA. We have VAA = 1 CAA(h).W (h)dh - A2 A , A(W )2 where CAA(h)= 2AA - 1 +(1 - AA)2 exp(..B(o,R)(h)) is the covariance of X and .W (h)= A(W . (W - h)) is the set covariance of W (Chiu et al., 2013). VAA is directly connected to the second-order moment measure MAA of the random measure AX , i.e., MAA(W1,W2) := E[AX (W1)AX (W2)], and VAA = MAA(W,W )/A(W )2 - A2 A. Note that CAA is a density of MAA with respect to the 4-dimensional Lebesgue measure. In order to determine the variances and covariances Vi j, i, j .{A,L, .}, it is thus necessary to know the (mixed) second-order moment measures Mi j, i, j .{A,L, .}, and, if existing, their densities Ci j with respect to Lebesgue measure. For exactly the model under consideration these densities have been derived in the literature (Mecke, 2001; Torquato, 2002; Ballani, 2007). For instance, we have VLL = 1 CLL(h).W (h)dh - L2 A , A(W )2 CLL(h)= .22.R - 1[0,2R)( h )2Rarccos h 2R 4R2 + .1[0,2R)( h ) . CAA(h) , r 4R2 - r2 and 1 VAL = VLA = CAL(h).W (h)dh - AALA , A(W )2 CAL(h)= LA - .CAA(h) 2.R h - 1[0,2R)( h )2Rarccos , 2R where CAA(h)=(1 - AA)2 exp(..B(o,R)(h)). It turns out that all measures Mi j but M.. have a density with respect to Lebesgue measure. The “density” of M.. contains a certain delta component (Mecke, 2001) which can be treated separately without any problem. All in all, for the present example of a Boolean model the quasi-score function Q is known exactly in the sense that it can be evaluated numerically. Hence, here we are in the position to .nd the estimates R.EQS and R EQS as the root of the exact quasi-score (EQS) equation Q((.,R),y)= 0 by the type of iteration shown in Eq. 12. Image Anal Stereol 2014;33:107-119 SIMULATED QUASI-SCORE (Eq. 24) amounted to 20 whereas using the discretized versions required 35 additional sampling locations on ESTIMATION average. For the simulation-based quasi-score (SQS) The Bayesian root mean square error (BRMSE) estimation we consider the two cases that both the observation y and each of the realizations of the — Y (.,R) N 1 statistics of of the Boolean model used . i 1= (.ik - .^ik)2 BRMSE = , k = 1,2 during the estimation procedure is either given by N the exact estimators (ALA,LLA, .LA) or by the estimators (AAA,LAA, .AA) coming from the discretization. Thus we obtain again two sets of estimators (R.SQS,R SQS) and (R.SQS,R SQS), respectively, for (.,R). SIMULATION STUDY We conducted a simulation study by applying the above described estimation methods SQS, EQS and MIa, MIb to the same set of model realizations. The .rst part of this study investigates the mean squared errors between re-estimated model parameters and true ones over randomly sampled locations of the parameter space. The second part addresses the estimation precision of the proposed method based on the SQS. Throughout the whole simulation study the observation window W was chosen as [0,5]2. Intentionally the lattice constant a was 0.05 leading to a relatively coarse discretization of 100 × 100 pixels. Further, the following process parameters of the SQS estimation method were .xed. The amount of sampling locations n0 used to construct the initial quasi-score approximation was set to 12. The number of simulation replications spent for each sample location was .xed to 100 throughout the whole estimation procedure. The parameter space for the feasible values of . =(.,R) was set to the intervals [5,30] and [0.01,0.2] for . and R, respectively. The maximum number of admissible new sampling locations added during the iterations of the method was restricted to 50. Within the .rst part of the study N = 200 parameter values of . and R were sampled from a uniform distribution, each parametrized by the interval [8,25] and [0.05,0.18], respectively. Simulating the model X once for each .i =(.i,Ri), i = 1,...,N yielded the observations xi. The exact statistics (ALA,LLA, .LA) and their counterparts (AAA,LAA, .AA) for the discretizations were both applied to these observations in order to compare the performance of the estimation methods. Fig. 1 presents the true parameters plotted against their estimated values obtained from a single-run SQS and EQS estimation for each .i. The mean number of newly added locations in case of running the SQS estimation method based on the exact statistics of N re-estimated parameters was separately calculated over .^i =(.^i, R^i) for each estimation method. For 13 out of 200 single-runs of the SQS estimation method the current implementation caused the method to fail to converge towards the “correct” root. This algorithmic behaviour due to the current implementation of the SQS estimation method is induced by the non-convexity properties of the quasi-score in general, thus making it dif.cult to distinguish between multiple roots (Heyde, 1997, sect. 13.3). First of all the BRMSE for the estimators based on the method of intensities, see Table 1, show a well-known behaviour, namely, that MIb outperforms MIa due to the fact that the estimators for the speci.c Euler number .A have much higher variability than that for AA and LA (Chiu et al., 2013). For both MIa and MIb the relatively high BRMSE of .~and R~compared to that of .^and R^is due to the fact that the estimators LAA and .AA based on the discretized realizations are not unbiased for LA and .A, respectively, see Serra (1982); Ohser and M¨ucklich (2000). This also leads to a bias for .~and R~, which is here quite pronounced because of the very coarse applied discretization. Table 1. Bayesian root mean square error (BRMSE) for the estimators of . and R according to the method of intensities with (AA,LA, .A) (MIa), with (AA,LA) (MIb), exact quasi-score (EQS), and simulated quasi-score (SQS) estimation. Values are rounded to two signi.cant digits. The simulation study standard errors are given in parentheses. The exact quasi-score is not applicable for the discretized data. Method BRMSE(R.) BRMSE(R. ) MIa 1.8 (0.21) 9.5 (0.36) MIb 1.3 (0.10) 7.0 (0.21) EQS 1.1 (0.07) – SQS 2.5 (0.32) 2.5 (0.34) Method BRMSE(R ) BRMSE(R ) MIa 0.011 (0.0029) 0.299 (0.0674) MIb 0.003 (0.0003) 0.044 (0.0017) EQS 0.003 (0.0002) – SQS 0.005 (0.0009) 0.007 (0.0011) Fig. 1. Plot of 200 re-estimated parameters vs. true parameters for (a) . and (b) R using the estimators (.^SQS,R^SQS) (red), (R.SQS,R SQS) (blue) and (.^EQS,R^EQS) (black). Furthermore, although the BRMSE of the new SQS estimation method using exact statistics is higher than that of MIa and MIb, the BRMSE of the EQS estimation method is smaller than that of MIb, i.e., EQS even outperforms MIb. On the one hand this demonstrates that the quasi-score estimation method can use the available statistical information in an optimal way, thus yielding more precise parameter estimates. On the other hand it shows that the applied simulation-based implementation of the quasi-score method (SQS) cannot fully exhaust the theoretically possible potential. The reasons for the latter are that Monte Carlo simulation is involved and that in a few cases “wrong” roots are found (see the above discussion), leading to a higher BRMSE. However, Table 1 also shows that the BRMSE of the new SQS estimation method using the exact or discretized version of the statistics are of the same order of magnitude (cf. also Fig. 1) and that for the case of discretized data SQS clearly outperforms MIa and MIb. This is because in the SQS estimation method the observed and simulation-based statistics are equally treated with respect to the way of how the data is observed, namely, in both cases the estimation of the statistics is based on the discretized model realizations. In particular it is not important whether or not the characteristics (AA,LA, .A) are estimated unbiasedly. More generally, it is even irrelevant for the SQS estimation procedure whether the used statistics possess any direct interpretation, e.g., whether they are estimators of any certain model characteristics. This is one of the fundamental strengths of the proposed SQS estimation method. In the remainder of this section we assess the precision of the SQS estimates and compare it to the error predicted by the inverse quasi-information matrix. We generated N = 100 observations of the model at the .xed parameter value . =(15,0.15) and applied the above two kinds of statistics to these observations as we did in the .rst part. The SQS estimation method was applied, the resultant estimates are shown in Fig. 2 including the MIa and MIb estimates. Fig. 2a shows the comparable performance of the three estimators (MIa, MIb and SQS) in case of exact statistics. Fig. 2b shows again the problem generated by the bias of the estimators of LA and .A in case of discretized data leading to a substantial bias for the method-of-intensities estimators. The comparison of the empirical estimation error with the predicted error is based on the empirical mean 1 squared error matrix MSEM = N .Ni=1(. -.^i)(. -.^i)t either in the case of the exact statistics, denoted as MMSEM. MSEM, or for their discretized counterparts, M They read MMSEM = 2.03 -3.05 · 10-3 -3.05 · 10-3 1.56 · 10-5 , MMSEM = 2.68 -5.00 · 10-3 -5.00 · 10-3 2.80 · 10-5 . Eq. 15 suggests that the inverse of the quasi-information matrix (Eq. 14) should be very similar to Image Anal Stereol 2014;33:107-119 the mean squared error matrix of .^. We thus compare this to the average N . = 1 .I^(.^i)-1 . Rq×q (25) N i=1 of the approximated quasi-information matrices in the estimated parameter value for each of the 100 simulations, which read, respectively, 2.03 -4.27 · 10-3 .^= , -4.27 · 10-32.99 · 10-5 3.19 -2.86 · 10-3 ~ . = . -2.86 · 10-32.61 · 10-5 Both compare well to the corresponding observed mean squared error matrix. Fig. 3 compares them graphically. The exact quasi-score method provides a different quasi-information matrix, 1.30 -1.89 · 10-3 I(.)-1 = , -1.89 · 10-35.07 · 10-6 which would suggest a lower estimation error. This corresponds to the fact that this method is more precise. It can be expected that using more simulations would allow to increase the performance of the SQS method towards EQS. The theoretical and empirical evidence thus suggests that the inverse quasi-information might serve as an estimate for the precision of the method. More studies are, however, required to support generalizability of this result. DISCUSSION The new simulation-based quasi-likelihood method allows to estimate parameters for statistical models where neither the likelihood nor moments or characteristics can be computed directly. The user is only required to provide informative statistics and a method to simulate realizations of these statistics for different model parameters. Based on an iterative simulation procedure providing successively improved approximations to the quasi-score function, the estimation algorithm provides a numerical approximation to the ef.cient quasi-likelihood estimation method. The inverse of the quasi-information matrix computable on the approximation seems to provide a good measure of the estimation error. This allows estimation of model parameters in very general settings. (b) Fig. 2. Re-estimated parameters by SQS (red), MIa (green), MIb (blue) for 100 realizations of the data at (.,R)=(15,0.15) using the statistics (a) (ALA,LLA, .LA) or (b) (AAA,LAA, .AA). Obviously the procedure is still in its infancy. The global convergence is not guaranteed by the current algorithm and led to substantial errors in about 7% of the runs. The error is still substantially above the theoretical limit given by the exact quasi-likelihood method. The exact quasi-likelihood method can typically not be applied since the necessary ingredients (moments of statistics and their derivatives) often can not be computed. However, if they could be computed and the statistics are well chosen, one would typically expect better or similar performance to most known estimation procedures. ML estimators (b) Fig. 3. Re-estimated parameters by the SQS for 100 realizations of the data at (. ,R)=(15,0.15) using the statistics (a) (ALA,LLA, .LA) or (b) (AAA,LAA, .AA). The 2.-ellipses are given for the averaged inverse quasi-information (black), the empirical mean squared error matrix of the residuals (red), and the exact inverse quasi-information matrix of the exact quasi-likelihood­method (green). The .gure illustrates the matching of the .rst two. in exponential families are special cases of quasi-likelihood estimators. Quasi-likelihood estimators use moments more ef.ciently than typical method-of­moment estimators and can use more information. ACKNOWLEDGMENTS This research has been supported by the German Federal Ministry of Education and Research under grant 03MS603B/05M10OFA. We are grateful to Andr´e Tscheschel for the opportunity to use his Morph2D software. REFERENCES Altendorf H, Jeulin D (2011). Random-walk-based stochastic modeling of three-dimensional .ber systems. Phys Rev E 83:041804. Ballani F (2007). The surface pair correlation function for stationary Boolean models. Adv Appl Probab SGSA 39:1–15. Chiles JP, Del.ner P (1999). Geostatistics: Modelling spatial uncertainty. New York: J. Wiley & Sons. Chiu SN, Stoyan D, Kendall WS, Mecke J (2013). Stochastic geometry and its applications. Chichester: J. Wiley & Sons, 3rd ed. Cressie NAC (1993). Statistics for spatial data. New York: J. Wiley & Sons. Gaiselmann G, Froning D, Toetzke C, Quick C, Manke I, Lehnert W, et al. (2013). Stochastic 3D modeling of non-woven materials with wet-proo.ng agent. Int J Hydrogen Energ 38:8448–60. Godambe VP, Heyde CC (1987). Quasi-likelihood and optimal estimation. Int Stat Rev 55:231–44. Golub GH, Loan CFV (1996). Matrix Computations. Baltimore, MD, USA: JHU Press. Heinrich L (1993). Asymptotic properties of minimum contrast estimators for parameters of Boolean models. Metrika 31:349–60. Heyde CC (1997). Quasi-likelihood and its applications: A general approach to optimal parameter estimation. New York: Springer. Illian J, Penttinen A, Stoyan H, Stoyan D (2008). Statistical analysis and modelling of spatial point patterns. Chichester: J. Wiley & Sons. Kleijnen JPC, van Beers WCM (2009). Kriging metamodeling in simulation: A review. Eur J Oper Res 192:707–16. Lautensack C (2008). Fitting three-dimensional Laguerre tessellations to foam structures. J Appl Statist 35:985– 95. Liang KY, Zeger SL (1995). Inference based on estimating Image Anal Stereol 2014;33:107-119 functions in the presence of nuisance parameters. Statist Sci 10:158–73. Mardia KV (1996). Kriging and splines with derivative information. Biometrika 83:207–21. Mardia KV, Marshall RJ (1984). Maximum likelihood estimation of models for residual covariance in spatial regression. Biometrika 71:135–46. Matheron G (1973). The intrinsic random functions and their applications. Adv Appl Probab 5:439–68. Mecke KR (2001). Exact moments of curvature measures in the Boolean model. J Stat Phys 102:1343–81. Molchanov IS (1997). Statistics of the Boolean model for practitioners and mathematicians. New York: J. Wiley & Sons. Moller J, Waagepetersen RP (2003). Statistical inference and simulation for spatial point processes. Boca Raton: Chapman & Hall/CRC. Nott DJ, Ryd´en T (1999). Pairwise likelihood methods for inference in image models. Biometrika 86:661–76. Ohser J, ucklich (2000). analysis M¨F Statistical of microstructures in materials science. Chichester: J. Wiley & Sons. Ohser J, Schladitz K (2009). 3D images of materials structures. Weinheim: Wiley-VCH. Osborne MR (1992). Fisher’s method of scoring. Int Stat Rev 60:99–117. Redenbach C (2009). Microstructure models for cellular materials. Comp Mater Sci 44:1397–407. Sacks J, Welch WJ, Mitchel TJ, Wynn HP (1989). Design and analysis of computer experiments. Statist Sci 4:409–23. Santner T, Williams B, Notz W (2003). The design and analysis of computer experiments. New York: Springer. Schneider R, Weil W (2008). Stochastic and integral geometry. Berlin: Springer. Serra J (1982). Image analysis and mathematical morphology. London: Academic Press. Spall JC (2003). Introduction to stochastic search and optimization: estimation, simulation, and control. Hoboken, New Jersey: J. Wiley & Sons. Torquato S (2002). Random heterogeneous materials: microstructure and macroscopic properties.. New York: Springer. Tscheschel A R¨Statistik zur(2005). aumliche Charakterisierung gef¨ullter Elastomere. Ph.D. thesis, TU Bergakadmie Freiberg, Germany. Varin C (2008). On composite marginal likelihoods. Adv Statist Anal 92:1–28. Wackernagel H (2003). Multivariate geostatistics. Berlin: Springer. Wahba G (1990). Spline models for observational data. Philadelphia, PA: SIAM. Wilson RJ, Nott DJ (2001). Review of recent results on excursion set models. Image Anal Stereol 20:71–8. Zimmermann DL (1989). Computationally ef.cient restricted maximum likelihood estimation of generalized covariance functions. Math Geol 21:655– 72.