Image Anal Stereol 2003;22:133-145 Original Research Paper RANDOM CLOSED SET MODELS: ESTIMATING AND SIMULATING BINARY IMAGES Angeles M GALLEGO and Amelia SlmÖ Department of Mathematics, University Jaume I, 12071 Castellon, Spain. e-mail: simo@mat.uji.es (Accepted November 3, 2003) ABSTRACT In this paper we show the use of the Boolean model and a class of RACS models that is a generalization of it to obtain simulations of random binary images able to imitate natural textures such as marble or wood. The different tasks required, parameter estimation, goodness-of-fit test and simulation, are reviewed. In addition to a brief review of the theory, simulation studies of each model are included. Keywords: Boolean model, germ-grain model, parameter estimation, simulation. INTRODUCTION There are many practical situations in which the imitation of natural textures such as marble or wood is of great interest. In other words, apparently equal but random binary or gray level images must be obtained. Examples include textile and tile manufacture. Stochastic Geometry is a branch of Applied Probability that provides very powerful tools for this task. In particular, Random Closed Sets (RACS) models can be used. The most widely known and used Random Closed Set model is the Boolean model. In this paper we show how the Boolean model and a class of RACS models that is a generalization of it, can be used to obtain these simulations. With the exception of one case, we consider only binary images because the corresponding mathematical techniques are simpler and better developed. In any case, a grey level image can be transformed to a family of binary images by means of a family of thresholds. If we want to use a RACS model to simulate similar images to a natural one these steps should be followed. A model that fits the data should first be found. Then the goodness of this fit should be checked. Thirdly the parameters of this model should be estimated from the observed image. Finally the simulation can be carried out. All these steps are considered in this paper. A brief review of the theory of the Boolean model and the most important generalizations of it is provided. These include germ-grain models, three phased models and Poisson-Boolean models. Parameter estimation methods and simulation algorithms are proposed for each of these models. In order to evaluate the goodness of the fit, the Monte Carlo test should be used and this is also briefly reviewed in this paper. iMATLAB is a trademark of The MathWorks Inc. We have written a library of functions to be used with MATLAB1, to carry out the different tasks that we refer to: parameter estimation, goodness-of-fit test and simulation. This library is available at http://www3.uji.es/~simo. There is a vast literature about Random closed Set Models, mainly about Boolean Model. But many of the papers are focused on theoretical properties and most of them are devoted to a particular model. The goal of this paper is to put together the most useful and interesting models and to emphasize their practical aspects, doing them easy to use. Although the statistical methods proposed are mainly well-known, specially those of section “The Boolean model”, as far as we know they never have been used in other kind of models like those of Section “Non-homogeneous germ-grain model” and “Three-phase Boolean model or sinter textures”. The rest of the paper is organized as follows. The next section begins with the definition of RACS and general properties are reviewed. The second section is devoted to the Boolean model: definition, basic properties, parameter estimation and simulation. In the third section the same is carried out for the Boolean model generalizations. Finally, conclusions and future research are given in the fourth section. RANDOM CLOSED SETS DEFINITIONS AND ESTIMATORS Random closed sets are mathematical models for irregular random area patterns whose formal definition 133 Gallego MA et al : RACS models and binary images was provided by Matheron (1974). A random closed set (RACS) S is a random variable taking values in (j£",oI , the class of closed subsets in the Euclidean space, R2, with the cr-algebra generated by the Hit or Miss topology. The RACS models considered here will be, with the exception of one case, stationary and ergodic. Stationarity means that its probability distribution is invariant against translations. A RACS is said to be ergodic if its mean characteristics can be obtained from spatial averages of the functional of this RACS. The parameters of the probability distribution of a RACS can be classified into two types: aggregate and individual. The former are directly observable and can thus be easily estimated by using the ergodic property. Their definitions and estimators are common to all the RACS models. The latter are not directly observed, although they are of greatest interest when fitting a model to real data and they are specific to each model. Individual parameters have to be estimated through estimates of aggregate parameters and their estimation is very difficult except in the Boolean case. In this section the most important aggregate parameters of a general stationary RACS model are studied. The probability distribution of any general RACS is uniquely determined by its capacity functional (Matheron, 1974); Tz(K)=P(Zr)K^0/) (1) where K is any compact subset of R2. If S is observed in a window W, an unbiased estimator of T will be given by: T ˆK) A((E®K)nWeK)) A(WeK) where S © K ? is the dilation of S with K,W QK the erosion and for B c R2, A(B) denotes the area of B. We will call T ˆs the empirical capacity functional of S and it is a consistent estimator of Ts. The simplest aggregate parameter of a RACS is the area fraction that is defined as the mean of the area of S in a unitary window W: p = E(A(Zr\W))=P(0£Z). An unbiased estimator of p is: A(snW) p A(W) (2) A further, very useful aggregate functional parameter is the contact distribution function, which provides a kind of measurement of ‘size’ when identification of single particles is inappropriate. The definition of the contact distribution function depends on the choice of a structuring element B. If B is a compact set containing the origin, then the contact distribution function is defined by: HB(r) = 1 - 1-Ts(rB) 1 ¦p (3) for r > 0. The cases of particular practical importance are the linear contact distribution function where B is a segment of unit length and the spherical contact distribution function where B = B(0,1) is a unit disk. The contact distribution function can be estimated by combining the estimators of the capacity functional and the area fraction. The last three reviewed aggregate parameters are less important, they are defined only for Random Closed Sets that fulfill certain conditions of regularity (convexity, smooth boundary,...) and will be used only in the Boolean case. The specific convexity number, NA, of a stationary and isotropic RACS is defined as the average number of exposed tangent points in a given direction. The estimation of the specific convexity number can be performed as follows. We choose an arbitrary direction u and we count the number of tangent points in the direction u inside the window W, N(N+(u) CW). Thus N(N+(u)nW) NA A(W) is a strong consistent estimator of NA. The specific boundary length, LA of S is its expected boundary length per unit area. The following is an unbiased estimator (Weil and Weaieacker, 1984): L ˆ _ U{znW)-U(znsW) A~ AW] ' where 8+W is the upper-right boundary of W and U(Sn 8+W) is the length of the corresponding intersection. Simpler estimators, as for example, the length of the boundary of S inside W divided by the area of W are strong consistent but biased. Finally, the specific connectivity number %A of a RACS is defined as the expected connectivity number per unit area, roughly speaking the mean of the difference between the number of clumps and the number of holes per unit area. An unbiased estimator of the specific connectivity number is given by Xa x(zr)W)-x(zn8+W) A(W) where for B = Un(0,n), x2 = Un(0,m), x,- (x 1 ,x2 ) 139 Gallego MA et al : RACS models and binary images (b) generate p = Un(0,1) if p < X{xi)/X0, generate Z{ following the distribution of E0 and draw Z{ + x{ else x{ is deleted Fig. 3 shows some simulations of the example of the non-homogeneous germ-grain model given in this section with Gaussian radii. Fig. 3. Two simulations of a non-homogeneous germ-grain model whose intensity is proportional to the abcisa coordinate. Grains are random balls with Gaussian radii and parameters: a) X0 = 0.00001, E{R) = 10 and E(R2) = 104, b) fa = 0.000001, E{R) = 20 andE(R2) = 425. Simulation study A simulation study was again carried out in this section. 50 realizations of the example of the previously studied non-homogeneous germ-grain model were generated in a 512x512 window. The grains are random balls with Gaussian radii and all of them have parameters A0 = 2e~6, E(R) = 20 and E(R2) = 449. We estimated the parameters by using the method explained in Section “Parameter estimation”. The results are shown in Table 3 and, as can be seen, they were quite acceptable. x E ˆ R) E(R2) 3e"6(1.03e-6) 14.3(7.4) 344.5(134.1) Table 3. Results of the experimental study of Section 4.3.4: sample mean (standard deviance) of the estimates of 50 realizations with X0 = 2e~6, E(R) = 20 and E(R2) = 449. Gibbs process of non-intersecting grains This model relaxes the assumption of independence between germs and grains in such a way that the grains do not overlap. Their formal definition is as follows (see Stoyan et al., 1995). Definition 3 Let Q = R2 IK = {w = (z,K) : z G R2;KeIK} Let A0 = {wn}n>i be a point process in £1 where: 1. {zn}n>i is a stationary point process in R2. 2. {Kn}n>i i.i.dK0. 3. {zn)n>\ and {Kn}n>i are independent. Let us consider the following neighborhood relation: wi ~ w2 iffZi n E2 / 0/ where Zn = Kn + zn. Let A = {wn} be a Gibbs process with density with respect to A0: p(wh...:wn) = zY[exp{a + 0(wuwj)} , where e(w w) = { ~°° i f"wi~w2, ^ !' ¦>' | 0 otherwise. The Gibbs process of non-intersecting grains is: E=l|S»- w„eA This model was studied in Mase (1986) and Stoyan (1989) with circular grains. In Ayala and Simo´ (1995) this model with elliptical grains was used to model nerve fiber. Parameter estimation Individual parameters of this model are a and the parameters of the probability distribution of K0. Their estimation was studied for the case of circular grains with random radii in Stoyan (1989) and is as follows. Let X be the intensity of the Gibbs process, m(r) the density function of the probability distribution of the radii of the Gibbs process while p(r) denotes the probability that a disk centered in the origin and radius r intersects the Gibbs process. All these parameters are directly estimable from the observation window. Let m0(r) be the density function of the probability distribution of the radius of K0. We have the following relation: Xm{r) = e~ap(r)m0(r). And the estimation of m0(r) is obtained: ˆ m ˆ(r)ea m ˆ0(r) with ä chosen so that: p ˆ(r) /»oo 0 m ˆ0(r)d Jo r=1. This method could be generalized, for example, to ellipses or line segments. 140 Image Anal Stereol 2003;22:133-145 Simulation Different methods of simulating general Gibbs processes can be found in the literature (Ripley, 1981; Van Lieshout, 2000), most of which are based on the so-called spatial birth-and-death (b-and-d) processes. The simulation begins with a start configuration which is then changed step by step, where points disappear (‘die’) and new ones are generated (‘born’). In our experiments, we have used a Metropolis-Hasting (M-H) type algorithm. A M-H algorithm is a discrete time Markov process where the transitions are defined in two steps, a proposal for a new state (in our case a birth or a death) is made that it is subsequently accepted or rejected based on the likelihood ratio of the new state compared to the old one. The spatial b-and-d process is a time continuous Markov process in which all transitions are accepted with probability one, but, the process stay in state x for an exponentially distributed random sojourn time. We have used one of the simplest M-H algorithm: births and deaths are equally likely and sampled uniformly. Let W be the window where the process is generated. Algorithm 2 1. An initial configuration So with k non intersecting disks is generated, with k being arbitrary. Make n = k, Z = Zq. 2. with probability \: (a) Generate a new point x' uniform in W and radius r' from mo(r).IfZr)B(x',l')^0/ the new disk is rejected. If not, i. If n < e~aA(W) — 1 the new disk is accepted. ii. If n > e~aA(W) the new disk is accepted with probability e ^ ' . (b) Choose (x, r) at random ofZ. i. Ifn > e~aA(W) this disk dies. ii. If n < e~aA(W) - 1 the disk dies with probability e-anAW) . Although we have choose this algorithm because it is simpler to implement, we have to warn that, in general, this algorithm could result in a low acceptance probability, especially for models exhibiting strong interaction, that would make it inefficient. In this case we should use other kind of proposals or well a spatial b-and-d process. See Clifford and Nicholls (1994) for an excellent comparison. Another problem here is to asses how long the chain should run in order to achieve the desired approximation. This problem could be solved by using the recently developed perfect or exact simulation (Propp and Wilson, 1996). Fig. 4 shows some simulations of this model. a) b) Fig. 4. Two simulations of a Gibbs process of non-intersecting grains with: a) a = 3 and b) a — 4, the grains are random balls with radii following a Gaussian distribution N(20,36). Simulation study In this section, a simulation study is again carried out to experimentally test the performance of the previously described parameter estimation procedure. In order to do this, we simulated 20 Gibbs processes of non-intersecting grains with a = 3 and 20 with a = 4. The grains are in both cases random balls following a Gaussian distribution with first and second moment 20 and 449 respectively. The sample means and standard deviation of the estimates are shown in Table 4. The results are in general fairly good. The results for a = 4 are a slightly better, they are less variable and their mean is nearer to the real value. ä 1 e ˆ r) 1 ER 2) 1 a = 3 2.7(1.7) 23.5 (2.9) 572.5 (46.4) 1 a = 4 4.4 (0.67) 22.9 (3.6) 557.41 (38.2) Table 4. Results of the experimental study of Section 4.3.8: sample means (standard deviation) of the estimates of 20 realizations with E(R) = 20, E(R2) = 449 and a = 3 and a = 4, respectively. THREE-PHASE BOOLEAN MODEL OR SINTER TEXTURES The departure from the Boolean model studied in this section consists of considering a three-phased Boolean model. This model is defined in a very simple way. Suppose that X\ and X2 are independent Boolean models. Then we define: Si =X\, z2 = X2nXi, s3 = X 2 cnX[, 141 Gallego MA et al : RACS models and binary images and we call (S1,S2,S3) the three-phased Boolean model. Note that the complete Boolean model X1 forms component 1. Component 2 can be interpreted as a pattern destroyed in part by X1 and S3 is the background. This three-phased model was introduced in Serra (1982) and applied to the description of sinter materials. It can easily be shown that the area fraction fulfills: Parameter estimation Since X1 is a completely observable Boolean model its individual parameters can be estimated using the methods from section “The Boolean Model”. In order to estimate the parameters of X2, we can use the equation of the capacity functional for union-censoring given in Molchanov (1997): t (is\ TXiUX2(K) - TXl(K) Tx2 K) =11111111)-----¦ (15) Using Eq. (15) the contact distribution function of X2 can be estimated and the parameter of X2 can be estimated using the minimum contrast method applied to the empirical logarithmic contact distribution function as explained in Section “The Boolean model”. Simulation The simulation of this model is trivial when its definition and the algorithm 1 are taken into account. Fig. 5 shows a simulation of a three-phased Boolean model, both Boolean models have intensity X = 0.0005 and random balls with Gaussian radii with parameters E(R) = 20 and E(R2) = 449. A line process is a random collection of lines in the plane which is locally-finite; i.e. only a finite number of lines hit each compact planar set. A line process can be seen as a particular case of point process on R2, because we can parameterize it as: (p, ) e IRx Fig. 5. A simulation of a three-phased Boolean model. Both Boolean models have intensity X = 0.0005 and random balls with Gaussian radii with parameters E(R) = 20 and E(R2) = 449. SIMULATION STUDY In this section we study the performance of the parameter estimation procedure, again by means of a simulation study. We applied the previously explained procedure to 20 simulations of a three-phased Boolean model. X1 and X2 are both Boolean models with intensity X = 0.0005 and random balls with Gaussian radii with parameters E (R) = 20 and E(R2) = 449. The results are provided in Table 5. As was expected, the estimates of X1 parameters are more precise because the model is completely observed. THE BOOLE-POISSON MODEL The last Boolean model generalization studied in this paper is obtained by restricting the Boolean realizations to remain inside the polygons generated by a Poisson process of lines. It is called the Boole-Poisson model (Serra, 1982). (0,27r] c IR2, with p being the perpendicular distance of the line to the origin and the angle that the perpendicular makes with the positive x-axis. Under this parameterization, the set G of all the lines in the plane is equivalent to: Table 5. Results of the experimental study of Section 4.5: sample means (standard deviation) of the estimates of 20 realizations with E(R) = 20, E(R2) = 449 and X = 0.0005 for both Boolean models. X E ˆ R) E(R2) Boolean 1 0.00056 (0.00023) 22.53 (14.52) 483.8 (313.6) Boolean 2 (hidden) 0.00068 (0.0005) 26.07 (25.5) 568.14(629.6) 142 Image Anal Stereol 2003;22:133-145 S = {(p,) : 0 12 go to (5) 3. v = Un(0,1) 4. p = -22:0 = 2nv: go back to (1) 5. v = Un(0,1):0 = 27rv 6. T = V2(sin(T then (1) 8. p = -22: go back to (1). Fig. 6 shows two simulations of this model. 143 Gallego MA et al : RACS models and binary images (a) (b) Fig. 6. Two simulations of a Poisson-Boolean model where the grains are random balls with radii following a Gaussian distribution with parameters (a) X = 0.0001, d = 0.001, E(R) = 20 and E{R2) = 449; (b) I = 0.0005, d = 0.002, E(R) = 10 andE{R2) = 109. BE HE HE EEEH FfflTFR Fig. 7. Pixel patterns used to detect lower tangent points. Simulation study In this section, we again carry out a simulation study to show the performance of the former method. We simulated 20 Poisson-Boolean model realizations with parameters A = 0.002, 6 = 0.00005. The grains of the Boolean model are balls with random radii following a Gaussian distribution with E(R) = 20 and REFERENCES Ayala G (1988). Inferencia en Modelos Booleanos. Tesis Doctoral. Universidad de Valencia. Ayala G, Simo´ A (1995). Random closed sets and nerve fiber degeneration. Adv Appl Probab 27:293-305. Clifford P Nicholls G (1994). Comparison of birth-and-death and metropolis-hastings markov chain monte carlo for the strauss process. Tech. rep., Manuscript. Oxford University. Cressie N (1993). Statistics for spatial data. Wiley Series in Probability and Mathematical Statistics. Diggle P (1981). Binary mosaics and the spatial pattern of heather. Biometrics 37:531-9. E(R2) = 449. The results can be seen in Table 6. They are relatively good in the estimation of A and 0, a slightly less so for the mean of the radius and not so good for the variance. CONCLUSIONS In this paper we reviewed the use of Random Closed Sets as a powerful tool in simulating random binary images able to imitate natural textures. The Boolean model and a class of RACS models that is a generalization derived from it have been studied. For each model simulation algorithms and parameter estimation procedures were given. In addition a library of functions to be used with MATLAB was written to carry out the different tasks that we refer to. Future work could include other generalizations of Boolean Model with different point processes like Strauss point processes or area-interaction point processes. ACKNOWLEDGMENTS We would like to thank to the anonymous referees for correcting errors and improving the paper. This work has been partially funded by grants GV01-307 andBSA2001-0803-C02 Diggle P (1983). Statistical analysis of spatial point patterns. Academic Press. Dupac V (1980). Parameter estimation in the poisson field of disc. Biometrika 67:187-90. Hahn U, Micheletti A, Pohlink R, Stoyan D, Wendrock H (1999). Stereological analysis and modeling of gradient structures. J Microsc 195:113-24. Last G, Holtmann M (1999). On the empty space function of some germ-grain models. Pattern Recognition 32:1587-600. Lyman T (1972). Metals Handbook. Am. Soc. Metals. Ohio. Margalef R(1974). Ecolog´ia. Omega. Barcelona. Table 6. Results of the experimental study of Section 4.7.1: sample means (standard deviation) of the estimates of a Poisson-Boolean model with I = 0.002, 0 = 0.00005, E(R) = 20 and E(R2) = 449. X 0 E ˆ R) e ˆ r2) 0.001 (5.84e7) 0.00008 (3.41e-5) 15 (10.39) 130.6 (54.6) 144 Image Anal Stereol 2003;22:133-145 Mase S (1986). On the possible form of size distributions for gibbsian processes of mutually non-intersecting balls. Appl Prob :646–59. Matheron G (1974). Random Sets and Integral Geometry. New York: J. Wiley and Sons. Molchanov I (1997). Statistics of the Boolean Model for Practitioners and Mathematicians. Wiley Series in Probability and Mathematical Statistics. Neyman J (1939). On new class of contagious distributions, applicable in entomology and bacteriology. Ann Math Statist 10:35–57. Neyman J, Scott E (1979). Statistical aproach to problems of cosmology (with discussion). R Statist Soc B 20:1–43. Plaza M (1991). Contrastes en modelos germen y grano. Tesis Doctoral. Universidad de Valencia. Propp J, Wilson D (1996). Exact sampling with coupled markov chains and applications to statistical mechanics. Proceedings of the Seventh International Conference on Random Structures and Algorithms 9:223–52. Rataj J, Saxl I (1997). Boolean cluster models: mean cluster dilations and spherical contact distances. Math Bohem 122:21–36. Ripley B (1981). Spatial Statistics. J. Wiley and Sons. Ripley B (1987). Stochastic simulation. J. Wiley and Sons. Saxl I, Rataj J (1996). Spherical contact and nearest neighbour distances in boolean cluster fields. Acta Stereol 15:91–5. Serra J (1982). Image analysis and mathematical morphology. Academic Press. Stoyan D (1989). Statistical inference for a gibbs point process of mutually non-intersecting discs. Biometrik J 31:153–61. Stoyan D, Kendall W, Mecke J (1995). Stochastic Geometry and its applications. Chichester: J. Wiley and Sons. Stoyan D, Stoyan H (1994). Fractals, random shapes and point fields. Methods of geometrical statistics. Wiley series in probability and mathematical statistics. Van Lieshout M (2000). Markov point processes and their applications. Imperial College Press. Weil W, Weaieacker J (1984). Densities for stationary random sets and point processes. Adv Apply Probab 16:324–46. 145