Image Anal Stereol 2014;33:219-224 doi:10.5566/ias.1073 Original Research Paper CONDITIONS FOR EXACT CAVALIERI ESTIMATION M´, OMEZ2 AND LUIS M. ONICA TINAJERO-BRAVO1 GUILLERMINA ESLAVA-G ´CRUZ-ORIVEC,3 1Graduate studies in Mathematics, National Autonomous University of Mexico (UNAM), Ciudad Universitaria, 04510, D.F., Mexico; 2Department of Mathematics, Faculty of Sciences, National Autonomous University of Mexico (UNAM), Ciudad Universitaria, 04510, D.F., Mexico; 3Department of Mathematics, Statistics and Computation, Faculty of Sciences, University of Cantabria, Avda. Los Castros s/n, E-39005 Santander, Spain. e-mail: tinajerobm@gmail.com, eslava@matematicas.unam.mx, luis.cruz@unican.es (Received October 25, 2013; revised January 20, 2014; accepted March 19, 2014) ABSTRACT Exact Cavalieri estimation amounts to zero variance estimation of an integral with systematic observations along a sampling axis. A suf.cient condition is given, both in the continuous and the discrete cases, for exact Cavalieri sampling. The conclusions suggest improvements on the current stereological application of fractionator-type sampling. Keywords: Murthy-Gundersen arrangement, order statistics, smooth fractionator, symmetric population, systematic sampling, zero variance estimation. INTRODUCTION Cavalieri sampling, namely systematic sampling along an axis, is highly ef.cient to estimate the population total whenever the measurement function satis.es certain conditions. In extreme cases the estimation may be exact, that is, the estimation variance may be zero. A trivial example arises when the measurement function is constant and the number of observations is a .xed integer (see, for instance, Fig. 2a in Mattfeldt, 1989). As a second example, sampling a .xed even number of Cavalieri observations along the base of an isosceles triangle will always yield the exact area of the triangle (see Fig. 2f in Mattfeldt, 1989). Such properties have a natural counterpart in the discrete case, namely whenever the population consists of a .nite number of units. Murthy (1967, p. 165) considers a .nite population in which the relevant values associated with the units follow a linear trend. He then de.nes “balanced systematic sampling” (bss), and notes that, if the population and the sample sizes are .xed even integers, then the estimation of the population total is exact. Actually, bss was rediscovered in a stereology context by Gundersen (2002), who called it the “smooth fractionator” design. Under the smooth fractionator arrangement a population following the linear trend is rearranged into a pattern analogous to an isosceles triangle, hence Murthy’s observation. In this note a suf.cient condition is given in the continuous and discrete cases for Cavalieri estimation to be exact. A simple procedure is given to construct measurement functions satisfying such condition. It is also shown that the smooth fractionator arrangement of any such population will also yield exact Cavalieri estimation. A different problem is how to arrange an arbitrary population so that the variance of the Cavalieri estimator is minimal (“best possible arrangement”, see Fig. 2 in Gundersen, 2002). Intents pioneered by Sethi (1965) suggest that no general procedure may exist. Nonetheless the results given here suggest simple recommendations (see Discussion) that may help in a general case. PRELIMINARIES The basic stereological problem considered here is to estimate the volume of a .xed three dimensional subset from parallel systematic sections a constant distance apart, namely from planar Cavalieri sections. In this case the measurement function is section area. In the continuous case the general target parameter is the integral . Q =f (x)dx , (1) -. where f : R› R+ is a bounded non-random function, called the measurement function, which is square integrable in a bounded domain D . R (consisting of .nite union of non empty closed intervals on a sampling axis), and vanishes outside D. Without loss of generality, in the sequel it is convenient to TINAJERO-BRAVO M ET AL: Exact Cavalieri estimation assume that D =[0,2h], h > 0. The standard unbiased where µ = Q/(2h) is the mean value of f (x) in the estimator of Q is the Cavalieri estimator, interval [0,2h]. Q = T . f (z + kT ) , (2) Proof. From the de.nition (Eq. 8) and the condition k.Z C2, it trivially follows that, where T > 0 is the sampling period, namely the m-1 Qh Q constant distance between consecutive observations, . = Q , m . N . D Qm(z)= T = · m · h h whereas z ~ UR[0,T ) is a uniform random variable in m k=0 the interval [0, T ). (10) In the discrete case the target parameter is the total Remark 1. Under the assumption C1, the identity N Qm(z)= Q for all z . [0,h/m), for some m > 1 , . Q = yi (3) (11) i=1 of a population consisting of a .nite number N . N of units. The values associated with the units are, Y := {y1,y2,...,yN } . (4) For convenience we set yi = 0, i ./{1,2,...,N}. The standard unbiased estimator of Q is, Q Q = T . yz+kT , (5) does not imply the condition C2: Q1(z)= Q for all z . [0,h) – thus, the condition C2 is suf.cient but not necessary. As an example, consider the following measurement function, . . 2x, x . [0,1/4) . [1/2,3/4) , f (x)= 3 - 2x, x . [1/4,1/2) , (12) . 2 - 2x, x . [3/4,1) . k.Z where T . N is the (.nite) sampling period and z is a uniform random integer from the set {1,2,...,T }. CONTINUOUS CASE MAIN RESULT (CONTINUOUS CASE) Consider the following special case of the Cavalieri estimator (Eq. 2) for which the sample size is a .xed integer n, namely, n-1 2h . Q (z)= T f (z+kT ), T = , n . N, z ~ UR[0,T ) . It can be veri.ed that Qm(z)= 1 = Q is constant for even m and for all z . [0,1/(2m)), whereas 4x + 1, x . [0,1/4) , f (x)+ f (x + 1/2)= 4 - 4x, x . [1/4,1/2) , (13) is not. IMPLICATIONS (CONTINUOUS CASE) Construction Proposition 1 suggests a construction method for a measurement function f (x), x . [0,2h], which satis.es n Eq. 9. De.ne a square integrable function . : [0,h] › [0,Q/h]. Then, k=0 (6) Besides, consider the following assumption, .(x), x . [0,h] , Q/h - .(x - h), x . (h,2h] . (14) C1: n = 2m, m . N, .xed, (7) f (x)= that is, n is a .xed even number. Then, the Cavalieri estimator may be written as To provide an intuitive interpretation, suppose that .(x) > 0, x . [0,h]. Then the graph of .(x) splits m-1 h the rectangle R :=[0,h] × [0,Q/h] into two non­ . Qm(z)= T { f (z + kT )+ f (z + kT + h)} , T = . overlapping regions R- (lower region) and R+ (upper m k=0 (8) region), that is, R = R-. R+, R-. R+= 0. Thus the /region under the measurement function f (x) is the Proposition 1. Under the assumption C1, a suf.cient union of R- and a translate of the re.ection of R+ condition that Qm(z)= Q for each m . N and all with respect to the upper side of R, see Fig. 1. If it z . [0,T ), is that Q1(z)= Q for all z . [0,T ), namely, is desired that f (x) is continuous, then choose .(x) C2: f (z)+ f (z + h)= 2µ,(constant) , to be continuous in [0,h] with .(h)= Q/h - .(0), (9) f or all z . [0,h] , (Fig. 1c, d). Image Anal Stereol 2014;33:219-224 Fig. 1. (a), (b) Construction of a measurement function f (x) – according to Eq. 14 – for which Cavalieri estimation is exact for any .xed even number of observations (red line segments in (b)). The target parameter Q, namely the grey area under the measurement function in (b), is equal to the area of the rectangle in (a). The sum of any pair of Cavalieri observations a distance h apart in (b) is always equal to the height of the rectangle in (a), namely Q/h, see Eq. 9. (c), (d) Example of a continuous measurement function with the same property, see text. Symmetric measurement function Proposition 2. Suppose that the measurement function satis.es the following symmetry property, f (x)= f (2h - x), x . [0,2h] . (15) Then, the condition C2 (Eq. 9), is a necessary and suf.cient condition that the measurement function satis.es the additional symmetry property: f (h/2)- f (x)= f (h-x)- f (h/2), x . [0,h] . (16) Proof. First we prove the suf.ciency namely, if the conditions (Eq. 9) and (Eq. 15) hold, then Eq. 16 holds. The identity (Eq. 15) implies f (h - x)= f (h + x), x . [0, h], which together with condition C2 yields f (x)+ f (h - x)= 2µ, x . [0,h]. Thus f (h/2)= µ and therefore f (x)+ f (h - x)= 2 f (h/2), x . [0,h], which is Eq. 16. Now we prove the necessity, that is, if Eq. 15 and Eq. 16 hold, then Eq. 9 holds. Substitution of f (h - x) with f (h + x) in the r.h.s. of Eq. 16 yields f (x)+ f (h + x)= 2 f (h/2). Integration of both sides of the preceding identity from x = 0 to x = h, yields Q = 2hf (h/2), whereby f (h/2)= µ, and the identity (Eq. 9) follows. D Thus, under the assumption C1 (Eq. 7), if a measurement function satis.es the two symmetry properties (Eqs. 15 and 16), then C2 holds, and by Proposition 1 it will yield exact Cavalieri estimation. Example 1. The ordinary symmetry property (Eq. 15) alone does not warrant exact Cavalieri estimation. For instance, the measurement function x2 , x . [0,1) , f (x)= (17) (2 - x)2 , x . [1,2] , satis.es f (x)= f (2 - x), x . [0,2] but not the condition (Eq. 16), and therefore it does not yield exact Cavalieri estimation. Example 2. The following measurement function 1 +(x - 1)3 , x . [0,2] , f (x)= (18) 1 - (x - 3)3 , x . (2,4] , (see Fig. 2) satis.es the symmetry conditions (Eqs. 15, 16) and hence the condition (Eq. 9). Thus, any Cavalieri sample of a .xed even size will yield the target parameter Q = 4 without error. Further examples are considered in Ch. 4 of Tinajero-Bravo (2014). Fig. 2. Graph of the symmetric measurement function of Example 2. An even number of Cavalieri observations (red line segments) will always yield the target parameter Q without error. DISCRETE CASE MAIN RESULT (DISCRETE CASE) Consider Cavalieri sampling of the population Y (Eq. 4), with the following assumptions, D1: n = 2m, m . N,.xed, D2: N = 2M, M . N, .xed, (19) D3: T = M/m . N. TINAJERO-BRAVO M ET AL: Exact Cavalieri estimation Thus, both the sample and the population sizes are .xed even integers and the latter is multiple of the former. Note that, unlike the continuous case, m does not necessarily take consecutive values 1,2,...,N/2 for .xed N. Now the unbiased Cavalieri estimator (Eq. 5) may be written, m-1 Qm(z)= T . {yz+kT + yz+kT +M} , (20) k=0 z ~ UR{1,2,...,T } . Proposition 3. Under the assumptions D1–D3, a suf.cient condition that Qm(z)= Q for each m . N and all z .{1,2,...,T }, is that Q1(z)= Q for all z .{1,2,...,M}, namely, D4: yz + yz+M = 2µ, (constant) , f or all z .{1,2,...,M} , (21) where µ = Q/(2M) is the mean value of yx, x = 1,2,...,N. Proof. Analogous to the proof of Proposition 1. D Remark 2. Under the assumptions D1–D3, the condition D4 is not necessary for the identity Qm(z)= Q to hold for each m . N and for all z .{1,2,...,T }. As an example, consider the following discrete population, Y = {6,6, 3,12,8, 4,1,3,11,5,7,6} . (22) Because N = 12, conditions D1 and D3 imply that m .{1,2,3}. It can readily be veri.ed that Q2(z)= Q3(z)= 72 = Q, whereas Q1(z) Q for all = z . {1,2,...,6}. Remark 3. If conditions D1–D4 hold, then the population Y consists of M pairs of element values such that the sum of each pair is equal to 2µ. An element of any such pair is called a ’match’ of the other element in the pair, and the pair itself is called a ’matched pair’. The preceding assertion is easy to verify because, if there is a k .{1,2,...,N} for which the element yk does not have a match in the population, then Eq. 21 fails to hold for z = k. Remark 4. Under conditions D1–D4, any Cavalieri sample consists of m matched pairs. This is easy to verify bearing in mind the form of the r.h.s. of Eq. 20 and the identity in Eq. 21. IMPLICATIONS (DISCRETE CASE) Construction The rectangular construction idea proposed for the continuous case extends here with the proper adaptation. In practice the populations of interest are usually discrete, and therefore their elements are often easy to rearrange prior to sampling. Relevant tools in this context are the order statistics. The order statistics {y(k),k = 1,2,...,N} of the population Y are the successive elements of the population resulting when the values {yk,k = 1,2,...,N} are sorted in non decreasing order of magnitude, namely, y(1) . y(2) . ··· . y(N-1) . y(N) . (23) Although Corollary 1 below is fairly direct, it is stated as such for convenience of exposition. Corollary 1. If condition D4 holds, then the order statistics of the population Y may be expressed as follows oo Y0:= y(1),y(2),...,y(M),2µ - y(M),...,2µ - y(1) . (24) Proof. For simplicity, and without loss of generality, suppose that the N values from Y are different. Then, in any matched pair one of the two element values is less than µ, whereas the other is necessarily larger than µ because the sum of both matched values is 2µ, see Remark 3. Thus the population consists of M values less than µ — which are precisely the .rst M elements of Y0 — and another M values larger than µ, which are the last M elements of Y0. D Symmetries (discrete case) If the identity (Eq. 21) holds, then the order statistics of the population Y are symmetric about the population mean µ, that is, µ - y(k)= y(N-k+1) - µ, k = 1,2,...,M . (25) In fact, Eq. 24 implies that y(N-k+1)= 2µ - y(k), k = 1,2,...,M , (26) which is equivalent to Eq. 25. The Murthy-Gundersen smoothing arrangement If the population Y satis.es the property D2, (mid Eq. 19), then a Murthy-Gundersen arrangement (called the smooth fractionator arrangement in Gundersen, 2002) is, oo YMG := y(1),y(3),...,y(N-1),y(N),...,y(4),y(2) . (27) Thus, Y is .rst sorted in non decreasing order of magnitude to form the order statistics YOS = oo y(k),k = 1, 2,...,N . Let Yo, Ye denote the two Image Anal Stereol 2014;33:219-224 subsets of odd and even orders from YOS, respectively, and let the subset Yr denote the reverse of Ye. Then e choose among YMG = {Yo,Yr}, or its reverse Yr MG, with e probability 1/2. Corollary 2. If the identity (Eq. 21) holds for the population Y, then the corresponding Murthy-Gundersen arrangement YMG also satis.es the identity. Proof. Write the identity (Eq. 27) as follows, ** * YMG = {y1, y2,...,yN } , y * = y(2i-1), i = 1,2,...M , (28) i y * = y(N-2i+2), i = 1,2,...M . i+M By replacing k with 2i - 1 on both sides of Eq. 26, the required identity y * i + y * = 2µ is obtained. D i+M Corollary 3. Suppose that the order statistics of the population Y are symmetric about µ (Eq. 25), the assumptions D1–D3 hold, and there is an arrangement of Y such that, whenever y(i) belongs to a Cavalieri sample for any i .{1,2,...,N}, the order statistic y(N-i+1) also does. Then, the identity in Eq. 21 holds. Proof. For m = 1 the Cavalieri estimator (Eq. 20) reads, Q1(z)= M(yz + yz+M), z .{1,2,...,M} . (29) Without loss of generality, suppose that the N population values are different. For any .xed z . {1,2,...,M} there is a unique index i .{1,2,...,N}such that yz = y(i), whereby, Q1(z)= M(y(i)+ yz+M) . (30) Corollary 3 states that, if y(i) belongs to a Cavalieri sample, then y(N-i+1) also does. Therefore, Q1(z)= M(y(i)+ y(N-i+1)) . (31) Moreover, if the order statistics are symmetric about µ, then the identity (Eq. 26) holds. Thus, Q1(z)= 2µM which, equated to the r.h.s. of Eq. 29, yields the identity (Eq. 21). D Remark 5. The Murthy-Gundersen arrangement (Eq. 27), for instance, satis.es Corollary 3. Example 3. The discrete population Y = {3, 1,4,2,7,9,6,8} (32) has even size N = 2M = 8 and satis.es the identity (Eq. 21). Therefore, Cavalieri estimation of the population total Q = 40 is exact for n = 2,4, (namely for T = 4,2 respectively). Note that n = 6 is not eligible because T = 8/6 is not an integer. The corresponding order statistics Y0 = {1,2, 3,4,6,7,8,9} (33) are symmetric about µ = 5, namely 5 - 1 = 9 - 5, 5 - 2 = 8 - 5, etc., Eq. 25. The two possible Murthy-Gundersen arrangements are, YMG = {1,3,6, 8,9,7,4,2}, Yr = {2,4,7, 9,8,6,3,1}. (34) MG Note that both arrangements satisfy the identity (Eq. 21), and hence they also yield exact Cavalieri estimation (Corollary 2). DISCUSSION Propositions 1 and 3 constitute the core of this note – they establish suf.cient conditions for Cavalieri sampling to yield the target parameter Q without error for any even sample size from a population of even size. These results constitute a theoretical guideline only, because the identities (Eq. 9) and (Eq. 21), the construction (Eq. 14), etc., involve the unknown parameter Q. On the other hand, the Murthy-Gundersen arrangement (Eq. 27) involves the values {yk} of the target population Y , which are usually unobservable prior to sampling. Typically in stereology, the kth sampling unit is an opaque block of material containing yk particles, which are not directly observable. Thus the mentioned arrangement has to be based on the apparent size of the blocks (Gundersen, 1986; 2002). The theory would apply exactly if yk was proportional to say the observable volume, or the weight xk of the kth block. Failing this, exact estimation is hardly attainable in practice. Notwithstanding the foregoing remarks, practical recommendations stemming from this note may be the following: (i) Aim at an even population size N, that is, start with an even number of blocks. (ii) Fix an even sample size n, ensuring that the sampling period T = N/n is an integer. The preceding two recommendations are likely to help in practice, but they are only speculative and do not need to reduce estimation variance in all cases. Hitherto it has largely been advocated to start with a population whose values follow an approximately linear trend (i.e. consisting of a smooth gradation of block sizes, from rather small to fairly large) followed by a Murthy-Gundersen arrangement, with little concern about the sample size. In Fig. 3a from Cruz-Orive and Weibel (1981), a .xed sample size n was advocated and ensured for Cavalieri sampling by adopting a distance T = H/n between sections, where H was the caliper length of the object, namely TINAJERO-BRAVO M ET AL: Exact Cavalieri estimation the orthogonal projected length of the object onto the sampling axis. Later, the in.uential paper of Gundersen and Jensen (1987) relaxed the restriction of .xing n, because H may be dif.cult, or at least laborious, to measure in the laboratory. In fact, the target object is often embedded in agar (which is not transparent) prior to slicing, see for instance Michel and Cruz-Orive (1988). Unfortunately the latter strategy has prevailed also in the discrete case, and specially in the fractionator context, in which .xing N and n should usually be easy. This is strange because it is well known that allowing the sample size n to be random increases the so called Zitterbewegung, which often constitutes an important contribution to the total estimation variance (Ki^eu et al., 1999; Garc´ia­Fi~nana and Cruz-Orive, 2000). The bene.cial effect on the estimation accuracy of adopting even and .xed population and sample sizes would still be enhanced if the population consisted of “matched pairs” of blocks, (Remark 3), and if the Murthy-Gundersen arrangement was then used (Corollary 2). The jumps of the measurement function, or of its derivatives, also contribute to the variance (Ki^eu et al., 1999; Garc´ia-Fi~nana and Cruz-Orive, 2004). However, in the ideal case in which N, n are both even, the population consists of matched pairs, and the Murthy-Gundersen arrangement is adopted, the variance tends to vanish in spite of the presence of any jumps among consecutive population values in the arrangement. Fig. 1b illustrates this effect in the continuous case. ACKNOWLEDGMENTS The authors are grateful to an anonymous referee for a number of constructive remarks. MTB acknowledges a scholarship for graduate estudies of the Mexican National Council for Science and Technology (CONACYT). LMCO acknowledges support of the Ministry of Education and Science I+D Project MTM-2009–14500–C02–01. REFERENCES Cruz-Orive LM, Weibel ER (1981). Sampling designs for stereology. J Microsc 122:235–72. Garc´ia-Fi~M, Cruz-Orive LM (2000). New nana approximations for the variance in Cavalieri sampling. J Microsc 199:224–38. Garc´ia-Fi~nana M, Cruz-Orive LM (2004). Improved variance prediction for systematic sampling on R. Statistics 38:243–72. Gundersen HJG (1986). Stereology of arbitrary particles. A review of unbiased number and size estimators and the presentation of some new ones, in memory of William R. Thompson. J Microsc 143:3–45. Gundersen HJG (2002). The smooth fractionator. J Microsc 207:191–210. Gundersen HJG, Jensen EB (1987). The ef.ciency of systematic sampling in stereology and its prediction. J Microsc 147:229–63. Ki^ eu K, Souchet S, Istas J (1999). Precision of systematic sampling and transitive methods. J Statist Plan Inf 77:263–79. Mattfeldt T (1989). The accuracy of one-dimensional systematic sampling. J Microsc 153:301–13. Michel RP, Cruz-Orive LM (1988). Application of the Cavalieri principle and vertical sections method to lung: estimation of volume and pleural surface area. J Microsc 150:117–36. Murthy MN (1967). Sampling theory and methods. Statistical Publishing Society, Calcutta. Sethi VK (1965). On optimum pairing of units. Sankhya 27: 315–20. Tinajero-Bravo M (2014). Varianza del estimador del total bajo muestreo sistem´atico de algunas poblaciones sim´etricas. Unpublished PhD Thesis, UNAM, Mexico.