Image Anal Stereol 2016;35:1-14 doi:10.5566/ias.1412 Original Research Paper ON THE PRECISION OF CURVE LENGTH ESTIMATION IN THE PLANE ANA I. GÓMEZB, MARCOS CRUZ AND LUIS M. CRUZ-ORIVE Department of Mathematics, Statistics and Computation, Faculty of Sciences. University of Cantabria, E-39005. Santander, Spain. e-mail: gomezanab@gmail.com, marcos.cruz@unican.es, luis.cruz@unican.es (Received October 19, 2015; revised February 2, 2016; accepted February 2, 2016) ABSTRACT The estimator of planar curve length based on intersection counting with a square grid, called the Buffon- Steinhaus estimator, is simple, design unbiased and efficient. However, the prediction of its error variance from a single grid superimposition is a non trivial problem. A previously published predictor is checked here by means of repeated Monte Carlo superimpositions of a curve onto a square grid, with isotropic uniform randomness relative to each other. Nine curvilinear features (namely flattened DNA molecule projections) were considered, and complete data are shown for two of them. Automatization required image processing to transform the original tiff image of each curve into a polygonal approximation consisting of between 180 and 416 straight line segments or ‘links’ for the different curves. The performance of the variance prediction formula proved to be satisfactory for practical use (at least for the curves studied). Keywords: Buffon-Steinhaus estimator, Cauchy estimator, curve length, error variance prediction, Monte Carlo resampling, test grid. INTRODUCTION Consider a bounded, planar, piecewise smooth curve Y ⊂ R2 of finite length B, which is the target parameter. The Buffon-Steinhaus unbiased estimator of B is based on intersection counting with a square grid of test lines which is IUR relative to Y , namely, B̂ = π 4 ·T · I , (1) (Steinhaus, 1930 – for references see Baddeley and Jensen, 2005) where T is the gap length between test lines and I the total number of intersections. The abbreviation ‘IUR’, introduced in Miles and Davy (1976), means ‘isotropic uniform random’ – details are given below. In this paper we address the problem of predicting the variance Var(B̂) from a single grid superimposition. The interest in the problem is old. Moran (1966) gives an exact variance expression when Y is a straight line segment. For a general curve, he gives an approximation for the variance component due to the grid position, but not for the one due to orientation – see also Cruz- Orive (1989). To our knowledge the only estimators hitherto available for both variance components were given in Cruz-Orive and Gual-Arnau (2002). Such estimators are generally not unbiased – they are only theoretical approximations. Our purpose is to check the performance of such approximations by means of Monte Carlo simulations. Often Y is a model for profile boundary (hence the notation ‘B’ for its length), namely the boundary of the intersection between a closed surface (e.g., a cell membrane, or a grain boundary) and a sectioning plane. A planar curve may also be the orthogonal projection of a spatial curve onto an observation plane, or even a flexible linear feature in space flattened onto a planar surface, in which case the real and the observed lengths should approximately coincide. This is the case for the material used in this paper, namely DNA molecules which appear as open linear features on the observation plane (Podestà et al., 2004). Fig. 1a below, kindly provided in tiff format by Professor Alessandro Podestà, is Fig. 1(A) from the latter paper. The laborious steps leading to the conversion of the tiff image features into polygonal curves in vector graphics (Section Processing of the curve images), which constitute necessary prerequisites to perform automatic measurements, suggest that the intersection counting method to estimate feature length directly on the original images is an attractive option – first because it is fast, design unbiased and efficient, second because the error variance can be predicted and last, but not least, because the processed curve lengths will seldom coincide with the lengths of the original projections (automatic measurements are generally biased by a wide variety of artifacts depending on curve shape, image resolution, etc.). 1 GOMEZ AI ET AL: On the precision of planar curve length estimation Fig. 1. (a) The linear features, used as model curves in this paper, represent flattened projections of DNA molecules onto the observation plane. Redrawn from Podestà et al. (2004), with permission from John Wiley and Sons. (b) Raster bitmap image version of the linear features. (c) Vector graphics image with polygonal approximations of the original curves, ready for automatic Monte Carlo experiment. The two curves marked in red were selected for the results shown in Figs. 3, 4. CURVE MODEL AND LENGTH ESTIMATION CURVE MODEL Each of the selected curvilinear features shown in Fig. 1, namely each of the selected DNA molecules, was converted from tiff format into a polygonal curve in vector graphics following the steps described in the section Processing of the curve images. A fixed polygonal curve Y , henceforth called a ‘curve’, for short, may be represented as follows, Y = N⋃ i=1 yi , (2) where yi denotes the ith straight line segment or ‘link’ of Y , and N is the finite total number of links. The total length of Y is B = N ∑ i=1 bi , (3) where bi, denotes the length of yi. Fix a rectangular frame Ox1x2 in the plane of the curve. The position and orientation of the curve is determined by a unit vector (x, ω) rigidly attached to the curve, emanating from a point x ∈ R2, and making an angle ω ∈ [0,2π) with Ox1. The point x = (x1, x2) is called the associated point (AP) of Y , (Miles, 1974, see also Baddeley and Jensen, 2005) and the vector (x, ω) is called the associated vector (AV). The choice of the AV is arbitrary, but once it is chosen it must remain rigidly attached to the curve. When the AV is at its initial position (x = 0, ω = 0), then the corresponding initial position of the curve is denoted by Y0,0, so that the AP of Y0,0 is at the origin O. Here this AP was the lower left corner of the smallest rectangle (with its sides parallel to the reference axes) enclosing Y0,0, see Fig. 2a. Under a rigid motion defined by the vector (x,ω), (namely a rotation ω about the origin followed by a translation x, or equivalently a translation x followed by a rotation ω about the point x) the curve Y0,0 is transformed into the curve Yx,ω see Fig. 2b. The triplet (x1, x2, ω) constitute the ‘coordinates’ of the curve. The generation of a IUR curve in the plane requires the use of the motion invariant density associated with the coordinates of the curve. This density is called the kinematic density (Santaló, 1976), namely, dYx,ω = dx dω , (4) where dx = dx1 dx2 is the area element in the plane and dω is the arc element in the unit circle. In practice, the preceding concept implies that the AP of the curve 2 Image Anal Stereol 2016;35:1-14 must be uniform random (UR) in any bounded region of the plane, whereas the orientation of the AV must be isotropic random (IR) namely UR in the unit circle, and independent from the position of the AP. O AP Y 0,0 AP x ω Yx,ω O a b c x O J 0 Λ0 2 1 1 1 T T Fig. 2. (a) Curve number 6 from Fig. 1c, with its rigidly attached associated point (AP) and associated vector (arrow). Each of the straight line link segments constituting the curve (indistinguishable by eye) share the same AP and the same associated vector. (b) Result of applying a rigid motion to the curve, namely a translation of the AP to the point x followed by a rotation ω about x. (c) A IUR superimposition of the curve onto a square grid Λ0. The AP is UR in the fundamental tile J0 of the grid, whereas the rotation angle ω is independent and UR in the interval [0,2π) . Here the relevant intersection counts read: I11 = 1, I12 = 1, I21 = 1, I22 = 2. The procedure was programmed in Python to generate automatic replications. CAUCHY (ONE STAGE) ESTIMATION OF CURVE LENGTH A special case of Cauchy’s projection formula (Santaló, 1976) expresses the length B of the curve Y0,ω in terms of its total orthogonal projected length l(ω) onto Ox1, with all points counted in their multiplicity, see Cruz-Orive (1989, Fig. 1), or Cruz-Orive and Gual- Arnau (2002, Fig. 4a). Let {α1,α2, ...,αN} denote the fixed angles of the oriented links of Y0,0 with Ox1. Then, l(ω) = N ∑ i=1 bi|cos(αi−ω)| . (5) Integration from ω = 0 to ω = 2π yields Cauchy’s formula, B = 1 4 ∫ 2π 0 l(ω)dω . (6) Suppose that the angle ω is UR in the interval [0,2π), with probability element, P(dω) = dω 2π , ω ∈ [0,2π) , (7) and suppose also that l(ω) can be measured exactly. Then, by Cauchy’s formula B = π 2 E{l(ω)} , (8) and therefore, B̃(ω) = π 2 l(ω) (9) is an unbiased estimator (UE) of B. Here ω may alternatively be UR in the interval [0,π), i.e., ω ∼ UR[0,π), because l(ω) = l(ω + π) for all ω . Estimation precision may be gained by measuring also the projected length l(ω + π/2) of Y0,ω onto the Ox2 axis, whereby, B̂(ω) = π 4 [ l(ω)+ l(ω +π/2) ] , (10) is also unbiased for B, and it can be expected to be more precise than B̃(ω). Note that here we may take ω ∼ UR[0,π/2) because B̂(ω) = B̂(ω +π/2). The estimator B̂(ω) is based on n = 2 systematic observations from the measurement function l(ω) in the interval [0,π), i.e., with period π/2, in the unit semicircle. A predictor of Var( B̂ ) is available from Eq. 4.4 of Cruz-Orive and Gual-Arnau (2002) with r = π , n = 2 and m = 1. Here m is a parameter of the global model adopted for the covariogram of the measurement function l(ω). In principle, the smoother l(ω), the more appropriate should be the choice m = 1. The local error is v̂2 = 0 because l(ω) is measured without error. Further, C0−C1 = [l(ω)− l(ω +π/2)]2/4. Thus, var { B̂(ω) } = π2 240 · [l(ω)− l(ω +π/2)]2 . (11) Note: In this paper a true variance is denoted by Var(·), whereas the corresponding predictor or estimator is denoted by var(·). 3 GOMEZ AI ET AL: On the precision of planar curve length estimation BUFFON-STEINHAUS (TWO STAGE) ESTIMATION OF CURVE LENGTH For a given orientation ω of the curve, the total orthogonal projected length l(ω) and l(ω + π/2) can be estimated without bias at a second stage by intersection counting with a square grid of test lines. More precisely: (i) Fix a square grid Λ0 of test lines parallel to the Ox1x2 axes, with gap length T > 0. It is convenient to fix the origin O at a central vertex of the grid. The adopted fundamental tile of the grid is the square J0 = [0,T )2, see Fig. 2c. (ii) Rotate the curve Y0,0 (with its AP at O) isotropically about O into Y0,ω , where ω ∼ UR[0,2π). Next, shift the AP of Y0,ω into a point x∼UR(J0), thus translating the curve together into Yx,ω , (Fig. 2c). (iii) Score the intersections counts { I1 j(ω,x), j = 1,2, ...,n1 } between the curve and the vertical lines of the grid in an ordered sequence, where I11(ω,x), I1n1(ω,x) denote the first, and the last non zero counts, respectively. Score similarly the intersection counts {I2 j(ω + π/2,x), j = 1,2, ..,n2} with the horizontal lines. The Buffon-Steinhaus estimator B̂(ω,x) of the curve length B is the result of replacing the projection lengths in the right hand side of Eq. 10, (i.e of the Cauchy estimator), with the corresponding Cavalieri type estimators (Eqs. 12b, 12c below) based on the intersection counts, namely, B̂(ω,x) = π 4 [ l̂(ω,x)+ l̂(ω +π/2,x) ] , (12a) l̂(ω,x) = T n1 ∑ j=1 I1 j(ω,x) , (12b) l̂(ω +π/2,x) = T n2 ∑ j=1 I2 j(ω +π/2,x) . (12c) The estimator B̂(ω,x) is therefore a two stage UE of B. The fist stage design is the Cauchy design generating two mutually perpendicular, total projections of the curve, whereas the second stage incorporates Cavalieri sampling to estimate the lengths of these projections by intersection counting. By the unbiasedness of the Cavalieri design, for each ω ∈ [0,2π) the conditional mean of the second stage estimator is equal to the fist stage (Cauchy) estimator, namely, Ex { B̂(ω,x)|ω } = B̂(ω) . (13) Because the Cauchy estimator is unbiased, namely because Eω { B̂(ω) } = B, it follows that E { B̂(ω,x) } = EωEx { B̂(ω,x)|ω } = B, which verifies the unbiasedness of B̂(ω,x). The exact variance of B̂(ω,x) may be represented using the standard variance decomposition formula. For simplicity set, B̂1 = B̂(ω) , B̂2 = B̂(ω,x) . (14) Then, Var(B̂2) = Varω { Ex ( B̂2 ∣∣ω)}+Eω{Varx(B̂2∣∣ω)} = Var ( B̂1 ) +Eω { Varx ( B̂2 ∣∣ω)} . (15) The first term in the right hand side of the preceding identity represents variance component due to orientations, whereas the second term represents the component due to intersection counting, averaged over orientations. The latter component, also called the nugget, or local error component, may be predicted using the standard Matheron’s formula for systematic sampling along an axis (generally called Cavalieri sampling). For each ω ∈ [0,2π) set σ21 = Varx { l̂(ω,x)|ω } , σ22 = Varx { l̂(ω +π/2,x)|ω } , (16) namely the local error variances due to intersection counting with the vertical and the horizontal test lines of the grid, respectively. Then, the second term in the right hand side of Eq. 15 is estimated as follows, var ( B̂2|ω ) = ( π 4 )2 ( σ̂21 + σ̂ 2 2 ) , (17a) σ̂2i = T 2 12 · (3C0i−4C1i +C2i), i = 1,2, ni ≥ 3 , (17b) σ̂2i = T 2 6 · (C0i−C1i), i = 1,2, ni = 2 , (17c) Cki = ni−k ∑ j=1 Ii jIi, j+k, k = 0,1,2; i = 1,2 , (17d) where, I1 j = I1 j(ω,x) , I2 j = I2 j(ω +π/2,x) . (18) On the other hand, the first term in the right hand side of Eq. 15, namely the orientations or Cauchy component, cannot be estimated with Eq. 11 directly 4 Image Anal Stereol 2016;35:1-14 because the curve projections are no longer measured exactly. Instead, var2 ( B̂1 ) = π2 240 · {[ l̂(ω,x)− l̂(ω +π/2,x) ]2 − (σ̂21 + σ̂22 ) } , (19) where the subscript ‘2’ in var2(·) indicates that the estimator is computed from the two stage data, namely from the intersection counts. Thus the total variance predictor of the two stage estimator is (Cruz-Orive and Gual-Arnau, 2002) var ( B̂2 ) = var2 ( B̂1 ) +var ( B̂2|ω ) . (20) REMARKS 1. Provided that the curve Y0,ω is connected, its orthogonal projection onto Ox1 will be a bounded interval Y ′ω , say. Define the integer valued measurement function IY ′ω (z) as the number of intersection between Y0,ω and the vertical straight line x1 = z. Then its integral l(ω) = ∫ R IY ′ω (z)dz (21) is the total orthogonal projected length defined above, and Eq. 12b is the standard Cavalieri estimator of it. Moreover, because the measurement function IY ′ω (z) is integer valued it will exhibit jumps, and therefore its smoothness constant will be q = 0 (Kiêu et al., 1999; Garcı́a- Fiñana and Cruz-Orive, 2004). In consequence, the σ̂2i are computed with q = 0. 2. While l(ω)= l(ω+π) for all ω , the corresponding Cavalieri estimators, see Eqs. 12b, 12c, will in general be different for each ω . This justifies the choice of the range [0,2π) for ω in step (ii) of the preceding subsection. PROCESSING OF THE CURVE IMAGES Here we describe the steps involved in the conversion of each original tiff curve image (Fig. 1a) into a polygonal curve in Scalar Vector Graphics (SVG, Fig. 1c) which can be imported into a programming environment to perform automatic Monte Carlo experiments. Preference was given to free, GPL licensed software. Step 1. The original tiff image (containing all the relevant curves) was edited using the GIMP software package (http://www.gimp.org/). The image was desaturated and converted into a black-and-white (B/W) one. Brightness, contrast and B/W colour threshold were set at −46, +6 and 108, respectively. The resulting image was a raster (or ‘bitmap’) image consisting of pixels which were coloured in black (Fig. 1b). Each curve, or group of nearly adjacent curves, was selected and recorded separately into a JPEG output file, which was submitted to the next step. Step 2. Each of the preceding (bitmap) JPEG images was submitted to AutoTrace (http://autotrace. sourceforge.net/) to transform it into vector graphics. A ‘centerline’ option was used whereby a pixelized curve was approximated by a ‘spine’ curve consisting of the union of Bèzier arcs. The output files were exported as SVG files. Step 3. The preceding SVG files were edited manually with the aid of Inkscape (http://inkscape. org/) to remove background noise, and to eventually adjust the curve arcs properly. Step 4. The result (Fig. 1c) was imported into Blender V 2.68a (http://www.blender.org/) to convert it into a connected polygonal of small linear segments or ‘links’ with known endpoint XY coordinates. The number of links ranged between 180 and 416, depending on the local curvatures of the original curves. With the final polygonal curves a Python script was written to compute curve length (Eq. 3), to rotate and shift the curves as prescribed in the preceding section, and to perform the Monte Carlo procedures described in the next section. MONTE CARLO EXPERIMENT TO TEST THE PERFORMANCE OF THE VARIANCE PREDICTORS PURPOSES From the curvilinear features (DNA molecules) shown in Fig. 1a, the two SVG curves numbered 5 and 6 in Fig. 1c were selected to illustrate the results. The choice was based on the fact that, qualitatively, these two curves give the visual impression of being ‘fairly isotropic’ and ‘fairly anisotropic’, respectively. For each curve the following aspects were studied. (1) Behaviour of the Cauchy estimator B̃(ω) as a function of the angle ω ∈ [0,π). In polar coordinates the graph of l(ω) is the rose of total orthogonal projected lengths of the curve, (i.e., the ‘rose of projections’, for short). 5 GOMEZ AI ET AL: On the precision of planar curve length estimation (2) Behaviour of the Cauchy estimator B̂(ω) based on two mutually orthogonal projections. (3) Replicates of the Buffon-Steinhaus estimator B̂(ω,x), which is based on intersection counts with a square grid, as a function of the mean total number of intersections E(I). (4) Comparison of the variance predictor var { B̂(ω) } of the Cauchy estimator, see Eq. 11, against the empirical variance Vare { B̂(ω) } . (5) Comparison of the variance predictor var { B̂(ω,x) } of the Buffon-Steinhaus estimator, and of the two variance component predictors in the right hand side of Eq. 20, against the corresponding empirical variances. In the corresponding graphs, each empirical error variance was divided by the square of the true curve length in order to represent the square coefficient of error. For instance, CE2e { B̂(ω) } = Vare { B̂(ω) } B2 . (22) For convenience the corresponding variance predictors were normalized in the same way, e.g., ce2 { B̂(ω) } = var { B̂(ω) } B2 . (23) If the interest was focused on the sample coefficient of error itself, however, then B2 should be replaced with its sample version B̂2(ω) in the denominator of the preceding expression. CAUCHY ESTIMATOR For each of the selected curves, the Cauchy estimator B̂(ω) was computed according to Eq. 10 at each of M = 322 = 1024 values of ω in the interval [0,π/2), namely {ωk = (U + k−1) π 2M , k = 1,2, ...,M} , U ∼ UR[0,1) . (24) From the corresponding M replicates of B̂(ω), the empirical mean and the error variance of the estimator were computed respectively as follows, Ee { B̂(ω) } = 1 M M ∑ k=1 B̂(ωk) , (25a) Vare { B̂(ω) } = 1 M M ∑ k=1 [ B̂(ωk)−Ee { B̂(ω) }]2 . (25b) On the other hand the corresponding M replicates{ var{B̂(ωk)},k = 1,2, ....,M } of the variance predictor given by Eq. 11 were computed for comparison against the empirical variance. The Cauchy estimator B̃(ω) was also computed in a similar way at M = 1024 points in the interval [0,π). Here, however, no variance predictor exists because the estimator is based on a single projection. BUFFON-STEINHAUS ESTIMATOR For a given curve Y0,0 and a fixed square grid Λ0 of test lines of gap T > 0, a total of M = K2 = 322 = 1024 replicates of the Buffon-Steinhaus estimator B̂(ω,x) were computed from Eq. 12 by means of M superimpositions of the curve onto the grid, (Fig. 2c), according to a systematic design as follows. Recall that the AP of the curve is UR in the fundamental square of the grid, namely x ∈ J0 = [0,T )2,whereas ω is UR in [0,2π) . The Cartesian coordinates of the M = K2 positions of the AP, e.g., {(x1i,x2 j), i, j = 1,2, ....,K} , (26) were the vertices of a fine systematic square grid of gap T/K generated within the fundamental square J0, namely, x1i = (U1 + i−1) ·T/K , x2 j = (U2 + j−1) ·T/K , (27) where U1,U2 are two independent UR numbers in the interval [0,1). Renumber the resulting sequence of M associated points as {x1,x2, ...,xM} in any convenient way. Next, generate M systematic rotation angles in the interval [0,2π), namely,{ (U3 + i−1) 2π M , i = 1,2, ...,M } , (28) where U3 is a third UR number in the interval [0,1), independent from U1,U2. Now, generate a random permutation of the preceding set of M systematic angles, and denote it by {ω1,ω2, ...,ωM}. Then, the curve coordinates of the M superimpositions of the curve onto the grid are {(xk,ωk), k = 1,2, ...,M} . (29) Note that the angle permutations avoid any correlation between curve location (xk) and orientation (ωk). Different values of the gap T of the grid were chosen so that the expected total number E(I) of intersections did not exceed about 130. With the abbreviation defined in Eq. 14, and bearing in mind 6 Image Anal Stereol 2016;35:1-14 that E(B̂2) = B, for a desired value of E(I) the gap was computed by the following formula, T = 4B/(πE(I)) . (30) For each curve and each value of T , the M replicates { B̂(ωk,xk), k = 1,2, ..,M } , (31) were obtained as described above, and the empirical mean and variance of the Buffon-Steinhaus estimator were thereby computed similarly as in Eqs. 25a, 25b, namely, Ee { B̂(ω,x) } = 1 M M ∑ k=1 B̂(ωk,xk) , (32a) Vare { B̂(ω,x) } = 1 M M ∑ k=1 [ B̂(ωk,xk)−Ee { B̂(ω,x) }]2 . (32b) From Eqs. 15, 25b, the empirical value of the Cavalieri (intersection counting) contribution was computed as follows, EeVar{B̂(ω,x)|ω}= Vare{B̂(ω,x)}−Vare{B̂(ω)} . (33) Further, for each pair (ωk,xk) the predictors of the Cavalieri and the Cauchy error variance components were computed from Eq. 17 and 19, respectively, as well as the total error variance predictor given by their sum (Eq. 20). RESULTS CAUCHY ESTIMATOR Each of the two curves studied, and their corresponding roses of projections, are shown at the top of Fig. 3. The graph of the single projection estimator B̃(ω) is displayed for each curve in Fig. 3a,b, whereas those for the double projection estimator B̂(ω) are displayed in Fig. 3c,d. For either curve the superior precision of B̂(ω) is apparent. The more flattened the rose of projections, the more anisotropic is the curve. This is visually reflected by the different degrees of accuracy of each of the two Cauchy estimators for each of the two curves. The empirical CE2e{B̂(ω)}, (Fig. 3e, f, pink straight lines), computed via Eq. 25b, is negligible for the first curve (3.86 · 10−5), and rather small for the second (5.06 · 10−4). The graph of the predictor ce2{B̂(ω)}, see Eq. 11, is also displayed for each curve in Fig.3e,f, (red oscillating curves). The corresponding means (6.00 ·10−4 and 1.30 ·10−3, respectively) are shown as red dotted lines. The biases are small in absolute terms (5.62 ·10−4 and 7.99 ·10−4 respectively), but very large in relative terms (14.56 and 1.58 respectively) because the actual empirical errors are small. BUFFON-STEINHAUS ESTIMATOR For each of the two curves considered (see top of Fig. 4), the M = 1024 Monte Carlo replicates of the Buffon-Steinhaus estimator B̂(ω,x) , see Eq. 12a, are displayed in Fig. 4a, b, respectively, at each of 10 values of the mean total number of intersections E(I). As a check of the Monte Carlo procedure, the corresponding means (red dots), computed via Eq. 32a, visually coincide with the known value of B in each case, as expected by unbiasedness. Approximate confidence bands of 95% (coloured in grey), and of 100%, are also displayed for the individual realizations of B̂(ω,x). The empirical CE2{B̂(ω,x) was computed for each curve, via Eq. 32b, at each of the 10 values of E(I) and joined by a polygonal line (thick black line in Fig. 4c, d). The individual realizations of the corresponding predictor ce2{B̂(ω,x)} (computed via Eq. 20) are also displayed in Fig. 4c, d. The corresponding 10 group means of the predictors were joined by a polygonal curve (thick red line). Approximate 95% (coloured in grey), and 100% confidence bands are displayed together. It is seen that, in the range of E(I) considered, the empirical curve is always captured by the 95% confidence band. For E(I)≈ 50 the empirical CE{B̂(ω,x)} is about 5% for either curve (horizontal, blue broken line). Such degree of accuracy is however unnecessary if the target quantity is the mean length of a population of curves. For a simple random sample of n curves, the stereological contribution to the total square coefficient of error of the population mean length estimate would be reduced to CE2{B̂(ω,x)}/n. The graph of the empirical Cavalieri component EeCE2{B̂(ω,x)|ω}, computed via Eq. 33, is shown for each curve in Fig. 4e, f, respectively (thick blue polygonal), together with its mean predictor computed via Eq. 10, (broken blue line). On the other hand, the corresponding Cauchy components are represented in pink and in red (broken line), respectively. 7 GOMEZ AI ET AL: On the precision of planar curve length estimation 600 500 400 300 200 100 0 0 π/2 π C u rv e l e n g th , n m 600 500 400 300 200 100 0 0 π/2 π C u rv e l e n g th , n m 600 500 400 300 200 100 0 0 π/2 π (ω),Cauchy estimator, B two projections True length, B 0 π/2 π 0.06 0.05 0.04 0.03 0.02 0.01 0.00 2 2 2 2 2 2 ω a b dc e f 0 π/2 π 0.06 0.05 0.04 0.03 0.02 0.01 0.00 2 2 2 2 2 2 ce { B }(ω) 2 Mean ce { B }(ω) 2 CE { B }(ω) 2 Emp. 600 500 400 300 200 100 0 0 π/2 π (ω), ∼ True length, B Cauchy estimator, B one projection Fig. 3. Top: curves numbered 5 (with corresponding results (a, c, e)) and 6 (with corresponding results (b, d, f)), from Fig. 1c, respectively, with their corresponding roses of total projected length. (a) (b) Cauchy curve length estimates from a single total projected length (red curves) as a function of the orientation of the projection axis. (c), (d) Idem for the two projection Cauchy estimates. Note the drastic increase in precision with respect to the single projection estimates. (e, f) The red wavy curves represent the model based predictors of the square coefficient of error of the two projection Cauchy estimator (Eq. 11). 8 Image Anal Stereol 2016;35:1-14 As anticipated in Fig. 3 for the Cauchy estimator, the predictor of the Cauchy (orientations) component shows a relatively poor accuracy, due in part to the fact that the true value of such component was very small relative to the total error variance. Thus, the total error variance practically coincides with the Cavalieri component in either case. Conditional on a given orientation ω , the measurement function IY ′ω (z), see Eq. 21, exhibits jumps because it is an intersection count, namely an integer. For this reason the smoothness constant used to obtain Eq. 17b was q = 0. Theory establishes that, under certain conditions, the trend or ‘extension term’ of the variance of a Cavalieri estimator is of order O(T 2q+2), (Kiêu et al., 1999; Garcı́a-Fiñana and Cruz-Orive, 2004). In the present study the expected trend of CE2{B̂(ω,x)} should therefore be approximately of O(E(I)−(2q+2)) = O(E(I)−2). A linear regression of logCE2{B̂(ω,x)} versus logE(I) yielded slope values of −1.90 and −1.49 for curves 5 and 6 respectively. Direct non linear least squares fitting of a exponential curve yielded exponent values of −2.45 and −2.13 respectively. For the Cavalieri components (Fig. 4e, f) the corresponding exponent estimates were −1.93 and −2.14 with the linear log-log regression, and −2.45,−2.22 with the non linear (exponential) regression, respectively. As a reference, a straight line segment corresponding to an exponent of −2 is represented in Fig. 4c-f. CASE OF A POPULATION OF CURVES DATA For the sake of illustration suppose that Fig. 1a is a UR quadrat from a large observation region of the plane containing the curves of interest. For instance, this quadrat could be one from an extensive grid of systematic quadrats encompassing the whole population. The curves analyzed from a properly sampled quadrat should be chosen according to some unbiased rule, such as the unbiased frame (also called the forbidden line) rule (Gundersen, 1977, see also Howard and Reed, 2005, or Baddeley and Jensen, 2005). An unbiased counting – or sampling – rule, combined with UR quadrat sampling, warrants a UR sample of items because all the items in the population will have identical a priori probabilities of being included in the sample. A guard area should also be uniquely defined for all quadrats, to warrant that any curve that is sampled can be entirely observed for measurement. This means that the effective quadrat analysed (not shown in Fig. 1a) will generally be smaller than the raw photograph. Suppose that the 9 curves numbered in Fig. 1c constitute a proper UR sample from the relevant population. The curve next to curve number 1 is apparently the union of two curves that could not be separated, and it has been removed as convenient for the present illustration. Imagine a horizontal line sweeping the quadrat from top to bottom. The curves tagged 1,2, . . . ,9 were the 1st, 2nd, . . ., 9th met by such line. (Actually, the sweeping line rule – see for instance Howard and Reed, 2005, Fig. 5.4 – is also unbiased for sampling, or counting, bounded objects in a bounded region). Table 1. Data corresponding to the curves labelled 1– 9 in Fig. 1c. B: curve length computed via Eq. 3. B̂: curve length estimate, based on intersection counting, computed via Eq. 12a-c with a square grid of gap length T = 25 nm. varw(B̂|B) : error variance of B̂ computed by Eq. 20 via Eq. 16–19. ce(B̂|B) = 100 · {varw(B̂|B)}1/2 / B̂. Curve B B̂ varw(B̂|B) ce(B̂|B) (nm) (nm) (%) 1 404 438 2615 11.7 2 397 430 2156 10.8 3 385 399 1254 8.9 4 399 339 2558 14.9 5 393 333 1484 11.6 6 392 351 1532 11.2 7 427 402 1081 8.2 8 417 354 2114 13.0 9 389 385 951 8.0 Mean 400.3 381.2 1749.44 - Var 187.75 1515.94 - - The data pertinent to the 9 tagged curves in Fig. 1c are collected in Table 1. The ‘exact’ length B of each curve was computed from Eq. 3. The Buffon-Steinhaus estimator B̂(ω,x) of B was computed in all cases from a single, computer generated IUR superimposition of the curve onto a fixed square grid of gap length T = 25 nm. This gap length was approximated from Eq. 30 to warrant E(I) ≈ 20 intersections, with B replaced with the sample mean of the 9 true lengths, namely 400.3 nm. Further, a single predictor var{B̂(ω,x)|B} was computed for each curve via Eq. 20 using the automatically scored intersection counts. ASSESSMENT OF THE MODEL ERROR VARIANCE PREDICTOR The data in Table 1 allows an indirect assessment of the quality of the stereological error variance 9 GOMEZ, AI et al.: On the precision of planar curve length estimation Fig. 4. (a), (b) Empirical Monte Carlo replicates of the intersection counting curve length estimates with 10 square grids of different sizes (ranging from T = 25 nm to the left, to T = 4 nm to the right end of the horizontal axis). A total of 1024 replications were generated in each case. The red dots represent the empirical means, which approximately coincide with the true curve length in all cases (as expected because the estimator is unbiased). The grey region is a 95% confidence band, that is, it encloses 95% of the replications; the outer line bounds enclose 100% of them. (c, d) The grey region is a 95% confidence band for the 1024 Monte Carlo replications of the model based predictor given by Eq. 20. The outer limits enclose all the replications. It is seen that the 95% confidence band always contains the empirical (i.e, the nearly true) CE2, (black polygonal line) for the entire range considered, and for each of the two curves. (e, f) Empirical (continuous curves) and corresponding model based means (dotted) squared coefficient of error components. The Cauchy components are in pink and in red colour, respectively, whereas the Cavalieri components are in blue. 10 Image Anal Stereol 2016;35:1-14 predictor proposed in this paper (Eq. 20). The variance Varb(B̂) of the unbiased length estimator B̂ = B̂(ω,x) between curves (hence the subscript ‘b’) may be decomposed as follows, Varb(B̂) = Varb(B)+EbVarw(B̂|B) , (34) where Varw(B̂|B) represents the stereological error variance within a curve (hence the subscript ‘w’). The sample version of the preceding decomposition reads, varb(B̂) = varb(B)+meanb { varw(B̂|B) } . (35) From the data displayed in Table 1 we obtain varb(B̂) = 1515.94 and varb(B) = 187.75, whereby Eq. 35 yields the indirect, model free estimate, meanb { varw(B̂|B) } = 1515.94−187.75 = 1328.19 . (36) Note that the preceding estimate is model free because Eq. 34 is universally valid provided that B̂ is unbiased, namely that E(B̂|B) = B. On the other hand, from the fourth column of Table 1 the direct sample mean of the individual model based variance predictors is, meanb { varw(B̂|B) } = 1749.44 , (37) which deviates from the model free estimate by a reasonable 32%. ESTIMATION OF THE POPULATION MEAN CURVE LENGTH Often, the target parameter is the mean length of a population of curves such as the DNA molecules in Fig. 1a. Let E(B) represent the population mean curve length. If the curves were digitized and measured automatically, then E(B) could be estimated by B = 400.3 nm , ce(B)% = 100 · √ 187.75/9 400.3 = 1.14% . (38) On the other hand, if curve length was estimated from about 20 intersections each, then E(B) would be estimated by B̂ = 381.2 nm , ce(B̂ )% = 100 · √ 1515.94/9 381.2 = 3.40% . (39) The preceding result would be similar if the intersections were counted directly by hand on the original images, with no image processing. Apart from time and effort, biases arising in the digitization procedure would be avoided. Note that the within curve error variance estimates varw(B̂|B) do not enter in the preceding calculations. Their knowledge, however, may be used to design the sampling protocol. The required coefficient of error of the mean depends on the purposes of the experiment – see Section 8.4 How many animals? from Cruz- Orive et al. (2004) for details, (here the latter title should better read How many curves?). For the present purposes we simplify the situation as follows. Suppose that we want to count a total of 400 intersections. If we count 20 intersections in each of 20 curves, then we might expect, ce(B )% = 100 · √ 1515.94/20 381.2 = 2.28% . (40) On the other hand, if we rather want to count 40 intersections in each of 10 curves, then Varw(B̂|B) may be expected to be reduced by a factor of about 4, because the latter variance is roughly proportional to (EI)−2, (Fig. 4), and we are doubling the number of intersections per curve. Thus, in this case, meanb { varw(B̂|B) } = 1328.19/4 = 332.05 , (41a) varb(B̂) = 187.75+332.05 = 519.80 (41b) ce( B̂ )% = 100 · √ 519.80/10 381.2 = 1.89% . (41c) The predicted precision of the mean is therefore of about 2% in either case. The former option (i.e., working less in more curves) should generally be the better one (Gundersen and Østerby, 1981). First, counting too many intersections per curve is tedious. Second, the foregoing formulae for the coefficient of error of the mean require independence between curve estimates. Sampling a few curves in a relatively small region is likely to violate this condition. It is worth mentioning that, if the target parameter is E(B), then it is not necessary to measure individual curve lengths, as above. A more convenient (and probably more efficient) design consists in superimposing – on the (extensive) field of interest, see for instance Fig. 1 of Podestà et al. (2005) – systematic quadrats combined with test lines. The upper edge of each quadrat, for instance, could be adopted as a test line, although the test lines may be placed outside the quadrats. The superimpositions should be IUR relative to the curves, or just UR if the test lines consisted of half circles. Let l/a cm−1 denote the known test line length per quadrat area. Further, let Qi denote the number of individual curves counted in the ith quadrat according to the unbiased frame rule, say. Also, let Ii denote the corresponding number of intersections scored with test lines associated with the ith quadrat, 11 GOMEZ AI ET AL: On the precision of planar curve length estimation and let n represent the total number of quadrats. Then, a ratio unbiased estimator of E(B) is, B = π 2 · a l · n ∑ i=1 Ii / n ∑ i=1 Qi cm . (42) In this case Var(B) may be predicted by Cochran’s formula (Cochran, 1977; Howard and Reed, 2005, p.158), provided that the distance between quadrats is large enough to assume independence between the n data pairs {(Ii,Qi)}. Similar designs are illustrated in Chapter 12 of the latter book. Note that: (a) If a quadrat does not lie entirely inside the reference area, then we count curve(s) just in the available quadrat area anyway, always respecting the unbiased rule adopted. (b) The intersections are counted on any curve that is intersected by any test line, whether it was counted in the quadrats, or not. In other words, there will generally be counted curves that do not contribute any intersections, and curves that do contribute intersections but are not counted. DISCUSSION The main purpose of this paper was to check the accuracy of a previously published error variance prediction formula for curve length estimation by intersection counting with a square grid. Each of nine finite linear features (namely flattened DNA molecule images, see Fig. 1a) was repeatedly superimposed with isotropic uniform randomness onto fixed square grids of various sizes. The subsequent replicates of the variance predictors were thereby compared against the empirical (i.e., close to true) error variance. The overall performance of the prediction formula was satisfactory – the results corresponding to two of the curves (namely a fairly isotropic and a fairly anisotropic one) are displayed in Fig. 4c, d. The error variance has a rotation (Cauchy) and a translation (Cavalieri) component. The prediction of the former (Eq. 11) was relatively poor, whereas the one of the latter was satisfactory (Fig. 4e, f). Because the Cauchy contribution was relatively very low, however, the overall prediction was satisfactory. It may be didactic to consider the extreme case in which the target curve is a straight line segment of length B. Here the Cauchy estimator based on two mutually orthogonal projections (Eq. 10) reads, B̂(ω) = π 4 B · (cosω + sinω) , P(dω) = 2 π dω, (0≤ ω < π/2) , (43) and its exact square coefficient of error is, CE2{B̂(ω)}= π 2 +2π 16 −1 , (44) so that CE{B̂(ω)}%≈ 9.77%, which is non negligible. As soon as the curve is not too anisotropic, however, the preceding CE decreases rapidly. On the other hand, for a straight line segment the Cauchy error variance predictor given by Eq. 11 reads var{B̂(ω)}= π 2 240 B2 · (cosω− sinω)2 , (45) and its relative bias is, Eωvar{B̂(ω)} Var{B̂(ω)} −1 = 1 15 · π 2−2π π2 +2π−16 −1 ≈ 0.56 , (46) namely 56%. It should be borne in mind, however, that B̂(ω) is based on only two observations. The Monte Carlo experiment involved 1024 automatic IUR superimpositions of each curve onto each fixed grid. To do this each curve was approximated as described in Section Processing of the curve images by a polygonal curve with known vertex coordinates. The procedure involved the successive use of four software packages (with some manual editing at the later stages) and took about 10–15 min per curve on average. Moreover, the manipulations will introduce some bias in the curve length. On the contrary, intersection counting estimation can be implemented directly on each original curve in less than 20s, and the only bias present may be due to the possible difference between the length of a ‘free’ DNA molecule, and its flattened projection onto the observation plane. It is in fact pausible that all the DNA molecules considered here had the same length, and that the different lengths reported in the first column of Table 1 were due to the sum of two biases, namely the flattening bias plus the image processing one. This is impossible to ascertain: the best way to handle bias is to avoid it. Note also that the curves themselves might not be separable automatically – see for instance the curve lying between curves 1 and 2 in Fig. 1c. A human, however, will soon realize that this curve must be the union of two. Thus, apart from the possibility of performing automatic experiments to investigate higher order properties, as illustrated in this study, automatic image processing cannot be recommended if the only purpose is to estimate mean curve length in a not too extensive case study. The present study is relatively academic, in the sense that it is restricted to a single bounded curve. Our main concern was the prediction of the within curve 12 Image Anal Stereol 2016;35:1-14 error variance Varw(B̂|B). Often, however, the target quantity may be the population mean curve length (Subsection Estimation of the population mean curve length). On the other hand, only the square grid was considered here. In practice, grids consisting of separate straight line segments, half circles, etc., can be more efficient than the square grid (which often tends to yield too many intersections). In addition, test systems consisting of half circles, such as the Merz grid (Weibel, 1979; Howard and Reed, 2005) may be convenient because they do not require a random orientation relative to the curve. Unfortunately, to our knowledge no general error variance prediction formula exists for a general grid. It is worth mentioning, however, that the ‘fakir predictor’ proposed for a test grid of wavy cycloids in Cruz- Orive et al. (2014, Eq. 42) proved to be satisfactory for digitized brain sections (Fig. 5 of that study). The formula could be tried also for separate, systematically arranged test segments, half circles, or more general bounded test curves. The formula requires recording the intersection counts separately for each fundamental test curve, and arranging them into a rectangular matrix (p. 132 of the latter paper). The problem, however, remains basically open. In short, the intersection counting method is highly recommended to estimate curve length because it is direct (i.e., it does not require image processing as long as the intersection points can be scored unambiguously in the original images), design unbiased, and rather efficient. Also, if a square grid is used, then this study shows that the error variance may be predicted fairly reliably from a single grid superimposition on the target curves considered. Semiautomatic stereological devices (several brands of which are available on the market) may be of considerable help here because they will generate the desired test grid automatically on the computer screen, with IUR position relative to the target curve images. The intersection counts are then recorded manually, and may be exported for instance into the free software R (http://www.r-project.org/) to compute the estimates. ACKNOWLEDGEMENTS We are indebted to David Dunlap (Emory University, GA, USA) and to Alessandro Podestà (University of Milano, Italy) for lending us an original tiff micrograph of Fig. 1a. The authors acknowledge financial support from the Spanish MICINN Project AYA2015-66357-R. REFERENCES Baddeley AJ, Jensen EBV (2005). Stereology for statisticians. London: Chapman & Hall/CRC. Cochran W (1977). Sampling Techniques. 3rd Ed. New York: J. Wiley & Sons. Cruz-Orive LM (1989). Precision of systematic sampling on a step function. In: Hubler A, Nagel W, Ripley B, Werner G, eds., Geobild’89, pp. 185–193. Berlin: Akademie Verlag. Cruz-Orive LM, Gelšvartas J, Roberts N (2014). Sampling theory and automated simulations for vertical sections, applied to human brain. J Microsc 253:119–50. Cruz-Orive LM, Gual-Arnau X (2002). Precision of circular systematic sampling. J Microsc 207:225–42. Cruz-Orive LM, Insausti A, Insausti R, Crespo D (2004). A case study from Neuroscience involving Stereology and Multivariate Analysis. In: Evans S, Janson A, Nyengaard J, eds., Quantitative Methods in Neuroscience, chap. 2. Oxford: Practical Neuroscience Series, Oxford University Press, 16–64. Garcı́a-Fiñana M, Cruz-Orive LM (2004). Improved variance prediction for systematic sampling on R. Statistics 38:243–72. Gundersen HJG (1977). Notes on the estimation of the numerical density of arbitrary profiles: the edge effect. J Microsc 111:219–23. Gundersen HJG, Østerby R (1981). Optimizing sampling efficiency of stereological studies in biology: or ’do more less well!’. J Microsc 121:65–73. Howard V, Reed MG (2005). Unbiased stereology: Three- dimensional measurement in microscopy, 2nd Ed. Oxford: Bios/ Taylor & Francis. Kiêu K, Souchet S, Istas J (1999). Precision of systematic sampling and transitive methods. J Stat Plan Inference 77:263–79. Miles R (1974). On the elimitation of edge effects in planar sampling. In: Kendal E, Harding D, eds., Stochastic Geometry: A tribute to the memory of Rollo Davidson. London: John Wiley and Sons, 228–47. Miles R, Davy P (1976). Precise and general conditions for the validity of a comprehensive set of stereological fundamental formulae. J Microsc 107:211–26. Moran PAP (1966). Measuring the length of a curve. Biometrika 53:359–64. Podestà A, Imperadori L, Colnaghi W, Finzi L, Milani P, Dunlap D (2004). Atomic force microscopy study of DNA deposited on poly l-ornithine-coated mica. J Microsc 215:236–40. Podestà A, Indrieri M, Brogioli D, Manning GS, Milani P, Guerra R, Finzi L, Dunlap D (2005). Positively charged surfaces increase the flexibility of DNA. Biophys J 13 GOMEZ AI ET AL: On the precision of planar curve length estimation 89:2558–63. Santaló L (1976). Integral geometry and geometric probability. Reading, MA: Addison-Wesley. Steinhaus H (1930). Zur Praxis der Rektifikation und zum Längenbegriff. Berichte Sachsischen Akad Wiss Leipzig 82:120–30. Weibel ER (1979). Stereological methods. Practical methods for biological morphometry, vol. 1. London: Academic Press. 14