Image Anal Stereol 2016;35:1-14 doi:10.5566/ias.1412
Original Research Paper
ON THE PRECISION OF CURVE LENGTH ESTIMATION IN THE PLANE
ANA I. GÓMEZB, MARCOS CRUZ AND LUIS M. CRUZ-ORIVE
Department of Mathematics, Statistics and Computation, Faculty of Sciences. University of Cantabria, E-39005.
Santander, Spain.
e-mail: gomezanab@gmail.com, marcos.cruz@unican.es, luis.cruz@unican.es
(Received October 19, 2015; revised February 2, 2016; accepted February 2, 2016)
ABSTRACT
The estimator of planar curve length based on intersection counting with a square grid, called the Buffon-
Steinhaus estimator, is simple, design unbiased and efficient. However, the prediction of its error variance
from a single grid superimposition is a non trivial problem. A previously published predictor is checked here
by means of repeated Monte Carlo superimpositions of a curve onto a square grid, with isotropic uniform
randomness relative to each other. Nine curvilinear features (namely flattened DNA molecule projections)
were considered, and complete data are shown for two of them. Automatization required image processing
to transform the original tiff image of each curve into a polygonal approximation consisting of between 180
and 416 straight line segments or ‘links’ for the different curves. The performance of the variance prediction
formula proved to be satisfactory for practical use (at least for the curves studied).
Keywords: Buffon-Steinhaus estimator, Cauchy estimator, curve length, error variance prediction, Monte
Carlo resampling, test grid.
INTRODUCTION
Consider a bounded, planar, piecewise smooth
curve Y ⊂ R2 of finite length B, which is the target
parameter. The Buffon-Steinhaus unbiased estimator
of B is based on intersection counting with a square
grid of test lines which is IUR relative to Y , namely,
B̂ =
π
4
·T · I , (1)
(Steinhaus, 1930 – for references see Baddeley and
Jensen, 2005) where T is the gap length between
test lines and I the total number of intersections.
The abbreviation ‘IUR’, introduced in Miles and
Davy (1976), means ‘isotropic uniform random’ –
details are given below. In this paper we address
the problem of predicting the variance Var(B̂) from
a single grid superimposition. The interest in the
problem is old. Moran (1966) gives an exact variance
expression when Y is a straight line segment. For
a general curve, he gives an approximation for the
variance component due to the grid position, but
not for the one due to orientation – see also Cruz-
Orive (1989). To our knowledge the only estimators
hitherto available for both variance components were
given in Cruz-Orive and Gual-Arnau (2002). Such
estimators are generally not unbiased – they are only
theoretical approximations. Our purpose is to check
the performance of such approximations by means of
Monte Carlo simulations.
Often Y is a model for profile boundary (hence
the notation ‘B’ for its length), namely the boundary
of the intersection between a closed surface (e.g., a
cell membrane, or a grain boundary) and a sectioning
plane. A planar curve may also be the orthogonal
projection of a spatial curve onto an observation plane,
or even a flexible linear feature in space flattened
onto a planar surface, in which case the real and the
observed lengths should approximately coincide. This
is the case for the material used in this paper, namely
DNA molecules which appear as open linear features
on the observation plane (Podestà et al., 2004). Fig. 1a
below, kindly provided in tiff format by Professor
Alessandro Podestà, is Fig. 1(A) from the latter paper.
The laborious steps leading to the conversion of
the tiff image features into polygonal curves in vector
graphics (Section Processing of the curve images),
which constitute necessary prerequisites to perform
automatic measurements, suggest that the intersection
counting method to estimate feature length directly
on the original images is an attractive option – first
because it is fast, design unbiased and efficient, second
because the error variance can be predicted and last,
but not least, because the processed curve lengths
will seldom coincide with the lengths of the original
projections (automatic measurements are generally
biased by a wide variety of artifacts depending on
curve shape, image resolution, etc.).
1
GOMEZ AI ET AL: On the precision of planar curve length estimation
Fig. 1. (a) The linear features, used as model curves
in this paper, represent flattened projections of DNA
molecules onto the observation plane. Redrawn from
Podestà et al. (2004), with permission from John
Wiley and Sons. (b) Raster bitmap image version of
the linear features. (c) Vector graphics image with
polygonal approximations of the original curves, ready
for automatic Monte Carlo experiment. The two curves
marked in red were selected for the results shown in
Figs. 3, 4.
CURVE MODEL AND LENGTH
ESTIMATION
CURVE MODEL
Each of the selected curvilinear features shown in
Fig. 1, namely each of the selected DNA molecules,
was converted from tiff format into a polygonal curve
in vector graphics following the steps described in
the section Processing of the curve images. A fixed
polygonal curve Y , henceforth called a ‘curve’, for
short, may be represented as follows,
Y =
N⋃
i=1
yi , (2)
where yi denotes the ith straight line segment or ‘link’
of Y , and N is the finite total number of links. The total
length of Y is
B =
N
∑
i=1
bi , (3)
where bi, denotes the length of yi.
Fix a rectangular frame Ox1x2 in the plane of the
curve. The position and orientation of the curve is
determined by a unit vector (x, ω) rigidly attached
to the curve, emanating from a point x ∈ R2, and
making an angle ω ∈ [0,2π) with Ox1. The point
x = (x1, x2) is called the associated point (AP) of Y ,
(Miles, 1974, see also Baddeley and Jensen, 2005)
and the vector (x, ω) is called the associated vector
(AV). The choice of the AV is arbitrary, but once it
is chosen it must remain rigidly attached to the curve.
When the AV is at its initial position (x = 0, ω = 0),
then the corresponding initial position of the curve
is denoted by Y0,0, so that the AP of Y0,0 is at the
origin O. Here this AP was the lower left corner of
the smallest rectangle (with its sides parallel to the
reference axes) enclosing Y0,0, see Fig. 2a. Under a
rigid motion defined by the vector (x,ω), (namely a
rotation ω about the origin followed by a translation x,
or equivalently a translation x followed by a rotation ω
about the point x) the curve Y0,0 is transformed into
the curve Yx,ω see Fig. 2b. The triplet (x1, x2, ω)
constitute the ‘coordinates’ of the curve.
The generation of a IUR curve in the plane requires
the use of the motion invariant density associated with
the coordinates of the curve. This density is called the
kinematic density (Santaló, 1976), namely,
dYx,ω = dx dω , (4)
where dx = dx1 dx2 is the area element in the plane and
dω is the arc element in the unit circle. In practice,
the preceding concept implies that the AP of the curve
2
Image Anal Stereol 2016;35:1-14
must be uniform random (UR) in any bounded region
of the plane, whereas the orientation of the AV must
be isotropic random (IR) namely UR in the unit circle,
and independent from the position of the AP.
O
AP
Y
0,0
AP x
ω
Yx,ω
O
a b
c
x
O
J 0
Λ0
2
1
1 1
T
T
Fig. 2. (a) Curve number 6 from Fig. 1c, with its
rigidly attached associated point (AP) and associated
vector (arrow). Each of the straight line link segments
constituting the curve (indistinguishable by eye) share
the same AP and the same associated vector. (b) Result
of applying a rigid motion to the curve, namely a
translation of the AP to the point x followed by a
rotation ω about x. (c) A IUR superimposition of
the curve onto a square grid Λ0. The AP is UR
in the fundamental tile J0 of the grid, whereas the
rotation angle ω is independent and UR in the interval
[0,2π) . Here the relevant intersection counts read:
I11 = 1, I12 = 1, I21 = 1, I22 = 2. The procedure
was programmed in Python to generate automatic
replications.
CAUCHY (ONE STAGE) ESTIMATION OF
CURVE LENGTH
A special case of Cauchy’s projection formula
(Santaló, 1976) expresses the length B of the curve Y0,ω
in terms of its total orthogonal projected length l(ω)
onto Ox1, with all points counted in their multiplicity,
see Cruz-Orive (1989, Fig. 1), or Cruz-Orive and Gual-
Arnau (2002, Fig. 4a). Let {α1,α2, ...,αN} denote the
fixed angles of the oriented links of Y0,0 with Ox1.
Then,
l(ω) =
N
∑
i=1
bi|cos(αi−ω)| . (5)
Integration from ω = 0 to ω = 2π yields Cauchy’s
formula,
B =
1
4
∫ 2π
0
l(ω)dω . (6)
Suppose that the angle ω is UR in the interval
[0,2π), with probability element,
P(dω) =
dω
2π
, ω ∈ [0,2π) , (7)
and suppose also that l(ω) can be measured exactly.
Then, by Cauchy’s formula
B =
π
2
E{l(ω)} , (8)
and therefore,
B̃(ω) =
π
2
l(ω) (9)
is an unbiased estimator (UE) of B. Here ω may
alternatively be UR in the interval [0,π), i.e., ω ∼
UR[0,π), because l(ω) = l(ω + π) for all ω .
Estimation precision may be gained by measuring also
the projected length l(ω + π/2) of Y0,ω onto the Ox2
axis, whereby,
B̂(ω) =
π
4
[
l(ω)+ l(ω +π/2)
]
, (10)
is also unbiased for B, and it can be expected to be
more precise than B̃(ω). Note that here we may take
ω ∼ UR[0,π/2) because B̂(ω) = B̂(ω +π/2).
The estimator B̂(ω) is based on n = 2 systematic
observations from the measurement function l(ω) in
the interval [0,π), i.e., with period π/2, in the unit
semicircle. A predictor of Var( B̂ ) is available from
Eq. 4.4 of Cruz-Orive and Gual-Arnau (2002) with
r = π , n = 2 and m = 1. Here m is a parameter
of the global model adopted for the covariogram
of the measurement function l(ω). In principle, the
smoother l(ω), the more appropriate should be the
choice m = 1. The local error is v̂2 = 0 because l(ω)
is measured without error. Further, C0−C1 = [l(ω)−
l(ω +π/2)]2/4. Thus,
var
{
B̂(ω)
}
=
π2
240
· [l(ω)− l(ω +π/2)]2 . (11)
Note: In this paper a true variance is denoted by Var(·),
whereas the corresponding predictor or estimator is
denoted by var(·).
3
GOMEZ AI ET AL: On the precision of planar curve length estimation
BUFFON-STEINHAUS (TWO STAGE)
ESTIMATION OF CURVE LENGTH
For a given orientation ω of the curve, the total
orthogonal projected length l(ω) and l(ω + π/2)
can be estimated without bias at a second stage by
intersection counting with a square grid of test lines.
More precisely:
(i) Fix a square grid Λ0 of test lines parallel to the
Ox1x2 axes, with gap length T > 0. It is convenient
to fix the origin O at a central vertex of the grid.
The adopted fundamental tile of the grid is the
square J0 = [0,T )2, see Fig. 2c.
(ii) Rotate the curve Y0,0 (with its AP at O)
isotropically about O into Y0,ω , where ω ∼
UR[0,2π). Next, shift the AP of Y0,ω into a point
x∼UR(J0), thus translating the curve together into
Yx,ω , (Fig. 2c).
(iii) Score the intersections counts
{
I1 j(ω,x), j =
1,2, ...,n1
}
between the curve and the vertical
lines of the grid in an ordered sequence, where
I11(ω,x), I1n1(ω,x) denote the first, and the last
non zero counts, respectively. Score similarly
the intersection counts {I2 j(ω + π/2,x), j =
1,2, ..,n2} with the horizontal lines.
The Buffon-Steinhaus estimator B̂(ω,x) of the
curve length B is the result of replacing the projection
lengths in the right hand side of Eq. 10, (i.e of the
Cauchy estimator), with the corresponding Cavalieri
type estimators (Eqs. 12b, 12c below) based on the
intersection counts, namely,
B̂(ω,x) =
π
4
[
l̂(ω,x)+ l̂(ω +π/2,x)
]
, (12a)
l̂(ω,x) = T
n1
∑
j=1
I1 j(ω,x) , (12b)
l̂(ω +π/2,x) = T
n2
∑
j=1
I2 j(ω +π/2,x) . (12c)
The estimator B̂(ω,x) is therefore a two stage
UE of B. The fist stage design is the Cauchy
design generating two mutually perpendicular, total
projections of the curve, whereas the second stage
incorporates Cavalieri sampling to estimate the lengths
of these projections by intersection counting. By the
unbiasedness of the Cavalieri design, for each ω ∈
[0,2π) the conditional mean of the second stage
estimator is equal to the fist stage (Cauchy) estimator,
namely,
Ex
{
B̂(ω,x)|ω
}
= B̂(ω) . (13)
Because the Cauchy estimator is unbiased,
namely because Eω
{
B̂(ω)
}
= B, it follows that
E
{
B̂(ω,x)
}
= EωEx
{
B̂(ω,x)|ω
}
= B, which verifies
the unbiasedness of B̂(ω,x).
The exact variance of B̂(ω,x) may be represented
using the standard variance decomposition formula.
For simplicity set,
B̂1 = B̂(ω) ,
B̂2 = B̂(ω,x) .
(14)
Then,
Var(B̂2) = Varω
{
Ex
(
B̂2
∣∣ω)}+Eω{Varx(B̂2∣∣ω)}
= Var
(
B̂1
)
+Eω
{
Varx
(
B̂2
∣∣ω)} . (15)
The first term in the right hand side of the
preceding identity represents variance component due
to orientations, whereas the second term represents
the component due to intersection counting, averaged
over orientations. The latter component, also called the
nugget, or local error component, may be predicted
using the standard Matheron’s formula for systematic
sampling along an axis (generally called Cavalieri
sampling). For each ω ∈ [0,2π) set
σ21 = Varx
{
l̂(ω,x)|ω
}
,
σ22 = Varx
{
l̂(ω +π/2,x)|ω
}
,
(16)
namely the local error variances due to intersection
counting with the vertical and the horizontal test lines
of the grid, respectively. Then, the second term in the
right hand side of Eq. 15 is estimated as follows,
var
(
B̂2|ω
)
=
(
π
4
)2 (
σ̂21 + σ̂
2
2
)
, (17a)
σ̂2i =
T 2
12
· (3C0i−4C1i +C2i), i = 1,2, ni ≥ 3 ,
(17b)
σ̂2i =
T 2
6
· (C0i−C1i), i = 1,2, ni = 2 , (17c)
Cki =
ni−k
∑
j=1
Ii jIi, j+k, k = 0,1,2; i = 1,2 , (17d)
where,
I1 j = I1 j(ω,x) ,
I2 j = I2 j(ω +π/2,x) .
(18)
On the other hand, the first term in the right hand
side of Eq. 15, namely the orientations or Cauchy
component, cannot be estimated with Eq. 11 directly
4
Image Anal Stereol 2016;35:1-14
because the curve projections are no longer measured
exactly. Instead,
var2
(
B̂1
)
=
π2
240
·
{[
l̂(ω,x)− l̂(ω +π/2,x)
]2
− (σ̂21 + σ̂22 )
}
,
(19)
where the subscript ‘2’ in var2(·) indicates that the
estimator is computed from the two stage data, namely
from the intersection counts. Thus the total variance
predictor of the two stage estimator is (Cruz-Orive and
Gual-Arnau, 2002)
var
(
B̂2
)
= var2
(
B̂1
)
+var
(
B̂2|ω
)
. (20)
REMARKS
1. Provided that the curve Y0,ω is connected, its
orthogonal projection onto Ox1 will be a bounded
interval Y ′ω , say. Define the integer valued
measurement function IY ′ω (z) as the number of
intersection between Y0,ω and the vertical straight
line x1 = z. Then its integral
l(ω) =
∫
R
IY ′ω (z)dz (21)
is the total orthogonal projected length
defined above, and Eq. 12b is the standard
Cavalieri estimator of it. Moreover, because the
measurement function IY ′ω (z) is integer valued it
will exhibit jumps, and therefore its smoothness
constant will be q = 0 (Kiêu et al., 1999; Garcı́a-
Fiñana and Cruz-Orive, 2004). In consequence, the
σ̂2i are computed with q = 0.
2. While l(ω)= l(ω+π) for all ω , the corresponding
Cavalieri estimators, see Eqs. 12b, 12c, will in
general be different for each ω . This justifies the
choice of the range [0,2π) for ω in step (ii) of the
preceding subsection.
PROCESSING OF THE CURVE
IMAGES
Here we describe the steps involved in the
conversion of each original tiff curve image (Fig. 1a)
into a polygonal curve in Scalar Vector Graphics
(SVG, Fig. 1c) which can be imported into a
programming environment to perform automatic
Monte Carlo experiments. Preference was given to
free, GPL licensed software.
Step 1. The original tiff image (containing all
the relevant curves) was edited using the GIMP
software package (http://www.gimp.org/). The image
was desaturated and converted into a black-and-white
(B/W) one. Brightness, contrast and B/W colour
threshold were set at −46, +6 and 108, respectively.
The resulting image was a raster (or ‘bitmap’) image
consisting of pixels which were coloured in black
(Fig. 1b). Each curve, or group of nearly adjacent
curves, was selected and recorded separately into a
JPEG output file, which was submitted to the next step.
Step 2. Each of the preceding (bitmap) JPEG
images was submitted to AutoTrace (http://autotrace.
sourceforge.net/) to transform it into vector graphics.
A ‘centerline’ option was used whereby a pixelized
curve was approximated by a ‘spine’ curve consisting
of the union of Bèzier arcs. The output files were
exported as SVG files.
Step 3. The preceding SVG files were edited
manually with the aid of Inkscape (http://inkscape.
org/) to remove background noise, and to eventually
adjust the curve arcs properly.
Step 4. The result (Fig. 1c) was imported into
Blender V 2.68a (http://www.blender.org/) to convert
it into a connected polygonal of small linear segments
or ‘links’ with known endpoint XY coordinates.
The number of links ranged between 180 and 416,
depending on the local curvatures of the original
curves.
With the final polygonal curves a Python script was
written to compute curve length (Eq. 3), to rotate and
shift the curves as prescribed in the preceding section,
and to perform the Monte Carlo procedures described
in the next section.
MONTE CARLO EXPERIMENT TO
TEST THE PERFORMANCE OF
THE VARIANCE PREDICTORS
PURPOSES
From the curvilinear features (DNA molecules)
shown in Fig. 1a, the two SVG curves numbered 5
and 6 in Fig. 1c were selected to illustrate the results.
The choice was based on the fact that, qualitatively,
these two curves give the visual impression of being
‘fairly isotropic’ and ‘fairly anisotropic’, respectively.
For each curve the following aspects were studied.
(1) Behaviour of the Cauchy estimator B̃(ω) as
a function of the angle ω ∈ [0,π). In polar
coordinates the graph of l(ω) is the rose of total
orthogonal projected lengths of the curve, (i.e., the
‘rose of projections’, for short).
5
GOMEZ AI ET AL: On the precision of planar curve length estimation
(2) Behaviour of the Cauchy estimator B̂(ω) based on
two mutually orthogonal projections.
(3) Replicates of the Buffon-Steinhaus estimator
B̂(ω,x), which is based on intersection counts with
a square grid, as a function of the mean total
number of intersections E(I).
(4) Comparison of the variance predictor var
{
B̂(ω)
}
of the Cauchy estimator, see Eq. 11, against the
empirical variance Vare
{
B̂(ω)
}
.
(5) Comparison of the variance predictor
var
{
B̂(ω,x)
}
of the Buffon-Steinhaus estimator,
and of the two variance component predictors
in the right hand side of Eq. 20, against the
corresponding empirical variances.
In the corresponding graphs, each empirical error
variance was divided by the square of the true curve
length in order to represent the square coefficient of
error. For instance,
CE2e
{
B̂(ω)
}
=
Vare
{
B̂(ω)
}
B2
. (22)
For convenience the corresponding variance predictors
were normalized in the same way, e.g.,
ce2
{
B̂(ω)
}
=
var
{
B̂(ω)
}
B2
. (23)
If the interest was focused on the sample coefficient
of error itself, however, then B2 should be replaced
with its sample version B̂2(ω) in the denominator of
the preceding expression.
CAUCHY ESTIMATOR
For each of the selected curves, the Cauchy
estimator B̂(ω) was computed according to Eq. 10 at
each of M = 322 = 1024 values of ω in the interval
[0,π/2), namely
{ωk = (U + k−1)
π
2M
, k = 1,2, ...,M} ,
U ∼ UR[0,1) .
(24)
From the corresponding M replicates of B̂(ω), the
empirical mean and the error variance of the estimator
were computed respectively as follows,
Ee
{
B̂(ω)
}
=
1
M
M
∑
k=1
B̂(ωk) , (25a)
Vare
{
B̂(ω)
}
=
1
M
M
∑
k=1
[
B̂(ωk)−Ee
{
B̂(ω)
}]2
.
(25b)
On the other hand the corresponding M replicates{
var{B̂(ωk)},k = 1,2, ....,M
}
of the variance
predictor given by Eq. 11 were computed for
comparison against the empirical variance.
The Cauchy estimator B̃(ω) was also computed in
a similar way at M = 1024 points in the interval [0,π).
Here, however, no variance predictor exists because
the estimator is based on a single projection.
BUFFON-STEINHAUS ESTIMATOR
For a given curve Y0,0 and a fixed square grid Λ0 of
test lines of gap T > 0, a total of M = K2 = 322 =
1024 replicates of the Buffon-Steinhaus estimator
B̂(ω,x) were computed from Eq. 12 by means of M
superimpositions of the curve onto the grid, (Fig. 2c),
according to a systematic design as follows. Recall that
the AP of the curve is UR in the fundamental square
of the grid, namely x ∈ J0 = [0,T )2,whereas ω is UR
in [0,2π) . The Cartesian coordinates of the M = K2
positions of the AP, e.g.,
{(x1i,x2 j), i, j = 1,2, ....,K} , (26)
were the vertices of a fine systematic square grid of
gap T/K generated within the fundamental square J0,
namely,
x1i = (U1 + i−1) ·T/K ,
x2 j = (U2 + j−1) ·T/K ,
(27)
where U1,U2 are two independent UR numbers in the
interval [0,1). Renumber the resulting sequence of M
associated points as {x1,x2, ...,xM} in any convenient
way. Next, generate M systematic rotation angles in the
interval [0,2π), namely,{
(U3 + i−1)
2π
M
, i = 1,2, ...,M
}
, (28)
where U3 is a third UR number in the interval [0,1),
independent from U1,U2. Now, generate a random
permutation of the preceding set of M systematic
angles, and denote it by {ω1,ω2, ...,ωM}. Then, the
curve coordinates of the M superimpositions of the
curve onto the grid are
{(xk,ωk), k = 1,2, ...,M} . (29)
Note that the angle permutations avoid any correlation
between curve location (xk) and orientation (ωk).
Different values of the gap T of the grid were
chosen so that the expected total number E(I) of
intersections did not exceed about 130. With the
abbreviation defined in Eq. 14, and bearing in mind
6
Image Anal Stereol 2016;35:1-14
that E(B̂2) = B, for a desired value of E(I) the gap was
computed by the following formula,
T = 4B/(πE(I)) . (30)
For each curve and each value of T , the M
replicates {
B̂(ωk,xk), k = 1,2, ..,M
}
, (31)
were obtained as described above, and the empirical
mean and variance of the Buffon-Steinhaus estimator
were thereby computed similarly as in Eqs. 25a, 25b,
namely,
Ee
{
B̂(ω,x)
}
=
1
M
M
∑
k=1
B̂(ωk,xk) , (32a)
Vare
{
B̂(ω,x)
}
=
1
M
M
∑
k=1
[
B̂(ωk,xk)−Ee
{
B̂(ω,x)
}]2
.
(32b)
From Eqs. 15, 25b, the empirical value of the Cavalieri
(intersection counting) contribution was computed as
follows,
EeVar{B̂(ω,x)|ω}= Vare{B̂(ω,x)}−Vare{B̂(ω)} .
(33)
Further, for each pair (ωk,xk) the predictors of the
Cavalieri and the Cauchy error variance components
were computed from Eq. 17 and 19, respectively, as
well as the total error variance predictor given by their
sum (Eq. 20).
RESULTS
CAUCHY ESTIMATOR
Each of the two curves studied, and their
corresponding roses of projections, are shown at the
top of Fig. 3.
The graph of the single projection estimator B̃(ω)
is displayed for each curve in Fig. 3a,b, whereas those
for the double projection estimator B̂(ω) are displayed
in Fig. 3c,d. For either curve the superior precision of
B̂(ω) is apparent.
The more flattened the rose of projections, the
more anisotropic is the curve. This is visually reflected
by the different degrees of accuracy of each of the two
Cauchy estimators for each of the two curves.
The empirical CE2e{B̂(ω)}, (Fig. 3e, f, pink
straight lines), computed via Eq. 25b, is negligible
for the first curve (3.86 · 10−5), and rather small
for the second (5.06 · 10−4). The graph of the
predictor ce2{B̂(ω)}, see Eq. 11, is also displayed
for each curve in Fig.3e,f, (red oscillating curves).
The corresponding means (6.00 ·10−4 and 1.30 ·10−3,
respectively) are shown as red dotted lines. The biases
are small in absolute terms (5.62 ·10−4 and 7.99 ·10−4
respectively), but very large in relative terms (14.56
and 1.58 respectively) because the actual empirical
errors are small.
BUFFON-STEINHAUS ESTIMATOR
For each of the two curves considered (see top
of Fig. 4), the M = 1024 Monte Carlo replicates of
the Buffon-Steinhaus estimator B̂(ω,x) , see Eq. 12a,
are displayed in Fig. 4a, b, respectively, at each of
10 values of the mean total number of intersections
E(I). As a check of the Monte Carlo procedure,
the corresponding means (red dots), computed via
Eq. 32a, visually coincide with the known value of B in
each case, as expected by unbiasedness. Approximate
confidence bands of 95% (coloured in grey), and of
100%, are also displayed for the individual realizations
of B̂(ω,x).
The empirical CE2{B̂(ω,x) was computed for each
curve, via Eq. 32b, at each of the 10 values of E(I) and
joined by a polygonal line (thick black line in Fig. 4c,
d).
The individual realizations of the corresponding
predictor ce2{B̂(ω,x)} (computed via Eq. 20) are
also displayed in Fig. 4c, d. The corresponding 10
group means of the predictors were joined by a
polygonal curve (thick red line). Approximate 95%
(coloured in grey), and 100% confidence bands are
displayed together. It is seen that, in the range of E(I)
considered, the empirical curve is always captured by
the 95% confidence band. For E(I)≈ 50 the empirical
CE{B̂(ω,x)} is about 5% for either curve (horizontal,
blue broken line). Such degree of accuracy is however
unnecessary if the target quantity is the mean length
of a population of curves. For a simple random sample
of n curves, the stereological contribution to the total
square coefficient of error of the population mean
length estimate would be reduced to CE2{B̂(ω,x)}/n.
The graph of the empirical Cavalieri component
EeCE2{B̂(ω,x)|ω}, computed via Eq. 33, is shown
for each curve in Fig. 4e, f, respectively (thick blue
polygonal), together with its mean predictor computed
via Eq. 10, (broken blue line). On the other hand, the
corresponding Cauchy components are represented in
pink and in red (broken line), respectively.
7
GOMEZ AI ET AL: On the precision of planar curve length estimation
600
500
400
300
200
100
    0
0 π/2 π
C
u
rv
e
 l
e
n
g
th
, 
n
m
600
500
400
300
200
100
    0
0 π/2 π
C
u
rv
e
 l
e
n
g
th
, 
n
m
600
500
400
300
200
100
    0
0 π/2 π
(ω),Cauchy estimator, B  
two projections
True length, B
0 π/2 π
0.06
0.05
0.04
0.03
0.02
0.01
0.00
2
2
2
2
2
2
ω
a b
dc
e f
0 π/2 π
0.06
0.05
0.04
0.03
0.02
0.01
0.00
2
2
2
2
2
2
ce  { B      }(ω)
2
Mean ce  { B      }(ω)
2
CE  { B      }(ω)
2
Emp.
600
500
400
300
200
100
    0
0 π/2 π
(ω),
∼
True length, B
Cauchy estimator, B 
one projection
Fig. 3. Top: curves numbered 5 (with corresponding results (a, c, e)) and 6 (with corresponding results (b, d,
f)), from Fig. 1c, respectively, with their corresponding roses of total projected length. (a) (b) Cauchy curve
length estimates from a single total projected length (red curves) as a function of the orientation of the projection
axis. (c), (d) Idem for the two projection Cauchy estimates. Note the drastic increase in precision with respect
to the single projection estimates. (e, f) The red wavy curves represent the model based predictors of the square
coefficient of error of the two projection Cauchy estimator (Eq. 11).
8
Image Anal Stereol 2016;35:1-14
As anticipated in Fig. 3 for the Cauchy estimator,
the predictor of the Cauchy (orientations) component
shows a relatively poor accuracy, due in part to
the fact that the true value of such component was
very small relative to the total error variance. Thus,
the total error variance practically coincides with
the Cavalieri component in either case. Conditional
on a given orientation ω , the measurement function
IY ′ω (z), see Eq. 21, exhibits jumps because it is an
intersection count, namely an integer. For this reason
the smoothness constant used to obtain Eq. 17b
was q = 0. Theory establishes that, under certain
conditions, the trend or ‘extension term’ of the
variance of a Cavalieri estimator is of order O(T 2q+2),
(Kiêu et al., 1999; Garcı́a-Fiñana and Cruz-Orive,
2004). In the present study the expected trend of
CE2{B̂(ω,x)} should therefore be approximately of
O(E(I)−(2q+2)) = O(E(I)−2). A linear regression
of logCE2{B̂(ω,x)} versus logE(I) yielded slope
values of −1.90 and −1.49 for curves 5 and 6
respectively. Direct non linear least squares fitting of
a exponential curve yielded exponent values of −2.45
and −2.13 respectively. For the Cavalieri components
(Fig. 4e, f) the corresponding exponent estimates were
−1.93 and −2.14 with the linear log-log regression,
and −2.45,−2.22 with the non linear (exponential)
regression, respectively. As a reference, a straight
line segment corresponding to an exponent of −2 is
represented in Fig. 4c-f.
CASE OF A POPULATION OF
CURVES
DATA
For the sake of illustration suppose that Fig. 1a
is a UR quadrat from a large observation region
of the plane containing the curves of interest. For
instance, this quadrat could be one from an extensive
grid of systematic quadrats encompassing the whole
population. The curves analyzed from a properly
sampled quadrat should be chosen according to some
unbiased rule, such as the unbiased frame (also called
the forbidden line) rule (Gundersen, 1977, see also
Howard and Reed, 2005, or Baddeley and Jensen,
2005). An unbiased counting – or sampling – rule,
combined with UR quadrat sampling, warrants a UR
sample of items because all the items in the population
will have identical a priori probabilities of being
included in the sample. A guard area should also be
uniquely defined for all quadrats, to warrant that any
curve that is sampled can be entirely observed for
measurement. This means that the effective quadrat
analysed (not shown in Fig. 1a) will generally be
smaller than the raw photograph. Suppose that the 9
curves numbered in Fig. 1c constitute a proper UR
sample from the relevant population. The curve next to
curve number 1 is apparently the union of two curves
that could not be separated, and it has been removed
as convenient for the present illustration. Imagine
a horizontal line sweeping the quadrat from top to
bottom. The curves tagged 1,2, . . . ,9 were the 1st, 2nd,
. . ., 9th met by such line. (Actually, the sweeping line
rule – see for instance Howard and Reed, 2005, Fig. 5.4
– is also unbiased for sampling, or counting, bounded
objects in a bounded region).
Table 1. Data corresponding to the curves labelled 1–
9 in Fig. 1c. B: curve length computed via Eq. 3. B̂:
curve length estimate, based on intersection counting,
computed via Eq. 12a-c with a square grid of gap
length T = 25 nm. varw(B̂|B) : error variance of B̂
computed by Eq. 20 via Eq. 16–19. ce(B̂|B) = 100 ·
{varw(B̂|B)}1/2
/
B̂.
Curve B B̂ varw(B̂|B) ce(B̂|B)
(nm) (nm) (%)
1 404 438 2615 11.7
2 397 430 2156 10.8
3 385 399 1254 8.9
4 399 339 2558 14.9
5 393 333 1484 11.6
6 392 351 1532 11.2
7 427 402 1081 8.2
8 417 354 2114 13.0
9 389 385 951 8.0
Mean 400.3 381.2 1749.44 -
Var 187.75 1515.94 - -
The data pertinent to the 9 tagged curves in Fig. 1c
are collected in Table 1. The ‘exact’ length B of each
curve was computed from Eq. 3. The Buffon-Steinhaus
estimator B̂(ω,x) of B was computed in all cases from
a single, computer generated IUR superimposition of
the curve onto a fixed square grid of gap length T = 25
nm. This gap length was approximated from Eq. 30
to warrant E(I) ≈ 20 intersections, with B replaced
with the sample mean of the 9 true lengths, namely
400.3 nm. Further, a single predictor var{B̂(ω,x)|B}
was computed for each curve via Eq. 20 using the
automatically scored intersection counts.
ASSESSMENT OF THE MODEL ERROR
VARIANCE PREDICTOR
The data in Table 1 allows an indirect assessment
of the quality of the stereological error variance
9
GOMEZ, AI et al.: On the precision of planar curve length estimation
Fig. 4. (a), (b) Empirical Monte Carlo replicates of the intersection counting curve length estimates with 10
square grids of different sizes (ranging from T = 25 nm to the left, to T = 4 nm to the right end of the horizontal
axis). A total of 1024 replications were generated in each case. The red dots represent the empirical means, which
approximately coincide with the true curve length in all cases (as expected because the estimator is unbiased).
The grey region is a 95% confidence band, that is, it encloses 95% of the replications; the outer line bounds
enclose 100% of them. (c, d) The grey region is a 95% confidence band for the 1024 Monte Carlo replications of
the model based predictor given by Eq. 20. The outer limits enclose all the replications. It is seen that the 95%
confidence band always contains the empirical (i.e, the nearly true) CE2, (black polygonal line) for the entire
range considered, and for each of the two curves. (e, f) Empirical (continuous curves) and corresponding model
based means (dotted) squared coefficient of error components. The Cauchy components are in pink and in red
colour, respectively, whereas the Cavalieri components are in blue.
10
Image Anal Stereol 2016;35:1-14
predictor proposed in this paper (Eq. 20).
The variance Varb(B̂) of the unbiased length
estimator B̂ = B̂(ω,x) between curves (hence the
subscript ‘b’) may be decomposed as follows,
Varb(B̂) = Varb(B)+EbVarw(B̂|B) , (34)
where Varw(B̂|B) represents the stereological error
variance within a curve (hence the subscript ‘w’). The
sample version of the preceding decomposition reads,
varb(B̂) = varb(B)+meanb
{
varw(B̂|B)
}
. (35)
From the data displayed in Table 1 we obtain
varb(B̂) = 1515.94 and varb(B) = 187.75, whereby
Eq. 35 yields the indirect, model free estimate,
meanb
{
varw(B̂|B)
}
= 1515.94−187.75
= 1328.19 .
(36)
Note that the preceding estimate is model free because
Eq. 34 is universally valid provided that B̂ is unbiased,
namely that E(B̂|B) = B. On the other hand, from the
fourth column of Table 1 the direct sample mean of the
individual model based variance predictors is,
meanb
{
varw(B̂|B)
}
= 1749.44 , (37)
which deviates from the model free estimate by a
reasonable 32%.
ESTIMATION OF THE POPULATION
MEAN CURVE LENGTH
Often, the target parameter is the mean length of
a population of curves such as the DNA molecules
in Fig. 1a. Let E(B) represent the population mean
curve length. If the curves were digitized and measured
automatically, then E(B) could be estimated by
B = 400.3 nm ,
ce(B)% = 100 ·
√
187.75/9
400.3
= 1.14% .
(38)
On the other hand, if curve length was estimated
from about 20 intersections each, then E(B) would be
estimated by
B̂ = 381.2 nm ,
ce(B̂ )% = 100 ·
√
1515.94/9
381.2
= 3.40% .
(39)
The preceding result would be similar if the
intersections were counted directly by hand on the
original images, with no image processing. Apart
from time and effort, biases arising in the digitization
procedure would be avoided.
Note that the within curve error variance estimates
varw(B̂|B) do not enter in the preceding calculations.
Their knowledge, however, may be used to design the
sampling protocol. The required coefficient of error of
the mean depends on the purposes of the experiment
– see Section 8.4 How many animals? from Cruz-
Orive et al. (2004) for details, (here the latter title
should better read How many curves?). For the present
purposes we simplify the situation as follows. Suppose
that we want to count a total of 400 intersections. If we
count 20 intersections in each of 20 curves, then we
might expect,
ce(B )% = 100 ·
√
1515.94/20
381.2
= 2.28% . (40)
On the other hand, if we rather want to count 40
intersections in each of 10 curves, then Varw(B̂|B) may
be expected to be reduced by a factor of about 4,
because the latter variance is roughly proportional to
(EI)−2, (Fig. 4), and we are doubling the number of
intersections per curve. Thus, in this case,
meanb
{
varw(B̂|B)
}
= 1328.19/4 = 332.05 , (41a)
varb(B̂) = 187.75+332.05 = 519.80 (41b)
ce( B̂ )% = 100 ·
√
519.80/10
381.2
= 1.89% . (41c)
The predicted precision of the mean is therefore of
about 2% in either case. The former option (i.e.,
working less in more curves) should generally be
the better one (Gundersen and Østerby, 1981). First,
counting too many intersections per curve is tedious.
Second, the foregoing formulae for the coefficient of
error of the mean require independence between curve
estimates. Sampling a few curves in a relatively small
region is likely to violate this condition.
It is worth mentioning that, if the target parameter
is E(B), then it is not necessary to measure
individual curve lengths, as above. A more convenient
(and probably more efficient) design consists in
superimposing – on the (extensive) field of interest, see
for instance Fig. 1 of Podestà et al. (2005) – systematic
quadrats combined with test lines. The upper edge of
each quadrat, for instance, could be adopted as a test
line, although the test lines may be placed outside the
quadrats. The superimpositions should be IUR relative
to the curves, or just UR if the test lines consisted
of half circles. Let l/a cm−1 denote the known test
line length per quadrat area. Further, let Qi denote the
number of individual curves counted in the ith quadrat
according to the unbiased frame rule, say. Also, let
Ii denote the corresponding number of intersections
scored with test lines associated with the ith quadrat,
11
GOMEZ AI ET AL: On the precision of planar curve length estimation
and let n represent the total number of quadrats. Then,
a ratio unbiased estimator of E(B) is,
B =
π
2
· a
l
·
n
∑
i=1
Ii
/ n
∑
i=1
Qi cm . (42)
In this case Var(B) may be predicted by Cochran’s
formula (Cochran, 1977; Howard and Reed, 2005,
p.158), provided that the distance between quadrats
is large enough to assume independence between the
n data pairs {(Ii,Qi)}. Similar designs are illustrated
in Chapter 12 of the latter book. Note that: (a) If
a quadrat does not lie entirely inside the reference
area, then we count curve(s) just in the available
quadrat area anyway, always respecting the unbiased
rule adopted. (b) The intersections are counted on
any curve that is intersected by any test line, whether
it was counted in the quadrats, or not. In other
words, there will generally be counted curves that do
not contribute any intersections, and curves that do
contribute intersections but are not counted.
DISCUSSION
The main purpose of this paper was to check
the accuracy of a previously published error variance
prediction formula for curve length estimation by
intersection counting with a square grid. Each of nine
finite linear features (namely flattened DNA molecule
images, see Fig. 1a) was repeatedly superimposed
with isotropic uniform randomness onto fixed square
grids of various sizes. The subsequent replicates of the
variance predictors were thereby compared against the
empirical (i.e., close to true) error variance. The overall
performance of the prediction formula was satisfactory
– the results corresponding to two of the curves
(namely a fairly isotropic and a fairly anisotropic one)
are displayed in Fig. 4c, d.
The error variance has a rotation (Cauchy) and
a translation (Cavalieri) component. The prediction
of the former (Eq. 11) was relatively poor, whereas
the one of the latter was satisfactory (Fig. 4e, f).
Because the Cauchy contribution was relatively very
low, however, the overall prediction was satisfactory.
It may be didactic to consider the extreme case
in which the target curve is a straight line segment
of length B. Here the Cauchy estimator based on two
mutually orthogonal projections (Eq. 10) reads,
B̂(ω) =
π
4
B · (cosω + sinω) ,
P(dω) =
2
π
dω, (0≤ ω < π/2) ,
(43)
and its exact square coefficient of error is,
CE2{B̂(ω)}= π
2 +2π
16
−1 , (44)
so that CE{B̂(ω)}%≈ 9.77%, which is non negligible.
As soon as the curve is not too anisotropic, however,
the preceding CE decreases rapidly. On the other hand,
for a straight line segment the Cauchy error variance
predictor given by Eq. 11 reads
var{B̂(ω)}= π
2
240
B2 · (cosω− sinω)2 , (45)
and its relative bias is,
Eωvar{B̂(ω)}
Var{B̂(ω)}
−1 = 1
15
· π
2−2π
π2 +2π−16
−1
≈ 0.56 ,
(46)
namely 56%. It should be borne in mind, however, that
B̂(ω) is based on only two observations.
The Monte Carlo experiment involved 1024
automatic IUR superimpositions of each curve onto
each fixed grid. To do this each curve was
approximated as described in Section Processing of the
curve images by a polygonal curve with known vertex
coordinates. The procedure involved the successive
use of four software packages (with some manual
editing at the later stages) and took about 10–15 min
per curve on average. Moreover, the manipulations
will introduce some bias in the curve length. On
the contrary, intersection counting estimation can be
implemented directly on each original curve in less
than 20s, and the only bias present may be due to
the possible difference between the length of a ‘free’
DNA molecule, and its flattened projection onto the
observation plane. It is in fact pausible that all the DNA
molecules considered here had the same length, and
that the different lengths reported in the first column
of Table 1 were due to the sum of two biases, namely
the flattening bias plus the image processing one. This
is impossible to ascertain: the best way to handle bias
is to avoid it. Note also that the curves themselves
might not be separable automatically – see for instance
the curve lying between curves 1 and 2 in Fig. 1c. A
human, however, will soon realize that this curve must
be the union of two. Thus, apart from the possibility
of performing automatic experiments to investigate
higher order properties, as illustrated in this study,
automatic image processing cannot be recommended
if the only purpose is to estimate mean curve length in
a not too extensive case study.
The present study is relatively academic, in the
sense that it is restricted to a single bounded curve. Our
main concern was the prediction of the within curve
12
Image Anal Stereol 2016;35:1-14
error variance Varw(B̂|B). Often, however, the target
quantity may be the population mean curve length
(Subsection Estimation of the population mean curve
length).
On the other hand, only the square grid was
considered here. In practice, grids consisting of
separate straight line segments, half circles, etc.,
can be more efficient than the square grid (which
often tends to yield too many intersections). In
addition, test systems consisting of half circles,
such as the Merz grid (Weibel, 1979; Howard and
Reed, 2005) may be convenient because they do not
require a random orientation relative to the curve.
Unfortunately, to our knowledge no general error
variance prediction formula exists for a general grid. It
is worth mentioning, however, that the ‘fakir predictor’
proposed for a test grid of wavy cycloids in Cruz-
Orive et al. (2014, Eq. 42) proved to be satisfactory
for digitized brain sections (Fig. 5 of that study). The
formula could be tried also for separate, systematically
arranged test segments, half circles, or more general
bounded test curves. The formula requires recording
the intersection counts separately for each fundamental
test curve, and arranging them into a rectangular
matrix (p. 132 of the latter paper). The problem,
however, remains basically open.
In short, the intersection counting method is highly
recommended to estimate curve length because it is
direct (i.e., it does not require image processing as long
as the intersection points can be scored unambiguously
in the original images), design unbiased, and rather
efficient. Also, if a square grid is used, then this study
shows that the error variance may be predicted fairly
reliably from a single grid superimposition on the
target curves considered. Semiautomatic stereological
devices (several brands of which are available on the
market) may be of considerable help here because they
will generate the desired test grid automatically on
the computer screen, with IUR position relative to the
target curve images. The intersection counts are then
recorded manually, and may be exported for instance
into the free software R (http://www.r-project.org/) to
compute the estimates.
ACKNOWLEDGEMENTS
We are indebted to David Dunlap (Emory
University, GA, USA) and to Alessandro Podestà
(University of Milano, Italy) for lending us an original
tiff micrograph of Fig. 1a.
The authors acknowledge financial support from
the Spanish MICINN Project AYA2015-66357-R.
REFERENCES
Baddeley AJ, Jensen EBV (2005). Stereology for
statisticians. London: Chapman & Hall/CRC.
Cochran W (1977). Sampling Techniques. 3rd Ed. New
York: J. Wiley & Sons.
Cruz-Orive LM (1989). Precision of systematic sampling
on a step function. In: Hubler A, Nagel W, Ripley
B, Werner G, eds., Geobild’89, pp. 185–193. Berlin:
Akademie Verlag.
Cruz-Orive LM, Gelšvartas J, Roberts N (2014). Sampling
theory and automated simulations for vertical sections,
applied to human brain. J Microsc 253:119–50.
Cruz-Orive LM, Gual-Arnau X (2002). Precision of circular
systematic sampling. J Microsc 207:225–42.
Cruz-Orive LM, Insausti A, Insausti R, Crespo D
(2004). A case study from Neuroscience involving
Stereology and Multivariate Analysis. In: Evans S,
Janson A, Nyengaard J, eds., Quantitative Methods in
Neuroscience, chap. 2. Oxford: Practical Neuroscience
Series, Oxford University Press, 16–64.
Garcı́a-Fiñana M, Cruz-Orive LM (2004). Improved
variance prediction for systematic sampling on R.
Statistics 38:243–72.
Gundersen HJG (1977). Notes on the estimation of the
numerical density of arbitrary profiles: the edge effect.
J Microsc 111:219–23.
Gundersen HJG, Østerby R (1981). Optimizing sampling
efficiency of stereological studies in biology: or ’do
more less well!’. J Microsc 121:65–73.
Howard V, Reed MG (2005). Unbiased stereology: Three-
dimensional measurement in microscopy, 2nd Ed.
Oxford: Bios/ Taylor & Francis.
Kiêu K, Souchet S, Istas J (1999). Precision of systematic
sampling and transitive methods. J Stat Plan Inference
77:263–79.
Miles R (1974). On the elimitation of edge effects in planar
sampling. In: Kendal E, Harding D, eds., Stochastic
Geometry: A tribute to the memory of Rollo Davidson.
London: John Wiley and Sons, 228–47.
Miles R, Davy P (1976). Precise and general conditions
for the validity of a comprehensive set of stereological
fundamental formulae. J Microsc 107:211–26.
Moran PAP (1966). Measuring the length of a curve.
Biometrika 53:359–64.
Podestà A, Imperadori L, Colnaghi W, Finzi L, Milani
P, Dunlap D (2004). Atomic force microscopy study
of DNA deposited on poly l-ornithine-coated mica. J
Microsc 215:236–40.
Podestà A, Indrieri M, Brogioli D, Manning GS, Milani P,
Guerra R, Finzi L, Dunlap D (2005). Positively charged
surfaces increase the flexibility of DNA. Biophys J
13
GOMEZ AI ET AL: On the precision of planar curve length estimation
89:2558–63.
Santaló L (1976). Integral geometry and geometric
probability. Reading, MA: Addison-Wesley.
Steinhaus H (1930). Zur Praxis der Rektifikation und
zum Längenbegriff. Berichte Sachsischen Akad Wiss
Leipzig 82:120–30.
Weibel ER (1979). Stereological methods. Practical
methods for biological morphometry, vol. 1. London:
Academic Press.
14