Image Anal Stereol 2016;35:15-28 doi:10.5566/ias.1378
Original Research Paper
A ROBUST ESTIMATION TECHNIQUE FOR 3D POINT CLOUD
REGISTRATION
DHANYA S PANKAJ AND RAMA RAO NIDAMANURIB
Indian Institute of Space Science and Technology, Valiamala, Thiruvananthapuram, Kerala, India
e-mail: dhanyaspankaj@gmail.com, ramarao.iit@gmail.com
(Received July 30, 2015; revised November 5, 2015; accepted December 4, 2015)
ABSTRACT
The 3D modeling pipeline involves registration of partially overlapping 3D scans of an object. The automatic
pairwise coarse alignment of partially overlapping 3D images is generally performed using 3D feature
matching. The transformation estimation from matched features generally requires robust estimation due to
the presence of outliers. RANSAC is a method of choice in problems where model estimation is to be done from
data samples containing outliers. The number of RANSAC iterations depends on the number of data points and
inliers to the model. Convergence of RANSAC can be very slow in the case of large number of outliers. This
paper presents a novel algorithm for the 3D registration task which provides more accurate results in lesser
computational time compared to RANSAC. The proposed algorithm is also compared against the existing
modifications of RANSAC for 3D pairwise registration. The results indicate that the proposed algorithm tends
to obtain the best 3D transformation matrix in lesser time compared to the other algorithms.
Keywords: 3D registration, robust estimation, RANSAC.
INTRODUCTION
A majority of the 3D vision applications require a
complete or registered 3D image of the object under
consideration. Due to the scanning limitations of the
3D acquisition devices and self occlusion caused by
the geometry of the objects, the complete 3D scan of
an object is usually acquired as multiple overlapping
partial scans. The registration of these overlapping
partial 3D scans is an important stage of 3D modeling
pipeline (Gomes et al., 2014). The partial scans are
acquired by moving the object in front of the scanning
device, or by moving the scanner around the object or
by using multiple scanning devices. The partial scan
data are represented in local co-ordinate system of the
current scanner position. The goal of 3D registration is
to bring all of the partial scan data into a common co-
ordinate system. The registration of the partial scans
into a complete single 3D scan is usually accomplished
in multiple stages. Two of the common stages are ‘pair-
wise registration’ and ‘multi-view registration’ (Dorai
et al., 1996). The survey in Tam et al. (2013) gives
an excellent account of the various 3D registration
algorithms.
The pairwise registration stage involves
registration of two partially overlapping scans at a
time. It aims to find a 3D rigid body transformation that
transforms one of the partial scans into the co-ordinate
system of the other. The multi-view registration aims
at finding a set of rigid body transformations that will
seamlessly register all the partial scans into a common
co-ordinate system. The information from the pair-
wise registration stage is used for guiding the global
multi-view alignment stage (Pulli, 1999). We deal with
the pair-wise registration problem in this paper. The
pair-wise registration problem is generally solved in
two stages – coarse registration or initial alignment and
fine registration (Campbell and Flynn, 2001). Coarse
registration stage involves finding out a rough initial
alignment between the scans. Once an initial alignment
is obtained, fine registration stage refines the alignment
to further improve the same. The commonly used
algorithm for fine registration is iterative closest point
(ICP) and its variants (Besl and McKay, 1992; Chen
and Medioni, 1992; Rusinkiewicz and Levoy, 2001).
Most of these ICP based algorithms require a good
initial alignment to obtain a good fine registration.
The initial alignment can be obtained using different
techniques. A controlled scanning environment may be
used to measure the transformations, but it is usually
expensive and limits the possible views. Another
commonly used technique is to align the scans by
manually selecting the corresponding points. The
popular and automatic algorithm to align the scans
is by feature matching. A set of corresponding points
in the two scans is obtained by matching local 3D
features.
In order to align the scan pairs, corresponding
points in the two point clouds should be determined.
The point clouds are preprocessed to remove
noise,surface normals are estimated (Klasing et al.,
2009) and then the keypoints are extracted. A
classification of the 3D keypoints in literature can
15
PANKAJ DS ET AL: 3D Registration
be found in Salti et al. (2011). Some commonly
used 3D keypoint detectors are local surface patches
(Chen and Bhanu, 2007), shape index (Koenderink
and van Doorn, 1992), keypoint detector presented
in intrinsic shape descriptors (ISS) (Zhong, 2009),
keypoint quality index (Mian et al., 2010), 3D SIFT,
3D SURF (Knopp et al., 2010) etc. A recent evaluation
of the 3D Keypoint detection algorithms is presented
in Filipe and Alexandre (2013). This work (Filipe and
Alexandre, 2013) indicates that ISS (Zhong, 2009) is
an efficient keypoint detector in terms of repeatability
and efficiency. 3D feature descriptors, which are
invariant to rigid transformations, are extracted at
the keypoints, based on a local neighborhood around
them. Structural indexing (Stein and Medioni, 1992),
point signatures (Chua and Jarvis, 1997), spin
image (Johnson and Hebert, 1999), shape indexes
(Koenderink and van Doorn, 1992), 3D shape context
(Frome et al., 2004), ISS (Zhong, 2009), PFH (Rusu et
al., 2008), FPFH (Rusu et al., 2009), SHOT (Tombari
et al., 2010) etc, are some of the widely cited 3D
descriptors available in literature.
Once the features are extracted at the keypoints,
they are matched across the two point clouds to find
the corresponding points. The features are matched by
constructing feature k-d trees (Friedman et al., 1977)
of the two point clouds and then finding the nearest
neighbor of a source feature point in the target k-d
tree. The k-d tree search is made efficient with the use
of approximation algorithms like FLANN (Muja and
Lowe, 2009). Correspondence rejection algorithms can
be employed to reject the wrong correspondences
obtained by this algorithm.
Once the corresponding points are obtained, least
square algorithms are available in literature to calculate
the 3D transformation (Arun et al., 1987). However,
points may get wrongly matched because of partial
overlap, noise in the scan, local nature of features,
inaccuracies in the previous stages etc. Thus the
percentage of correct matches (inliers) in a set of
correspondences can be very low. The performance
of least square estimators deteriorates as the number
of inliers to the model goes down. Robust estimators
like least median of squares (Rousseeuw, 1984), least
trimmed squares (Rousseeuw and Leroy, 2005) etc,
which can tolerate outliers are developed by the
statistical community. Most of these estimators have
a breakdown point up to 50% inliers. The higher the
breakdown point, the more robust the estimator is
(Wang, 2004). However as Wang (2004) points out,
most of these estimators are not practical in the case of
many computer vision applications as many problems
require a breakdown point of more than 50%. To
deal with such problems, many robust estimators like
RANSAC (Fischler and Bolles, 1981), Hough transform
(Hough, 1962) etc are developed within the computer
vision community. In the case of 3D registration,
the percentage of inliers depends upon many factors
as mentioned above and cannot be guaranteed to be
greater than 50%. Hence the 3D transformation model
estimation using feature matching usually employs one
of the robust estimators.
The number of RANSAC iterations depends on
the number of correspondences, the complexity of
the model and the percentage of inliers (Chum and
Matas, 2005). If the percentage of inliers is very
low, the convergence of RANSAC is generally slow.
To deal with these issues, many variants to RANSAC
like MLESAC (Torr and Zisserman, 2000), progressive
sample consensus (PROSAC) (Chum and Matas, 2005),
locally optimized RANSAC (LoSAC) (Chum et al.,
2003), NAPSAC (Myatt et al., 2002) etc have been
proposed by the computer vision community (Choi et
al., 2009). Many of these techniques are applied to
2D vision problems like epipolar geometry estimation,
motion segmentation, homography estimation, object
recognition, image retrieval etc. However, adaptations
of these methods to the 3D registration problem are not
available in literature to the best of our knowledge.
In this paper we propose a novel robust estimation
algorithm which provides accurate results for 3D
registration in less computational time. A fast and
accurate estimation of 3D transformation is crucial in
applications where real time acquisition of 3D data
and modeling are done. A good initial alignment phase
helps in avoiding a large number of closely spaced
scans and also reduces the time spent in the fine
alignment stage.
Section Methods discusses the 3D registration
problem and the proposed approach. Results section
explains the experiments conducted for evaluating
the algorithm and the results obtained. Section
Discussion provides a brief discussion of the results
and factors influencing the algorithm. Conclusion
section concludes the paper.
METHODS
In the proposed approach, the 3D transformation is
estimated from minimal samples, similar to RANSAC.
In the case of 3D feature matching, the distance
between the features is available as a rough estimate
of the quality of the match,even though not completely
reliable. This information can be efficiently utilized for
guiding the sampling. Utilizing additional information
is more efficient than the random sampling strategy
of RANSAC as pointed out by Chum and Matas
16
Image Anal Stereol 2016;35:15-28
(2005). Different guided sampling strategies have been
proposed in literature. NAPSAC (Myatt et al., 2002) is
based on the assumption that the probability of finding
an inlier adjacent to another inlier is high. This is
based on the assumption that inliers of a model tends
to lie closely. However this assumption is problem
dependent and generally does not hold in the case of
3D transformation estimation where all the selected
points confined to a particular area may lead to a local
transformation model. In fact, steps are usually taken
to avoid the sample points lying close together and
they should be distributed across the shape for more
accurate alignment.
Another variant of RANSAC which uses a guided
sampling process is PROSAC (Chum and Matas, 2005).
It utilizes the extra information, like the quality of the
data points, usually available in most of the computer
vision tasks. Instead of sampling randomly from the
data point pairs, samples are selected randomly from
a subset of the correspondence set sorted by quality
of correspondences. The subset is formed initially
from the top-ranked correspondences, grows gradually
and eventually degenerates into the random sampling
set, as the iteration progresses. Thus the sampling
method finds balance between the random sampling
and the sampling based purely on the quality of
correspondence matches. As more promising samples
are considered early, this can lead to significant
computation time improvements to the algorithm.
However, if the quality score is not very indicative of
the correctness of the match, then the accuracy of the
estimate may suffer. Having a lot of self similar points
in the model can also lead to sub-optimal solutions
as the feature match quality of these wrong matches
may be high. Since only a subset of the whole set
of corresponding points is considered for sampling
where all combinations of matches are not explored,
and since in some cases the matching score may not
very reliable, this sampling can lead to sub-optimal
solutions.
In order to address the sub-optimal solution
obtained by selective sampling of subsets of
correspondences, we propose to perform a refinement
of the calculated transformation model. Once the
transformation model is calculated from the minimal
sample, the number of inliers to the model is found
out. The corresponding points which fall within a
distance threshold, after the transformation is applied,
are treated as inliers. The threshold selection depends
upon the magnitude of noise in the data. Then the
transformation model which gives the maximum
number of inliers is calculated. An all-inlier sample
may not lead to a transformation model which finds
all the inliers (Chum et al., 2003). The solution
obtained usually lie near to the optimal solution and
can be refined locally. The refinement is performed
when the the current best solution is found in the
iteration. Multiple loops of ‘model estimation –
inlier calculation’ is executed by gradually varying
the threshold from high to the normal value. The
process of varying the threshold helps in considering
points lying closer to the present inliers initially for
model estimation. The solution is then refined as
the threshold is decreased gradually. This process of
local refinement typically leads to an optimal solution
compared to the current best solution. However
since the process is repeated every time the current
best solution is obtained, this can result in more
computational time. The computational burden caused
by the local optimization is compensated by the
computational savings obtained by guided sampling.
PROPOSED ALGORITHM
The proposed algorithm for transformation
estimation is given in Algorithm 1.
The set of N correspondences obtained from
3D feature matching are provided as input to
the algorithm. The correspondences are then sorted
according to the quality of the matched features. A
set of n corresponding points with highest quality is
denoted as Un. A sample consists of the minimum
number of corresponding points (m = 3) required
to estimate the 3D transformation. The samples are
drawn from the (growing) subset Un of the total
correspondences. Quality of a sample is the quality
of the corresponding point pair in it with minimum
quality. Let TN be the number of samples (sample
size m) drawn by standard RANSAC from the set of
N correspondences. Let Tn be the average number of
samples in the original set of TN samples, having data
points only from Un and this is given by Eq. 1:
Tn = TN
(n
m
)(N
m
) ,
Tn+1 =
n+1
n+1−m
Tn , (1)
T ′n+1 = T
′
n + dTn+1−Tne .
A recurrence relation for finding Tn+1 is given by
Eq. 1. As the values of Tn are not integer in general,
T ′n+1 is given by Eq. 1 where T
′
n = 1. The growth
function for the sample generation set Un is formed
from T ′n using the Eq. 2. The growth function should
reflect the result of previous tests and this is indicated
by k, the number of tests conducted so far, since
RANSAC runs are typically characterized by a success
followed by a number of failures. Samples are drawn
17
PANKAJ DS ET AL: 3D Registration
randomly from the subset Un which grows in size from
n = m to TN according to the growth function:
g(k) = min
{
n : T ′n ≥ k
}
. (2)
Once the minimum sample is obtained as
described, a 3D transformation model is computed
from it using least square technique (Arun et al., 1987;
Umeyama, 1991). Given a set of point pairs {xi,yi},
i= 1 : N, a rigid body transformation, which minimizes
the accumulated distance between the corresponding
points after applying the transformation, is calculated.
This distance is given by Eq. 3 where R and t
represents the 3D rotation matrix and translation vector
respectively, xi and yi represents the 3D co-ordinate
vectors of the point pair i:
E2 =
1
N
N
∑
i=1
‖yi− (Rxi + t)‖2 , (3)
e2 = ‖yi− (Rxi + t)‖2 . (4)
The 3D transformation is applied to the source
point in each of the corresponding point pairs and the
distance between the target point and the transformed
source point is computed for each of the point pairs.
This distance for a point pair i is given by Eq. 4. If
this error falls within a distance threshold, then the
point is considered as an inlier to the model calculated.
The threshold is calculated empirically and is based on
the precision or inter-point distance of the point cloud.
The set of all such inliers is formed. The sampling
and model generation phase is repeated until a local
best solution or the current best number of inliers is
obtained.
Once a local best solution is obtained, a refinement
of the solution is carried out. A hybrid iterative
and inner RANSAC algorithm (Chum et al., 2003) is
implemented for model refinement. Once the current
best solution is obtained, a sample is selected randomly
from the set of inliers. This sample size can be greater
than the minimal size as we are sampling from the
set of inliers. In our experiments it is taken to be the
maximum of 3 and half the number of inliers. Then
the transformation model computed from this sample
is further refined iteratively by varying the threshold
for finding the inliers and estimating transformation
from the inliers. This is repeated kLO times and the best
number of inliers is updated depending on the result.
The proposed solution takes care of the extra
computational burden caused by local optimization
steps by utilizing the time savings obtained by guided
sampling. Also the sub-optimal solution attributed to
the limited number of samples due to guided sampling
is taken care of by the local optimization where
samples are generated from the set of current best
inliers. Thus samples other than those from the sample
generation set in guided sampling are also considered.
The stopping criterion of the loop is calculated as
follows. Let m be the size of the minimum sample, and
ε be the percentage of inliers. In k iterations we will
be drawing k samples. If we are drawing k samples,
then the probability of not finding an all-inlier sample
in k samples is (1− εm)k. k is selected such that this
probability falls below η0. This is given by Eq. 5 where
PIn represents the probability of obtaining an inlier
which is approximated as the percentage of inliers, ε:
k ≥ log(η0)/ log(1−PIn) . (5)
In the proposed algorithm, the samples are
generated from a small subset of correspondences.
Hence an additional check is also employed to ensure
that the model obtained is not randomly endorsed by
a set of outliers (Chum and Matas, 2005). Assuming
binomial distribution for the cardinality of the set of
random ‘inliers’, the probability of obtaining a random
inlier set with size i is given by Eq. 6:
PRn (i) = β
i−m(1−β )n−i+m
(
n−m
i−m
)
. (6)
For each sample generation set with size n, the
minimum number of inliers Inmin is calculated so that
the probability of obtaining a random support of that
size falls below a threshold, ψ . This is given by Eq. 7.
Iminn = min{ j :
n
∑
i= j
PRn (i)< ψ} . (7)
Then, only a model which provides the number
of inliers In greater than this minimum support size is
selected as a solution, given by Eq. 8.
In > Iminn . (8)
The iterations are repeated until the two conditions
given by Eqs. 5, 8 are satisfied.
RESULTS
The partially overlapping point clouds of various
objects acquired for 3D modeling task were used
for testing. Scans from two different datasets were
considered for the study. Scans of three models
from the Stanford dataset (Levoy, 2005), acquired
using Cyberware 3030 MS scanner – Buddha, Bunny
and Dragon models – were registered. The scans of
Chef, Chicken and Parasaurolophus models, acquired
18
Image Anal Stereol 2016;35:15-28
Algorithm 1 Proposed Algorithm
{%comment N : Number of correspondences
m : Minimum Sample Size( 3 for 3D registration )
η0 : Threshold for stopping criterion % comment }
n∗ ← N, n← m, k ← 0, I∗ ← 0, k∗ ← 1 T ′n ← 1,
T ′N ← 200000,
k < k∗
1. Select model generation set
k← k+1 k = T ′n and n < n∗
n← n+1
where Tn, Tn+1 and T ′n+1 are given by Eq. 1
Sample selection of size m T ′n < k
Sample contains m−1 points at random from Un−1
at random and un
Select m points from Un at random
2. Model Generation
Generate model or estimate 3D transformation from
the sample
3. Model Verification
Find inliers Ik Ik > I∗
A. Run Local Optimization Step
mmin← min(Ik/2,m)
kLO← 10
K← 5
a. Select mmin samples from the inliers Ik
b. Generate model from the samples
c. Iterative
Find inliers with distance < Kθ
Generate model from these inliers using linear
algorithm
Reduce threshold and repeat step Iterative until
threshold = θ
d. Repeat steps a to c kLO times and store the best
model and best I∗
B. Select termination length n∗ and k∗ such that
equations
kn∗ ≥ log(η0)/log(1− εmn∗)
and
I∗n ≥ Iminn∗
are satisfied.
C. I∗← Ik
D. k← k+1
by Minolta scanner from the University of Western
Australia dataset (Mian et al., 2006) were also tested.
Actual transformations are available for the Stanford
models and hence this information was used for
evaluating the performance of the algorithms. In
the case of the UWA dataset models, the actual
transformations are not available and hence the
distance between the corresponding point pairs after
registration was considered as an error measure for
evaluating the performance of the algorithms. The .ply
files were converted to .pcd files before processing.
The .pcd file contains the 3D co-ordinates information.
We have not considered intensity or color information
for this study.
In order to evaluate the performance of the
proposed algorithm, 3D pairwise registration of
partially overlapping point clouds was implemented
using the three existing algorithms RANSAC, the
variants of Ransac – PROSAC (Chum and Matas, 2005)
and LoSAC (Chum et al., 2003) – in addition to the
proposed algorithm.
One of the 3D images (source) was registered to
the other (target) by estimating the 3D rigid body
transformation. The 3D image pairs were down-
sampled using voxel-grid filtering and processed
using statistical filtering. The normals at data points
were computed using PCA and ISS keypoints were
extracted. 3D shape context descriptors were extracted
at the keypoints in both the point clouds. The
correspondences of the keypoints in the source point
cloud were found in the target point cloud by matching
the features. The features were matched by using
the nearest neighbor search in feature k-d trees. The
best correspondences were found using the compared
algorithms where the 3D transformation estimation
was obtained by SVD based least squares method
(Arun et al., 1987; Umeyama, 1991). A minimum
distance constraint was applied while sampling the
points which ensure that the sampled points are
distributed across the point cloud and are not crowded
together and leads to better transformation estimation.
The point clouds were processed and the set
of corresponding point pairs were obtained along
with the distance between the corresponding feature
vectors. These corresponding points were given as
input to all the four algorithms and the resulting
transformation results were compared. In the case of
Stanford models, the actual transformation in terms of
3D translation units and the 3D rotation represented
as unit quaternions, with respect to the common co-
ordinate system is available. This information was
used to calculate the transformation between the pairs
of point clouds considered for pairwise registration.
After registration, the obtained 3D transformation
matrix was converted into x, y, z translation units and
the rotation unit quaternion. The norm of the distance
19
PANKAJ DS ET AL: 3D Registration
between the translation units and the dot product
between the rotation unit quaternions were computed.
Lesser the norm of the translation difference, better
the computation of 3D translation. The rotations are
compared by taking the dot product of the estimated
and actual quaternions. Since the quaternions are
normalized, the dot product of value of 1 indicates
that the 3D rotations are approximately equal. The
dot product value indicates how close the estimated
rotation is to the actual rotation. Higher the dot product
value, better the estimated rotation. The time taken by
the algorithms in the main loop on an Intel I3 3.10
GHz processor was also computed and is reported in
milliseconds.
For the UWA datasets, the actual transformations
are not available. Hence for evaluation, instead of
translational distance norm and dot product between
quaternions, an error measure called transformation
validation score was used. This score is a measure
of the distance between the corresponding points of
the pair of scans after transformation is applied. To
account for the points with no actual corresponding
points, a distance threshold was applied so that the
points greater than a distance apart are filtered from
the calculation. The score is normalized with respect
to the number of correspondences found. The lower
the transformation validation score, the better is the
alignment and the registered clouds were additionally
verified manually for ensuring this. The number of
inliers to the computed transformation model was
also computed. The number of hypotheses or models
evaluated was also computed and this is an indication
of the number of samples tested before finding the
final transformation.The experiments were repeated
50 times and the average values are reported. The
algorithm was implemented in C++ with the help of
PCL library (Rusu and Cousins, 2011).
The partially overlapping scans of the Stanford
model Buddha were registered and the various
evaluation metrics were compared across the different
algorithms. The x-axis indicates the different scan
pairs in all the figures. The dot product of the actual
rotation unit quaternion and the one obtained from
the registration was computed and plotted for different
set of scan-pairs, for the different algorithms and is
given in Fig. 1. From Fig. 1, we can see that the
proposed algorithm and LoSAC obtain consistently
good rotation estimates across the different scan pairs.
PROSAC incurs more rotation errors and the RANSAC
results are not consistently good. The result shows that
the proposed algorithm obtains good and consistent
results for rotation estimation.
Fig. 1. Buddha dataset – dot product between actual
and estimated rotation quaternions.
Fig. 2. Buddha dataset – error in translation estimate.
The error in computation of 3D translation for the
same set of scans is plotted in Fig. 2. Fig. 2 indicates
that the proposed algorithm achieves comparatively
lower error in the estimation of 3D translation.
RANSAC and PROSAC results are much worse. LoSAC
and the proposed algorithm achieves comparable
results.
The time taken in the main loop is also plotted
for different algorithms in Fig. 3. From Fig. 3, it can
be clearly seen that the time taken by the proposed
algorithm is much lower compared to LoSAC or
RANSAC for all the scans. Only PROSAC runs faster
than the proposed algorithm and it should be noted that
the rotational and translational accuracy of the PROSAC
is much less compared to the proposed algorithm.
20
Image Anal Stereol 2016;35:15-28
Another set of experiments were conducted using
the overlapping scans of the Stanford Bunny dataset
and the results are given in Figs. 4–6. The percentage
of inliers in the Bunny dataset is much less as
compared to the Buddha dataset. However, the trend in
the results is similar to that of the Buddha dataset. The
proposed algorithm attains good 3D transformation
estimates compared to the other algorithms in lesser
time. In this dataset, the time taken by the proposed
algorithm is even lesser than that of PROSAC. This may
be attributed to the lower percentage of inliers in this
dataset. When the percentage of inliers is very low,
the proposed algorithm even outperforms PROSAC in
execution time. It shows better results than RANSAC
and PROSAC in terms of accuracy.
The same set of experiments was also conducted
using the scans of the Dragon dataset. The results
are given in Figs. 7–9. The inlier percentage of
correspondences in this dataset is also low for the
selected feature. The results are similar to that of
the Bunny dataset. Here also the proposed algorithm
outperforms the PROSAC and RANSAC algorithms in
terms of accuracy of the estimates. It is comparable
to LoSAC in the case of accuracy but performs better
in terms of execution time. The execution time of
the proposed algorithm is also comparable to that of
PROSAC and even outperforms it in some instances.
Scans of 3D models from the UWA Dataset
are also used for evaluating the proposed algorithm.
True transformations are not available and hence
the transformation validation score is reported. The
transformation validation score indicates the average
error per corresponding pair. The unit of this score
is in accordance with the unit of the real world
coordinates of the point cloud used. Fig. 10 shows
the transformation validation scores of the different
algorithms on the Chef dataset scan pairs. Since this
is not mentioned in the dataset, we have not indicated
it in our results. From the figure, we can see that
the proposed algorithm obtains the minimum error
compared to other algorithms. The time taken by the
proposed algorithm is much less than LoSAC and
RANSAC and slightly more than PROSAC, in most
cases, as in Fig. 11. For one instance for the scan pair
5, RANSAC is faster than the proposed algorithm, but
the estimation error for RANSAC is higher.
Fig. 3. Buddha dataset – time taken for RANSAC
iterations in milliseconds.
Fig. 4. Bunny dataset – dot product between actual and
computed rotation quaternions.
Fig. 5. Bunny dataset – error in translation estimate.
21
PANKAJ DS ET AL: 3D Registration
Fig. 6. Bunny dataset – time taken for RANSAC
iterations in milliseconds.
Fig. 7. Dragon dataset – dot product between actual
and computed rotation quaternions.
Fig. 8. Dragon dataset – error in translation estimate.
Fig. 9. Dragon dataset – time taken for RANSAC
iterations in milliseconds.
The number of inliers found out by the proposed
algorithm is the highest and is comparable to that of
LoSAC, as given by Fig. 12. The number of model
hypotheses tested by the algorithms is also reported
in Fig. 13 and the results indicate that this is lowest
for the proposed algorithm. The models tested inside
the local optimization stage are not included. However,
the numbers of models tested by LoSAC, without
including the inner models are also much higher than
that the proposed algorithm.
Similar experiments were conducted on the scan
pairs from two other models of the UWA dataset
– Chicken and Parasaurolophus models. The results
obtained are given in Figs. 14–21. The results on these
datasets also support the reasoning in the previous
cases. There is a case in the Chicken dataset where
the accuracy of the proposed approach is lower than
that of other algorithms and similar to that of PROSAC.
We attribute this to the lower percentage of inliers for
the scan pair and the small subset of correspondences
selected by guided sampling. In other cases, the
proposed algorithm performs consistently well.
The results of registration of some of the scan
pairs are shown in Figs. 22–23. The results obtained
by RANSAC and the proposed algorithm are shown
for comparison. The figures show that the registration
obtained by the proposed algorithm is very accurate
compared to RANSAC and its variants. This helps
in reducing the time taken by the fine registration
stage to achieve convergence. Accurate results help the
subsequent stages to converge to a precise solution.
The unregistered point clouds and the clouds after
initial registration by the proposed method are shown
in Figs. 24–29.
22
Image Anal Stereol 2016;35:15-28
Fig. 10. Chef dataset – transformation validation
score.
Fig. 11. Chef dataset – time taken for RANSAC
iterations in milliseconds.
Fig. 12. Chef dataset – number of inliers.
Fig. 13. Chef dataset – number of models tested.
Fig. 14. Chicken dataset – transformation validation
score.
Fig. 15. Chicken dataset – time taken in milliseconds.
23
PANKAJ DS ET AL: 3D Registration
Fig. 16. Chicken dataset – number of inliers.
Fig. 17. Chicken dataset – number of models tested.
Fig. 18. Parasaurolophus dataset – transformation
validation score.
Fig. 19. Parasaurolophus dataset – time taken in
milliseconds.
Fig. 20. Parasaurolophus dataset – number of inliers.
Fig. 21. Parasaurolophus dataset – number of models
tested.
24
Image Anal Stereol 2016;35:15-28
Fig. 22. Registration results of Parasaurolophus model
scan pair by RANSAC (a) and proposed method (b)
Fig. 23. Registration results of Buddha model scan pair
by RANSAC (a) and proposed method (b).
Fig. 24. Buddha dataset scan pair before registration
(a) and after initial registration (b).
Fig. 25. Bunny dataset scan pair before registration (a)
and after initial registration (b).
Fig. 26. Dragon dataset scan pair before registration
(a) and after initial registration (b).
Fig. 27. Chef dataset scan pair before registration (a)
and after initial registration (b).
25
PANKAJ DS ET AL: 3D Registration
Fig. 28. Chicken dataset scan pair before registration
(a) and after initial registration (b).
Fig. 29. Parasaurolophus dataset scan pair before
registration (a) and after initial registration (b).
DISCUSSION
In the proposed approach, the guided sampling
effectively utilizes the correspondence match scores
from the 3D feature matching. Considering the
probably best correspondence pairs in early iterations
helps in reducing the number of models tested and
thus results in less computational time. This is evident
from the various results discussed. The algorithms
which utilizes guided sampling, like PROSAC (Chum
and Matas, 2005) and the proposed algorithm, achieves
the best running time. However the results obtained
by PROSAC are less accurate compared to the other
algorithms. This can be attributed to the fact that
many possibly good correspondence pairs may not
be considered by PROSAC for model generation due
to the early termination. This drawback is rectified
with the help of a local optimization step in the
proposed algorithm. In the local optimization step,
the model generation samples are formed from the
set of inliers. This sample is also a non-minimal
sample, which helps in including many possibly good
correspondences for computing the model. Thus the
non-minimal samples formed from inliers results in the
generation of good models. In addition to this, since
varying thresholds are used for computing the model in
various iterations, a local refinement of the computed
transformation is carried out. This results in a more
accurate estimation of the transformation model. This
can be seen from the results obtained. The proposed
algorithm achieves the accuracy comparable to that
of LoSAC (Chum et al., 2003). LoSAC, however is
computationally more intensive which can be seen
from the running time which is more than the other
algorithms considered. Random sampling results in a
large number of models to be tested and this may
lead to many local optimization steps. This extra
computational burden is compensated by the guided
sampling stage in the proposed method. The most
likely correspondences evaluated early in the loop
leads to local optimization of good models initially.
This helps in reducing the number of models tested
and hence the running time. The stopping criterion
ensures that a good model (i.e formed from inliers)
is obtained by the algorithm and this is not a model
supported by random outliers. From the above results,
we can see that the proposed algorithm performs the
3D registration task with good accuracy as well as
computational time.
The proposed method uses the correspondences
found by 3D feature matching. The influence of
the feature selection process is in deciding the ratio
of inliers to outliers in the corresponding pairs.
So the influence of feature selection process can
be considered equivalent to the influence of the
percentage of inliers. The correctness of 3D feature
matching depends upon the geometric distinctiveness
of the point cloud surface. If the point cloud has less
geometrical features, the ratio of inliers to outliers
will be less in the found correspondences. A very low
inlier percentage will lead to more computational time
requirements by the proposed algorithm. In such cases,
other 3D features which make use of the colour or
texture information may be used. Also in the case
of point cloud pairs where the inlier percentage is
very high, the extra computational overhead by the
local optimization may be made optional by including
additional verification in the algorithm.
The proposed algorithm aims at the coarse
registration of point clouds and hence can match
26
Image Anal Stereol 2016;35:15-28
partially overlapping point clouds at arbitrary
locations. The convergence funnel using the Stanford
bunny model point cloud was examined. The model
point cloud was rotated and translated and then
registered with the original point cloud. Translations
(radially) along the x-z plane up to a distance of 6
times the height of the bunny and rotations about y
axis with 30 degree increments up to 360 degrees were
considered. The method achieved global convergence
in all cases. Hence the method can be considered to be
not limited by specific rotation or translation limits but
only by the distinctiveness of the features selected for
the input point clouds.
CONCLUSION
The pair-wise initial alignment of partially
overlapping 3D point clouds generally employs 3D
feature matching. On account of the possibility of
high percentage of outliers in 3D correspondences,
robust estimation techniques are often used. The
paper presents a novel approach for dealing with
the outliers in the set of correspondences for 3D
transformation estimation. The proposed algorithm
has been implemented for the 3D registration of six
different models from two datasets. The performance
of the algorithm has been compared with the popular
algorithm RANSAC and its variants and the results
indicate that the proposed algorithm outperforms
them in relative domains of time or accuracy. Since
fine alignment stage like ICP requires good initial
alignment, the results of the proposed algorithm leads
to faster and accurate ICP convergence compared to
other methods. The proposed method thus finds a
balance between accuracy in the estimate and time of
execution.
ACKNOWLEDGEMENT
The authors would like to thank the Stanford
University and the University of Western Australia
for providing the datasets freely available online. The
authors acknowledge the comments and suggestions
of the anonymous reviewers which helped improve
our submission. The work of the first author is
supported by a Doctoral Fellowship provided by the
Indian Institute of Space Science and Technology,
Department of Space, Government of India.
REFERENCES
Arun K S, Huang T S, Blostein S D (1987). Least-squares
fitting of two 3-D point sets. IEEE T Pattern Anal
5:698–700.
Besl P J, McKay ND (1992). Method for registration of 3-D
shapes. Proc SPIE 1611:586–606.
Campbell R J, Flynn P J (2001). A survey of free-
form object representation and recognition techniques.
Comput Vis Image Und 2:166–210.
Chen H, Bhanu B (2007). 3D free-form object recognition
in range images using local surface patches. Pattern
Recogn Lett 28(10):1252–62.
Chen Y, Medioni G (1992). Object modelling by registration
of multiple range images. Image Vision Comput
10(3):145–55.
Choi S, Kim T, Yu W (2009). Performance evaluation of
RANSAC family. Proc Brit Mach Vision Conf (BMVC
’09) 81.1–12
Chua C S, Jarvis R (1997). Point signatures: A new
representation for 3D object recognition. Int J Comput
Vision 25(1):63–85.
Chum O, Matas J (2005). Matching with PROSAC –
progressive sample consensus. P IEEE Comput Soc
Conf Comput Vision Pattern Recogn (CVPR 2005)
1:220–6.
Chum O, Matas J, Kittler J (2003). Locally optimized
ransac. In: Michaelis B, Krell G, eds. Pattern Recogn.
Lect Not Comput Sci 2781:236–43.
Dorai C, Wang G, Jain AK, Mercer C (1996). From images
to models: Automatic 3D object model construction
from multiple views. P 13th IEEE Conf Pattern Recogn
1:770–4.
Filipe S, Alexandre LA (2013). A comparative evaluation
of 3D keypoint detectors. Proc 9th Conf Telecomm
Conftele 145–8.
Fischler MA, Bolles RC (1981). Random sample consensus:
a paradigm for model fitting with applications to image
analysis and automated cartography. Commun ACM
24(6):381–95.
Friedman JH, Bentley JL, Finkel RA (1977). An algorithm
for finding best matches in logarithmic expected time.
ACM T Math Software 3(3):209–26.
Frome A, Huber D, Kolluri R, Bülow T, Malik J (2004).
Recognizing objects in range data using regional point
descriptors. In: Pajdla T, Matas J, eds. P ECCV 2004.
Lect Not Comput Sci 3023:224–37.
Gomes L, Pereira Bellon OR, Silva L (2014). 3D
reconstruction methods for digital preservation of
cultural heritage: A survey. Pattern Recogn Lett 50:3-
14.
Hough PV (1962). Method and means for recognizing
complex patterns. US Patent 3,069,654.
Johnson AE, Hebert M (1999). Using spin images for
efficient object recognition in cluttered 3D scenes. T
Pattern Anal 21(5):433–49.
27
PANKAJ DS ET AL: 3D Registration
Klasing K, Althoff D, Wollherr D, Buss M (2009).
Comparison of surface normal estimation methods for
range sensing applications. P IEEE Int Conf Robot
Autom (ICRA ’09) 3206–11.
Knopp J, Prasad M, Willems G, Timofte R, Van Gool L
(2010). Hough transform and 3D surf for robust three
dimensional classification. In: Daniilidis K, Maragos P,
Paragias N, eds. P ECCV 2010. Lect Not Comput Sci
6316:589–602.
Koenderink JJ, van Doorn AJ (1992). Surface shape and
curvature scales. Image Vision Comput 10(8):557–64.
Mian A, Bennamoun M, Owens R (2010). On the
repeatability and quality of keypoints for local feature-
based 3D object retrieval from cluttered scenes. Int J
Comput Vision 89(2-3):348–61.
Mian A, Bennamoun M, Owens R (2006). Three-
dimensional model-based object recognition and
segmentation in cluttered scenes. IEEE T Pattern Anal
28(10):1584–601.
Muja M, Lowe DG (2009). FLANN, fast library for
approximate nearest neighbors. P Int Conf Comput Vis
Theory (VISAPP’09).
Myatt DR, Torr PHS, Nasuto SJ, Bishop JM, Craddock R
(2002). NAPSAC: High noise, high dimensional robust
estimation-its in the bag. Proc 13th Brit Mach Vision
Conf (BMVC 2002) 2:458–67.
Pulli K (1999). Multiview registration for large data sets. P
IEEE 2nd Int Conf 3-D Digital Imaging Model 160–8.
Rousseeuw PJ (1984). Least median of squares regression. J
Am Stat Assoc 79(388):871–80.
Rousseeuw PJ, Leroy AM (2005). Robust regression and
outlier detection, vol. 589. John Wiley & Sons.
Rusinkiewicz S, Levoy M (2001). Efficient variants of
the ICP algorithm. P IEEE 3rd Int Conf 3-D Digital
Imaging Model 145–52.
Rusu RB, Blodow N, Marton ZC, Beetz M (2008). Aligning
point cloud views using persistent feature histograms.
P IEEE/RSJ Int Conf Intell Robot Syst (IROS 2008)
3384–91.
Rusu RB, Blodow N, Beetz M (2009). Fast point feature
histograms (FPFH) for 3D registration. P IEEE Int
Conference on Robotics and Automation 2009, 3212–
17.
Rusu RB, Cousins S (2011). 3D is here: Point cloud library
(PCL). P IEEE Int Conf Robot Autom (ICRA) 1–4.
Salti S, Tombari F, Di Stefano L (2011). A performance
evaluation of 3D keypoint detectors. P IEEE Int
Conf 3D Imaging Model Proces Visual Transmiss
(3DIMPVT) 236–43.
Stein F, Medioni G (1992). Structural indexing: Efficient 3-
D object recognition. IEEE T Pattern Anal 14(2):125–
45.
Tam GK, Cheng ZQ, Lai YK, Langbein FC, Liu Y, Marshall
D, Martin RR, Sun XF, Rosin PL (2013). Registration
of 3D point clouds and meshes: A survey from rigid to
non-rigid. IEEE T Vis Comput Gr 19(7):1199–217.
Tombari F, Salti S, Di Stefano L (2010). Unique
signatures of histograms for local surface description.
In: Daniilidis K, Maragos P, Paragias N, eds. P ECCV
2010. Lect Not Comput Sci 6313:356–69.
Torr PHS, Zisserman A (2000). MLESAC: A new
robust estimator with application to estimating image
geometry. Comput Vis Image Und 78(1):138–56.
Levoy M. (2005) The Stanford 3D Scanning Repository.
http://www-graphics.stanford.edu/data/3dscanrep
Umeyama S (1991). Least-squares estimation of
transformation parameters between two point patterns.
IEEE T Pattern Anal 4:376–80.
Wang H (2004). Robust statistics for computer vision: model
fitting, image segmentation and visual motion analysis.
PhD Thesis. Monash University.
Zhong Y (2009). Intrinsic shape signatures: A shape
descriptor for 3D object recognition. P IEEE 12th Int
Conf Comput Vision Worksh (ICCV Workshops) 689–
96.
28