Elektrotehniški vestnik 85(4): 197-205, 2018 Original scientific paper A radial keypoint detector Jasna Maver Department of Library and Information Science and Book Studies, Faculty of Arts, University of Ljubljana, Aškerčeva 2, 1000 Ljubljana, Slovenia E-mail: jasna.maver@ff.uni-lj.si Abstract. The work presents a radial keypoint detector, a new version of one of the self-similarity detectors described in [14]. Several improvements are proposed to make the detector more efficient. We extend the detected keypoints with a local region orientation allowing the keypoint invariance to the image plane rotation. The radial detector is compared with the DoG detector at the theoretical level. In the experimental evaluation, the radial detector is tested on full image sequences of the HPatches dataset. The radial detector gives competitive repeatability and matching scores compared to the DoG detector which is one of the most popular keypoint detectors. Key words: local feature detector, dependent effects model, keypoint detector evaluation Radialni detektor značilnih točk V delu predstavimo Radialni detektor, novo verzijo enega od detektorjev samopodobnih značilnih točk opisanih v delu [14]. Predlagamo več izboljšav, ki zagotovijo večjo učinkovitost detektorja. Detektirane značilne točke razširimo z orientacijo lokalne regije, kar omogoča invariantnost na rotačijo slike. Na teoretični ravni primerjamo Radialni detektor z detektorjem DoG. Radialni detektor eksperimentalno ovrednotimo na podatkovni bazi HPatčhes slikovnih sekvenč. Rezultati, ki jih dobimo za merjeno ponovljivost in ujemanje na podlagi opisnika značšilnih točšk, so konkurenčšni rezultatom, dobljenimi z detektorjem DoG, kije eden najbolj priljubljenih detektorjev značilnih točk. 1 Introduction A feature or keypoint detečtor is an algorithm whičh identifies salient ločations or keypoints in images. An obječt or sčene appearanče in images čhanges due to different čonditions when čapturing images, e.g. a čhange in illumination and a viewpoint or due to different image deformations, e.g. a lossy image čompression. We expečt that keypoints repeat in images of the same sčene. Different measures based on the image intensity in a ločal neighbourhood of a keypoint form different de-sčriptors. Intensity variations near the keypoints should make desčriptors easily distinguished. Many čomputer vision appličations, sučh as čamera čalibration, sčene rečonstručtion [21], [6], obječt or sčene rečognition [19], [15], [10], panorama čreation [7], [4], robot navigation [5], and obječt tračking rely on keypoint matčhing. Computer vision researchers have developed many different keypoint detečtors. An extensive survey is Received 19 June 2018 Accepted 24 August 2018 made by Tuytelaars and Mikolajczyk in [22]. The Kadir and Brady salient region detector [8] is inspired by the information theory. The saliency of the location corresponds to local complexity or unpredictability. SIFT [9] detects blob-like keypoints. They are local extrema in the scale-space obtained by image filtering with differences of Gaussians. SURF [3], based on the Hessian matrix, approximates the Gaussian filters with box filters which enables fast computation of saliency maps using integral images, thus speeding up the keypoint detection. The traditional Gaussian scale-space approach has its limitations since it blurs the noise and fine details to the same degree, reducing localization accuracy and distinctiveness. The KAZE detector [1] uses nonlinear diffusion filtering to encourage smoothing within a region instead of smoothing across boundaries. MSER or Maximally Stable Extremal Regions proposed by Matas et al. [13] detects regions where all pixels inside the region have either higher or lower intensity than those on its outer boundary. It belongs to a group of affine region detectors, where we can find also the Harris and Hessian affine [16] region detectors. SUSAN [20] is a simple and efficient corner detector. It computes the fraction of pixels within a neighbourhood, which have a similar intensity to the centre pixel. Corners can then be localized by thresholding this measure and selecting local minima. The FAST detector [18] builds on the SUSAN detector. FAST evaluates only 16 pixels on a circle of radius r = 3. If a set of the N contiguous pixels in the circle are all brighter or all darker than the intensity of the candidate pixel, then the candidate pixel is classified as a corner. The algorithm computes the decision tree by learning the distribution of the corner configuration 198 MAVER from a training set of a specific environment. Unlike FAST, the AGAST detector proposed by Mair et al. [12] uses a binary decision tree which is generic and does not have to be adapted to new environments. The self-similarity detectors, proposed in [14], exploit three complementary parts of the variance computed on a local region. The three proposed saliencies, i.e. the radial, tangential, and residual, are invariant to rotation, photometric shift, and the magnitude of the local region contrast, they are covariant with translation and scaling, and are robust to intra-class variations. The detectors did not get much popularity mainly due to a low computational speed, though a hardware solution would enable fast computation. The algorithm also does not provide the orientation of a local region. The computation of the radial saliency alone can be speeded-up and in this paper we explain the steps taken to achieve this goal. We also use additional techniques to improve the quality of the keypoint detection and extend the keypoints with a local region orientation. combination Imn I + am + Pn + Ymn, where 1 denotes the mean value of P, i.e., M-1N-1 I 1 MN Imn' =0 n=0 while am and pn denote deviations of the row and column means from I: a — — C - I ^m n Cm ' Pn ~M ^ Rn — I- (1) Interaction effects Ymn are therefore computed as: Ymn Irmn 1 am Pn • 2 Method The radial detector uses a effects which are one kind of effects of the dependent effects model (DEM). 2.1 Local region represented by DEM Circular local region P is transformed into rectangular patch P by means of polar raster sampling. Samples are taken at N angles,$„ = ^; n = 0,1,..., N — 1, and M circles with radii rm = m; m = 0,..., M — 1. The intensity value from location (rm,^n) in P is transformed to the m-th column and n-th row of P, where we denote it as Imn = I(m, n), as shown in Fig. 1. Rectangle P is decomposed by DEM in the Figure 1. Polar raster sampling of circular local region P. 2.2 Local region saliency The radial detector uses a-effects of DEM. The total variation of P represented by the total sum-of-squares M-1 N-1 VPmxn = x/ 53(Imn - IMxN)2 can be described as a sum of two component variations VPmxn = wpmxn + BPmxn . Here wpmxn tells how much variation is there within each of the columns of P, with columns being represented by the columns means, while BPmxn tells how much variation is there between the means of the columns of P. For a reader who is unfamiliar with this technique, Fig. 3 illustrates an example. When the variation within the columns of P is small, P can be well approximated by means of its columns. Remember that columns of P correspond to circles of P (see Fig. 2). Saliency is defined by the ratio: Sm B N Pm> VP (2) Pmxn with following way. Let Cm and Rn denote the sums of intensity values from the m-th column and n-th row N-1 Imn, and R n of P, respectively; i.e., Cm = J2„=o Imn. The dependent effects model represents P with three types of effects: radial or column effects am, tangential or row effects ,0n, and interaction effects Ymn, where m = 0,..., M — 1; n = 0,..., N — 1. Each element Imn of P can be written as the following linear M-1 BPMxn(M,N)— N m=0 The values of saliency (2) lie on an interval from 0 to 1. Value 0 represents the situations where all variation of P is due to the variation within the columns, while value 1 the situations where all variation is due to the variation between columns. 2 m RADIAL KEYPOINT DETECTOR 199 Ca) (b) (c) (d) Ce) Figure 2. Blob features obtained by the radial detector. The radial detector approximates local regions with the averages computed for circles. (a) Original image with resolution (66 x 72) pixels, (b) Green circles denote detected features obtained for the region sizes between 4 to 12 circles, (c) Reconstruction obtained with the feature approximations. The larger features are drawn before the smaller and the less salient before the more salient. (d),(e) Features and their approximations. 4 2 V4 I- 3<0 = 4 ¿8 « 4 + M V^lwj' + ie-M^+M)1- - n* H+H 4*4*°* 5. For larger keypoints, to speed up the computation, the up-sampled image is down-sampled two times. For each of the dawn-sampled image, the saliency maps are computed for eleven circles and the keypoint detection is applied for the maps with m > 6. Before the saliency maps are computed, the image is first smoothed with a Gaussian filter with small a. In this way, we acquire keypoints with radii from 2.25 pixels to 21 pixels. 3.2 Computation of sums on circles Transformation of a local region from the Cartesian to the polar representation (Fig. 1) is time consuming. Instead of addressing locations on circles to compute sums Cm(x, y) and ¿N=1 j (x, y); m = 0,..., M—1, they are computed by convolving an intensity image I(x,y) and image I(x, y).2 with convolution filters 200 MAVER Am; m = 0,..., M - 1. We have with Cm(x, y) = Am * I(x, y), N-1 1'mj(x,y) = Am *I•2(x,y); j=0 m = 0, ...,M — 1. Here * is the convolution operation in x and y and the operation .2 squares the image intensity at each pixel in the image. Filter Am(x, y) represents the m-th circle or 1-pixel wide annulus (see Fig. 4). The sum of weights on the circle is for each filter equal to value N. Filters Am; m = 0,..., M -1 are generated in a double nested loop (see Algorithm 1). ■VNc. H w W W Bfi Bfi ' - K H SB SB ffi Figure 4. Filters Am (x,y); m = 1 ,...,M — 1. The radial detector uses 11 circles to compute the saliency maps. The sum of the pixel weights is for each circle equal to N. The first filter A0(x, y) with rm = 0, not shown in the image, just multiplies the image intensity with N. Result: Filters A™; m = 1, M1 3( Ncol 1 Ic , £ Nr. + Nr. + 1max ) . Here Ic and Ir are the maximal intensity values of the c-th column and r-th row of the image, respectively, and Imax is the maximal value of the image. Ncoi and Nrow are the number of columns and rows of the image. Hence, only those local scale-space maxima obtained on the saliency maps of (3) that have the value of (4) above the threshold value are taken into account. The same applies to the case of sorting. The local scale-space maxima are sorted according to the values of (4). Fig. 5 shows both maps S(x,y, 5) and Bnor(x,y, 5). N = 720; for m=1 to M-1 do Am = zeros(2 * m + 1, 2 * m +1); c = m +1; for n=1 to N do r = m; x = c + round(r * cos(2 * n/N * (n — 1))); y = c + round(r * sin(2 * n/N * (n — 1))); Am(x, y) = Am (x, y) + 1; end end Algorithm 1: Generation of filters Am; m = 1,..., M — 1, shown in Fig. 4. 3.3 Selecting keypoints with a large contrast It is a common practice to apply a threshold on the saliency maps for a keypoint selection. Only the local scale-space maxima with a saliency value above the pre-specified threshold value are considered as detected keypoints. Another option is to sort the detected local scale-space maxima according to their saliency in a descending order and then take only the first n local scale-space maxima as keypoints. Usually, a larger saliency corresponds to a larger image contrast. In the case of the saliency (3), this is not true. To exclude the local extrema with a low contrast, which are more sensitive to the noise than the extrema with a large contrast, the radial detector applies a threshold on maps Bnor (x,y,m)= Em-1 a2(x,y) (4) ■N-12 Figure 5. Saliency maps. (a) Original image, (b) Saliency map S(x,y, 5) given by Eq. 3, (c) Saliency map Bnor(x,y, 5) given by Eq. 4. 3.4 Eliminating the edge responses The saliency measure (3) has a strong response also along the edges. These locations are unstable to small levels of the noise, therefore we would like to reject keypoints at these locations. Here we use the same technique as proposed in [9]. The two principal curvatures computed at the keypoint location and scale are considered: along the edge, the saliency will have a small principal curvature, while across the edge, the curvature is large, hence by checking for the ratio of the principal curvatures, the keypoints along the edges are discarded. 3.5 Orientation assignment The invariance to the image plane rotation is achieved by assigning an orientation to the keypoint. Here we follow the approach proposed in [9], however we use another type of the average image for gradient computation. A radius rm of the keypoint is used to select the average image, Im (x,y) = mN Em-1 C so that all computations are performed in a scale-invariant manner. For each image sample in the area of 3.5 x rm around the keypoint, the gradient magnitude, gmag(x, y), and orientation, 6(x,y), is computed using the pixel differ- r nor RADIAL KEYPOINT DETECTOR 201 ences: gx = Im(x + 1, y) - Im(x - 1, y) gy = Iro(x,y + 1) -Im(x,y - 1) gmag (x y) = ^g^ + g gK (x, y) = e (5) which determines weighting of locations in a local region. Let us divide weights of (5) into two parts; the circular centre and its surrounding. This can be accomplished by representing the centre with the two-dimensional unit height Gaussian function with a K-times smaller standard deviation g(x,y) = e x2 + y2 ' 2^2 (6) and the surrounding by the difference between the two unit-height Gaussian functions x2+y2 _ gs (x, y) = e 2^2k2 - e- x2 + y2 2^2 (7) Hence, ctk = K • a and gK(x,y) = g(x,y) + gs(x,y). Fig. 6 shows the proposed decomposition. Let G(x, y) and GK (x, y) denote two images obtained by convolving intensity image I(x, y) with filters (6) and (5), respectively: G(x,y) = g(x,y) *I(x,y) (8) 0(x,y) =tan-1(gy). gx An orientation histogram is then formed from the gradient orientations of the sample points. The orientation histogram has 36 bins covering the 360 degree range of orientations. Each sample added to the histogram is weighted by its gradient magnitude and by a Gaussian-weighted circular window with a a that is 1.5 times that of radius rm of the keypoint. The dominant orientation is determined by detecting the global maximum in the histogram. Since the radial detector detects locations with a radial and mirror symmetry, some local maxima in the histogram can also be high, close to the value of the global maximum. Therefore, any local maximum that is within 80% of the global maximum is used to create a keypoint with that orientation. For a better accuracy, the same approach is used as in [9] to estimate the position of the maxima. A parabola is fit to three histogram values closest to the global or selected local maximum to interpolate the position of the maximum. 4 A comparison of the DoG detector with the radial detector To compare the detector based on the difference between two Gaussian filters or DoG, which is a part of the SIFT algorithm [9], with the Radial detector, we start with a two-dimensional unit height Gaussian function Figure 6. Decomposition of unit-height Gaussian gK (x, y) into centre g(x,y) and surrounding gs(x,y). From left to right: gK(x, y), g(x, y), and gs (x, y) computed for a = 3 and K = V2. and (9) Gk(x, y) = gK(x, y) *I(x,y). 4.1 The DoG filter and a0 effect Weighting functions (6) and (7) determine two zones of a circular shape (see Fig. 6). Following the idea of DEM, (8) and (9), can be used to compute two a effects but this time with local region weighting as defined by (6) and (7). Let wK, w, and ws be the normalizing constants of the filters gK(x, y), g(x, y), and gs(x,y), respectively: c w f w 2 7^2 /w f w / gK(x, y)dxdy = 2na2K2, - w J —w /w /*w / g(x, y)dxdy = 2na2, -w —w and ws = wk — w = 2na2(K2 — 1). Seeing G(x,y) as the weighted centre and GK (x,y) — G(x, y) as the first annulus, two aG effects can be computed by using the same idea as represented by Eq. 1: «oG (x y) aiG (x y) = Gk (x, y) - G(x, y) G(x, y) Gk (x,y) wk ' Gk (x, y) WK (10) (11) G denotes that the region weighting is based on the unit height Gaussian function. We can notice that the a0G (x, y) effect given by Eq. 10 represents convolution of intensity image I(x; y) by the DoG filter: aoG (x, y) = -D(x, y) * I(x, y) with D(x,y) = K2g(x,y) - gK(x,y) wk Using Eqs. 10 and 11 and wK = K2 • w, the relation between effects alG (x, y) and a0G (x, y) can easily be derived: aiG (x,y) = k2 11 aoG (x,y). Hence, by knowing a0G (x, y), we know alG (x, y) as well. For K = a/2, we obtain a0G (x, y) = — alG (x, y). In this case, both regions, the centre and the surrounding, are equally important. Following the above, we can 2 2 x +y w s 202 MAVER conclude that the SIFT algorithm uses the a0G (x,y) effect as a saliency measure for the keypoint detection. SIFT uses only two as, therefore the local scale space maxima and minima represent locations with a dark centre and light surrounding or reverse. The larger is the contrast between the centre and the surrounding the larger is the saliency. The region size is increased by increasing parameter a. The saliency maps (3) used by the radial detector do not distinguish between the low and high contrast regions, therefore we introduce an additional measure (4). The radial detector enlarges the local region by adding a new circle to the current local region. The radial detector is slower to compute than SIFT. Computation of saliency map S(x,y,m) requires filtering of two images I(x,y) and I.2(x, y). Also, filters Am;m = 0,... ,M — 1 are not separable as are the Gaussian filters. 5 Experimental evaluation Figure 7. Examples of image pairs from full-image sequences in the HPatches dataset [2]: ijenis, i_dome, i_leuven, v_there, v_gmffiti, and v_sunseason. The radial detector is evaluated on the HPathes dataset [2] which contains 57 sequences with linear and nonlinear illumination changes and 59 sequences with viewpoint changes captured on different scenes. The HPathes dataset is built by including image sequences from several pre-existing datasets and by many new image sequences. Each image sequence consists of six images, with the first one being reference, and the remaining five containing different degrees of deformation in question. The geometric transformations with respect to the reference image are given in a form of homographies. Examples of the image pairs are shown in Fig. 7. In experimental testing we compare the radial with the DoG detector. DoG is a part of SIFT and it is the most often used keypoint detector. In the experimental evaluation we use the protocol suggested in [17]. The number of keypoints is limited to 3000. We compute two metrics: 1) Repeatability score It measures the ratio between the corresponding keypoints and the minimum number of keypoints visible in both images. 2) Matching score Each keypoint is represented by the SIFT descriptor. This allows us to find matches based on the keypoint descriptor. A match is the nearest neighbor in the descriptor space. If the match is also a geometric correspondence, then it is a correct match. The matching score is computed as the ratio between the number of correct matches and the minimum number of keypoints visible in the pair of images. Results for the illumination and viewpoint changes are shown in Fig. 8 and Fig. 9, respectively. For both types of changes, the radial detector outperforms the DoG detector, as both tested metrics, e.g. the repeatability and matching score, are higher for a large number of image sequences. The DoG detector is usually better for the image sequences with a substantial change in the scale. The averages, computed over all image sequences, are shown in Table. 1. 6 Conclusions The radial detector is a new version of one of the self-similarity detectors represented in [14]. The algorithm is upgraded with different techniques that speed up computation and eliminate the noise-sensitive keypoints. The keypoints are extended with a local region orientation. The experimental evaluation shows competitive results compared to the DoG detector, which is one of the most popular keypoint detectors. The radial detector is publicly available [11]. References [1] P. Alcantarilla, Fernández , A. Bartoli, and A. J. Davison, KAZE features, Proceedings of 12th European Conf. Computer Vision, pp. 214-227, 2012. [2] V. Balntas, K. Lenc, A. Vedaldi, K. Mikolajczyk, Hpatches: A benchmark and evaluation of handcrafted and learned local descriptors, Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2017. [3] H. Bay, A. Ess, T. Tuytelaars, L. V. Gool, Speeded-up robust features (SURF) Comput. Vis. Image Understanding, Vol. 110(3), pp. 346-359, 2008. [4] M. Brown, D. G. Lowe, Automatic panoramic image stitching using invariant features, International Journal of Computer Vision, Vol. 74 (1), pp. 59-73, 2007. [5] C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. Reid, J. J. Leonard, Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age, IEEE Trans. Robot., Vol. 32 (6), pp. 1309-1332,1916. [6] J. M. Frahm, P. Fite-Georgel, D. Gallup, T. Johnson, R. Raguram, C. Wu, Y. H. Jen, E. Dunn, B. Clipp, S. Lazebnik, M. Pollefeys, Building rome on a cloudless day, Proc. of 11th European Conf. Computer Vision, 2010, pp.368-381, 2010. [7] D. Ghosh, N. Kaabouch, A survey on image mosaicing techniques, Journal of Visual Communication and Image Representation, Vol. 34, pp. 1-11, 2016. [8] T. Kadir, M. Brady, Scale, Saliency and Image Description, International Journal of Computer Vision, Vol. 45 (2), pp. 83105, 2001. [9] D. Lowe, Distinctive image features from scale invariant keypoints, International Journal of Computer Vision, Vol. 60 (2), pp. 91-110, 2004. [10] S. Lowry, N. Sunderhauf, P. Newman, J. J. Leonard, D. Cox, P. Corke, M. J. Milford, Visual place recognition: A survey, IEEE Trans. Robot., Vol. 32 (1), pp. 1-19, 2016. RADIAL KEYPOINT DETECTOR 203 Figure 8. Illumination changes: the average values computed for repeatability scores, number of correspondences, matching scores, and number of correct matches obtained on the first 57 image sequences of the Hpatches dataset. Avg. Avg. num. Average Avg. num. Detector Repeatability Corresp. Matching Score Corr. Matches ILLUMINATION Radial 55.64 986 37.31 642 DoG 52.59 963 26.55 468 VIEW Radial 49.69 888 32.87 731 DoG 46.62 916 26.88 669 Table 1. Average values for repeatability, number of correspondences, matching score, and correct matches computed over all image sequences with illumination and viewpoint changes of the Hpatches dataset. 204 MAVER Figure 9. Viewpoint changes: the average values computed for repeatability scores, number of correspondences, matching scores, and number of correct matches obtained on the last 59 image sequences of the Hpatches dataset. [11] R. Mandeljc, AlphaGamma descriptor, . https://github. com/rokm/alphagamma-descriptor, 2017. [12] E. Mair, G. D. Hager, D. Burschka, M. Suppa, G. Hirzinger, Adaptive and Generic Corner Detection Based on the Accelerated Segment Test, Proc. 11th European Conf. Computer Vision, pp. 183-196, 2010. [13] J. Matas, O. Chum, M. Urban, T. Pajdla. Robust Wide Baseline Stereo from Maximally Stable Extremal Regions, Proceedings of the British Machine Vision Conference, pp 36.1-36.10, 2002. [14] J. Maver, Self-similarity and points of interest, IEEE Trans. Pattern Anal. Mach. Intell. Vol. 32 (7), pp. 1211-1226, 2010. [15] K. Mele, D. Suc, J. Maver, Local probabilistic descriptors for image categorisation, IET Computer Vision, Vol. 3 (1), pp. 823, 2009. [16] K. Mikolajczyk, C. Schmid, Scale and Affine invariant interest point detectors, International Journal of Computer Vision, Vol. 60(1), 63-86, 2004. [17] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, L. Van Gool, A comparison of affine region detectors. International Journal of Computer Vision, Vol. 65 (1/2), pp. 43-72, 2005. [18] E. Rosten, R. Porter, T. Drummond,Faster and Better: A Machine Learning Approach to Corner Detection, IEEE Trans. Pattern Anal. Mach. Intell., Vol. 32 (1), pp. 105-119, 2010. [19] K. van de Sande, T. Gevers, C. Snoek, Evaluating color descriptors for object and scene recognition, IEEE Trans. Pattern Anal. Mach. Intell., Vol. 32 (9), pp. 1582-1596, 2010. [20] S. Smith, J. Brady, SUSAN—A new approach to low level image processing, International Journal of Computer Vision, Vol. 23 (1), pp. 45-48, 1997. RADIAL KEYPOINT DETECTOR 205 [21] N. Snavely, I. Simon, M. Goesele, R. Szeliski, S. M. Seitz, Scene reconstruction and visualization from community photo collections, Proceedings of the IEEE, Vol. 98 (8), pp. 13701390, 2010. [22] T. Tuytelaars and K. Mikolajczyk, Local Invariant Feature Detectors: A Survey, Foundations and Trends in Computer Graphics and Vision, Vol. 3(3), pp. 177-280, 2008. Jasna Maver received her PhD degree in computer science from the University of Ljubljana in 1995. From 1990 to 1991, she was with the Department of Computer and Information Science, GRASP Laboratory, University of Pennsylvania, working on the problem of next view planning. She is an Associate Professor of computer and information science at the University of Ljubljana. She holds a position at the Department of Library and Information Science and Book Studies of the Faculty of Arts. She is also a research associate of the Computer Vision Laboratory and Visual Cognitive Systems Laboratory of the Faculty of Computer and Information Science. Her research interests include methods for keypoint detection and local region representation.