Image Anal Stereol 2016;35:39-52 doi:10.5566/ias.1369 Original Research Paper STEREO MATCHING ALGORITHM BASED ON ILLUMINATION CONTROL TO IMPROVE THE ACCURACY ROSTAM AFFENDI HAMZAHB,1,2, HAIDI IBRAHIM1 AND ANWAR HASNI ABU HASSAN1 1School of Electrical & Electronic Engineering, Engineering Campus, Universiti Sains Malaysia, 14300 Nibong Tebal, Penang, Malaysia.; 2Faculty of Engineering Technology, Universiti Teknikal Malaysia Melaka, 76100 Durian Tunggal, Melaka, Malaysia e-mail: rostamaffendi@utem.edu.my, haidi ibrahim@ieee.org, eeanwar@usm.my (Received July 14, 2015; revised December 11, 2015; accepted January 9, 2016) ABSTRACT This paper presents a new method of pixel based stereo matching algorithm using illumination control. The state of the art algorithm for absolute difference (AD) works fast, but only precise at low texture areas. Besides, it is sensitive to radiometric distortions (i.e., contrast or brightness) and discontinuity areas. To overcome the problem, this paper proposes an algorithm that utilizes an illumination control to enhance the image quality of absolute difference (AD) matching. Thus, pixel intensities at this step are more consistent, especially at the object boundaries. Then, the gradient difference value is added to empower the reduction of the radiometric errors. The gradient characteristics are known for its robustness with regard to the radiometric errors. The experimental results demonstrate that the proposed algorithm performs much better when using a standard benchmarking dataset from the Middlebury Stereo Vision dataset. The main contribution of this work is a reduction of discontinuity errors that leads to a significant enhancement on matching quality and accuracy of disparity maps. Keywords: computer vision, digital image processing, disparity map, gradient matching, pixel-based matching, stereo matching algorithm. INTRODUCTION In recent years there have been various progresses in the field of vision and image processing. These researches are useful for commercial and scientific purposes in many fields. One of the great advances pertaining to these aspects is the extraction of three dimensional (3D) information from two dimensional (2D) data. Many research groups have studied this discipline in depth, gaining mechanism for 3D illustration by using more than one camera. Trying to resemble the vision of human beings, the researchers have investigated the camera-based system inspired by human vision (Dominguez-Morales et al., 2011). Following this, the stereo vision is established. It uses two parallel digital cameras to acquire the depth of a scene. The benefits of using the stereo camera are high resolution, low price and the images can be used for other application as well (Humenberger et al., 2010). For robot applications such as people or scene recognition and robot navigation, the stereo vision camera is used. Schmid et al. (2013) used a stereo camera as the main sensor for autonomous navigation and obstacle avoidance on indoor and outdoor flying robot. The task of the sensor is to continuously track the state of the visualized environment in order to provide visual information of the path planning and decision modules of the robot. This is to adapt the movement of the robotic system in tandem with to the state variations appearing in the imaged scene. This sensor provides 3D data perception of the surrounding environment for on-board planning, frontier-based exploration and wall following. Another interesting function of stereo vision application is the visual support system for blind navigation. Fernandes et al. (2010) developed a prototype system to acquire a 3D structure representing the environment in front of the user. Then, the object recognition is executed. The results are converted and provided to the user via a 3D virtual sound for navigation. The loudness of the sound is proportional to the intensity of the pixel. The usage of the stereo camera is relatively suitable because it is a purely passive technology and does not affect the surrounding environment. In the field of stereo vision, stereo matching is a process of establishing the correspondence between one pair of the images. The matching process which is the search for corresponding predictions of the same scene point onto both camera planes has to be solved. The problems associated with developing image matching are the computational cost required to achieve the appropriate results (Humenberger et al., 2010). The result of the matching process is presented in a disparity map. This map provides the depth data which is important to 3D image reconstruction. The disparity map estimation comprises of finding the 39 HAMZAH RA ET AL: Stereo matching base on illumination control correspondence for each pixel pair from two images at a designated coordinates (i.e., reference image coordinates). According to Scharstein and Szeliski (2002), most of the stereo matching algorithms rely on four steps. The four steps are explained below: Step 1: Matching cost computation (i.e., matching process at each pixel from left and right images). Step 2: Cost aggregation (i.e., aggregate initial costs over support region). Step 3: Disparity optimization (i.e., select the disparity level that optimized the function). Step 4: Disparity refinement (i.e., post-processing to refine the final disparity map). The algorithm of disparity map development can be classified as either local or global methods. This classification depends on the technique in which the disparity map is calculated (Scharstein and Szeliski, 2002). Local methods determine the disparity using the correspondence between the gray colors, values or texture patterns within a given local support window. The local method applies a small number of pixels around the pixel of interest. These methods are also referred as window-based or area-based methods (Hamzah and Ibrahim, 2016). There are several approaches related to window-based such as fixed window (Yang et al., 2014), multiple window (Hirschmuller et al., 2002) and adaptive window (Lu et al., 2008). Such methods use only local information. Therefore, they have a low computational requirement and a short runtime. The disparity map is determined by selecting the smallest matching cost value from disparity candidates. This selective approach is commonly known as the winner take all (WTA) approach. Although the local method can produce disparity maps quickly, but the precision is low, especially in depth discontinuity regions. On the other hand, global optimization methods perceive the disparity assignment problem as a problem of minimizing a predefined global energy function. This method applies energy minimization in lines with the entire image itself. They are usually less sensitive to local individualities and tend to be computationally expensive. The measurement is taken from the global data and an additional smooth constraint for neighboring pixels (Wang et al., 2013a). The smooth constraint is utilized in order to preserve the disparity smoothness of the pixels of the same region while simultaneously refines the object boundaries. Numerous methods for solving the global energy minimization problem by using a graph from Markov random field (MRF), have been proposed. These methods can be categorized as either graph cut (GC) method (Wang et al., 2013b) or belief propagation method (Xiang et al., 2012). The GC method obtains the minimal energy solution by applying a minimum cut and max-flow algorithm to the energy flow structure extracted from the MRF graph. In contrast, the belief propagation method minimizes the energy function by iteratively passing the messages from the current node to neighboring nodes in the MRF graph. Global methods can obtain high accuracy, but unfortunately they have a high computational requirement. CONTRIBUTION In this work, the survey of the main challenge of existing methods is conducted based on the steps of algorithm development. By means of this survey (Hamzah and Ibrahim, 2016), the crucial part is to develop the preliminary of disparity map which contributes to the overall performance of disparity map accuracy. The preliminary of disparity map is produced at the matching cost computation process (i.e., Step 1). Based on this finding, this work proposes a new approach of matching cost function to increase the overall accuracy. The new proposed local method algorithm uses a combination of two different similarity measurements which consist of gradient- based and absolute difference (AD) approaches. The AD algorithm is imposed with an illumination control to enhance the quality of the preliminary disparity map. The aggregation step is implemented by using a guided filter as proposed by He et al. (2013). The implementation of post-processing steps use left- right cross consistency checking, fill-in the invalid pixels and a weighted bilateral filter. Based on the proposed algorithm, the accuracy of the disparity maps at low texture and discontinuity areas are improved with the implementation of illumination control. The computational complexity can also be reduced if compared to window-based approaches at the matching costs step (i.e., Step 1). The rest of this paper is organized as follows. The next section will provide a detailed discussion of previous works on local stereo matching algorithms. Then, the subsequent section will explain the structure of the proposed stereo matching algorithm. This is followed by the section for experimental arrangements and the results. The conclusion and future works from the proposed stereo matching algorithm are provided in the last section. STEREO VISION GEOMETRY Basically, stereo vision can be illustrated by using triangulation analysis. This analysis computes the 3D position of points in the images. It gives the disparity 40 Image Anal Stereol 2016;35:39-52 selection and depth estimation from the geometry of stereo setting as given by Eqs. 1,2. d = xl− xr , (1) z = b f d , (2) where d represents the disparity value while xl and xr represent the coordinates of left and right image planes respectively. The z is the depth of a target point, b is the baseline and f denotes the focal length of the stereo camera. The triangulation model from the representation of P on the image plane is shown by Fig. 1. Fig. 1: Two pinhole camera models (i.e., left (L) and right (R)) with parallel optical axes. RELATED WORKS ON STEREO MATCHING ALGORITHMS Recently, many local methods have been developed to obtain a disparity map. There are some similarities in cost matching function through the use of window-based techniques. They are the sum of absolute differences (SAD) (Tippetts et al., 2011), the sum of squared differences (SSD; Yang and Pollefeys, 2003) and the normalized cross correlation (NCC; Satoh, 2011). Observed from all of these works, the main problem of window based technique is that it normally assumes that the pixels within the support region have the same disparity. This is not certainly valid for pixels near depth discontinuities or edges. Hence, incorrect selection of the size and shape of the matching window leads to poor disparity value estimations. In the work by Gupta and Cho (2010), the adaptive binary window was used to build the matching cost. They formed a variable shape adaptive binary window by marking all positions into the support window as a one region. Their approach increases the mismatch pixels at the boundaries and textureless areas. This happens due to wrong pixel assumptions in window-based matching technique if there are boundaries or untextured areas detected. In the work by Hu et al. (2011), the virtual support window was introduced, which developed a more complex solution to resize the windows. Another complex solution is proposed by Einecke and Eggert (2013), who implemented a modification of normalized cross correlation (NCC) algorithm at the matching stage. This method produced the same problem as the implementation by Gupta and Cho (2010). The window-based matching approach tends to produce an incorrect matching approximations of the pixel value at every support regions calculated especially at the boundaries. Sharma et al. (2011) and Wang et al. (2014) introduced a method which used a feature based matching technique. Sharma et al. (2011) developed a matching technique using the scale invariant feature transform (SIFT) algorithm. They adapted SIFT algorithm by using self-organizing mapping to perform a more efficient feature matching. Wang et al. (2014) implemented feature-based matching through two different expansion phases. The first phase used image segmentation process to obtain the sparse points of matching. Then, the second phase used a regular seed-growing algorithm was used to produce a quasi-dense disparity maps. From their results, the feature-based matching accuracy was low because this approach only targeted the object features and less sensitive to occlusion and textureless areas. Another approaches for pixel-based matching technique were shown by the works of Samadi and Othman (2013), Jung et al. (2014) and Ma et al. (2013a). Samadi and Othman (2013) stated that the census based pixel matching was implemented with the reduction of bits on the comparison numbers. Their modifications increase the speed and accuracy which have been applied to the mobile robot navigation. Jung et al. (2014) used a census filter based on higher order matching cost with an assumption that the patch of all disparities is at the same values. Their work produce a better quality disparity map. Ma et al. (2013a) implemented census transform based on neighbourhood information. They have utilised more bits to represent the difference between the pixel and its neighbour. However, their results showed that the discontinuity regions still produce high errors. The essential property of the disparity maps that support high quality processing is the accuracy of the maps at the object boundaries and depth discontinuities. Generally, local methods do not achieve high accuracy in those regions due to inappropriate selection of window sizes. However, in previous years, at the cost aggregation stage, the adaptive support weight (ASW) approaches have gained consideration due to high quality disparity map approximation as shown by Yoon and Kweon 41 HAMZAH RA ET AL: Stereo matching base on illumination control (2006) and Yang et al. (2009). This approach delivers weighting between the support window-based on the intensity similarity. Therefore, during the aggregation process of each pixel, this approach marks the neighbouring pixels which have closer intensities as it will exert a greater weight (i.e., higher support). This method produces a constant weight sharing for smooth regions, thus increases robustness against noise which helps to preserve the edge features. Therefore, some approximations over ASW have been proposed in order to increase their effectiveness. The most well- organized approach that use ASW was proposed in (Yang et al., 2009). They applied a constant time complexity of bilateral filter in cost aggregation stage. Through their approach, a piecewise linear discretization of intensity levels was implemented to reach the weighted average within a support window. They produced good results but with high computational complexity. However, the support weight methods require independent calculations for each pixel in a neighbourhood that increases computational complexity and a huge amount of memory. Moreover, in the work of Kowalczuk et al. (2013), the two-stage adaptive support weights (ASW) algorithm was used, with a combination of window-based cost aggregation and iterative refinement technique. Their work produced high accuracy. However, the complexity of their work was high as it used ASW and iterative approaches in the same algorithm. ASW approach requires high computational complexity to adaptively determine the shapes or textures. Furthermore, the iterative refinement technique comprises multiple loops which involve complex programming technique to refine the mismatch pixels. In the work by Liu et al. (2012), the algorithm was implemented using a multi-scale Weber (MSW) and weighted linear regression bilateral filter. The MSW descriptor was utilized to combine raw matching cost in two layers of structures. The matching cost was aggregated by spatial moving discrete sampling and weighted least squares. Through this technique, the implementation of bilateral in aggregation stage managed to produce high complexity and resulted in slow processing time. Usually, disparity map refinement methods reach an effective refinement by manipulating the values near the neighbouring pixels of interest (Gu et al., 2008; Min et al., 2012). Two established local disparity refinement techniques are median filtering and Gaussian convolution. Median filter is able to get rid of small, isolated mismatch disparities through its edge preserving characteristic. This filter picks the middle value within the window pixels to obtain the final result of the center pixel. In the work by Michael et al. (2013), the median filter was used in refinement stage for real time applications. Ma et al. (2013b) used a constant time weighted technique at refinement stage. Their results demonstrated high accuracy in removing noise and errors with respect to the edge of depth map produced. Another approach uses a Gaussian convolution, which combines disparity estimation with those of its neighbours, according to weights defined by a Gaussian distribution. This method decreases the noise in the disparity map. Unfortunately, it decreases the amount of fine detail existing in the final disparity maps too. An effort was made by Vijayanagar et al. (2013) who imposed weights on the Gaussian filter, approximating the lost depth values by using the nearby good depth pixels as a guide to prevent the filtering across the object boundaries. MATERIAL AND METHODS THE PROPOSED STEREO MATCHING ALGORITHM Similar to standard local stereo matching approaches, the proposed development of an algorithm involves the steps as shown in Fig. 2. The matching cost computation uses the gradient and absolute difference values with an illumination control. The cost aggregation is implemented by using a guided filter. Then, the disparity optimization using winner take all (WTA) strategy. The WTA strategy absorbs the minimal aggregated corresponding values for each valid pixel. At this stage, the invalid pixels still occurred especially at the occlusion and untextured areas. The left-right (L-R) cross consistency checking process is applied to determine those invalid pixel areas. Then, the process of the invalid pixels fill- in takes place. In this work, the invalid pixels are replaced with a valid minimum pixel value detected by the previous process. The final stage consists of implementing the weighted bilateral filter to remove the remaining noise which usually occurs in the fill-in pixel processes. MATCHING COST COMPUTATION In this sub-section, the new proposed pixel- based matching stage uses a combination of gradient- based similarity measurement and absolute differences (AD) with an illumination control. To calculate the components of the gradient for each image, the gradient value of Gx and Gy masks are used as implemented in De-Maeztu et al. (2011) and von Gioi et al. (2012). Fundamentally, the gradient process 42 Image Anal Stereol 2016;35:39-52 Fig. 2: A stereo matching system. works in horizontal and vertical direction respectively as given by Eqs. 3,4. Gx = [ 1 0 −1 ] ∗ I , (3) Gy =  10 −1 ∗ I , (4) where I is the image and ∗ is the convolution operation. Using both gradient components, the magnitude m and phase ϕ are computed using Eqs. 5,6 respectively. m = √ G2x +G2y , (5) ϕ = arctan ( Gy Gx ) . (6) In the x-direction of gradient displacement, the gradient matching cost CG′(p,d) is given by Eq. 7. CG′(p,d) = |5x (ml(p))−5x(mr(p−d))| , (7) where the coordinates pixel of interest (x,y) are represented by p, and d is the disparity value, while ml and mr are the magnitude of grayscale image gradient operator that applied to the left and right respectively. The final value of the gradient difference CG(p,d) depends on the value which do not exceed the truncated value at τCG. This is given by Eq. 8; CG(p,d) = { τCG, if |CG′(p,d)|> τCG , |CG′(p,d)|, otherwise . (8) Meanwhile, the absolute difference algorithm AF is given by the Eq. 9 which relies on the intensity difference between two pixels at the left image Il and right image Ir. AF(p,d) = |β (Il(p)− Ir(p−d))| . (9) In this work, the illumination control β is imposed on the AF algorithm. This new control parameter will enhance the image features on the preliminary of the disparity map. The given Eq. 10 is the condition when the truncated value of τAD is applied as implemented by Yoon and Kweon (2006) in order to increase the robustness against the outliers. AD(p,d) = { τAD, if |AF(p,d)|> τAD , |AF(p,d)|, otherwise . (10) The final cost function M(p,d) at this stage is the combination of AD(p,d) and CG(p,d) as given by Eq. 11. M(p,d) = αAD(p,d)+(1−α)CG(p,d) , (11) where the α is added to balance the color and gradient terms as implemented by Yang et al. (2014). The value of α controls the sensitivity of radiometric differences. However, in this work, the weighted of α value is added to AD(p,d) instead of CG(p,d). COST AGGREGATION Cost aggregation is the most critical stage, which minimize the matching uncertainties. It produces the overall performance of the disparity maps for local methods. From cost matching step, the raw disparity values are vast and too sensitive to noise. In this work, the guided filter is chosen since it is designed to reduce the noise and preserve the edges (Yang et al., 2014). The guided means that the filter is using the selected guided imaging, (i.e., left or right grayscale image as a guide for the filtering process). The left image is selected in this work as a reference and guidance for the process of filtering. The filter kernel of the weighted guided filter is given by Eq. 12, Gp,q(I) = 1 |w|2 ∑q∈wk ( 1+ (Ip−µk)(Iq−µk) σ2k + ε ) . (12) I is the guidance grayscale image and p represents the (x,y) coordinates pixel of interest within a support window on the same image. q shows the neighbouring pixels in the support region with the size of (r× r). The σ and µ are the intensity values of variance and mean in a squared window of wk, which is centered at the pixel k on the guidance image. w is the number of pixels in a square window of wk. The ε value represents the control element for smoothness 43 HAMZAH RA ET AL: Stereo matching base on illumination control term.The aggregation cost volume CA(p,d) at this stage is given by Eq. 13, CA(p,d) = Gp,q(I)M(p,d) , (13) where Gp,q(I) is the weight of the guided filter and M(p,d) represents the filtering input image. The weight of the edge preserving factor is determined by the sum of neighbouring pixels of q in Eq. 12 within the support region of the guidance image. DISPARITY OPTIMIZATION To obtain the accurate disparity map, this work computes the final disparity by selecting the minimal aggregated corresponding value for each pixel using WTA strategy. The utilization of WTA strategy for local algorithms is able to reduce the computational complexity such as those implemented by Huang and Wang (2010); Zhang et al. (2011). However, through their findings, the disparity maps attained at this stage still consist of errors in the unmatched pixels or occluded regions. Given Eq. 14, the disparity associated with the minimum aggregated cost dp at each pixel is chosen. CA(p, d) means the cost aggregation volume acquired after the process of cost aggregation and D represents the set of all valid and allowed discrete disparity values, dp = argmin d∈D CA(p,d) . (14) POST PROCESSING The final stage of the proposed algorithm consists of the post processing step. This step comprises three sequential processes which are occlusion handling or invalid pixels detection, fill-in the invalid pixels and filtering. The occlusion or invalid areas are detected by the left-right (L-R) cross consistency checking process. The task of this process is to find out the invalid pixels in the disparity map. This process performs from left reference disparity map image coincides with the right reference disparity map. The result is the invalid and rejected pixels due to some flat regions and occluded areas of the scene. Next, the fill-in invalid pixels process takes place. In this current work, since the left image is a reference, the filling process starts with the right to the left valid pixel replacement. The invalid pixel is replaced with the nearest valid pixel value. The valid pixel must be located on the same scan line or at the starting scan line as shown by Eq. 15. d(x) = { d(x−1), if d(x) = 0 , d(x), otherwise , (15) where d(x) is a pixel with an intensity value and x represents the location of the pixel. However, this filling and replacing process will produce the unwanted streak artefacts in the disparity maps. To remove that noise, the weighted bilateral filter is utilized as given by Eq. 16. This filter is an edge- preserving filter and is able to improve the disparity map quality, WMBFp,q = exp ( −|p−q| 2 σ2s ) exp ( − |Ip− Iq|2 σ2c ) , (16) where p is a pixel needs to be denoised using the weight of the neighbouring pixel of q. σs represents a spatial adjustment parameter and σc corresponds to the color similarity parameter. The p− q refer to spatial Euclidean distance and |Ip− Iq| is the Euclidean distance in color space. This filter applies a higher weight to pixels that are spatially close and have a similar color according to the sigma adjustment (Tan and Monasse, 2014). The summation of histogram h is calculated from Eq. 17 where each value is weighted from Eq. 16, h(d) = ∑ q∈wp WMBFp,q , (17) produce the results of Eq. 18; d ∈ [dmin,dmax] , (18) where the wp is the window size with the radius of (r × r) at the centred pixel of the p. The dmin is the minimum disparity value and dmax denotes the maximum disparity value. The final disparity value is determined by the median value of h(d) from Eq. 19. d(p) = min { d | h(d)≥ 1 2 h(dmax) } . (19) RESULTS In this work, the experimental images and the results use a standard benchmarking dataset that have been widely used by researchers from the Middlebury Stereo Vision database (Scharstein and Szeliski, 2015). Fig. 3 shows the input images which consist of Tsukuba, Venus, Teddy and Cones with the left scene reference, ground truth images and the frame size. According to (Scharstein and Szeliski, 2015), the accuracy levels of an image are measured from bad pixels percentage in all pixels in non-occluded regions (nonocc), all pixels detected with valid pixels (all) and pixels in regions near depth discontinuities and occluded regions (disc). The experiments in this work are carried out on the platform of Window 8.1 on a desktop PC with a 3.2GHz processor and 8GB 44 Image Anal Stereol 2016;35:39-52 Fig. 3: A standard benchmarking datasets from the Middlebury. Fig. 4: 2D slices of cone images on a different value of β . memory. The parameters in this work are explained as follows: Step 1: The value of β is determined through the simulation of a test image (i.e., in this work, cones image is selected). As for the visualization of the illumination difference, Fig. 4 shows the slice of the cones image in two dimensional. In this work, the β is selected at about 0.1. The image produced at this value is more reliable in terms of its brightness output compared to the other images. The τCG equals to 0.00855 and τAD is 0.02155. The constant value of α is 0.18. Step 2: The selection of a filter at this step is based on the experimental analysis of several established methods or filters. They are guided filter (GF) (He et al., 2013), non-local (NL)(Yang, 2012b), adaptive support weight (ASW) (Hosni et al., 2013), segment tree (ST)(Mei et al., 2013), recursive bilateral filter (RBF) (Yang, 2012a), median (MD) and box (BX) filters. The results are shown in Table 1. Based on this experiments, the GF is selected due to the lowest average error it produces. Since the cost aggregation step is important for the local method algorithms, this paper presents an analysis of the relationship of ε and filter window size (r× r) on the GF. The range of ε is about 0 to 1 and the filter window size is from 7 to 11. Figs. 5-7 show the results on nonocc, all and disc. The average results of these analyses are shown in Fig. 8 with the lowest average error at ε equals to 0.0001 and window size size (9×9). 45 HAMZAH RA ET AL: Stereo matching base on illumination control (a) (7×7) (b) (9×9) (c) (11×11) Fig. 5: All pixels in non-occluded regions (nonocc) for different filter size. (a) (7×7) (b) (9×9) (c) (11×11) Fig. 6: All pixels detected with the valid pixels (all) for different filter size. (a) (7×7) (b) (9×9) (c) (11×11) Fig. 7: Pixels in regions near the depth discontinuities and occluded regions (disc) with different filter size. 46 Image Anal Stereol 2016;35:39-52 Fig. 8: The comparison of average results based on different filter size from all attributes on the tested parameters. Table 1: Comparison results with different methods or filters at Step 2. Algorithm/Filter GF ST NL RBF ASW MD BX Ave (%) 5.45 5.49 5.68 5.68 5.78 7.56 9.78 Step 3: The optimization stage in this work uses WTA strategy which is similar to other local method algorithms. The minimum value of disparity intensity is chosen within the support region from Step 2. Step 4: To make the algorithm robust against the image size, different test set (i.e, Tsukuba image) is used as test image to determine the weighted bilateral filter parameters. Other parameters in the previous steps remain unchanged. The test results are shown in Fig. 9. The lowest average of bad pixel value from the attributes (i.e., nonocc, all and disc) is chosen at σs equals to 17 and σc is 0.3 respectively. The window size of wp is 19. The quantitative results of every disparity map are submitted to the authoritative testing website (Scharstein and Szeliski, 2015). Table 2 shows the comparative results with and without β implementations while other parameters are fixed at the same values. The results demonstrate that the proposed algorithm with the illumination control performs much better. The proposed algorithm with β is able to reduce all of the attributes evaluated. The most significant reduction is at the discontinuity areas. This occurs due to the implementation of image enhancement especially at the depth of the object boundaries. The final results of this work are shown in Fig. 10. The figure includes the results (i.e., images) on the raw matching disparity at Step 1, before post processing at Step 3, the L-R cross checking process and the region of bad pixels images. The running time is estimated at 0.01 second for each image. DISCUSSION The performance of the proposed algorithm is compared with some other established local algorithms in (Scharstein and Szeliski, 2015) through Table 3. This table summarizes the quantitative performance in descending order of the overall performances and the best results are in bold for every attribute. These algorithms are taken for comparison due to their different approaches to the development and their complexities compared to the proposed algorithm. The proposed method is highly ranked among the local approaches and more accurate than the window-based matching cost algorithms such implemented in RealTimeABW (Gupta and Cho, 2010), VSW (Hu et al., 2011) and SNCC+AM (Einecke and Eggert, 2013). Moreover, the proposed algorithm outperforms recently published well-known census-based algorithm (i.e., Differential, Samadi and Othman, 2013, RINCensus, Ma et al., 2013a, and feature-based, i.e., TwoStep Wang et al., 2014) in most parameters evaluated. The proposed algorithm also outperforms two complex algorithms with iterative method and two layer structures, (i.e., RTAdaptWgt, Kowalczuk et al., 2013, and MSWLinRegr, Liu et al., 2012) which produces the best result in nonocc, all and disc regions for high texture of the cones image. Additionally, Fig. 11 shows that the proposed algorithm has the lowest average value of bad pixels percentage on the discontinuity errors. 47 HAMZAH RA ET AL: Stereo matching base on illumination control (a) wp = (17×17) (b) wp = (19×19) (c) wp = (21×21) Fig. 9: The results for Tsukuba image with different values of sigma space σs, sigma colour σc and window size wp. Fig. 10: The final results on the Middlebury dataset. The running time of the images is Tsukuba (0.005s), Venus (0.009s), Teddy (0.01s) and Cones (0.01s). Table 2: Results with and without β implementations. Algorithms (threshold=1) Tsukuba Venus Teddy Cones Ave (%) nonocc all disc nonocc all disc nonocc all disc nonocc all disc Proposed algorithm without β 1.81 2.06 7.96 0.38 0.56 4.16 7.56 13.1 17.7 3.88 9.37 10.6 6.60 Proposed algorithm with β 1.63 1.91 7.14 0.28 0.49 2.63 6.10 11.4 15.6 2.62 8.11 7.49 5.45 Total difference 0.18 0.15 0.82 0.10 0.07 1.53 1.46 1.70 2.10 1.26 1.26 3.11 1.15 To test the capability of the proposed algorithm, some other stereo images in Fig. 12 are tested with the proposed algorithm. In general, the results demonstrate that the proposed algorithm generates a good disparity mapping. It can be seen that for the Barn and Sawtooth images where the foreground objects are detached from the background with clear and precise contours and accurate disparity values in accordance with their depth order. For the complex scene objects (i.e., Reindeer, Books, Moebius, Art, Dolls stereo pairs), the disparity values of the layered objects are correctly reconstructed in accordance with their respective depths. The scene objects are situated at increasing depth and are assigned step by step according to the disparity values from near too far. The results testify that the proposed algorithm produces an accurate and smooth disparity maps with clear and detailed edge contours. 48 Image Anal Stereol 2016;35:39-52 Table 3: Performance comparison with established methods. Algorithms (threshold=1) Tsukuba Venus Teddy Cones Ave (%) nonocc all disc nonocc all disc nonocc all disc nonocc all disc Proposed algorithm 1.63 1.91 7.14 0.28 0.49 2.63 6.10 11.4 15.6 2.62 8.11 7.49 5.45 HCFilter, 2013 1.56 1.78 8.07 0.22 0.34 2.96 6.18 11.5 16.1 3.02 8.07 8.19 5.67 MSWLinRegr, 2012 1.46 1.72 7.89 0.57 0.92 6.71 6.11 11.0 15.6 3.12 8.76 8.52 6.04 RTAdaptWgt, 2013 1.45 1.99 7.59 0.40 0.81 3.38 7.65 13.3 16.2 3.48 9.34 8.81 6.20 VSW, 2011 1.62 1.88 6.96 0.47 0.81 3.40 8.67 13.3 18.0 3.37 8.85 8.12 6.29 SNCC+AM, 2013 3.21 3.57 13.6 0.22 0.45 3.00 6.41 10.4 17.7 3.11 8.61 9.27 6.63 TwoStep, 2014 2.91 3.68 13.3 0.27 0.45 2.63 7.42 12.6 18.0 4.09 10.1 10.3 7.14 RealTimeABW, 2010 1.26 1.67 6.83 0.33 0.65 3.56 10.7 18.3 23.3 4.81 12.6 10.7 7.90 Differential, 2013 4.74 6.77 19.4 1.69 2.62 20.4 8.29 10.1 23.2 4.25 10.3 12.2 10.3 RINCencus, 2014 4.78 6.00 14.4 1.11 1.76 7.91 9.76 17.3 26.1 8.09 16.2 17.6 10.9 Fig. 11: The average of bad pixels percentage on discontinuity regions for all images from Table 3. Fig. 12: Results on the disparity maps tested from other stereo images provided by Scharstein and Szeliski (2015). The performance of the proposed algorithm is also tested for adaptability to the real environment. The images from the KITTI dataset (Menze and Geiger, 2015) and the Universiti Sains Malaysia (USM) laboratory are taken into consideration. Fig. 13 shows the results of six continuous frames from the KITTI database. Every frame consists of three images which are the left image, ground truth and results of the proposed algorithm. It demonstrates that smooth disparity maps are generated from the proposed algorithm and is able to identify the vehicles and obstacles on each frame. The results on the USM images show smooth disparity maps are produced. Both of the above experiments show the good ability of the proposed method to reconstruct disparity maps from real environment. CONCLUSION AND FUTURE WORKS In this work, the accuracy of stereo matching algorithm is presented. With the standard qualitative benchmarking dataset tested, the proposed algorithm with the illumination control imposed at AD algorithm increases the robustness against the discontinuity regions and is able to reduce the errors as shown in Table 1. An average of error reduction with and without β implementation is 1.15%. This proves that this method is able to increase the accuracy and most significantly reduces the discontinuity errors. The proposed algorithm is also tested with other dataset which contain low and high texture images. The good results are produced as shown in Figs. 12-14. The aggregation of the matching cost uses a guided filter. The advantage of applying this filter is its edge-preserving property. The optimization stage implements a WTA strategy which uses a proper minimum value. In the last stage, the weighted bilateral filter in post-processing step is able to reduce the existing noise that smoothers the disparity maps. For future works, this method will be tested in a 49 HAMZAH RA ET AL: Stereo matching base on illumination control Fig. 13: Results on the disparity maps from the KITTI dataset. Fig. 14: Results on the disparity maps from the USM lab. standalone system (i.e., FPGA) and provide the energy consumption of the system to show the viability and its behaviors. ACKNOWLEDGEMENT This work was supported by Universiti Sains Malaysia’s Research University Individual (RUI) with Account no. 1001/PELECT/814169 and Universiti Teknikal Malaysia Melaka. REFERENCES De-Maeztu L, Villanueva A, Cabeza R (2011). Stereo matching using gradient similarity and locally adaptive support-weight. Pattern Recogn Lett 32:1643-51. Dominguez-Morales M, Cerezuela-Escudero E, Jimenez- Fernandez A, Paz-Vicente R, Font-Calvo JL, Inigo- Blasco P, et al. (2011). Image matching algorithms in stereo vision using address-event-representation: A theoretical study and evaluation of the different algorithms. In: Proc Int Conf Signal Process Multimedia Appl (SIGMAP), 2011 Jul 18-21; Seville, Spain. 1-6. Einecke N, Eggert J (2013). Anisotropic median filtering for stereo disparity map refinement. In: Proc Int Conf Comput Vision Theo Appl (VISAPP), 2013 Feb 21-24; Barcelona, Spain. 189-98. Fernandes H, Costa P, Filipe V, Hadjileontiadis L, Barroso J (2010). Stereo vision in blind navigation assistance. Proc World Automa Cong (WAC), 2010 Sep 19-23; Kobe, Japan. 1-6. Gu Z, Su X, Liu Y, Zhang Q (2008). Local stereo matching with adaptive support-weight, rank transform 50 Image Anal Stereol 2016;35:39-52 and disparity calibration. Pattern Recogn Lett 29:1230- 5. Gupta RK, Cho S (2010). Real-time stereo matching using adaptive binary window. In: Proc Int Symp 3D Data Process Visual Trans, 2010 May 17-20; Paris, France. 735-9. Hamzah RA, Ibrahim H (2016). Literature survey on stereo vision disparity map algorithms. J Sensors 2016:1-23. He K, Sun J, Tang X (2013). Guided image filtering. IEEE T Pattern Anal 35:1397-409. Hirschmuller H, Innocent PR, Garibaldi J (2002). Real-time correlation-based stereo vision with reduced border errors. Int J Comput Vision 47:229-46. Hosni A, Bleyer M, Gelautz M (2013). Secrets of adaptive support weight techniques for local stereo matching. Comput Vis Image Und 117:620-32. Hu W, Zhang K, Sun L, Li J, Li Y, Yang S (2011). Virtual support window for adaptive-weight stereo matching. In: Proc Visual Comm Image Process (VCIP), 2011 Nov 6-9; Tainan, Taiwan. 1-4. Huang H, Wang Q (2010). A region and feature-based matching algorithm for dynamic object recognition. In: Proc IEEE Int Conf Intel Comput Intel Syst, 2010 Oct 29-31; Xiamen, China. 735-9. Humenberger M, Engelke T, Kubinger W (2010). A census- based stereo vision algorithm using modified semi- global matching and plane fitting to improve matching quality. In: Proc Comput Vision Pattern Recogn Worksh (CVPRW), 2010 Jun 13-18; San Francisco, USA. 77- 84. Jung HY, Park H, Park IK, Lee KM, Lee SU (2014). Stereo reconstruction using high-order likelihoods. Comput Vis Image Und 125:223-36. Kowalczuk J, Psota ET, Perez LC (2013). Real-time stereo matching on CUDA using an iterative refinement method for adaptive support-weight correspondences. IEEE T Circ Syst Vid 23:94-104. Lin Y, Lu N, Lou X, Zou F, Yao Y, Du Z (2013). Matching cost filtering for dense stereo correspondence. Math Probl Eng 2013:1-11. Liu T, Dai X, Huo Z, Zhu X, Luo L (2012). A cost construction via MSW and linear regression for stereo matching. In: Proc Int Conf Pattern Recogn (ICPR), 2012 Nov 11-15; Tsukuba, Japan. 914-7. Lu J, Lafruit G, Catthoor F (2008). Anisotropic local high- confidence voting for accurate stereo correspondence. In: Proc Electr Imaging 2008. Int Soc Optics Photonics, 2008 Jan 27; San Jose, USA. 68120J. Ma L, Li J, Ma J, Zhang H (2013a). A modified census transform based on the neighborhood information for stereo matching algorithm. In: Proc 7th Int Conf Image Graphics (ICIG), 2013 Jul 26-28; Qingdao, China. 533- 8. Ma Z, He K, Wei Y, Sun J, Wu E (2013b). Constant time weighted median filtering for stereo matching and beyond. In: Proc IEEE Int Conf Comput Vision (ICCV), 2013 Dec 1-8; Sydney, Australia. 49-56. Mei X, Sun X, Dong W, Wang H, Zhang X (2013). Segment- tree based cost aggregation for stereo matching. In: Proc IEEE Conf Comput Vision Pattern Recogn (CVPR), 2013 Jun 23-28; Portland, USA. 313-20. Menze M, Geiger A (2015). Object scene flow for autonomous vehicles. In: Proc IEEE Conf Comput Vision Pattern Recogn (CVPR), 2015 Jun 7-12; Boston, USA. 3061-70. Michael M, Salmen J, Stallkamp J, Schlipsing M (2013). Real-time stereo vision: Optimizing semi- global matching. In: Proc IEEE Intel Vehicles Symp, 2013 Jun 23-26; Gold Coast, Australia. 1197-202. Min D, Lu J, Do MN (2012). Depth video enhancement based on weighted mode filtering. IEEE T Image Proc 21:1176-90. Samadi M, Othman MF (2013). A new fast and robust stereo matching algorithm for robotic systems. In: Proc Int Conf Comput Inform Tech, 2013 May 9-10; Bangkok, Thailand. 281-90. Satoh SI (2011). Simple low-dimensional features approximating NCC-based image matching. Pattern Recogn Lett 32:1902-11. Scharstein D, Szeliski R (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int J Comput Vision 47:7-42. Scharstein D, Szeliski R (2015). Middlebury stereo evaluation - Version 2. http://vision.middlebury.edu/ stereo/eval/references. Accessed: 2015 May 29. Schmid K, Tomic T, Ruess F, Hirschmuller H, Suppa M (2013). Stereo vision based indoor/outdoor navigation for flying robots. In: Proc IEEE/RSJ Int Conf Intel Robot Syst (IROS), 2013 Nov 3-7; Tokyo, Japan. 3955- 62. Sharma K, Jeong KY, Kim SG (2011). Vision based autonomous vehicle navigation with self-organizing map feature matching technique. In: Proc 11th Int Conf Control Autom Syst (ICCAS), 2011 Oct 26-29; Gyeonggi-do, South Korea. 946-9. Tan P, Monasse P (2014). Stereo disparity through cost aggregation with guided filter. Image Process On Line (IPOL) 4:252-75. Tippetts BJ, Lee DJ, Archibald JK, Lillywhite KD (2011). Dense disparity real-time stereo vision algorithm for resource-limited systems. IEEE T Circ Syst Vid 21:1547-55. 51 HAMZAH RA ET AL: Stereo matching base on illumination control Vijayanagar KR, Loghman M, Kim J (2013). Real-time refinement of kinect depth maps using multi-resolution anisotropic diffusion. Mobile Netw Appl 19:414-25. von Gioi RG, Jakubowicz J, Morel JM, Randall G (2012). LSD: a line segment detector. Image Process On Line (IPOL) 2:35-55. Wang L, Liu Z, Zhang Z (2014). Feature based stereo matching using two-step expansion. Math Probl Eng 2014:1-14. Wang HQ, Wu M, Zhang YB, Zhang L (2013b). Effective stereo matching using reliable points based graph cut. In: Proc Vis Comm Image Proc (VCIP), 2013 Nov 17- 20; Kuching, Malaysia. 1-6. Wang YC, Tung CP, Chung PC (2013a). Efficient disparity estimation using hierarchical bilateral disparity structure based graph cut algorithm with a foreground boundary refinement mechanism. IEEE T Circ Syst Vid 23:784-801. Xiang X, Zhang M, Li G, He Y, Pan Z (2012). Real-time stereo matching based on fast belief propagation. Mach Vision Appl 23:1219-27. Yang R, Pollefeys M (2003). Multi-resolution real-time stereo on commodity graphics hardware. In: Proc IEEE Conf Comput Vision Pattern Recogn (CVPR), 2003 Jun 18-20; Wisconsin, USA. 211-7. Yang Q (2012a). Recursive bilateral filtering. In: Proc 12th Eur Conf Comput Vision (ECCV), 2012 Oct 7-13; Florence, Italy. 399-413. Yang Q (2012b). A non-local cost aggregation method for stereo matching. In: Proc IEEE Conf Comput Vision Pattern Recogn (CVPR), 2012 Jun 16-21; Rhode Island, USA. 1402-9. Yang Q, Tan KH, Ahuja N (2009). Real-time O(1) bilateral filtering. In: Proc IEEE Conf Comput Vision Pattern Recogn (CVPR), 2009 Jun 20-25; Miami, USA. 557- 64. Yang Q, Ji P, Li D, Yao S, Zhang M (2014). Fast stereo matching using adaptive guided filtering. Image Vision Comput 32:202-11. Yoon KJ, Kweon IS (2006). Adaptive support-weight approach for correspondence search. IEEE T Pattern Anal 28:650-6. Zhang K, Lu J, Yang Q, Lafruit G, Lauwereins R, Van GL (2011). Real-time and accurate stereo: a scalable approach with bitwise fast voting on CUDA, rank transform and disparity calibration. IEEE T Circ Syst Vid 21:867-78. 52