Image Anal Stereol 2010;29:173-180 Original Research Paper CONTENT-BASED AUTOFOCUSING IN AUTOMATED MICROSCOPY Peter Hamm1, Janina Schulz2 and Karl-Hans Englmeier1 1 Institute for Biological and Medical Imaging, Helmholtz Zentrum München German Research Center for Environmental Health, Ingolstadter Landstr. 1, D-85764 Neuherberg, Germany; 2Carl Zeiss Microimaging GmbH, Kistlerhofstr. 75, D-81379 Munich, Germany e-mail: peter.hamm@helmholtz-muenchen.de, schulz@zeiss.de, englmeier@helmholtz-muenchen.de (Accepted September 7, 2010) ABSTRACT Autofocusing is the fundamental step when it comes to image acquisition and analysis with automated microscopy devices. Despite all efforts that have been put into developing a reliable autofocus system, recent methods still lack robustness towards different microscope modes and distracting artefacts. This paper presents a novel automated focusing approach that is generally applicable to different microscope modes (bright-field, phase contrast, Differential Interference Contrast (DIC) and fluorescence microscopy). The main innovation consists in a Content-based focus search that makes use of a priori knowledge about the observed objects by employing local object features and Boosted Learning. Hence, this method turns away from common autofocus approaches that apply solely whole image frequency measurements to obtain the focus plane. Thus, it is possible to exclude artefacts from being brought into focus calculation as well as locating the in-focus layer of specific microscopic objects. Keywords: autofocus, classification, fluorescence microscopy, object detection, phase contrast. INTRODUCTION Being the most important imaging tool in biomedical research, microscopy takes a significant role in the research advances in proteomics, genomics, bio-chemistry, and molecular biology. The achievements in these fields are the driving force for continuous improvements in medical diagnostics and drug development as well as drug targeting. Hereby, the technical progression in the recent years, namely computer and imaging sensor technologies, provides the opportunity to automate microscopy tasks to a large degree. In this context, the task of automated focusing is a fundamental problem to be solved to enable High-Throughput and High-Content Screenings. The common basic approach to face the automated focus task is to look for the optical plane along the z-axis of the specimen that contains the highest image contrast. In order to obtain such contrast measurements previous works applied mainly image frequency calculations such as simple gradient filter (Santos eta^l.., 1997) or, more sophisticated, wavelet transform (Forster et al., 2004; Widjajaand Jutamulia, 1998) and discrete cosine transform (DCT) (Feng et al., 2007). Also statistical analysis via image variance (Groen et a^^., 1985) and autocorrelation (Vollath, 1987) calculation successfully proved to be working well to distinguish between out-of-focus and in-focus images. Additionally, improvements have been accomplished by selecting only areas of interest within an image and further enhancements on existing methods, e.g., modelling the focus curve (Brazdilova and Kozubek, 2009). Although numerous image-based autofocus methods have been proposed and some found their ways into commercial products, they all suffer from limited applicability when either the observation of specific objects is wanted or when objects in different optical layers need to be in focus. A patent pending method that handles these difficulties by assuming a completely new approach to the problem is described in this paper. The main idea is to refrain from using the hitherto existing whole image analysis and to take the actual image content into account. Our approach uses preceded class training to gain knowledge of the research object. With this a prioi^i information we are able to direct closer focus measurements only to regions of interest and, thus, exclude unwanted objects and artefacts from the observation. METHODS The herein presented novelty to approach the focussing problem is not due to the development of just another focus measurement function but to explicitly use the knowledge of the examined object to constrict the focus search solely to regions of interest. The general proceeding can be formulated as a task of three steps (see Fig. 1). In the first step, a captured image is scanned for regions holding strong object hypotheses while assuming that the microscope is in some initial position within the sample and the captured image does contain a most likely blurred state of the observed specimen. In the second step, focus measurements are processed only upon the detected regions along the z-axis to yield the respective in-focus layer. In the event of multiple objects being at different z-layers - as it is the case for thick and non-monolayer preparations - consequentially multiple layers will be given out as result. In the final third step, all found layers are closer verified to ensure that the focused images do factually hold the desired object. Thereby, layers are rejected that turn out to contain unwanted objects {e.g., artefacts) when surveyed in focused state. In order to conduct the above described steps, it is necessary to feed the focusing system with information on the subject to study. By means of classification tasks known from the field of pattern recognition - that is feature extraction from known sample images and subsequent classifier training - a decision function is built to locate potentially interesting image regions. For this, we apply a well-known object detection framework that makes use of Haar Features and boosted classifier learning (Viola and Jones, 2001). The important key elements of this system and its application to our issue is explained in the following section. Fig. 1. Schematic focus working steps. I. From a random z-position (Layer A) within the specimen, a layer is captured upon which object hypotheses are found. II: Focus search is Umited to the determined areas yielding one or multiple in-focus layer(s). Ill: Within the in-focus layer (Layer B) a more detailed classification is processed to verify the focused areas. FEATURE EXTRACTION AND CLASSIFIER TRAINING In terms of preparing for the classifier training an image set is needed. This sample set is generated as the microscopist is asked to manually mark representative samples in the in-focus layer. Hereby, the number of samples does not need to exceed a dozen to keep user interaction within reasonable limits. Though the selection has to cover the variety of the considered object category. The sample set is furthermore complemented by blurred and rotated instances to add defocused states and ensure rotation invariance which is not supported by the selected features that will be described in the following. In our experiment we use simple Gaussian filtering for blurring and linear interpolated image rotation transform that produces instances each with an angle offset of 30° while mirrored background padding inhibits negative border effects. The step of feeding the training set with samples is supposed to happen only once while it is possible to add samples in further observations to refine the classifier. The feature extraction uses the Haar wavelet set (Fig. 3) introduced by Papageorgiou (Papageorgiou etal, 1998) which has been successfully used for a Classification of Image Region Fig. 2. The user marks positive and negative areas in an initial process to provide a training image database for the boosted training with Haar Features. Fig. 3. Haar Wavelet Types and their exemplarily depicted positions and scales within an image. variety of detection tasks like face (Viola and Jones, 2001; Lienhart and Maydt, 2002), object and even for cell recognition (Smith et al, 2009). This choice of features have been made for two reasons: on one hand Haar features are generally applicable to any kind of probe and, since we have to consider unsharp representation of any object when operating on heavily blurred images, features of, e.g., morphological kind cannot be taken into account. On the other hand the detection routine with Haar features can be unbeatably fast when compared to for example scale-invariant keypoint features (Lowe, 2004). An extracted image feature corresponds to the response of convoluting a particular Haar filter with an image area. At this, each Haar filters does vary in type, position within an image and size as displayed in Fig. 3. The number of features N per Haar filter type can be computed from the size of the image with the width Wj and the height Hj and the initial size of the specific Filter (width Wiiaar and height /^laar) as follows: A. W'/ Hi N= —— * ■ Wilaar ^a \ Jk. ^aai + 1 (1) V The amount of possible Haar features is by far over-complete and of great number even for small images. Keeping Fig. 3 in sight and using the above equation, a rather small image (region) of 30 x 30 pixels {Wi = Hi = 30) would produce almost N = 105 000 features for only one Haar filter type with an initial size of 2 x 1 pixels (Wiiaar = 2; /^iaar = ^)- Only a few of these many Haar filters can contribute to classification and, moreover, a single Haar filter would obviously represent a weak decisionmaker. Thus, classifier training is needed to select those of the thousand features that discriminate best between the object classes. As for Viola and Jones' recognition approach (Viola and Jones, 2001), the method of choice is adaptive boosting (AdaBoost) (Freund and Schapire, 1995) in combination with cascading the classifiers as introduced by Viola and Jones (Viola and Jones, 2001). AdaBoost selects and concatenates the Haar filter by weighting training images according to the difficulty to be classified correctly. Thereby, each Haar filter (i.e., each possible classifier) that passes through the iterative training process has to focus on hard to classify samples. The training result is a combination of several weak decision-makers that together form a strong classifier. Additionally, trained classifiers are arranged into cascade of stages, in a way that each stage fulfils minimum detection and maximum false positive requirements. Stages are added subsequently until a specified overall classification performance is reached. This cascading of classifiers achieves a tremendous speedup during the detection process: a positive image has to pass all cascades to be classified as such. Any rejection by one classifier within any cascade will instantly reject the examined image (see Fig. 4) without passing the complete cascade. FINDING THE FOCUS LAYER In the routine of finding the focus we assume an initial position in z-direction (image I in Fig. 5) that lies within the upper and lower bounds of the specimen, though improbable representing the in-focus layer. Based on the afore trained classifier, the detection process scans the entire image by means of sliding a search window across the image and returning the regions with an object hypothesis (red sqares in image II of Fig. 5) if classified as such. Hereby, the minimum and maximum size of the search window is chosen accordingly to the size variation of the observed object class. Then, the system applies firstorder Gaussian derivative (Geusebroek et al, 2000) focus measurement with a sigma value of 1.5 solely to the areas of hypotheses along the z-direction in parallel. Fig. 4. Cascade if stages containing varying number of weak classifiers. Fig. 5. Focus detection of mitotic cells. As the first step of hypotheses generation is performed with a rather coarse detection, it assures a high rate of true positive detection while deliberately accepting a high rate of false positive identifications, too. In order to reject these false positives, all found regions are finally verified in their respective in-focus layer (green squares in image III of Fig. 5). This is achieved by a second classifier, which has been trained only on non-blurred samples. RESULTS The proposed method was tested on data sets captured with the inverted microscope Carl Zeiss Axio Imager.Zl equipped with an AxioCam HR3 and a motorised stage in z-direction. The autofocus method was implemented as macro plugin to work online within the Axio Vision software environment. Though, for better reproducibiliy of results and with fading fiuorophores in fiuorenscent samples in mind, the probes were captured as z-stackes and the autofocus routine was conducted upon these. All testing have been processed on a notebook with 1.80 GHz Intel Core2Duo and 3 GByte of RAM. The dimensions of the captured stacks have been 1388 by 1040 pixel per image with 16 bits per pixel and up to 75 layers per stack. The test examples contained several challenges regarding the task of autofocusing such as non-monolayer specimens and bright artefacts that usually bias conventional autofocus techniques. As starting point of observations of new specimen types, the user marks image regions within an in-focused sample image with respect to the class affiliation. Fig. 6. Focus curves obtained by applying the most common focus measurements to the sample shown in Fig. 7. We again make a point of having only as few as possible samples for the training input (maximum one to three dozens each class) to keep user interaction at minimum. Finally, we compared the results of the Content-based Autofocus system with the most known autofocus routines (Geusebroek etal, 2000; Sun etal, 2004) (see complete list in label of Figs. 11a and 6) found in literature. Furthermore, a biologist with profound knowledge in cell biology was conducted to give his expert opinion regarding the subjective focus position for every probe. The majority of evaluation tests was performed on CACO-2 preparations (Human colon adenocarcinoma) nuclear stained with Hoechst 33258 and showing cells in different stages of cellular division. Cell proliferation and cycle analysis of CACO-2 cell lines are widely used in clinical cancer research and pharmaceutical industry. In our test case with altogether 22 stacks, we trained the classifier to detect cells in metaphase and anaphase when the spindle apparatus is notably expressed. The z-stacks were captured with a step size of 30 |im since an oil immersion objective with a magnification of 63 x and a numerical aperture of 1.4 was used. The number of layers between the lowest and highest in-focus layer averaged at 2.53 76 |im) which points out the existence of more than only one focus layer. In fact, some stacks contained cells of interest in up to three different focal layers being spatially separated by 150 |im. All in all 16 positive and 34 negative samples plus their blurred and rotated instances have been used as sample set for the preceded classifier training. The results shown in Fig. 7a stand exemplary for our tests with the CACO-2 preparations which hold cells in different focal planes. While the mainly unimodal focus curves of the best-performing known methods Autocorrelation, First-Order Gaussian Derivatives and Normalized Variance (see Fig. 6) point to Layer #26, #25 and Layer #27, Content-based Autofocus approach is able to bring out the focus of all planes at z-postion 22, #24 and #26 respectively #27. From Fig. 7b it can also understandingly seen why common methods aim at the Layer around #26. Apart from the rightmost dividing cell, most of the non-mitotic cells are located at this layer and thus giving the greatest overall structural input to standard focus functions. While this example shows an excellent result, the current performance of the presented method can be summarised in this way: on one hand it is truly able to get desired objects into focus while excluding the vast majority of unwanted objects - which is confirmed by a sensitivity of 0.836 - on the other hand, the performance is yet limited by some False-Positive-Detections, too. (a) (b) Fig. 7. Focus evaluation of mitotic cells in three different focal places. (a) I^esult of the hy^pot^heses ge^erat^^on in a de^ocused la^^er; (b) Ident^ißcat^ion of focal planes at each R^OI (denoted w^t^h t^he respect^ive layer number) and evaluation that succesfully throws out t^wo hyppotheses cont^aining non-mit^ot^ic cells (red number^s). The displaced image sho^s la^er #26 ^n w^hich only the most^r^i^ht det^ect^ion is in focus. v^