Elektrotehniški vestnik 74(3): 107-112, 2007 Electrotechnical Review, Ljubljana, Slovenija An implementation of a two-layered SVM classifier in Condor Mira Trebar1, Nigel Steele2 1 University of Ljubljana, Faculty of Computer and Lnformation Science, Tržaška 25, Ljubljana, Slovenija 2 Department of Mathematical Sciences, Coventry University, Priory Street, Coventry, CV1 5FB, UK E-mail: mira.trebar@fri.uni-lj.si Abstract. Condor, as a high-throughput distributed computing system, is used in a two-layered Support Vector Machine (SVM) implementation of the classification problem. The complexity of the SVMs training algorithm increases with respect to the number of samples. The data are split into subsets and the solution described reduces the training time by optimizing the first layer SVMs separately on a cluster of computers. As a result, a smaller subset of support vectors from partial results is used to optimize the second layer SVM. For the experiments on a large data set (Forest data), the distributed implementation of two-layered SVMs in Condor shows a significant improvement of the training time by keeping or even improving the error performance of a single SVM classifier. Key words: Support Vector Machine, classification, distributed computing, Condor Izvedba dvo-nivojskega SVM klasifikatorja v sistemu Condor Povzetek. Predstavili bomo porazdeljen upravljalski sistem Condor, ki je uporabljen za dvonivojsko izvedbo modela SVM (Support Vector Machine) pri reševanju problemov klasifikacije. Kompleksnost optimizacijskega algoritma SVM zelo hitro narašča s številom učnih vzorcev. Učni vzorci so naključno razvrščeni v enako velike podmnožice, ki so uporabljene za porazdeljeno učenje prvega nivoja modela SVM v gruči računalnikov. Rezultat prvega nivoja je bistveno manjša množica učnih vzorcev, imenovanih 'support vectors', ki jo uporabimo za učenje drugega nivoja modela SVM. S poskusi dvonivojskega modela SVM in uporabo sistema Condor smo ugotovili, da lahko za obsežne množice vzorcev (Forest data) bistveno skrajšamo čas učenja in v primerjavi z enim modelom SVM zagotovimo enako natančnost klasifikacije ali jo celo izboljšamo. Ključne besede: SVM, razvrščanje, porazdeljeno računanje, Condor 1 Introduction For many research projects it is common to have problems that require days or weeks of computation on personal computer to solve. There exist different solutions and one of them is to use a cluster of computers, workstations or all available resources in Received 23 May 2006 Accepted 26 February 2007 an organization connected by a local-area network (LAN). This can be done by using the Condor software system which enables a High Throughput Computing (HTC) environment [1] which delivers large amounts of computer power over a short period of time. Condor can save time when a job must be run many different times with hundreds of different data sets and streamlines the scientist's tasks by allowing the submission of many jobs at the same time. In this way, tremendous amounts of computation can be done with very little intervention from the user [2]. Moreover, Condor allows users to take advantage of idle machines that they would not otherwise have access to. Condor also provides other important features for its users, because the source code does not have to be modified in any way. Condor is a specialized workload management system for computer-intensive jobs developed as the product of the Condor Research Project at the University of Wisconsin-Madison and is currently available as a free download from the Internet [3]. Support Vector Machines (SVMs) are presented as a machine learning method in classification and also regression problems [9]. However, they require the solution of a quadratic optimization problem which means that with larger data sets the complex- 108 Trebar, Steele ity of SVMs increases. The original Support Vector Machine (SVM) is a linear classifier [6] in the input or data space that is mapped into a higher dimensional feature space, resulting in a nonlinear classifier. As a result, SVMs have the property of encapsulating all the information using a small number of data called support vectors. In the last few years, several attempts to classify large data sets with various SVMs models have been published: the large quadratic programming (QP) problem is broken down into a series of smaller QP sub-problems [7], a parallel mixture of SVMs [4] and the SVMZi^ fast implementation algorithm [5]. All the experiments showed significant time improvement by using more expert SVMs trained on different subsets of data. We introduce here the parallel implementation of two-layered model of SVMs. The training data set is randomly partitioned into data subsets of size N/M (N is the number of input patterns and M is the number of subsets). The first layer consists of M Support Vector Machines (SVMs) optimized in parallel and as a result we obtain smaller sets of support vectors (SVs) to use them as a second layer support vector training set. Actually, the proposed approach is presented as a two-layered model of SVMs, but only the results of the second layer SVM are relevant for the classification of new input data. Experimental results described in the paper include traditional job execution on one machine as well as different ways of job processing using Condor's dedicated cluster. Comparisons of results made under different conditions of the cluster are given, along with the use of the Condor software. 2 Condor software system Condor is a specialized software system for computer intensive jobs which effectively utilizes the workstations, dedicated clusters of workstations and personal computers that communicate over a network. As a batch system, Condor provides a job queueing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management. The three powerful mechanisms used in Condor are: • The Classified Advertisement mechanism (Clas-sAd) which is an extremely flexible framework for matching resource requests (jobs) with resource offers (machines). • The Remote System Calls mechanism which assures to run any applications on any remote machine of a given architecture. • The Checkpointing mechanism transparently records a current state of the executing program in the checkpoint file in such a way that the program can be later restarted from that state. It also permits a job to migrate from one machine to another machine. Users submit their serial or parallel jobs to Condor and Condor places them into a queue, chooses when and where to run the jobs based upon a Classified Advertisement mechanism. While the jobs are running, Condor carefully monitors their progress, and ultimately informs the user upon completion. The use of the Remote System Calls offers an added benefit that a user submitting jobs to Condor does not need an account on the remote machine where it runs a job. When the job does a system call, for example to do an input or output function, the data is maintained on the machine where the job was submitted. The basic system of the Condor pool in Fig. 1 is comprised of a single machine which is a central manager, and an arbitrary number of other machines which can act as submit machines or as execution machines. There is only one central manager which is very important part in the pool and it should be reliable and is likely to be online all the time. If it crashes, no further matchmaking can be performed. Conceptually, the role of Condor is to match waiting Central Managef Conctor_CoieciCT CorxJoí_Negoi¡oiof Execution Machine Figure 1. Condor pool requests with available resources which is done by the match-making between available machines and submitted jobs. Jobs want to find machines upon which they can execute. The central manager acts as a collector of information and the negotiator between resources and resource requests and performs these services by two separate programs, named daemons (Condor-Collector, Condor-Negotiator). Machines have specific resources available, such as the platform and the amount of available memory. When submitting a job, a separate ClassAd is produced for each job and machine, listing all attributes. Con- Subrriit Machine Controlling Daemons I | Ccodor_ShciciQw Process ...... dor acts as a matchmaker between the jobs and the machines by pairing the ClassAd of a job with the ClassAd of a machine and it will assure that the requirements in both ClassAds are satisfied. Any machine in the pool, including the central manager, can be configured to execute Condor jobs and also as a machine from where jobs can be submitted. Here is important to mention that the resource requirements for a submit machine are actually much greater than the resource requirements for an execute machine. Both machines perform their services with controlling programs (Controlling Daemons) when they are communicating with the central manager. The execution machine running the users' job also performs the system call (CondorSysCalLLibrary) which is sent back to the submit machine where it is managed by Condor-Shadow Process running whenever the job was submitted. A submitted job may also prefer to execute on a machine with better floating point facilities, or it may prefer to execute on a specific set of machines. These preferences are also expressed in the ClassAd. Further, a machine owner has great control over which jobs are executed under what circumstances on the machine. The owner writes a configuration file that specifies both requirements and preferences for the jobs. Alternatively, any of these may be expressed as a preference, for example where the machine prefers the jobs of a select group, but will accept the jobs of others if there are no jobs from the select group. To illustrate simply how Condor works, we present here four steps needed to prepare and run the job: • Code preparation includes an application that runs in the background which is not able to do the interactive input and output. • Select the runtime environment, also called Condor universe (standard, Vanilla, ...). • Write submit description file with all the information about the job running, the files to use, how many times to run the job. • Submit the job with the condorsubmit command which takes as an argument the name of the file called a submit description file. For programs, running in a sequential and parallel way, where the input, output or execution of one or more programs is dependent on one or more programs, a directed acyclic graph (DAG) is used. The programs are nodes in the graph and are linked-up by the parallel or serial dependency. The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler and submits jobs to Condor in an order defined by DAG and returns the result. The DAG is described in an input file which includes all Condor submit description files with their execution dependencies and parameters. Each node in a DAG is a unique executable file and each has a unique Condor submit description file. The jobs used in DAG are submitted using the program condorsubmit-dag which takes as an argument the name of the input file. 3 Two-layered Support Vector Machine Support Vector Machines (SVMs) [9], [6] have been applied to many classification problems, generally yielding good performance and acceptable execution times for smaller data sets. The original SVM is a linear classifier where the input vectors are separated by a hyperplane, while the non-linear SVMs map the input vectors from the original space into a high-dimensional feature space, where they become linearly separable and there construct an optimal hyperplane [9]. The output function of nonlinear SVMs for classification problem with training data given as x G 5in, y G {+1, —1} is then defined as N y = aiViK(^ x¿) +b (*) i=1 where K(x,xi) is a kernel function, x is the input vector of a test example, y i, is a class label and x¿ is the input vector for the i-th training example, N is the number of training samples, a, = {«i,..., a/v} and b are the parameters of the model. To find the coefficients a¿, i = 1, ..., N, it is sufficient to solve the optimization problem in the dual space by finding the minimum of the objective function N ^ N N 2=1 2=1 j = 1 Because training vectors will often repeat, the soft margin optimal hyperplane is usually determined with the upper bound U on the parameters a¿, and the optimization from (2) is computed as subject to the constraints N QLiyi = 0 and 0