Volume 38 Number 4 December 2014 ISSN 0350-5596 Informatica An International Journal of Computing and Informatics Editorial Boards Informatica is a journal primarily covering intelligent systems in the European computer science, informatics and cognitive community; scientific and educational as well as technical, commercial and industrial. Its basic aim is to enhance communications between different European structures on the basis of equal rights and international refereeing. It publishes scientific papers accepted by at least two referees outside the author's country. In addition, it contains information about conferences, opinions, critical examinations of existing publications and news. Finally, major practical achievements and innovations in the computer and information industry are presented through commercial publications as well as through independent evaluations. Editing and refereeing are distributed. Each editor from the Editorial Board can conduct the refereeing process by appointing two new referees or referees from the Board of Referees or Editorial Board. Referees should not be from the author's country. If new referees are appointed, their names will appear in the list of referees. Each paper bears the name of the editor who appointed the referees. Each editor can propose new members for the Editorial Board or referees. Editors and referees inactive for a longer period can be automatically replaced. Changes in the Editorial Board are confirmed by the Executive Editors. The coordination necessary is made through the Executive Editors who examine the reviews, sort the accepted articles and maintain appropriate international distribution. The Executive Board is appointed by the Society Informatika. Informatica is partially supported by the Slovenian Ministry of Higher Education, Science and Technology. Each author is guaranteed to receive the reviews of his article. When accepted, publication in Informatica is guaranteed in less than one year after the Executive Editors receive the corrected version of the article. Executive Editor - Editor in Chief Anton P. Železnikar Volariceva 8, Ljubljana, Slovenia s51em@lea.hamradio.si http://lea.hamradio.si/~s51em/ Executive Associate Editor - Managing Editor Matjaž Gams, Jožef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia Phone: +386 1 4773 900, Fax: +386 1 251 93 85 matjaz.gams@ijs.si http://dis.ijs.si/mezi/matjaz.html Executive Associate Editor - Deputy Managing Editor Mitja Luštrek, Jožef Stefan Institute mitja.lustrek@ijs.si Executive Associate Editor - Technical Editor Drago Torkar, Jožef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia Phone: +386 1 4773 900, Fax: +386 1 251 93 85 drago.torkar@ijs.si Contact Associate Editors Europe, Africa: Matjaz Gams N. and S. America: Shahram Rahimi Asia, Australia: Ling Feng Overview papers: Maria Ganzha Editorial Board Juan Carlos Augusto (Argentina) Vladimir Batagelj (Slovenia) Francesco Bergadano (Italy) Marco Botta (Italy) Pavel Brazdil (Portugal) Andrej Brodnik (Slovenia) Ivan Bruha (Canada) Wray Buntine (Finland) Zhihua Cui (China) Hubert L. Dreyfus (USA) Jozo Dujmovic (USA) Johann Eder (Austria) Ling Feng (China) Vladimir A. Fomichov (Russia) Maria Ganzha (Poland) Sumit Goyal (India) Marjan Gušev (Macedonia) N. Jaisankar (India) Dariusz Jacek Jaköbczak (Poland) Dimitris Kanellopoulos (Greece) Samee Ullah Khan (USA) Hiroaki Kitano (Japan) Igor Kononenko (Slovenia) Miroslav Kubat (USA) Ante Lauc (Croatia) Jadran Lenarcic (Slovenia) Shiguo Lian (China) Suzana Loskovska (Macedonia) Ramon L. de Mantaras (Spain) Natividad Martinez Madrid (Germany) Sando Martincic-Ipišic (Croatia) Angelo Montanari (Italy) Pavol Nävrat (Slovakia) Jerzy R. Nawrocki (Poland) Nadia Nedjah (Brasil) Franc Novak (Slovenia) Marcin Paprzycki (USA/Poland) Ivana Podnar Žarko (Croatia) Karl H. Pribram (USA) Luc De Raedt (Belgium) Shahram Rahimi (USA) Dejan Rakovic (Serbia) Jean Ramaekers (Belgium) Wilhelm Rossak (Germany) Ivan Rozman (Slovenia) Sugata Sanyal (India) Walter Schempp (Germany) Johannes Schwinn (Germany) Zhongzhi Shi (China) Oliviero Stock (Italy) Robert Trappl (Austria) Terry Winograd (USA) Stefan Wrobel (Germany) Konrad Wrona (France) Xindong Wu (USA) Yudong Zhang (China) Rushan Ziatdinov (Russia & Turkey) Editorial: "Michie-Turing" IS2014 Award Recipient: Janez Grad It is my honour and privilege to introduce the 2014 "Donald Michie and Alan Turing" prize recipient Janez Grad for Life Achievements in Slovenian Information Society (http://is.ijs.si/is_awards.html), aka "Michie-Turing" Award. The name of the prize was proposed by Stephen Muggleton when we were discussing Slovenian international information-society events. Following a long analysis with colleagues we have set the nomination procedure inspired by the procedure Nobel prizes are awarded. Of course, our award cannot compete with the best world-wide awards like the Turing award, the »Nobel prize« for computing. Nevertheless, we wanted to highlight and emphasise the contributions of Donald Michie and Alan Turing on the one side, and our achievements in the fields of information society, computing, and informatics on the other side. The tuning of the procedure and obtaining the approval from the descendants of Donald Michie and Alan Turing demanded quite some time. At this point I would like to express my gratitude to all colleagues involved. The name "Michie-Turing" Award was coined from Alan Turing, computer »Einstein«, and Donald Michie, Turing's contemporary, who often visited Slovenia and together with Ivan Bratko helped establishing Slovenian AI school. Both names are highly appreciated in our societies - for artificial intelligence SLAIS, for computer science and informatics Informatika, and ACM Slovenia, among others. Donald was a very welcome and frequent guest in Slovenia and I often remember him staying in a room five doors from mine. His room still bears the name »Donald Michie room« and mine »Alan Turing room«. We celebrated their names in a series of events, e.g. at the conference itself, with a special issue of Informatica Vol 37, no. 1, 2013. The editors' Introduction to the Special Issue on "100 Years of Alan Turing and 20 Years of SLAIS" was written by Dunja Mladenič, Stephen Muggleton, and Ivan Bratko. It consisted of the following contributions: C. Sammut: The Child Machine vs. the World Brain; M. Gams: Alan Turing, Turing Machines and Stronger; A. Bifet: Mining Big Data in Real Time; J. Gama: Data Stream Mining: the Bounded Rationality; D. Mladenič and M. Grobelnik: Automatic Text Analysis by Artificial Intelligence; N. Lavrač and P. Kralj Novak: Relational and Semantic Data Mining for Biomedical Research; I. Kononenko, E. Štrumbelj, Z. Bosnič, D. Pevec, M. Kukar, and M. Robnik Šikonja: Explanation and Reliability of Individual Predictions; M. Bohanec, M. Žnidaršič, V. Rajkovič, I. Bratko, and B. Zupan: DEX Methodology: Three Decades of Qualitative Multi-Attribute Modeling; J. Demšar and B. Zupan: Orange: Data Mining Fruitful and Fun - A Historical Perspective. In 2014, the Slovenian societies supporting Informatica decided to introduce the Michie-Turing prize recipient in a special editorial of this journal. The prize winner Janez Grad provided an overview of his early years in computing, titled: A Survey of Computer Usage in Education and Research at the University of Ljubljana, with Emphasis on RCC and the UCC Prof. Grad has been engaged in the process of the computer usage, computer science and informatics since 1960 when he started working at the (Nuclear) Institute »Jožef Stefan« (NIJS, later IJS), Ljubljana. At that time computer usage has first been introduced into processes of some Slovenian business companies, and education and research institutions, respectively, in particular the University of Ljubljana (ULj) and NIJS. In the following, the evolutionary processes of computer usage and the supporting activities in the above stated institutions are described. In the 1960', the main players in informatics were the INTERTRADE enterprise, Ljubljana, that played the main role in most of the business companies by supplying them with the IBM technology on one side, and on the other side ULj and NIJS, where NIJS was the main initiator and action performer. For the purpose of its research process, NIJS was first using the IBM 705 computer which was installed at the Federal Statistical Office in Belgrade, Yugoslavia (computer programmer at NIJS was Prof. Janez Grad). Later on in 1962, ULj together with NIJS, from here on both together named as ULj, transferred their computer usage to the ZUSE-Z-23 computer system being installed in Ljubljana with the help of the Composite Organization of Associated Labour ISKRA (COAL ISKRA), Kranj, which has started the licence production of this type of computers in cooperation with the German company ZUSE (responsible employees for the system running were Prof. Janez Grad and Mr. Cveto Trampuž). By means of the financial support from ISKRA, the researchers at ULj programmed mathematical algorithms to be used on the Z-23 computer and the licensed computers. The use of computers at ULj that started in 1961 expanded very rapidly in the following years. In addition to the use in research it has spread for educational purposes to all the faculties and also into the administration departments. All these activities demanded a new, more powerful computer system. But as the total amount of the university work on computers was too small for a justified and rational exploitation of a big computer system, and as ULj did not have the adequate financial means for purchasing a new computer on its own, ULj and in particular IJS gave the initiative for a joint big computer investment shared by ULj, the electronic equipment producers COAL ISKRA, Kranj and Slovenian government institutions, Ljubljana. And so a CDC 3300 computer of an appropriate configuration was installed in 1968. It was running in a batch mode with no terminals. The site of the installed CDC 3300 computer, named as »Republic Computer Centre« (RCC), was on the outskirts of Ljubljana, in Ljubljana -Stegne, a couple of kilometres away from the ULj and other founders' sites - which resulted in poor communications between the RCC and their users. Prof. Janez Grad was appointed as the head of RCC. Due to these facts, the managers of computer usage at ULj bought another smaller computer, IBM 1130, intended for the use of students, and installed it in the vicinity of the faculties at the Institute of Mathematics, Physics and Mechanics. Research work was processed partly on this computer as well. Meanwhile, the expansion of computer usage continued at ULj. Besides the usual batch processing, the need for computer usage in process control appeared at the Faculty for Mechanical Engineering and IJS, and hybrid computers were developed by the Faculty for Electrical Engineering. ULj bought four new small computers in order to meet these demands, one IBM 1130, one IBM System 7 and two CDC 1700 computers. In order to cope with all emerging activities and problems accompanied them, ULj formed a group of experts - »The professional committee for computerization«, and named Prof. Dr. Jernej Virant as its head. Shortly after that, in 1971 ULj founded also the »University Computer Centre« (UCC) and named Prof. Janez Grad as the head of UCC. In the years 1971 - 73 they together designed a plan for the university terminal system oriented towards the central computer system in RCC which was accomplished by the middle of 1975. In 1971, RCC decided to buy and operate a more powerful computer system CDC CYBER 70, which enabled multiprogramming and remote job entry. As (acting) directors of RCC were successively named Dr. Edo Pirkmajer, Mr. Cveto Trampuž and finaly Dr. Desan Justin. As the new computer system was too big for the workload of the RCC founders, they invited some additional companies and institutions to join as partners of the extended RCC. And quite few of them really have joined the RCC, for example Educational Community of Slovenia, Research Community of Slovenia, Republic Road Community, National Bank, Ljubljana Dairies, Building Enterprise Obnova, Forestry of Slovenia, etc. Due to ever increasing number of students of University of Ljubljana, which exceeded over 20.000, who needed a simultaneous computer usage UCC designed a timesharing and interactive type of work in 1981. The network embraced all the faculties and other high schools of ULj, and also the Republic Center for CAI (Computer Aided Instructions) at the Faculty for natural sciences and technology; the head was Prof. Dr. Aleksandra Kornhauser. This required a terminal system with over 100 display and/or teletype terminals and personal computers, respectively. As the CYBER computer at RCC did not quite suit these requirements, UCC decided to install and use a DEC system-10 with DELTA knot computers and KOPA 1000 and LA 34 terminals, respectively. The system has been further extended in the following years, and Mr. Franc Mandelc has been appointed as the head of UCC. ULj has started a new step in computer usage in years 1987-88, when DEC-10 has been exchanged by two VAX 8550 computers and a network of over 200 terminals and personal computers. The JUPAK system of Yugoslav Post Offices enabled the UCC computer system to be connected to other computers in Slovenia, Yugoslavia, and Europe. In this way a direct information exchange between ULj and the universities in Europe became possible through e-mail (BITNET, COSINE) and joint (common) data bases could be built up and used. Nowadays UCC has been serving as ULj information centre supplying faculties and other users the information via university IT network, while the faculties use their own computer equipment. In meantime, RCC has been transformed into a centre used by Slovenian companies for solving their application problems in their business processes. The IT development at Slovenian universities, research institutes and in Slovenian society on the whole, for instance in Slovenian Society Informatika, Ljubljana Economists Society, COAL ISKRA, ISKRA DELTA enterprise, INTERTRADE enterprise, etc., was generally accompanied by intensive endeavour in research, teaching and development activities, which all helped to introduce the computer science and informatics programmes into the study and research programmes of most of the high schools and secondary schools and business sphere as well. Some activities of the ULj and IJS researchers and teaching staff within the process of the computer system development and its usage at ULj and IJS: Some activities were realized within the frame of some professional associations while others were performed solely within ULj and IJS, respectively. For instance: Organization and realization of professional gatherings (conference, symposium, consultations) such as FCIP, IFIP CONGRESS 71, INFORMATICA, DSI, SOR, IS International Multiconferences (IJS, organizers Prof. Dr. Matjaž Gams, Prof. Dr. Vladislav Rajkovič, ^); Publishing of periodical professional journals such as: (1) INFORMATICA, An international Journal of Computing and Informatics (IJS, Executive Editors Dr. Anton P. Železnikar, Prof. Dr. Matjaž Gams); (2) The Applied INFORMATICS; etc. Publishing research papers, monographs, text books, other books (such as: (1) The Electronic Computers. Editor Franc Spiller-Muys, Iskra, 26 authors, Published on the occasion of the IFIP CONGRESS 71, Ljubljana; (2) Introduction into Computer Science. DZS, Dr. Ivan Bratko, Dr. Vladislav Rajkovič). Carrying through many research projects (such as: The Foundation of an Information Centre I - III. Dr. Janez Grad and 13 co-authors) and the computer network planning projects (such as: The University Computer Network. UCC, Chief editor Dr. Janez Grad, Executive Editor Mr. Franc Mandelc); Analysing, describing and publishing the professional profiles in the field of computer usage and informatics (Governmental department, presiding by Prof. Dr. Jernej Virant, 20 co-authors). (1) Business Informatics Dictionary. Prof. Dr. Ivan Turk and 37 co-workers); (2) Dictionary of Computing. CZ, technical-professional editor Dr. Matjaž Gams, 9 national co-ordinators and 80 co- workers. Matjaž Gams Managing Editor, Informatica Automatic Android-based Wireless Mesh Networks Paul Wong, Vijay Varikota, Duong Nguyen and Ahmed Abukmail School of Science and Computer Engineering University of Houston - Clear Lake, Houston, Tx 77058, USA Email: wongp7342@uhcl.edu, varikotav6521@uhcl.edu, nguyend4081@uhcl.edu, abukmail@uhcl.edu Keywords: Android, wireless mesh networks, Wi-Fi Direct, relay Received: August 14, 2014 Android devices are portable, equipped with powerful processors, and can act as both mesh routers and clients through Wi-Fi Direct, making them a suitable platform from which to deploy a wireless mesh network. To allow Android devices to automatically establish and participate in a wireless mesh network, we circumvent two key problems of the Android's Wi-Fi APIs: lack of direct support for ad-hoc networking /many-to-one Wi-Fi Direct connections, and Wi-Fi Direct WPS authentication limiting applications to ones which require intermittent user interaction. Povzetek: Opisana je metoda za izboljševanje vključevanja naprav Android v omrežja. 1 Introduction Devices running Android[1], such as smartphones and tablets, are ubiquitous, power efficient, and inexpensive. In addition, most Android devices come with a suite of sensors which, in a network, may be used to collect physical data simultaneously in different locations; this gives a wireless mesh network containing Android devices the potential to be immediately useful in a variety of applications, such as in a wireless sensor network [2, 3]. Android devices acting as soft access points can extend the coverage of the network while simultaneously functioning in other capacities (such as that of a sensor node), reducing the amount of infrastructure required for deployment of the network over a given area. While being comparable in size to many microntroller-based boards, the open source and popularity of Android devices have produced an enormous number of applications and algorithm implementations. Android devices are capable of executing complex algorithms directly on board, giving the Android mesh network facility for distributed computing. This paper demonstrates how wireless mesh networks can be constructed by Android devices automatically with Wi-Fi Direct [4] using a technique that establishes connections without requiring user interaction. By allowing these devices to connect to each other automatically, we make their use practical in a number of different applications in which they would not otherwise be, namely, ones which require them to be permanently or semi-permanently installed - as network infrastructure or as part of a larger stationary sensor network, for example. Used Android devices are easy to obtain and inexpensive; they will only become more so in the future, making them potentially cost efficient alternatives to more traditional types of infrastructure. In section 2, we review the concept of a wireless mesh network, acknowledge existing, similar approaches, and discuss Wi-Fi Direct on Android. In Section 3, we discuss the existing network implementations for Android. Sections 4, 5, and 6 outline our implementation in detail, describing both methodology and architecture. Section 7 shows the structure of the software, and section 8 discusses our testing and results. 2 Background 2.1 About Wireless Mesh Networks Wireless Mesh Networks are dynamic, self-organizing networks made up of devices which are either mesh clients or mesh routers. Wireless Mesh Networks are similar to ad-hoc wireless networks in structure, but differ in that for a Wireless Mesh Network, there is still a central gateway through which most network traffic will ultimately pass[5]. Wireless Mesh Networks are an efficient and cost-effective way of providing wireless network infrastructure, because of the ease and low cost of installation[5]. 2.2 About Android The key strengths of the Android platform in the context of automatically constructing Wireless Mesh Networks are: - Android is open. The source code for the operating system is freely available, and, as a mobile platform, applications may be developed and distributed for it without any significant limitation[6]. - An application which will run on one Android device will run on another irrespective of its architecture. - Android devices have a host of built-in sensors. Cameras, GPS, accelerometers, microphones, compasses, barometers, EM field, RGB light sensors, and gyros are all commonly available and programmer-accessible through the Android API. - Android devices are able to communicate over 2.5-4G networks, Wi-Fi a/b/g/n, and Bluetooth(g). - Android devices are power efficient, small, and portable. - Android devices are typically run by powerful, multi-core ARM processors, giving them the ability to perform complex tasks and computations on board. - The market for Android devices is directly consumer driven. Development of new hardware is rapid; models of perfectly serviceable devices are succeeded almost yearly, causing the prices of the older models to drop along with the demand for them, making them a prime inexpensive target for building wireless mesh networks. - Android applications can be configured to start automatically after device boot-up. 2.3 Wi-Fi Direct on Android Since Android 4.0, Wi-Fi Direct has been available on Android devices through the API's Wi-Fi P2P framework. A major problem with the implementation is that, if used in any of the ways shown in the documentation or sample code[7], many-to-one connections will not work; as of Android 4.4, attempting to connect to any third device will cause the existing connection to drop. We describe the method we have used to bypass this issue in section 4.2.1. 3 Existing mesh/ad-hoc network implementations for Android 3.1 A multilayer application for multi-hop messaging on Android devices Ryan Berti, Abinand Kishore, and Jay Huang of the University of Southern California developed software for the Android which can send data through a daisy-chain of phones linked through transient Wi-Fi Direct connections. Connections between devices are one to one, and each phone in the network acts as both a relay and a client. The application computes routes through the devices, and packs the route data in every message so that each relaying device knows where to send the data towards[8]. 3.2 The serval project The Serval Project aims to be a backup telecommunications system that can function outside a normal cellular network. The Serval Project's software can use Android's Wi-Fi in ad-hoc mode to establish its network, however, using Android's ad-hoc mode presently requires root permissions on the device (root permissions can only be acquired on a device by hacking it, which usually voids the warranty and may be illegal if the device is a contract phone). If root permissions are not available on a device, a wireless access point is needed for the device to communicate using the software[9]. 3.3 Open garden Open Garden is proprietary software for the Android, Windows, and Mac OS X that allows users to create and take part in wireless mesh networks to access the Internet. It does not require root access on a phone and can use Wi-Fi Direct or Bluetooth((( to establish connections[10]. 4 Establishing Android Wi-Fi Direct connections automatically 4.1 Motivation for connection automation Android devices such as smartphones, are designed to be user-driven devices. That is, the Android stack, which sits on top of Linux, is geared towards facilitating communication, productivity, and entertainment applications in which user interaction is central. It is in this user-centric context that Android's Wi-Fi Direct framework was created. The primary connection methods of the framework, and, indeed, the only documented methods for establishing peer-to-peer connectivity between Android devices are all user driven; they must have a user's input and confirmation in order to proceed with connection and authentication. While suitable for the application areas for which Android was primarily designed, the requirement of having periodic user interaction to drive connection processes limits the ways the devices can be used. Permanent installation of these devices, for example, would be impractical if a user or administrator had to periodically press a button on an authentication dialog to keep a network running. By allowing Android devices to connect to each other automatically and without any interaction, we eliminate this requirement of user supervision and make the devices practical in applications where they may be difficult to access and/or frequent and periodic user oversight cannot or should not be guaranteed. 4.2 Implementation In our implementation, a device in the mesh network is, at any given time, either a) a mesh router or b) a mesh client. Many clients may be connected to a single router simultaneously and can access the outside network through that router. If necessary to enlarge coverage, multiple routers may exist in the network. If two routers are outside of each others' ranges, a client that is within the range of both can act as a relay to ferry information from one router to the other. To accomplish this, the client acting as a relay temporarily disconnects from its original mesh router to broadcast and receive data to and from the other mesh router[5]. Unlike some implementations, ours does not need root access on any device. It can also run automatically and establish connections without user intervention, and works on any device running Android 4.0 or later. It is also lightweight enough to be used for communications infrastructure. The two main obstacles overcome by our implementation are: 1. As mentioned previously in section 2.3, as of Android 4.4 (API 19), the Wi-Fi Direct interface on Android may not be used directly to connect more than two devices at a time. 2. When two devices try to connect to each other using Wi-Fi Direct, manual, user-interactive authentication is required. A WPS confirmation dialog box will pop up on the screen during a connection attempt and must be answered by a user in order for the connection to be accepted. A device that has already connected once may be "remembered," which means that if it attempts to connect again no authentication dialog will be presented. However, remembered devices are forgotten after reboots. 4.2.1 Soft access point solution Though the Android Wi-Fi Direct implementation does not work properly for many-to-one connections (see section 2.3) it does, however, allow for the creation of a soft access point (AP) that one or more devices can connect to simultaneously if they know its SSID and preshared key. Since neither the SSID or preshared key of the access point can be set by the framework (both are automatically and randomly generated), they must somehow be given to the devices which are to connect to the soft access point. The way we have accomplished this is by packing the SSID and preshared key into a network service discovery broadcast, which can be sent wirelessly from the soft access point to other devices in range using the Android Wi-Fi Direct framework. The SSID and the preshared key are both encrypted with a separate key that all the devices know beforehand (an encryption key, unlike an access point preshared key, can be set programmatically). All devices within range of one of these soft access points (AP) will receive this network service discovery broadcast containing the AP's SSID and PSK (see figure 1). If a client device is within range of more than one soft access point, it will receive the AP information for all of them, and if it is to act as a relay, it can take turns connecting to each AP. Using this method, multiple clients can connect to a single device over Wi-Fi simultaneously (see figure 2); a second and very important benefit to this method is that the authentication process can be completely automated. 4.2.2 Security Soft access points created by Android's Wi-Fi Direct framework are WPA2 compliant, which means that all communications between the access point and authenticated clients are encrypted using AES-CCMP or TKIP[11]. The wireless AP password is encrypted inside the network service discovery broadcast with the javax.crypto.Cipher class using "AES/GCM/PKCS5PADDING", which is the AES algorithm in Galois/Counter mode with the cleartext padded into 8 byte blocks. The key used for encryption is stored on each device beforehand to enable them to create and join the p2p network without periodic user intervention. The reason why the preshared key of the soft access point is not just directly stored beforehand is because it is randomly generated and subject to change each time the soft AP is created, at the discretion of the framework and not the programmer; there is no way to change or modify it. If the preshared key was stored beforehand, then users would have to reconfigure all the devices whenever an Android device running a soft access point decided to change its preshared key. In other words, we cannot effectively "preshare" the preshared key because it changes at the whim of Android OS. By broadcasting the AP's most recent preshared key each time it is created, we ensure that clients can always connect to it without having to be manually reconfigured. By encrypting the key before broadcasting it, we ensure that only the clients who we want to be able to connect to the network will be able to; we can store the encryption key used to encrypt the AP keys on the devices beforehand, without having to worry about it changing. 4.3 Mesh router procedure 1. Create a soft AP using Wi-Fi Direct. This is performed by making a group using the createGroup method of the WifiP2pManager class. The newly created access point: (a) creates its SSID from its device name and a random string of letters. (b) is secured using WPA2-PSK; its passphrase is randomly generated and cannot currently be changed or set by the programmer. (c) runs a DHCP service on the device to assign local IP addresses to connecting client devices. This device is known as the "group owner" afterwards[12]. 2. Wait for the soft AP to finish starting up. The Android device will send a WIFI_P2P_-CONNECTION_CHANGED_ACTION intent when the AP is up. Associate a BroadcastReceiver with this intent to catch it. This BroadcastReceiver will be called by the operating system: (a) whenever a change in the Wi-Fi Direct connection is detected. (b) after createGroup finishes creating the access point. Receipt of this intent will act as a signal to proceed to the next step. 3. Pack the SSID, passphrase, MAC, and connectivity status into a string. 4. Encrypt the string. Use an encoding for this string that will not lose information when cast into Unicode, such as ISO-8859-1[13]. t Iwl wfrlmpcih 7-nfi PM Activity started. WfdNetManagerService instance created. WfdNetManagerService instance started. Listening for AP Broadcast. Successfully added AP Info Discovery Service. Device is connected to power. WifiLock has been acquired. Network state changed: Connected on mobileXLTE to AP Info Discovery succeeded. Client thread started. Attempting connection to 192.168.49.1 AP Info Broadcast received. SSID: DIRECT-Cy-PaursAvail2PSK: l6nZERFW Wifi connected successfully, sletwork state changed: Connected on mobileXLTE .jnnection attempt timed out. /.attempting connection to 192.168.49.1 Connected successfully as client on /192.168.49.185 Received:Welcome! Received:Welcome! Received:Welcome! Figure 1: A client device picking up an AP broadcast and connecting. 5. Create a WifiP2pDnsSdServiceInfo object and put the encrypted string into it. This can be done from inside the BroadcastReceiver. 6. Begin broadcasting the service information by calling WifiP2pManager's addLocalService method. 7. Start a netsock server on a separate thread and begin listening for clients attempting to connect. In our implementation, each connected client is given its own new thread for the sake of simplicity. Router Pseudocode: CreateAP() en_ssid ^ Encrypt(GetMySSID()) en_psk ^ Encrypt(GetMyPSK()) MakeNSDBroadcast(en_ssid, en_p.sk) server_socket ^ OpenServerSocketQ while program_is_running client_socket ^ BlockingAccept() StartNewClientIOThread(c1ient socket) 4.4 Client procedure 1. Make a callback object to receive network service discovery (NSD) broadcasts by creating a DnsSdServiceResponseListener. This will be called any time a Upnp broadcast is received. The DnsSdServiceResponseListener must: (a) Receive the broadcast data as a String (b) Unencrypt the received broadcast string (c) Parse out the SSID and passphrase from the unencrypted string (d) Store the parsed out information in an instance of a Class designed to hold it (e) Push that instance with the data into a list. 2. Start listening for network service discovery broadcasts by calling WifiP2pManager's setDnsS-dResponseListeners, addServiceRequest, and discoverServices methods to register, associate, then start the UpnpService discovery service with the DnsSdServiceResponseListener. 3. Begin a timer that will stop the NSD listening some time after the receipt of the first broadcast. This gives the network service discovery broadcastreceiver time to pick up other broadcasts, if there are any. 4. Decide which access point among those heard from is the best to connect to once the time expires. The "best" may be the first AP in the list that is connected to the outside network, the AP with the strongest radio signal, or the only AP available. AT&T m t ■^l wfdmesh ^ J i 7:06 Aclivily atartcd. WfdNclManagcrSL^rvicLMnatancL^ crt^dlL^d. WfdNclManagcrScrvicc inatdncc atartcd. Listening fur AP Broadcaat. Devi cc 13 cunncclcd tü fjüwcr. WifiLock has bcLMi acquired. Succcaafully added AP Infü Diacüvcry Scrvicc. Devi cc i 3 now a grüufj üwncr. AP Infü Diacüvcry auccLvdthd. Server now running and liatcning für cünnccliüna. AP Infü brüadcaaf afarfcd. Yüur AP ESID: DIRECT-Cy-Paul'a Avail 2 Vüur APPSK: l6nZERFW Client haa cünncctcd on /1D2.163.49.135 Client acntHi! I'm cünnthclthd. Client haa cünncctcd on/1D2.163.49.52 Client haa cünncctcd on/192.163.49.52 Client haa cünncctcd on /192.163.49.52 Client 3cnt:Hi! I'm cünncclcd. Figure 2: Server broadcasting AP info and allowing multiple clients to connect simultaneously. 5. Connect to the "best" AP using WifiManager. 6. Once the device has connected to the mesh router over Wi-Fi, connect to the router's netsock server and begin exchanging information. PerformlO(csocket) wait(t) DisconnectFromAP() if i < maxaps i ^ i + 1 else ^ ^ 0 4.5 The relay procedure Clients which are to act as relays between access points remember all the access points they heard from during the initial broadcast listening period. For now, they connect to all the access points in range in a round robin fashion, switching from one access point to the next after an arbitrary amount of time (see figure 3); our future work will explore more sophisticated routing schemes, such as demand and event-based relaying. 1. After connecting to the first access point in its list and receiving data, the relay client will disconnect from the AP. 2. After disconnecting, the relay client will connect to the next AP in its list. 3. While connected to this AP, the relay client will broadcast any information it knows that it needs to forward. The mesh router will have timestamped all messages it has queued as outgoing, and will check these times-tamps against the last time this client disconnected. Messages which are more recent than this time will be sent to the client. 4. After all data is transmitted, upon instruction from its access point, or after an arbitrary amount of time, the client relay will disconnect from this access point device and connect to the next one (which is the first access point in its list if it has reached the end of the list). Client Pseudocode: x ^ 0 while listening_for_NSD_broadcasts if received_new_broadcast AP[x].SSID ^ Decrypt(Broadcast.SSID) AP[x].PSK ^ Decrypt(Broadcast.PSK) x ^ x + 1 'maxaps ^ x i ^ 1 t ^ arbitrary time interval server_ip_address ^ "192.168.49.1" while program_is_running ConnectToAP(AP [i]) csocket ^ ConnectSock(server_ip_address) 5 Broadcasting a message 1. The message to be sent is prefixed with a universally unique identifier (UUID) used to identify it, and the MAC address of its origin. The UUID is randomly generated when the message is created and is immutable. 2. If a router or a node receives a message with a UUID that it has already received recently, the message is ignored and discarded, not to be passed on. An arbitrary number of UUIDs may be stored in a circular buffer from which to check against. The ideal size of the buffer will depend on the size of the network and the frequency of data exchange. 3. If the message is new, process the information, append the device's MAC to the header, then send the message out. Figure 3: Relay procedure. 5.1 Message structure The message may contain either data or commands, and may be, as described in the previous sections, broadcast to all devices or sent to a specific device only. The parts of the message are (see figure 4): 1. Header: (a) UUID: Uniquely identifies a message. Useful in determining whether or not a message has been received before. (b) Message Type Data: Indicates whether the message contains one of the following: - data intended for a single recipient - data intended for all devices on the network - a command intended for a single recipient - a command intended for all devices on the network (c) MAC address list: Used to construct route data and to determine the directions the message is to be passed in (Note: the first two characters of a MAC address may change for the same device and same network interface, and should be ignored.) 2. Data: The payload of the message. May be anything from a command to raw binary data. 5.2 Sending a message intended for a single recipient If the network is to send messages meant for particular devices, the steps for broadcasting a message may be used to construct a graph of the network. After step 5, you would check to see if the recipient of the message can connect to any devices other than the mesh router to which it is already connected. If it cannot, the message has reached a "leaf" node, and sends the data back up to the source of the message. The header will contain, in order, the MAC addresses of each mesh router and mesh client which the message has passed over on its way to the leaf. Depending on the needs of the network, it may be simpler to broadcast all messages, regardless of whether they are intended for a single recipient. The devices can all check to see if the message was intended for them, and ignore its contents if it was not (though still passing it along as needed). In this fashion, the message will eventually reach the device it is supposed to, though it may not take the shortest path. Figure 4: Internal structure of a message. 6 Starting the software on device boot An application can be configured to start after the Android OS has completed booting. Permission for android.permission.RECEIVE_BOOT_COMPLETED must be requested in the application's manifest file, and a receiver for the android.intent.action.BOOT_COMPLETED intent must be declared. The OS will start the application after it is completed booting, then send the specified receiver this intent, after which the receiver can start up the main activity. Also, the lock screen can be disabled manually in the Security section of the device's settings; by default, it will be shown as soon as the device powers on. 7 The structure of the software Our software can be installed on any Wi-Fi capable device running Android 4.0 or later, and can be divided into three main parts (See figure 5): 1. The main activity, which provides a console for on-device logging and debugging, as well as basic UI functionality 2. The network service, which either runs the mesh router service or the mesh client service 3. The BOOT_COMPLETED broadcast receiver, which is registered with the system from the application manifest, and the starting point of the program if started by the system Figure 5: The overall architecture of the software. 7.1 Main activity / boot receiver The first time the program is run on a device, the main activity will ask whether the device is to be a mesh client or a mesh router via a dialog box. This only needs to be done once, as the choice will be saved to file and remembered the next time the program starts. The program can be manually set to either router or client mode at any time from an action bar menu. After being configured for the first time, the program will run automatically on device boot through its Boot Receiver, which is called by the operating system. 7.2 Network service The network service started by the main activity is either the mesh router service or the mesh client service, both of which inherit from a common "network service" superclass and present an identical I/O interface, which can be used to query the service if it can talk to any other device, as well as send and listen for messages. The services themselves are divided into two modules: 1. the Wi-Fi Direct and Wi-Fi module, which is responsible for a) advertising and listening for network service discovery broadcasts containing access point information, and b) establishing and maintaining Wi-Fi connections. 2. the netsock module, which is a) started by the previous module after a Wi-Fi connection has been established, b) transfers data over TCP sockets, and c) is multi-threaded for efficiency, and to keep the main activity thread from being killed by the operating system. 7.3 Attaching other components Our software provides network communications, and may be either used as part of other software, or potentially run on its own and used through Android's IPC mechanisms. Messages may arrive asynchronously depending on a device's location in the mesh network; this must be taken into account by the code to use this service. 8 Testing and results Our build environment was Eclipse with the Android Developer Tools plugin running on both Windows and Linux. The devices we deployed our software on included a pair of twin Samsung Galaxy Nexuses, the Motorola Moto, and the Nexus 7, which used Android versions 4.3, 4.4.2, and 4.4.3, respectively. All of the phones were running either OEM versions of Android, or locked, contract-phone versions of it. The software is written entirely in Java using the Google Android APIs and does not require any of the devices to be rooted. During testing, we were able to establish the network on up to four devices, with one acting as a soft AP tethered to the Internet, another acting as a soft AP with no direct Internet connection, a client acting as a relay between the two, and a client persistently connected to the same soft AP. We were also able to configure the devices so that three of them acted as soft APs, while the last worked as a relay between all of them. In both configurations, the phones were able to construct the network automatically after being turned on. One of the challenges we have experienced is making a device's NSD broadcasts continuously visible to other devices. Sometimes, this required us to power-cycle the wi-fi radio, clear the phone's program cache, or restart the device several times. With a clear, unobstructed line of sight, the wireless radios on the Android devices we tested were capable of maintaining a direct connection to a device-created soft access point at distances up to 32 to 35 meters. We found that obstructions such as walls, doors, etc. significantly reduced the operable range. 9 Conclusion In this article, we discussed and demonstrated how the Android's Wi-Fi Direct API can be used to automatically establish connections to other Wi-Fi Direct devices. By using network service discovery broadcasts over Wi-Fi Direct, we are able to transmit a soft AP's preshared key to other devices, then connect these devices to the AP using their regular Wi-Fi facilities without ever requiring a user to perform manual WPS authentication on any of them. Also, we have shown that, once freed from the requirement of manual user authentication, it becomes practical for Android devices to automatically construct and participate in a wireless mesh network, acting as mesh routers, mesh clients, and relays. 10 Acknowledgements We thank Dr. Robert Ryan and Mary Pagnutti from Innovative Imaging and Research for their advice, guidance, and feedback on this paper. The material in this paper is based on work supported by NASA STTR Phase 2 contract No.NNX13CS14C. References [1] Google, "Android," Jan. 2014. [Online]. Available: http://www.android.com/ [2] W.-J. Yi, W. Jia, and J. Saniie, "Mobile Sensor Data Collector using Android Smartphone," Circuits and Systems (MWSCAS), 2012 IEEE 55th International Midwest Symposium on, no. 55, pp. 956-959, 2012. [3] P. P. Jayaraman, A. Zaslavsky, and J. Delsing, "Sensor Data Collection Using Heterogeneous Mobile Devices," Pervasive Services, IEEE International Conference on, pp. 161-164, 2007. [4] "Wi-fi direct," Jan. 2014. [Online]. Available: http://www.wi-fi.org/discover-wi-fi/wi-fi-direct [5] Wireless Mesh Networks: Architectures and Protocols. Springer, 2007. [6] R. Meier, Professional Android 4 Application Development. Wrox, 2012. [7] Google, "Wifip2pmanager," Jan. 2014. [Online]. Available: http://developer.android.com/guide/topics/ connectivity/wifip2p.html [8] K. Huang, Berti, "A multilayer application for multi-hop messaging on android devices," Oct. 2013. [Online]. Available: http://anrg.usc.edu/ee579_2012/ Group02/Design.html [9] "Serval mesh documentation," Oct. 2013. [Online]. Available: http://developer.servalproject.org/ [10] "Opengarden.com," Oct. 2013. [Online]. Available: https://opengarden.com/ [11] "IEEE 802.11i-2004: Part 11: Wireless lan medium access control (mac) and physical layer (phy) specifications," Jul. 2004. [12] D. Camps-Mur, A. Garcia-Saavedra, and P. Serrano, "Device-to-Device Communications with WiFi Direct: Overview and Experimentation," IEEE Wireless Communications, vol. 20, no. 3, pp. 96-104, 2013. [13] "ISO-8859-1: Information technology - 8-bit singlebyte coded graphic character sets - Part 1: Latin al-phabetNo. 1,"2003. Visualization and Concept Drift Detection Using Explanations of Incremental Models Jaka Demšar, Zoran BosniC and Igor Kononenko University of Ljubljana, Faculty of Computer and Information Science VeCnapot 113, 1000 Ljubljana, Slovenia E-mail: jaka.demsar0@gmail.com, {zoran.bosnic, igor.kononenko}@fri.uni-lj.si Keywords: data stream mining, concept drift detection, visual perception Received: October 10, 2014 The temporal dimension that is ever more prevalent in data makes data stream mining (incremental learning ) an important field of machine learning. In addition to accurate predictions, explanations of the models and examples are a crucial component as they provide insight into model's decision and lessen its black box nature, thus increasing the user's trust. Proper visual representation of data is also very relevant to user's understanding - visualization is often utilised in machine learning since it shifts the balance between perception and cognition to take fuller advantage of the brain's abilities. In this paper we review visualisation in incremental setting and devise an improved version of an existing visualisation of explanations of incremental models. Additionally, we discuss the detection of concept drift in data streams and experiment with a novel detection method that uses the stream of model's explanations to determine the places of change in the data domain. Povzetek: V članku predstavimo novo vizualizacijo razlage inkrementalnih klasifikacijskih modelov in posameznih napovedi. Predlagamo tudi metodo zaznave spremembe koncepta, temelječo na nadzorovanju toka razlag. 1 Introduction Data streams are becoming ubiquitous. This is a consequence of the increasing number of automatic data feeds, sensoric networks and internet of things. The defining characteristics of data streams are their transient dynamic nature and temporal component. In contrast with static (tabular) datasets (used in batch learning), data streams (used in incremental learning) can be large, semi-structured, incomplete, irregular, distributed and possibly unlimited. This poses a challenge for storage and processing which should be done in constant time - for incremental models, operations of increment (fast update of the model with the new example) and decrement (forgetting old examples) are vital. Concepts and patterns in data domain can change (concept drift) - we need to adapt to this phenomenon or the quality of our predictions deteriorates. We discuss data streams and concept drift more in Section 2.1. Bare prediction quality is not a sufficient property of a good machine learning algorithm. Explanation of individual predictions and model as a whole is needed to increase the user's trust in the decision and provide insight in the workings of the model, which can significantly increase the model's credibility. The model independent methods of explanation have been developed for batch [12, 13] and incremental learning [2] (Section 2.2). Data visualisation is a versatile tool in machine learning that serves for sense-making and communication as it conveys abstract concepts in a form, understandable to humans. In Section 2.3 we discuss visual display of data in an incremental setting and describe the improper visualisation of explanation of incremental models. The main goal of the article is its improvement, which is presented in Section 3. An additional goal (Section 4) was to devise a method of concept drift detection which monitors the stream of explanations. Finally, we test the improved visualization and the novel concept drift detection method on two datasets and evaluate the results (Section 5). 2 Related work 2.1 Data stream mining In incremental learning, we can observe possibly infinite data stream of pairs (x,, C,), where is the i - th instance and Ci is its true label. After the model makes a prediction Pi = C-i, the environment reacts with a feedback which can be used to assess the model's performance (in the case of classifiers, the instance's true label becomes available). According to PAC (Probably approximately correct) learning model, if the distribution, generating the instances is stationary, the error rate will, at least for sound machine learning algorithms, decline towards the Bayes error rate as the number of processed instances increases [10]. Consequently, when a statistically significant rise in error rate is detected, we can suggest that there has been a change in the generating distribution - concept drift. The nature of change is described with two dimensions: the cause of change (changes in data, hidden variables, changes in the context of learning...) and the rate of change. The change detection can be also completely left out (using windows and blindly forcing the operations of increment and decrement). The other approach is to actively detect change either by monitoring the evolution of performance measures or comparing distributions over two time windows. In the explanation methodology (Sections 2.2 and 4) we use two methods from the former group. The basis of monitoring the learning process using statistical process control (SPC) [4] is detecting significant error rate using central limit theorem. Each processed instance is in one of the three possible states - in control, out of control and warning when the system is in between the former states. When in control, the current model is incrementally updated with the current instance, since the error rate is stable. In warning state, a buffer is filled with incoming instances - it serves as an optimally-sized container for data that is relevant for the new model if the drift occurs (out ofcontrol state) - the buffer is then used to construct a new learning model form the instances in it. If the system goes from warning back to in control, the buffer is emptied, since we deemed the error rise to be a false alarm. The other method, Page-Hinkley test [8], is simpler in its nature. It was devised to detect the change of a Gaussian signal and is commonly used in signal processing. The method's behaviour can be controlled with parameters A (threshold for alarm) and S, which corresponds to the allowed magnitude of changes. 2.2 Explanation of models and individual predictions Explanation of individual predictions and prediction models is used to gain additional insight into the model's decision and to lessen the black box nature of most predictions. The adequately explained prediction increases the user's trust and understanding of the model and extracts additional information. Although some models are already transparent (e.g. decision trees, Bayesian networks) and model-dependent methods of explanation exist for most others, these methods do not meet the requirement for consistency of explanation, which enables comparison of explanations across different models. IME (Interactions-based Method for Explanation) [13] with its efficient adaptation [12] is a model independent method of explanation, which also addresses interactions of features and therefore successfully tackles the problem of redundant and disjunctive concepts in data. The explanation of the prediction pi for instance xj is defined as a vector of contributions of individual feature values (^1,^2,... where is the contribution of the value of the j-th feature. Positive ^j implies that the feature value positively contributed to the prediction (and vice versa) while the absolute value | is proportional to the magnitude of influence on the decision. The sum of all contributions is equal to the difference between the prediction using all feature values and a prediction using no features (prediction difference). The explanation quality in highly correlated with prediction quality [13]. Consequently, for very good models, the explanation gives us an insight not only into the working of the built model, but also into concepts behind the domain itself. In order to discover dependencies among feature values, 2n subsets of features are considered. An efficient sampling-based approximation method exists [12], where the explanation of a prediction is modeled with game theory as a cooperative game (N, A) where players N are features and A is the prediction difference. The payoff vector divides the prediction difference among feature values as contributions that correctly explain the prediction1. The explanation of a single prediction can be expanded to the whole model by iterating thorough all possible feature values and using contributions of randomly generated instance pairs to compute the contributions of each attribute value. An adaptation of existing explanation methodology to incremental setting is proposed in [2]. In addition to the efficiency requirements, the key feature of incremental learning to consider is the possibility of a concept drift. The explanation in incremental learning should therefore itself be a data stream. An adaptation is developed by considering the existing model-independent method in batch learning [12] and equipping it with drift detection and adaptation methods. In the incremental explanation methodology, the basic concept is slightly modified by introducing the parameter max_window. It acts as a limiter for maximum size of the learning model, by narrowing down the model by FIFO principle, if necessary. The batch explanation is integrated at key points in SPC2 (changes of states), resulting in a granular stream of explanations which reflect local areas of static distributions and intermediate areas of concept drift. An optional parameter w determines triggering of periodic explanations independent of change indicators. Aside from periodic explanations, the granularity of the stream is completely correlated with change detection - the explanation of the model occurs only when error rate significantly increases, indicating a change in the distribution of generating the instances3. Explaining an individual prediction follows the same process as in batch explanation, only that the local learning model is used. 1 The concept of Shapley value is used as a solution. 2 SPC meets the requirement of model-independence - it works as a wrapper for an arbitrary classifier. In addition to the occurrence of a concept drift, it also detects its rate (the smaller the buffer is between warning and out of control, the faster the drift occurs) 3The resulting stream of explanations itself can be a subject of analysis - if the model is good, patterns found in the explanation stream may reflect the patterns behind the domain. 2.3 Data visualisation Data visualization is often utilised in machine learning since it shifts the balance between perception and cognition to take fuller advantage of the brain's abilities [3]. A data stream can be seen as a collection of observations made sequentially in time. Sampling based research shows that the majority of visualizations depict data that has a temporal component [9]. In this context, visualization acts as a form of summarization. The challenge lies in representing the temporal component, especially if we are limited to two-dimensional non-interactive visualisations. Concept drift is the other property in data streams that makes creating good visualization a hard task. Even if we manage to effectively summarize and display patterns in data at some point, we are still left with the task of displaying the change in them. The main goal of this paper is improving the existing methodology for visualising explanations of incremental models [2]. The feature value contributions are represented with customised bar charts. When explaining a static model, all possible values are listed along the axis with mean negative and mean positive contributions of each feature. When plotting multiple such visualisations (as is the case in locally explaining incremental models based on change detection) they become very difficult to read as a whole because of the large number of visual elements that we have to compare (we sacrifice macro view completely in favour of micro view). To consolidate these images and address the change blindness [7] phenomenon, charts are stacked into a single plot, where the age and size of the explanation are represented with transparency (older and "smaller" explanations fade out). The resulting visualisation is not tainted by first impressions (as it is only one image) and is adequately dense and graphically rich. However, the major flaw of this approach lies in the situations when columns, representing newer explanations override older ones and thus obfuscate the true flow of changing explanations, for example, when the concept drift precipitates the attribute value contributions to increase in size without changing the sign. Concepts can therefore become not only hidden; what's more, the visualization can be deceiving, which we consider to be worse than just being too sparse. Therefore, we need to clarify the presentation of the concept drift along with an accurate depiction of each explanation's contributions while maintaining the macro visual value, that enables us to detect patterns and get a sense of true concepts and flow of changes behind the model. 3 Improved visualisation When visualising explanations of individual predictions, horizontal bar charts are a fitting method also in the incremental setting - we plot the mean positive and negative contribution of each attribute value and the mean of each attribute as a whole. Individual examples are always explained according to the current model which, in our case, can change. This is not an obstacle, since the snapshot of the current model is in fact the model that classified the example, so we can proceed with the same explanation methodology as in the static environment4. This approach fails with explanations of incremental models as we need a new figure for each local explanation (Subsection 2.3). To successfully represent the temporal component of incremental models, we use two variations of a line plot where the x axis contains time stamps of examples and the splines plotted are various representations of contributions (y axis). The first type of visualization (examples in Figures 2 and 4) has one line plot for each attribute. Contributions of values of the individual attribute are represented with line styles. The mean positive and mean negative contribution of the attribute as a whole are represented with two thick faded lines. Solid vertical lines indicate the spots where explanation of the model was triggered (and therefore become the joints for the plotted splines), while dashed vertical lines mark the places where the actual concept drift occurs in data. The second type is an aggregated version (examples in Figures 3 and 5) where the mean positive and mean negative contributions of all attributes are visualized in one figure. In these two ways we condense the visualization of incremental models without a significant loss in information while still providing a quality insight into the model. Exact values of contributions along with times-tamps of changes can be read out (micro view), while general patterns and trends can be recognised in the shapes of lines that are intuitive representations of flowing time (macro view). The resulting visualisations are dense with information, easily understandable (conventional plotting of independent variable, time, on x-axis) and presented in gray-scale palette, making them more suitable for print. 4 Detecting concept drift using the stream of explanations When explaining incremental models, the resulting explanations are, in themselves, a data stream. This gives us the option to process them with all the methods used in incremental learning. In our case, we'll devise a method to detect outliers in the stream of explanations and declare such points as places of concept drift. The reasoning behind this is the notion that if the model does not change, then also the explanation of the whole model will not change. When an outlier is detected, we consider this to be an indicator of a significant change in the model and thus also in the underlying data. In addition to this, the method provides us with a stream of explanations that is continuous to a certain degree of granularity and so enables us to overview the concepts behind the data at more frequent intervals than the existing explanation methodology. 4That means that, without major modifications, we can only explain instances at the time of their classification - the model changes with the arrival of the next instance. We use a standard incremental learning algorithm [4] (we learn by incrementally updating the model with each new example, decrement the model if it becomes too big or rebuild the model if we detect change [5]) and introduce the granularity parameter which determines how often the explanation of the current model will be triggered. This will generate a stream of explanations (vectors of feature value contributions) that will be compared using cosine distance. For each new explanation, the average cosine distance from all other explanations that are in the current model, is calculated. These values are monitored using the Page Hinkley test. When an alarm is triggered, it means that the current average cosine distance from other explanations has risen significantly, which we interpret as a change in data domain - concept drift. The last granulation examples are then used to rebuild the model, the Page Hinkley statistic and the local explanation storage are reset (to monitor the new model). The cosine distance is chosen because, in the case of explanations, we consider the direction of the vector of contributions to be more important than its size, which is very influential in the traditional Minkowski distances. It also carries an intuitive meaning. The Page Hinkley test is used in favour of SPC because of its superior drift detection times [10] and the lack of need for a buffer - examples are already buffered according to the granularity. The method is therefore model independent. Algorithm 1 describes the process in a high level pseudocode. 5 Results 5.1 Testing methodology and datasets We test the novel visualisation method and the concept drift detection method on two synthetic datasets, both containing multiple concepts with various degrees of drift between them. These datasets are also used in previous work [2], so a direct assessment of visualization quality and drift detection performance can be made. From the results of this testing, we may conclude whether the new methods are the improvements. The naive Bayes classifier and the k nearest neighbour classifier are used. Their usage yields very similar results in all tests, so only results obtained by testing with Naive Bayes are presented. SEA concepts [11] is a data stream comprising 60000 instances with continuous numeric features xj G [0,10], where i = 1,2, 3. xi and x2 are relevant features that determine the target concept with x2 + x3 < ß where threshold ß G {7,8,9,9.5}. Points are arranged into four blocks: ß = 8, ß = 9, ß = 7 and ß = 9.5, consecutively. Although the changes between the generated concepts are abrupt, 10% class noise is inserted into each block. The second dataset, STAGGER, is generated with MOA (Massive Online Analysis) data mining software [1]. The instances represent geometrical shapes which are in the feature space described by discrete features size, color and shape. The binary class variable is determined by one of Algorithm 1 Detecting concept drift using the stream of explanations Require: h {classifler} Require: {:ci,yi)t {data stream} Require: m {number of samples for IME} Require: g {explanation granulation} Require: max_window {maximum size of the model} Require: A {Page Hinkley threshold} local_explanations ^ [] $ = h {Incremental classifier} buf = [] {buffer} for (x!,yi) e {:ci,yi)t do $ = $ .increment (X^i) if len($) > max_window then $ = $.decrement() end if buf.append((:ci,yi)) if len(buf) > g then buf = buf [1 :] {Maintain buffer size} end if if i%g == 0 then ^i = IME($) {Explain the current model} dist_^i ^ mean(cos_dist(^i,4>') e local_explanation,s) if PageHiknley(dist_^i) = ALERT then $ = h(buf) {Rebuild} local_explanations ^ [] end if local_explanations ^ local_explanations U 0 end if end for the three target concepts ((size = small) A (color = red), (color = green) V (shape = square) and (size = medium) V (size = large)). We generate 4500 instances which are divided into blocks belonging to the particular concept (presented in Table 5.1). The concept drift is applied by specifying an interval of certain length between blocks - there, the target concepts of the instances mix according to the sigmoid function. Therefore, the dataset includes gradual drift, disjunction in concepts and redundant features. Interval Concept Width of drift [0, 749] 1 50 [750,1799] 2 50 [1800, 3599] 3 150 [3600,4500] 1 / Table 1: STAGGER dataset used for evaluation 5.2 Improved visualizations Concept drifts in STAGGER dataset (we use max_length = = 500) are correctly detected and adapted to as reflected in Figure 2. The defined concepts can be easily recognized from explanations triggered by the SPC algorithm - the change in explanation follows the change in concept. Windows generated by the vertical lines give us insight in local explanations of the model (where the concept is deemed to be constant). Disjunct concepts (2 and 3) and redundant feature values are all explained correctly (e.g. reduncacy of "shape" and disjunction of "size" values in concept 3). Figure 1 demonstrates how classifications of two instances with same feature values can be explained completely differently at different times - adapting to change is crucial in incremental setting. This is also evident in the aggregated visualization (Figure 3), which can be used to quickly determine the importance of each attribute. STAGGER, all I 0.6 0.4 I 0.2 ! 0.0 1 -0.2 -0.4 -0.6 I.TMT.IIIIIIII 0 1000 2000 3000 "Time (number of example] 4000 Figure 3: Aggregated visualization of explanations triggered at change detection (STAGGER). t = 1500 t = 2350 -1.0 -0.5 0.0 0.5 1.0 Contributions -1.0 -0.5 0.0 0.5 1.0 Contributions Figure 1: Explanations of individual predictions. attribute: size 0.6 H— smal 0.4 U - Uri. 0.2 H" medi. 0.0 ^ -0.2 g -0.4 U -0.6 g 0.6 o 0.4 S 0.2 S 0.01 ^ -0.2 g -0.4 U -0.6 £ 0.6 o 0.4 S 0.2 fi 0.0 _ ^ -0.2 g -0.4 U -0.6 O 1000 2000 3000 Time (number of exampie) irrelevant with its only contributions being the result of noise. The succession of target concepts ß e {8,9,7, 9.5} is even more recognizable in the aggregated visualization (Figure 5). Changes in data are not as significant as those in STAGGER dataset, although the drift can still be observed, the dip around t = 40000 being a notable example (the concept drifts from ß = 7 to ß = 9.5). 0.4 "> 0.3 g 0.2 ■■g 0.1 S 0.0 b -0.1 g -0.2 U -0.3 -0.4 0.4 g 0.3 g 0.2 ■■3 0.1 .9 0.0 b -0.1 g -0.2 <-> -0.3 -0.4 0.4 » 0.3 o 0.2 ■■S 0.1 5 0.0 .fc -0.1 g -0.2 U -0.3 -0.4 attribute: x^ - 1 1 I 1 1 1 1 - 1 1 1 1 ▼ 1 1 attribute: x. o 10000 20000 30000 40000 50000 60000 Time (number of exampie) Figure 4: Visualization of periodic explanations (SEA). Figure 2: Visualization of explanations triggered at change detection (STAGGER). For SEA dataset, feature values were obtained by equidistant discretization in 10 intervals for each feature. Since the blocks containing each concept are rather long and all concept drifts were successfully detected by SPC, the explanations obtained by change detection and those that were triggered periodically, do not differ greatly (we use max_length = 4000, w = 5000). Explanations of individual instances are tightly corresponding to explanations of the model. As can be seen in figure 4, the shape of contributions of features x2 and x3 reflects the target concept x2 + x3 < ß; lower values increase the likelihood of positive classification and vice versa. Feature xi is correctly explained as 5.3 Concept drift detection Evaluating the concept drift detection using the stream of explanations on the STAGGER dataset yielded positive results. As depicted in Figure 6, the method correctly detects concept drifts without false alarms and is in that regard similar to SPC method. The stream of explanations itself fits patterns seen in testing with other successful drift detection methods (figure 2). Choices of larger granulations yielded similar results, but the change detection was obviously delayed. The concept drift was however never missed, provided that the granulation was smaller that the spacing between sequential changes in data. The delays of concept drift detection are correlated with the magnitude of change, e.g. the last concept drift (t = 3600) was detected s SEA, all attributes averages 0.2 0.1 .9 3 I -i^ 0 10000 20000 30000 40000 50000 60000 Time (number of example) Figure 5: Aggregated visualization of periodic explanations (SEA). with significant delay. In this regard the proposed method is inferior to SPC algorithm - the concept drift detection in noticeably delayed and we're also dependant on two parameters - granulation and alert threshold, so the generality of the method is diminished. No adaptation to change, STAGGER SPC, STAGGER 0 1000 2000 3000 4000 Tlmestamp of example 1000 2000 3000 4000 Timestamp of example Detection from stream of explanations, STAGGER Detection from stream of explanations, SEA I •6 0 1000 2000 3000 4000 Timestamp of example 20000 40000 Timestamp of example Figure 6: Loss of prediction and concept drift detection with several methods - no drift detection (static classifier), drift detection with SPC [2] and drift detection using the stream of explanations. The thick vertical lines indicate occurrence of true change in data and the dashed vertical lines mark drift detection. When testing with SEA datasets, the concept drift was not correctly detected. Changing the granulation, Page Hinkley alert threshold and max_window parameter resulted in varying degrees of false alarms or non reaction to change (see Figure 6). This behaviour can be attributed to a small magnitude of change that occurs in data - the difference between concepts in data is quite small and continuous. However, when explaining this (incorrectly adapted) model we still recognise true underlying concepts due to the max_window property which causes the model to au- tomatically decrement when it becomes too big5. We conclude that, in this form, the presented method is not a viable alternative to the existing concept drift detection methods. Its downsides include high level of parametrization (maximum size of the model, granularity and alert threshold level) which require a significant amount of prior knowledge and can also become improper if the model changes drastically. Consequently, another assessment of data is needed - the required manual supervision and lack of adaptability in this regard can be very costly and against the requirements of a good incremental model. The concept drift detection is also not satisfactory - it is delayed in the best case or concepts can be missed or falsely alerted in the worst case. Another downside is the time complexity - the higher the granularity the more frequent explanations will be, which will provide us with a good stream of explanations but be very costly time-wise. The method is therefore not feasible in environments where quick incremental operations are vital. However, if we can afford such delays, we get a granular stream of explanations which gives us insight into the model for roughly any given time. A note at the end: we should always remember that we are explaining the models and not the concepts behind the model. Only if the model performs well, we can claim that our explanations truly reflect the data domain. This can be tricky in incremental learning, as at the time of a concept drift, the quality of the model deteriorates. 6 Conclusion The new visualization of explanation of incremental model is indeed an improvement compared to the old one. The overriding nature of the old visualisation was replaced with an easy to understand timeline, while the general concepts (macro view) can still be read out from the shape of the lines. Micro view is also improved as we can determine contributions of attribute values for any given time. The detection of concept drift using the stream of explanations did not prove to be suitable for general use based on the initial experiments. It has shown to be hindered by delayed detection times, missed concept drift occurrences, false alarms, high level of parametrization and potential high time complexity. This provides motivation for further experiments in this field, especially because the stream of explanations provides good insight into the model with accordance to the chosen granulation. The main goal of future research is finding a true adaptation of IME explanation methodology to incremental setting, i.e. efficient incremental updates of explanation at the arrival of each new example. Truly incremental explanation methodology would provide us with a stream of explanations of finest granularity. In addition to this result (for each timestamp we get an accurate explanation of the 5 Considering prior domain knowledge. 0.0 -0.1 -0.2 0 60000 current model), a number of new possibilities for visuali- [13] E. Štrumbelj, I. Kononenko, and M. Robnik Šikonja. sation would emerge, particularly those that rely on finely granular data, such as ThemeRiver [6]. References [1] A. Bifet, G. Holmes, R. Kirkby, B. Pfahringer, and Braun M. Moa: Massive online analysis. [2] J. Demšar. Explanation of predictive models and individual predictions in incremental learning (in slovene). B.S. Thesis, Univerza v Ljubljani, 2012. [3] S. Few. Now You See It: Simple Visualization Techniques for Quantitative Analysis. Analytics Press, USA, 1st edition, 2009. [4] J. Gama. Knowledge Discovery from Data Streams. Chapman & Hall/CRC, 1st edition, 2010. [5] D. Haussler. Overview of the probably approximately correct (pac) learning framework, 1995. [6] S. Havre, B. Hetzler, and L. Nowell. Themeriver: Visualizing theme changes over time. In Proc. IEEE Symposium on Information Visualization, pages 115123, 2000. [7] L. Nowell, E. Hetzler, and T. Tanasse. Change blindness in information visualization: a case study. In Information Visualization, 2001. INFOVIS 2001. IEEE Symposium on, pages 15-22, 2001. [8] E. S. Page. Continuous inspection schemes. Biometrika, 41(1):100-115, 1954. [9] Gunopulos D. Keogh E. Vlachos M. Das G. Ratanamahatana C.A., Lin J. Mining time series data. In Data Mining and Knowledge Discovery Handbook, pages 1049-1077. Springer US, 2010. [10] R. Sebastiao and J. Gama. A study on change detection methods. In Progress in Artificial Intelligence, 14th Portuguese Conference on Artificial Intelligence, EPIA 2009, Aveiro, Portugal, October 12-15, 2009. Proceedings, pages 353-264. Springer, 2009. [11] W. Nick Street and Y. Kim. A streaming ensemble algorithm (sea) for large-scale classification. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '01, pages 377-382, New York, NY, USA, 2001. ACM. Explaining instance classifications with interactions of subsets of feature values. Data Knowl. Eng., 68(10):886-904, October 2009. [12] E. Štrumbelj and I. Kononenko. An efficient explanation of individual classifications using game theory. The Journal of Machine Learning Research, 11:1-18, 2010. Foreground-Background Separation by Feed-forward Neural Networks in Old Manuscripts Abderrahmane Kefali, Toufik Sari and Halima Bahi LabGED Laboratory Computer Science Department, Badji Mokhtar University, BP-12 Annaba, 23000, Algeria E-mail: {kefali, sari, bahi}@labged.net Keywords: foreground-background separation, binarization, ANN, MLP, image fusion, PSNR Received: February 16, 2014 Artificial Neural Networks (ANNs) are widely used techniques in image processing and pattern recognition. Despite of their power in classification t^asks, for pattern recognition, they show limited applicability in the earlier stages such as the foreground-background separation (FBS). In this paper a rjovel FBS technique based on ANN is applied on old documents with a variety of degradations. The idea is to train the ANN on a set of pairs of original images and their respective ideal black and whit^e ones relying on global and local information. We ran several experiments on benchmark and synthetic ^a^t^a a^nd we obtained bet^ter results than s^ate-of-the art methods. Povzetek: V tem prispevku Je predlagan nova b^nariz^acija tehnika, ki temelji na umetni nevronski mreži za stare dokumente z različnimi nivoji poslabšanja. 1 Introduction Important documentary collections exist currently in the libraries, museums and other institutions in pedagogic or sociopolitical matters. Historical documents of old civilizations and public archives are typical examples of such abundance which represent the patrimony, and nation's history. In the last two decades, the scientific community renewed the interest in the restoration of old documents. Several R&D projects were initiated and several research papers and PhD thesis have, consequently, focused on ancient documents processing and analysis. The pattern recognition and image processing researchers are in the core of this interest group. In the most cases, document image processing should deal with foreground-background separation1 (FBS). The goal of a FBS algorithm is to extract the relevant information (text, figures, tables, etc.), i.e. "the foreground", from the page, i.e. "the background". Actually, image binarization is critical in the sense that bad separation will cause the loss of pertinent information and/or add useless information. FBS is more difficult for old documents images which have various types of degradations from the digitization process itself, aging effects, humidity, marks, fungus, dirt, etc. The FBS is generally done using a cut-off value "the threshold" [3][11]. The literature is rich in methods for document image binarization based on thresholding [1][2][3][11]. These methods use different ways to FBS is also called « binarization » meaning producing black and white (BW) images from grayscale or color ones. calculate the threshold and are grouped into two main classes: global methods when a single threshold is used for the whole image and local methods when one threshold per image area is computed. Mixtures can be done combining global and local information to find optimal threshold values. ANNs have contributed since their introduction in the fifties to solve more complex problems in many areas including classification, prediction, approximation, pattern recognition, etc. ANNs are able to generalize, after a learning stage, their behavior on new data they had not "seen" before. In spite of the ANNs' success in various image processing applications, their contribution in image binarization is still limited and have not generated much research [3][4]. In this paper, we exploit the generalization facility of ANNs to devise a novel binarization approach that relies not on thresholding, but on classification using a Multilayer Perceptron (MLP). This assumption can be justified by definition that the binarization is in fact a classification process where the set of image pixels should be divided into two classes "blacks" and "whites" and, for this, ANNs can excel. The benefit of using a supervised neural network for FBS is the reduction of the complexity of the conventional thresholding methods and the availability of large databases for training and validation. The remainder of this paper is organized as follow. We first give a brief description of kinds of thresholding techniques and focus on ANNs based methods. Next, we describe and detail our proposed approach. After that, we present the evaluation results with several well-known methods, to finish with a conclusion. 2 Related works Numerous surveys and competitions on document image binarization have been completed not only for comparison purposes but even for categorization, evaluation, and of course for scholars too. According to [3], the binarization techniques can be divided into six groups: - Histogram-based methods: the methods of this class perform a thresholding based on the form of the histogram. [20][21]; - Clustering-based methods: These methods assign the image pixels to one of the two clusters: object and background. [22][23]; - Entropy-based methods: These algorithms use the information theory to obtain the threshold. [24] [25]; - Object attribute-based methods: Find a threshold value based on some similarity measurements between original and binary images. [26][27]; - Spatial binarization methods: find the optimal threshold value taking into account spatial measures. [28]; - Locally adaptive methods: These methods are designed to give a new threshold for every pixel. Several kinds of adaptive methods exist. We find methods based on local gray range [29], local variation [30], etc. Recently, ANNs have been used for binarization. As stated in [16], most of these works dealt with noisy handwritten document images and non-document images but, however, did not inspect the special case of historical documents. M. Egmont-Petersen et al., [4] reviewed more than 200 works on image processing with ANNs from which only three were on image binarization. N. Papamarkos et al. in [5] proposed a multithresholding approach using the Principal Component Analysis (PCA) and a Kohonen Self-Organized Feature Map (SOFM) neural network. The input layer of SOFM coded the 255 elements of the gray-level histogram and the output layer is a 1D map in which only one winner neuron is activated for each input vector. In [6], the authors used a feed forward neural network to find the "clean" pixel corresponding to the noisy one as input in conjunction with the median filter value and Rank-Ordered Absolute Differences (ROAD) of the same pixel. We found that the same approach was proposed earlier in [7]. In [31], a PNN network is used but the histogram is compacted. On the other hand, in [32] every pixel is used to obtain the thresholded image. A combination of a thresholding technique and an artificial neural network for leaf veins extraction is carried out in [12]. Padekas and Papamarkos [13] combined several binarization techniques using a Self-Organizing Map, for the binarization of normal and degraded printed documents for character visualization and recognition. Yang et al. [14], used text image segmentation method with a neural network for document image processing. In [15], a multi-layer perceptron is used and trained using noisy document images to produce enhanced document images. In [16], authors combined a global thresholding method, namely the Mass-Difference (MD) thresholding and supervised neural network for selecting a global optimum threshold value. This last is used then to binarize the degraded document image. MD thresholding is first applied to calculate several local optimum threshold values. Afterwards a supervised neural network is trained using these optimum threshold values as its inputs and a single global optimum threshold value as its output. The neural network consists of an input layer having 256 neurons receiving the local threshold values, one hidden layer containing 22 neurons and only one neuron in the output layer which represents the global threshold value. Lazaro and al. [17] proposed to obtain an optimum threshold for each image using a semantic description of the histogram and a general regression neural network, for high precision OCR algorithms over a limited number of document types. The histogram of the input image is first smoothed in order to eliminate false minima and maxima and its discrete derivate is found. Using a polygonal version of the derivate and the smoothed histogram, a new description of the histogram is calculated. Once this last is generated, a semantic description is inferred. The obtained semantic description is used by a general regression neural network to obtain the optimal threshold value. In [18], authors used multi-layer perceptron (MLP) as a threshold transfer to select the visually satisfied threshold by a modified back propagation algorithm. The proposed technique starts with the histogram of the contrasted image. The histogram is scaled to [0, 1] and divided into 128. The three-layer MLP has an input layer of 128 neurons, two hidden layers of 256 and 512 neurons respectively and an output layer with 128 neurons. [19] Introduced three neural based binarization techniques. The three approaches start with a Self Organizing Map (SOM) trained over a part of the image in order to extract its most representative grey levels or colors. The SOM inputs are the pixels of the image. Then, the classification is done differently in the three approaches. In the first approach, the neurons of the SOM are segmented into N regions (for N classes) using the Kmeans algorithm [9]. Once the pixels are clustered, the SOM is used to classify the pixels of the entire image. Each pixel is given as input to the SOM in order to be classified. The second approach is applied on color images. In this approach an MLP is used where the inputs are the SOM neurons, and its outputs are the classes of the same neurons. After training, the MLP is applied on the whole image to classify each pixel. In the third approach, Sauvola or Niblack thresholds are applied on the neurons of the trained SOM, and a global threshold is extracted. This global threshold is applied on the entire image. Although the binarization record is long and will continue to lengthen (see ICDAR 2013 competitions), few comparison studies were found. A large discussion on performances of the works mentioned above is not possible at this stage. A good comparison should be done, in our opinion, but on standard and benchmark databases of document images and using well renowned performance metrics. 3 Proposed method In this section we will describe a new method for binarizing images of old documents. The proposed method may be placed among the hybrid class. In this method, the separation of image pixels to "blacks" or "whites" is performed by a Multilayer Perceptron (MLP) trained with back-propagation (see [49] for more details). In this approach, the MLP does not compute or learn any threshold but runs a direct binarization by classifying the image pixels into two classes. As we described above the binarization is a kind of two-class classification problem. The MLP is an ANN with supervised learning dedicated especially for classification purposes. It has the ability of separating non-linearly separable classes of patterns. We have chosen to use MLPs as gray-levels of pixels may overlap in some cases. Even if this is not always true, MLPs go surely further than many other classifiers [19]. The motivation of using learning-based approach for FBS is that, as detailed in Figure 1, the pages of one same volume have similar defects as they are written in the same type of paper, in the same period of time and are conserved in the same conditions. As a result, one binarization algorithm will equally succeed on all the pages of the same manuscript. Therefore, we can train the algorithm on a limited number of pages, or some specific areas from different pages, and it can perform well on the other pages. For other volumes, we can retrain the classifier to adapt to the new data. The same scheme is also possible for one single page; we can train the algorithm on only the data from some areas of the page and it generalizes its behavior on the remainder of the page. (a) (b) (c) Figure 1: (a) a collection of documents in the same conservation conditions. (b) & (c) different pages of the same volume with similar degradations. For each pixel, global and local information from the neighborhood are used to train the MLP. Local information is the grey values of the pixel p with those of its neighbors from an N x N window centered on p. In addition, we introduce global information, namely the global mean (M) and global standard deviation (5) of the whole image. Indeed, as noted in [18], these two last and other statistical parameters of the image are widely used by the most binarization methods, [22] [23] for instance, to compute the thresholds. The MLP should then output 0 for black or 1 for white. See Figure 2. Figure 2: MLP for classifying a pixel p according to its current value and of those of its neighbors in a 3x3 window. The proposed approach consists of four steps: image preprocessing, MLP deHnition, MLP training, and finally Binarization step. The block diagram of the approach is shown in Figure 3. Figure 3: Block diagram of the proposed method. 3.1 MLP definition The first thing we need to do is to set the "adequate" structure of the neural network (number of hidden layers and number of neurons per layer). As we said, we preferred to work with Multilayer Perceptron which showed proven abilities in classification problems. Despite the importance of the optimal topology of an MLP for a given problem, it is not always easy to devise and almost not necessary. Kolmogorov [50] showed that all continuous functions of n variables have an exact representation in terms of finite superpositions and compositions of a small number of functions of one variable [8]. In terms of networks this means that every continuous function of many variables can be a network with two hidden layers [52]. In addition, according to G.V. Cybenco in [8] "a MLP network that has only one hidden layer is able to approximate almost any type ofnon linear mapping". So, we used the trial-and-error procedure to find the ideal topology of our MLP. We devised an MLP with a single output layer corresponding to the binary value of the pixel (0 or 1). The number of inputs is the number of pixels in the window (9 for a 3x3 window, 25 for 5x5, etc.), in addition to the Mean (M) and the standard deviation (5) of the whole image. The activation function chosen is the sigmoid function defined by: f( x) ^ 177=^ However, one question remains: which window size will give better results? To answer this question, we tested several window sizes, with different neural networks of varying number of layers, and neurons in each layer; and trained all these configurations and evaluate for each configuration its performance on the test set, using several evaluation measures. The obtained results show that the optimal configuration is: • Window size of 3x3 and therefore the number of inputs is 11, • One hidden layer of 11 neurons, • One neuron as output. 3.2 Image preprocessing This phase has as goal the preparation and the extraction of training (and validation) data training images and it runs offline. <242,173,173,245,173,173,223,233,36, 185.66, 64.58> Figure 4: Input vector for one pixel in a 3x3 window. 3.2.2. Images of training and validation We used two sets of images for creating the training and validation data. The first is a public set composed of document images from the four collections proposed within the context of the competitions DIBCO 2009 , H- 3 4 5 DIBCO 2010 , DIBCO 2011 and H-DIBCO 2012 . These four collections contain a total of 50 real documents images (37 handwritten and 13 printed) coming from the collections of several libraries, with the associated ground truth images. All the images contain representative degradations which appear frequently (e.g. variable background intensity, shadows, smear, smudge, low contrast, bleed-through). Figure 5 show some images from these collections. from a sample of 3.2.1. Training and validation data To guarantee a high accuracy, the MLP should be trained on a huge set of patterns in order to determine with high precision its free parameters (the links weights). In practice, when using only a training set produces, in the most cases, the phenomenon of ov^er-learning; where the ANN learns the training data but is not able to generalize to other data not seen during the training phase. To overcome this problem, we use another set of data called 'validation set'. During the training phase, at every epoch we check in addition to the learning error the validation error and compare it to previous values to determine the moment of performance drop. The training and validation sets are composed of several input vectors with the corresponding desired output. The size of the input vector is the number of input neurons of the MLP. In our case, each vector represents the gray values of a pixel with that of those neighborhoods, in addition to the global mean (M) and global standard deviation (5) of the whole image (Figure 4). The corresponding expected output is the binary value of the related pixel in the ground truth image (the black and white image). As the pixel values are in [0, 255], the data should be normalized or scaled to the interval [0, 1] (0 for black and 1 for white) to ensure the proper functioning of the neuron in sensitive regions of the activation function. (c) (d) Figure 5: Images taken from the collections of: (a) DIBCO 2009, (b) H-DIBCO 2010, (c) DIBCO 2011, (d) H-DIBCO 2012. The second set of images is a synthetic collection prepared in order to include most of the degradations in old documents. It is created by applying the image mosaicing by superimposing technique lor blending [33]. The idea is as follow, we start with some images of documents in black & white, which represent the ground truth, and with some backgrounds extracted from old documents and we apply a fusion procedure to get as many different images of old documents. However, P. Stathis et al., in [10] proposed two different techniques for the blending: maximum intensity and image averaging. We adopt to use the image averaging technique in order to have a more natural result. 2 http://users.iit.demokritos.gr/"bgat/DIBCO2009/benchmark/ 3 http://www.iit.demokritos.gr/~bgat/H-DIBCO2010/benchmark 4 http://utopia.duth.gr/~ipratika/DIBCO2011/benchmark 5http://utopia.duth.gr/~ipratika/HDIBCO2012/benchmark The Mosaicing process used can be summarized by the following pseudo-code: Input: GT: the ground truth image BG: the background Output: R: the resulting image For Each pixel (i,j) If BG(i,j) is More_Darker_Than GT(i,j) Then R(i,j) = BG (i,j) Else R(i,j) = (GT(i,j)+BG(i,j))/2 End For We note here that we used a large amount of backgrounds with different varieties of degradations (transparency effects, holes, stains, etc.) to allow the MLP to learn from a vast diversity of possible cases (Figure 6). (a) (b) (c) Figure 6: Images of documents obtained by fusing binary images and backgrounds. (a) background from old documents, (b) ground truth binary image, (c) the resulting synthetic images. 3.3 MLP training In this phase, the MLP is trained using the training and validation data prepared before. For that, we used back propagation algorithm. It is a supervised learning algorithm, where the system is provided with samples of inputs and the corresponding expected, or desired, output values. The training process is done as following: we introduce the training vectors to the MLP, and we calculate the error between the outputs of the MLP (estimated output) and the real data (desired outputs). Next, we update the weights of all neurons. After that, we provide the validation vectors to the MLP, and we calculate the validation error (as before), without updating the neurons weights. The process is repeated in order to minimize both the learning and the validation error. The training is interrupted when the validation error begins to increase. After training the neural network learns (usually) to provide the correct output value when it is presented with the input value only. 3.4 Binarization Once learning is achieved, the last phase of the proposed method is the use of the trained MLP to binarize different document images. It is done by running the trained neural network with one forward pass using the final training weights. For each pixel of the processed image, we provide the MLP the feature vector as in the training phase and it should output a binary value for the pixel. 4 Experimentation and results Our application was developed in Java with the 6 framework Neuroph . As we said before, we used two sets of documents for the training and validation. The first set is a public collection of a total of 50 real documents images (37 handwritten and 13 printed) coming from the collections of several libraries, with the associated ground truth. The second set is composed of 240 synthetic images of documents constructed by our fusion algorithm from 20 different backgrounds and 12 binary images. As a first attempt, we used 15 images of the first set and 70 images of the second set for training, 10 images of the first set and 30 images of the second set for validation, and the remainders for testing. We also used all the pixels in all images of training and validation sets. This was not practicable because the images are larger (average 783,000 pixels) resulting in a massive training set of about 66,555,000 vectors. Then, we considered selecting a portion of vectors to use in training and validation. Thus, we selected randomly 500 vectors for training and validation image, resulting 42,500 vectors for training and 20,000 others vectors for validation. During the training we calculate at each time the training and validation errors. As common, the training error decreases continuously and gradually but the validation error decreases at the beginning and then starts to deteriorate which mean that over-learning had occurred and so we should stop learning process. For our case, this happened at the epoch #25,382 (Figure 7). -training error - vaiidation error -2 -1 Epociis Figure 6: Errors of training and validation. To assess the generalization ability of our neural network, we tested it on the testing collection containing 165 images of new old documents that were not presented to the MLP (25 of the first set and 140 of the second set), and compared the resulting binary images with the corresponding ground truth ones. The comparison is done quantitatively by using standard measures that have been widely used for evaluation purposes, especially in DIBCO 2009 [2], H-DIBCO 2010 [1], DIBCO 2011 [53], H-DIBCO 2012 [54] competitions. These measures consist of: FMeasure, PSNR, NRM, MPM, and DRD. 6 http://neuroph.sourceforge.net Noting TP, TN, FP, FN the True positive, True Negative, False positive and False negative values, respectively. i) FMeasure F-Measure was introduced first by Chinchor in [55]. 2 X Re call x Pr ecision FM = - Where: Re ca^l^l = Re caill+Pr ecision (2) TP and TP + FN Pr ecision = - TP TP + FP ii) PSNR (Peak Signal to Noise Ratio) PSNR is a similarity measure between two images. However, the higher the value of PSNR, the higher the similarity of the two images [2][34]. PSNR = 10.log MSE (3) M N NRM = NRfn + NRFP (4) With : N^, = - FN and N^,, =- FP FN + TP F FP + TN The better binarization quality is obtained for lower NRM. iv) MPM (MisclassiUcation Penalty Metric) The Misclassification penalty metric MPM evaluates the binarization result against the Ground Truth on an object-by-object basis [51]. MPM = MPfn + MPfp (5) where : mPf^n = S d^N E d D and mPfp = j=1 D dpN and djjP denote the distance of the i^'' false negative and the j^'' false positive pixel from the contour of the Ground Truth segmentation. The normalization factor D is the sum over all the pixel-to-contour distances of the Ground Truth object. A low MPM score denotes that the algorithm is good at identifying an object's boundary. v) DRD (Distance Reciprocal Distor'tion Metric) DRD is an objective distortion measure for binary document images, and it was proposed by Lu et al. in [34]. This measure properly correlates with the human visual perception and it measures the distortion for all the S flipped pixels as follows: E DRDk DRD = '=1 (6) NUBN EE(I(/) -12( 7))2 Where: MSE = - MN I and I2 represents the two images matched. M and N there height and width respectively. C the difference between foreground and background (here 255). iii) NRM (Negative Rate Metric) NRM is based on the pixel-wise mismatches between the Ground Truth and the binarized image [51]. It combines the false negative rate NRFN and the false positive rate NRFP. It is denoted as follows: NUBN is the number of the non-uniform 8x8 blocks in the GT image. DRDk is the distortion of the flipped pixel of coordinate (x, y) and it is calculated using a 5x5 normalized weight matrix WNm. This last is defined in [34] as follow: WN.(i,j)= (i-j) m m EEWm (i, j) i=1 j=1 Such as: Wm (i, j) = for i=iC and j = je otherwise. J (i - ic )2 + (J - jc )2 ' With m = 5, and ic = Jc = (1 + m) / 2. DRDk is given as follow (eq.7): DRD, = EE SEI Igt (x+i, y + J) - Ib (x, y) X WNm (i + 2, j + 2) i=-2 j=-2 We also compared our method with several well known state-of-the-art methods of document binarization [3][10][11]. For the local methods, we reimplemented, all used values of parameters are those indicated in the original papers. The average results obtained with all compared methods over each test set are summarized in Table.1. The average results between the two sets are shown in Table.2. The final ranking of the compared methods is shown in Table.3, which also summarizes the partial ranks of each method according to each evaluation measure and the sum of ranks. Afterwards, We provide in Fig.8 graphs representing the average performance of the compared methods in terms of FMeasure and PSNR. From the Tables and Fig.8, our proposed technique is ranked second after Sauvola and Pietikainen method for both test sets, and it has good performances in terms of FMeasure and PSNR. This result is due to the generalization ability of the MLP despite of the characteristic of degradation and the variation of noise types present in the documents. 0 1 First test set Second test set Method FM PSNR NRM MPM DRD FM PSNR NRM MPM DRD Otsu [22] 0.8571 15.886 0.0628 0.1915 1.755 0.6245 15.101 0.1590 2.8783 4.6768 ISODATA [37] 0.8544 15.734 0.0632 0.1938 1.833 0.6153 15.059 0.1666 2.9310 4.7320 Kittler and Illingworth [23] 0.4673 6.2878 0.1630 9.7698 26.13 0.4239 8.3621 0.2163 17.727 28.682 Kapur and al.[24] 0.8432 14.989 0.0509 0.2673 2.161 0.3618 11.493 0.3096 3.3474 7.6726 Mello and Lins [35] 0.6496 12.708 0.2055 0.3465 3.051 0.3574 13.713 0.3314 0.7512 4.3837 Tsallis Entropy Based Algorithms [36] 0.5203 13.092 0.2712 0.0901 3.092 0.1387 11.532 0.4547 0.3561 3.9472 Iterative global thresholding (IGT) [38] 0.7622 13.683 0.1162 1.7347 4.215 0.6049 13.637 0.1711 2.3889 4.0455 Niblack [30] 0.5321 7.7542 0.1259 8.3005 17.85 0.3996 7.5398 0.1937 9.4157 17.059 Sauvola and Pietikainen. [39] 0.8360 15.537 0.0938 0.2486 1.634 0.6718 16.130 0.1835 0.7630 2.2288 Nick [40] 0.7764 13.882 0.0977 1.5542 4.116 0.6778 15.477 0.1656 0.9653 2.5658 Feng and Tan. [41] 0.7798 13.826 0.0820 2.2608 5.016 0.6744 15.148 0.1576 1.0867 2.7634 Bernsen [29] 0.5150 8.1056 0.1807 10.936 19.62 0.4081 8.1610 0.2069 8.7728 14.517 Meen-Gradient (Leedham and al.) [42] 0.7914 13.872 0.1009 0.2631 2.834 0.6023 12.832 0.1858 1.2630 3.7058 Cavalcanti and al. [43] 0.4926 11.928 0.3117 0.0807 4.255 0.3443 12.815 0.3454 0.8949 4.0059 Tabatabei and Bohlool. [44] 0.8002 14.744 0.0671 2.4013 5.309 0.6377 15.660 0.1572 2.2360 3.6973 Gangamma and al.[45] 0.6714 12.879 0.1765 4.1804 6.549 0.6037 14.317 0.2179 1.0963 2.4942 Improved IGT [46] 0.7530 13.695 0.1364 1.3784 3.751 0.5980 13.486 0.1921 1.8568 3.7286 Using Local Max and Min [47] 0.7891 14.342 0.0971 2.2913 5.439 0.5713 11.622 0.1835 1.3603 4.5757 Sari and al.[48] 0.8362 15.628 0.0719 0.7248 1.970 0.4979 9.4425 0.1764 4.3272 10.974 Proposed method 0.8436 15.555 0.0599 0.7468 2.597 0.6608 15.396 0.1456 1.806 2.9860 Table 1: Average results from different binarization methods on each test set. Method FM PSNR NRM MPM DRD Otsu [22] 0.7408 15.4939 0.1109 1.5349 3.2159 ISODATA [37] 0.7348 15.3966 0.1149 1.5624 3.2825 Kittler and Illingworth [23] 0.4456 7.3250 0.1897 13.7486 27.406 Kapur and al.[24] 0.6025 13.2417 0.1803 1.8073 4.9168 Mello and Lins [35] 0.5035 13.2112 0.2684 0.5488 3.71735 Tsallis Entropy Based Algorithms [36] 0.3295 12.3124 0.3629 0.2231 3.5196 Iterative global thresholding (IGT) [38] 0.6835 13.6609 0.1437 2.0618 4.13025 Niblack [30] 0.4658 7.6470 0.1598 8.8581 17.4545 Sauvola and Pietikainen. [39] 0.7539 15.8343 0.1386 0.5058 1.9314 Nick [40] 0.7271 14.6801 0.1316 1.2597 3.3409 Feng and Tan. [41] 0.7271 14.4877 0.1198 1.6737 3.8897 Bernsen [29] 0.4615 8.1333 0.1938 9.8547 17.0685 Meen-Gradient (Leedham and al.) [42] 0.6969 13.3522 0.1433 0.7630 3.2699 Cavalcanti and al. [43] 0.4185 12.3722 0.3286 0.4878 4.13045 Tabatabei and Bohlool. [44] 0.7190 15.2025 0.1122 2.3186 4.50315 Gangamma and al.[45] 0.6376 13.5984 0.1972 2.6384 4.5216 Improved IGT [46] 0.6755 13.5911 0.1643 1.6176 3.7398 Using Local Max and Min [47] 0.6802 12.9824 0.1403 1.8258 5.00735 Sari and al.[48] 0.6670 12.5353 0.1241 2.5260 6.472 Proposed method 0.7522 15.4759 0.1028 1.2764 2.7915 Table 2: Average results from different binarization methods between the two test sets. 5 Conclusion Document binarization is an important and crucial step in all systems of document analysis and recognition. This task becomes more difficult for historical documents containing various types of noises and degradations. In this paper, we proposed a new ANN based method for old document binarization. The purpose of using ANN, and especially Multilayer Perceptrons, for image binarization is to fill the lack of employing the techniques of soft computing and machine learning in such problem, and to take advantages of the generalization abilities of the MLP. Indeed, as confirmed by the experimentation results, the MLP presented a reliable behavior for the complex task of foreground/background separation from significantly degraded document images. Many extensions are possible and we will continue to enhance the proposed method. Other experimentations are needed in order to identify the applicability of the proposed solution in mobile devises of real life utilization. Rank Method FM PSNR NRM MPM DRD Sum of ranks 1 Sauvola and Pietikainen. [39] 1 1 8 3 1 14 2 Proposed method 2 3 1 7 2 15 3 Otsu [22] 3 2 2 8 3 18 4 ISODATA [37] 4 4 4 9 5 26 5 Nick [40] 5 6 7 6 6 30 6 Meen-Gradient (Leedham and al.) [42] 8 11 10 5 4 38 7 Feng and Tan. [41] 6 7 5 11 10 39 8 Tabatabei and Bohlool. [44] 7 5 4 15 13 44 9 Improved IGT [46] 11 10 13 10 9 53 10 Iterative global thresholding (IGT) [38] 9 8 11 14 12 54 11 Mello and Lins [35] 15 13 18 4 8 58 12 Using Local Max and Min [47] 10 14 9 13 16 62 13 Tsallis Entropy Based Algorithms [36] 20 17 20 1 7 65 14 Sari and al.[48] 12 15 6 16 17 66 15 Kapur and al.[24] 14 12 14 12 15 67 16 Cavalcanti and al. [43] 19 16 19 2 11 67 17 Gangamma and al.[45] 13 9 17 17 14 70 18 Niblack [30] 16 19 12 18 19 84 19 Bernsen [29] 17 18 16 19 18 88 20 Kittler and Illingworth [23] 18 20 15 20 20 93 Table 3: Final ranking of the compared methods on the two test sets. F-Measure 0,3 i tabjkomgrqspdehl cnf Methods (a) PSNR 16,0000 12,0000 10,0000 8,0000 ■ 6,0000 /1 nnnn 4,0000 2,0000 n nnnn - 0,0000 i a tboj kgpqmder snfl hc Methods A Otsu b ISODATA c Kittler and Illingworth d Kapur and al. e Mello and Lins f Tsallis Entropy Based Algorithms g Iterative global thresholding (IGT) h Niblack i Sauvola and Pietikainen. j Nick k Feng and Tan. l Bernsen m Meen-Gradient (Leedham and al.) n Cavalcanti and al. o Tabatabei and Bohlool. p Gangamma and al. q Improved IGT r Using Local Max and Min s Sari and al. t Proposed method (b) Figure 7: Graphs showing the performance of the compared binarization methods in terms of (a) FMeasure and (b) PSNR. References [1] I. Pratikakis, B. Gatos, K. Ntirogiannis (2010). H-DIBCO 2010 - Handwritten Document Image [2] Binarization Competition. In proceedings of the International Conference on Frontiers in Handwriting Recognition ICFHR, pp. 727-732. B. Gatos, K. Ntirogiannis, I. Pratikakis (2009). DIBCO 2009: document image binarization contest. International Journal of Document Analysis and Recognition IJDAR, Vol.14, No. 1, pp. 35-44. [3] M. Sezgin, B. Sankur (2004). Survey over image thresholding techniques and quantitative performance evaluation. Journal of Electronic Imaging, Vol. 13, No. 1, pp. 146-165. [4] M. Egmont-Petersen, D. de Ridder, H. Handels (2002). Image processing with neural networks - a review. Pattern Recognition, Vol. 35, No. 10, pp. 2279-2301. [5] N. Papamarkos, C. Strouthopoulos, I. Andreadis (2000). Multithresholding of color and gray-level images through a neural network technique. Image ein^d Vision Computing, Vol. 18, No. 3, pp. 213-222. [6] G. Kaliraj, S. Baskar (2010). An efficient approach for the removal of impulse noise from the corrupted image using neural network based impulse detector. Image and Vision Computing, Vol. 28, No. 3, pp. 458-466. [7] H. Kong, L. Guan (1998). A noise-exclusive adaptive filtering framework for removing impulse noise in digital images. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, Vol. 45, No. 3, pp. 422-428. [8] G. Cybenco (1989). Approximation by superposition of a sigmoidal function. Mathematics of control, Signals, and Systems, Vol. 2, No. 4, pp. 303-314. [9] J. B. MacQueen (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281-297. [10] P. Stathis, E. Kavallieratou, N. Papamarkos (2008). An Evaluation Survey of Binarization Algorithms on Historical Documents. In proceedings of the 19th International Conference on Pattern Recognition ICPR, pp. 742-745. [11] A. Kefali, T. Sari, M. Sellami (2010). Evaluation of several binarization techniques for old Arabic documents images. In proceedings of the First International Symposium on Modeling and Implementing Complex Systems MISC. Constantine, Algeria, pp. 88-99. [12] H. Fu, Z. Chi (2006). Combined Thresholding and Neural Network Approach for Vein Pattern Extraction from Leaf Images. IEE Proceedings-Vision, Imlage and Signal Processing, Vol. 153, No. 6, pp. 881-892. [13] E. Badekas, N. Papamarkos (2007). Optimal Combination of Document Binarization Techniques Using a Self Organizing Map Neural Network. Engineering Applications of Artificial Intelligence, Vol. 20, No. 1,pp. 11-24. [14] Y. Yang, K. Summers, M. Turner (2004). A Text Image Enhancement System Based on Segmentation and Classification Method. In Proceedings of the First ACM Workshop on Hardcopy Document Processing HDP, pp. 33-40. [15] J.L. Hidalgo, S. Espana, M.J. Castro. J.A. Perez (2005). Enhancement and Cleaning of Handwritten Data by Using Neural Networks. Lecture Notes in Computer Science, Vol.3522. Springer-Verlag, Berlin Heidelberg New York, pp. 376-383. [16] A. Khashman, B. Sekeroglu (2007). Global Binarization of Document Images Using a Neural Network. Third International IEEE Conference on Signal-Image Technologies and Internet-Based System SITIS, Shanghai, pp. 665-672. [17] J. Lazaro, J.L Martin, J. Arias, A. Astarloa, C. Cuadrado (2010). Neuro Semantic Thresholding for High Precision OCR Applications. Image and Vision Computing, Vol. 28, No. 4, pp. 571-578. [18] T. Chen, M. Takagi (1993). Image Binarization by Back Propagation Algorithm. International Archives of Pho^ogrammetry and Remote Sensing, Vol.29, pp. 345-345. [19] H. Hamza, E. Smigiel, A. Belaid (2005). Neural based binarization techniques. In Proceedings of the 8th Internat^ional Conference on Document Analysis ain^dRecognition ICDAR, pp. 317-321. [20] M.I. Sezan (1990). A peak detection algorithm and its application to histogram-based image data reduction. Computer Vision, Graphics, and Imlage Processing, Vol. 49, No. 1, pp. 36-51. [21] A. Rosenfeld, P. de la Torre (1983). Histogram concavity analysis as an aid in threshold selection. IEEE Transactions on System, Man, and Cybernetics, Vol. 13, No. 2, pp. 231-235. [22] N. Otsu (1979). A thresholding selection method from grayscale histogram. IEEE Transactions on System, Man, and Cybernetics, Vol. 9, pp. 62-66. [23] J. Kittler, J. Illingworth (1986). Minimum error thresholding. Pattern Recognition, Vol. 19, No. 1, pp. 41-47. [24] J.N. Kapur, P.K. Sahoo, AK.C. Wong (1985). A new method for gray-level picture thresholding using the entropy of the histogram. Computer Vision, Graphics, and Image Processing, Vol. 29, No. 3, pp. 273-285. [25] H.D. Cheng, J.R. Chen, J. Li (1998). Threshold selection based on fuzzy c-partition entropy approach. Pattern Recognition, Vol. 31, No. 7, pp. 857-870. [26] L. Hertz, R.W. Schafer (1988). Multilevel thresholding using edge matching. Computer Vision, Graphics, and Imlage Processing, Vol. 44, No. 3, pp. 279-295. [27] L.K. Huang, M.J.J. Wang (1995). Image thresholding by minimizing the measures of fuzziness. Pattern Recognition, Vol. 28, No. 1, pp. 41-51. [28] A.S. Abutableb (1989). Automatic thresholding of gray-level pictures using two-dimensional entropy. Computer Vision, Graphics, and Image Processing, Vol. 47, No. 1, pp. 22-32. [29] J. Bernsen (1986). Dynamic thresholding for gray-level images. In Proceedings of the 8t' International Conference on Pattern Recognition ICPR, Paris, French, pp. 1251-1255. [30] W. Niblack (1985). An introduction to Digital Image Processing. Strandberg Publishing Company, Birkeroed, Denmark. [31] K.L. Chung, W.Y. Chen (2003). Fast Adaptive PNN-Based Thresholding Algorithms. Pattern Recognition, Vol. 36, No. 12, pp. 2793-2804. [32] C.Y. Chang, P.C. Chung (2001). Medical Image Segmentation Using a Contextual-Constraint-Based Hopfield Neural Cube. Image and Vision Computing, Vol.19, No. 9-10, pp. 669-678. [33] L.G. Brown (1992). A survey of Image Registration Techniques. A^CM Computing Surveys, Vol. 24, No. 4, pp. 325-376. [34] H. Lu, A.C. Kot, Y.Q. Shi (2004). Distance-Reciprocal Distortion Measure for Binary Document Images. IEEE Signal Processing Lettners, Vol. 11, No. 2, pp. 228-231. [35] C.A.B. Mello, R.D. Lins (2000). Image segmentation of historical documents. Visual 2000, Mexico City, Mexico, Vol. 30. [36] C.A.B. Mello, A.L.I.D. Oliveira, A. Sanchez (2008). Historical Document Image Binarization. In proceedings of the International Conference on Computer Vision Theory and Applications, Funchal, Portugal, Vol. 1,pp. 108-113. [37] F.R.D. Velasco (1980). Thresholding using the isodata clustering algorithm. IEEE Transaction on Systems, Man and Cybernetics, Vol. 10, No. 11, pp. 771-774. [38] E. Kavallieratou (2005). A Binarization Algorithm Specialized on Document images and Photos. In Proceedings of the S"' International Conference on Document Analysis and Recognition ICDAR, pp. 463-467. [39] J. Sauvola, M. Pietikainen (2000). Adaptive document image binarization. Pattern Recognition, Vol. 33, No. 2, pp. 225-236. [40] K. Khurshid, I. Siddiqi, C. Faure, N. Vincent (2009). Comparison of Niblack inspired Binarization methods for ancient documents. In Proceedings of the 16th Document Recognition and Retrieval DRR, USA. [41] M.L. Feng, Y.P. Tan (2004). Adaptive binarization method for document image analysis. In Proceedings of the IEEE International Conference on Multimedia and Expo ICME, pp. 339-342. [42] G. Leedham, C. Yan, K. Takru, J.H.N. Tan, L. Mian (2003). Comparison of Some Thresholding Algorithms for Text/Background Segmentation in Difficult Document Images. In Proceedings of the Internat^ional Conference on Document Analysis and Recognition ICDAR, Vol.2, pp. 859-864. [43] G.D.C Cavalcanti, E.F.A. Silva, C. Zanchettin, B.L.D. Bezerra, R.C. Doria, J.C.B. Rabelo (2006). A Heuristic Binarization Algorithm for Documents with Complex Background. In Proceedings of the International Conference on Image Processing ICIP, pp. 389-392. [44] S.A. Tabatabaei, M. Bohlool (2010). Novel method for binarization of badly illuminated document images. In Proceedings of the IEEE 17'" International Conference on Image Processing ICIP, Hong Kong, pp. 3573-3576. [45] B. Gangamma, M.K. Srikanta (2011). Enhancement of Degraded Historical Kannada Documents. International Journal of Computer Applications, Vol.29, No.11, pp. 1-6. [46] E. Kavallieratou, S. Stathis (2006). Adaptive Binarization of Historical Document Images. In Proceedings of the 1Sth Conference on pattern recognition ICPR, pp. 742-745. [47] B. Su, S. Lu, C.L. Tan (2010). Binarization of Historical Document Images Using the Local Maximum and Minimum. In Proceedings of the B'" IAPR International Workshop on Document A^nalysis Systems DAS, Boston, MA, USA, pp. 159166. [48] T. Sari, A. Kefali, H. Bahi (2012). An MLP for binarizing images of old manuscripts. In Proceedings of the International Conference on Frontiers in Handwriting Recognition ICFHR, pp. 247-251. [49] S. B. Kotsiantis (2007). Supervised Machine Learning: A Review of Classification Techniques. Informatica, Vol.31, pp. 249-268. [50] A.N. Kalmogorov (1957). On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. Dokl. Acad. Nauk SSR, 114, pp. 953-956 [51] J. Aguilera, H. Wildenauer, M. Kampel, M. Borg, D. Thirde, J. Ferryman (2005). Evaluation of motion segmentation quality for aircraft activity surveillance. In Proceedings of the Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance VS-PETS, pp. 317-324. [52] F. Girosi, T. Poggio (1989). Representation properties of networks: Kolmogorov's theorem is irrelevant. Neural Computation, Vol 2, No. 4, pp. 465-469. [53] I. Pratikakis, B Gatos, K. Ntirogiannis (2011). ICDAR 2011 Document Image Binarization Contest (DIBCO 2011). In Proceedings of the 2011 International Conference on Document Analysis and Recognition ICDAR, Beijing, China, pp. 15061510. [54] I. Pratikakis, B. Gatos, K. Ntirogiannis (2012). ICFHR 2012 competition on handwritten document image binarization (H-DIBCO 2012), In Proceedings of the International Conference on Frontiers in Handwriting Recognition ICFHR, pp. 817-822. [55] N. Chinchor (1992). MUC-4 Evaluation Metrics. In Proceedings of the Fourth Message Understanding Conference, pp. 22-29. An Online Compression Algorithm for Positioning Data Acquisition Wei Pan, Chunlong Yao, Xu Li and Lan Shen School of Information Science and Engineering, Dalian Polytechnic University, Dalian 116034, China E-mail: yaocl@dlpu.edu.cn Keywords: location data acquisition systems, positioning data, trajectory compression, trajectory recovery Received: September 17, 2014 Positioning data are usually acquired periodically and uploaded to the server via wireless network in the location data acquisition systems. Huge communication overheads between the terminal and the server and heavy loads of storage space are needed when a large number of data points are uploaded. To this end, an online compression algorithm for positioning data acquisition is proposed, which compresses data by reducing the number of uploaded positioning points. Error threshold can be set according to users' needs. Feature points are extracted to upload real-timely by considering the changes of direction and speed. If necessary, an approximation trajectory can be obtained by using the proposed recovery algorithm based on the feature points on the server. Positioning data in three different travel modes, including walk, non-walk and mixed mode, are acquired to validate the efficiency of the algorithm. The experimental results show that the proposed algorithm can get appropriate compression rate in various road conditions and travel modes, and has better adaptability. Povzetek: Predstavljen je nov algoritem za zajemanje podatkov o realnem času, uporaben za sisteme za določanje položaja. 1 Introduction With advances in tracking technologies and the rapid improvement in location-based services (LBS), a larger amount of location data needs to be collected [1]. These positioning data are linked up in chronological order to form a trajectory which contains a sequence of positioning data with latitude, longitude and timestamp. Trajectory can be used for positioning, navigation, providing some location-based services and expressing user geospatial history behaviour. For location acquisition systems, a large number of positioning data not only generate lots of communication overheads but also bring heavy burden for the storage system [2]. A major reason for this problem is that the uploaded data points are far more than necessary points [3]. Therefore, trajectory compression and simplification have received increasing concern. Trajectory compression can be classified into two categories, offline compression and online compression. In the offline compression, all data points collected (usually on the server) are considered for compression regardless of the data transfer process. In the online compression, data points are processed before they are uploaded, and only feature points selected are transmitted to the server. For the offline compression, Zhang et al. [2] proposed a trajectory data compression algorithm based on the spatial-temporal characteristic. In order to extract accurately trajectory information, the algorithm calculates the distance standard to judge the feature point according to the three-dimensional spatial-temporal characteristic of GPS data points. Given a trajectory Traj and error threshold, the Doulas-Peucker Algorithm [4] constructs a new trajectory Traj by adding points from Traj repeatedly until the maximum spatial error of Traj becomes smaller than error threshold. The Douglas-Peucker algorithm has the limitation of ignoring temporal data. Top-Down Time Ratio (TD-TR) [5] overcomes this limitation by using Synchronized Euclidean Distance (SED) [6] instead of spatial error. The characteristics of the vehicle trajectory are described in the proposed algorithm [7], in which the advantages and disadvantages of judgment method with point by point and judgment method with multi-point joint are analysed. This algorithm puts forward the description method for trajectory based on turning point judgment method. Chen et al. [8] proposed a trajectory simplification algorithm (TS), which considered both the shape skeleton and the semantic meanings of a GPS trajectory. The heading change degree of a GPS point and the distance between this point and its adjacent neighbours in this algorithm are used to weight the importance of the point. By studying the model of temporal data in vehicle monitoring system, Wang et al. [9] proposed a modelling idea for establishing trajectory version. To reduce storage and improve temporal queries, spatial-temporal cube is segmented to form unit spatial-temporal cube by this modelling idea. With discussing some problems of trajectory data compression about road net, Guo [10] proposed a non-linear compression algorithm of moving objects trajectories based on road net. The above offline algorithms have played a positive role in reducing amount of positioning points. However, they are only applied to the case that the start and end of the trajectory are clear. That is to say, compression begins only after obtaining all of the points from the input trajectory. That is why communication overheads between the terminal and the server are huge and burdens of storage space are heavy. In recent years, some online compression algorithms have been proposed. Similar to Douglas-Peucker, Opening Window algorithm [11] approximates each trajectory using an increasing number of points so that the resulting spatial error is smaller than the bound. This algorithm is a generalized online compression with a little delay, and it works without considering time data. Opening Window Time Ratio (OPW-TR) [5] is an extension to Opening Window which uses SED instead of spatial error. Compared to Opening Window algorithms, OPW-TR has the benefit of taking temporal data into account. However, just like Opening Window algorithm, OPW-TR also has a little delay. Both of them cannot real-timely determine whether current point needs to be compressed. An algorithm called SQUTSH (Spatial QUalIty Simplification Heuristic) [12] uses a priority queue where the priority of each point is defined as an estimate of the error that the removal of that point would introduce. SQUISH compresses each trajectory by removing points of the lowest priority from the priority queue until it achieves the target compression ratio. This algorithm is fast and tends to introduce small errors during compression. However, it cannot compress trajectories while ensuring that the resulting error is within a user-specified bound. It also exhibits relatively large errors when the compression ratio is high [13]. Yin et al. [14] proposed an algorithm and a restraining model for vehicle data stream transmission. It gives a formula for data recovery in the monitoring centre by discussing the determination condition of speed and direction change degree in the algorithm. However, this algorithm is suitable for the integrated system of monitoring and navigation, and must be matched to the map. Wang et al. [15] proposed an improved moving object model based on vehicle monitoring system of moving objects model. This model provides a dynamic time selection algorithm, which adjusts dynamically the time density according to object moving speed. Therefore, the number of communications is reduced effectively. However, this algorithm works regardless of data recovery and the issue that the moving direction affects the accuracy of the model. Han et al. [16] proposed an adaptive algorithm for vehicle GPS monitoring data. The speed function is expressed by using fitting polynomial with constraint based on endpoint speed and mileage. This algorithm considers the case that terminal with and without map, in which the deviation distance is used for denoting the deviation of speed direction. At the same time, data recovery algorithm, algorithm evaluation index and application mode of algorithm are proposed in this algorithm. However, it must be set speed threshold in advance according to the road. In practice, according to the different road, predetermined speed threshold is very difficult. In this paper, an online compression algorithm for positioning data acquisition systems is proposed, which compresses the data points before transmission from terminal to the server. In this algorithm, the feature points are extracted and uploaded by computing the change of speed and direction based on error threshold set according to the users' requirements, and non-feature points are discarded to reduce unnecessary data transmission and bandwidth occupation. The remainder of this paper is organized as follows. Section 2 describes our new online compression algorithm for positioning data acquisition in detail. An experimental evaluation of trajectory compression algorithm is provided in Section 3. The paper concludes with future work in Section 4. 2 The online compression algorithm In order to compress an original trajectory, the key of our algorithm is to pick out feature points .For each point collected in the terminal, the current point will be uploaded and preserved if it is a feature point, else it will be discarded as a non-feature point. If needed, the recovery trajectory that approximated to the original trajectory can be obtained by the recovery algorithm with the feature points on the server. Our algorithm involves feature points, non-feature points and lost points. A feature point is a positioning point that a large change in speed or direction; a non-feature point is a positioning point that is discarded with a small change in speed and direction; a lost point is a positioning point that is lost because the communication delay caused by the positioning signal blind area. Generally, original trajectory is a temporally ordered sequence of positioning points P = {p1, p2, pn}, for i = 1, 2, n, pi = (Ingi, lati, ti) where Ingi, lati represent the longitude and latitude of the ith point and ti stands for the time stamp. For a point pi, pi.lng, pi.lat and pi.t denote longitude, latitude and time stamp of pi respectively. For each positioning point pi in an original trajectory, recovery point p,'of pi is a data point obtained by the recovery algorithm, which represents pi in the recovery trajectory. 2.1 Algorithms In the compression phase, error threshold and acquisition interval are set in advance according to accuracy requirements. The initial two collected points (these two points must be the feature points) are directly uploaded to the server without judgment. Starting from the third positioning point, corresponding recovery point is calculated for each point collected currently. The distance between the actual point pi and its corresponding recovery point pi' is compared with the error threshold. If larger than the error threshold, the current point is regarded as a feature point and uploaded. If not, the current point is abandoned as a non-feature point (see Fig. 1). Judgment is done for each point, and the last positioning point is uploaded until the end of the acquisition. Pk-\ P3 —— ---0"--^ -<3 P1 P2 P3' P4' Pk-1 Pk Pi Pi' collected at an interval. This interval is consistent for counter and data acquisition. When the time reaches an interval, if the signal is acquired, the counter is "0". If the signal is not acquired, that is the signal loss, counter is added with "1" until the signal is acquired. Fig. 2 shows a schematic diagram for the lost point. Figure 1: A schematic diagram for the compression process. As shown in Fig. 1, p1, p2, p3, p4, P5, Pk-1, Pk and pi are the original positioning points, each positioning point has a corresponding recovery point. pi, p2, p5 and pk are feature points, their recovery points are themselves. p3', P4'and Pk-1'are recovery points of p3, p4 and pk-1. p2 is previous feature points of p3' and p1 is previous points of P2. P3'is calculated by p1 and p2. The speed of p2 can be concluded by distance and time d-value between p1 and P2. Supposing p2 and p3'have the same speed. The product of this speed and interval T is the distance of p3'. Set p3'on a straight line that p1, p2 are located at. According to the coordinate of p1 and p2, the coordinate of p3' can be calculated. In the same way, p2 is still previous feature points of p4' and p5', and they are also calculated by p1 and p2. Differently, the distance of p4' is the product of speed of the p2 and twice interval T, the distance of p5'is the product of speed of the p2 and triple interval T. The distance between recovery point pi' and actual point pi is called di and di is compared with the error threshold D. If di > D, current point is uploaded as a feature point. If not, current point is discarded as nonfeature point. d3, d4 in Fig. 1 are smaller than D, so p3, p4 are discarded. d5 is not less than D, so p5 is uploaded as feature point. The recovery point pi' of current point pi is calculated though the previous feature points pk of pi ' and the previous point pk-1' (it can be a feature point or not) of pk. The speed of pk is calculated according to pk and Pk-1'. The product of this speed and time d-value between pi and pk is the distance of pi' . Set pi' in the straight line that pk, pk-1' are located at. According to the coordinate of pk, pk-1' and the distance of p,', the coordinate of pi' is calculated. In the recovery phase, p1, p2, p5 and pk are recovered directly as feature points. p3, p4 and pk-1 are non-feature points. They are recovered to P3', P4', Pk-1' according to the above process. In the recovery phase, the first and second feature points received are recovered directly. Determine whether there are non-feature points between the current pi and previous feature point pk from the third feature point. If there are non-feature points, they are recovered in order by calculating corresponding recovery points according to pk. Then recover the current point pi after all non-feature points between pi and pk are recovered. The process cycles until the recovery of the last feature point. To deal with the problem of lost point, positioning point pi introduces lost point count (Count) property and it is given with pi = (Ingi, lati, ti, c,). The design idea is to add a counter in the terminal. Positioning data are P1 —•----^--- P2 P3' Pi Figure 2: A schematic diagram for the lost point. As shown in Fig. 2, p1, p2 and p4 are original positioning points. After compression, p3 is discarded as non-feature point and p3' is corresponding recovery point of p3. There are three intervals between p3 and p4, and it should have two positioning points between them. However, due to signal loss, these two points are not collected. They are called the lost points. When determining whether p5 is a feature point or not, ps'and p4 are made use for calculation direct without considering the lost points. When recovery of ps', the lost points are needed to consider. There are four intervals between two feature points p2 and p4. It should have three positioning points. Since C4 = 2, that is to say, two positioning point are lost, so just to recover the only non- feature point p3. 2.2 Algorithm description The recovery point is used to determine whether the current point is a feature point in the compression phase. The recovery point is the non-feature point recovered in the recovery phase. Therefore, the calculation of recovery point is the key of the algorithm. Making use of the previous feature point pk of pi' and the previous point Pk-1'of Pk to calculate the recovery point pi'of current point Pi (if i = 3, then pk = p2> Pk-1' = P1). 2.2.1 The compression algorithm The recovery point of current point is calculated from the third positioning point collected. The actual point has attributes of latitude and longitude. For ease of calculation, latitude and longitude as spherical coordinate can be transformed into rectangular coordinate by the algorithm [17]. That is to say, pi = (longi, lati, ti) as spherical coordinate is transformed into pi = (xi, yi, ti) as rectangular coordinate. For any two points pi and pj, the distance of them is Lj= Dist (pi, pj) in the trajectory with n points. Accordingly, the distance between pk and pk-1'is Lk. pk and pk-1 ' are the consecutive points and the time d-value of them is pk. t - pk- 1'.t (in the case of signal not loss, the d-value is the interval T). The speed Vi is calculated by Formula (1) and interval T. The distance Si of pi' is calculated according to the time d-value of pi, pk and Formula (2). Lk = Dist{pk_i, Pk) = ^(Pk - Pk-1 2 + (Pk ■ y - Pk-I -y)': 2 < k < n -1 (1) L, F, = P k -t - P k-1 ■ ^(Pk .x - Pk-1 ■x)2 + (Pk .y - Pk-1 ■y)2 t Pk ■ - Pk-1 ■ 2 < k < n-1, 3 < i < n (2) S, = F, X (p. -t - Pk -t) = i (Pk - Pk-1 ■x) + (Pk -y - Pk-1 -y) f Pk ■t - Pk-1 - X (Pi t - Pk -t), 2 < k < n-1, 3 < i < n (3) coordinates of pk and pk-1' and triangle similar properties After calculating the distance of p,', position [18]. Formulas are as follows. Detailed compression coordinate of p,' is calculated according to rectangular process is shown in Algorithm 1. Pi' ^-x = pk.x + X (Pk -X - Pk-1 -x) L = Pk.x + Pi = Pk.y+ =Pk.y - (Pi ■t - Pk ■t) X (Pk ■x - Pk-1 ■x) f , Pk ■t - Pk-1 ■t t X (Pk -y - Pk -1 -y) Lk / (Pi -t - Pk ■t) X (Pk -y - Pk-1 -y) f , Pk -t - Pk-1 ■ 2 < k < n-1, 3 < i < n 2 < k < n-1, 3 < i < n (4) (5) Algorithm 1 for compression Input: Pi, Pk andPk-i', D and T. Output: A simplified trajectory Traj' with feature Points. 1. Collect data points and open counter , Traj' = 0; 2. Uploadp1 andp2, // The first and second positioning point needn't decide and are uploaded directly 3. Pk=P2 ,pk^-1' = pv, // P2 is assigned to pk, P1 is assigned to pk-1' 4. Foreach pi (i > 3); // Cyclic judge forpi 5.Transformpi = (lngi, lati, ti) to pi = (x,, yi, ti); // Spherical coordinate is transformed into rectangular coordinate 6. Calculate S,; // Calculate the distance ofp,' 7.Calculate (x,,yi); // Calculate the rectangular coordinate of p,' 8.Transformp, = (x,, y,, t,) to p, = (Ing,, lat,, t,); // Rectangular coordinate is transformed into spherical coordinate 9. Dist(pi, Pi') is di, // The distance of p, andp,' is di 10. Compare d, and D ; // Error threshold is D 11. If idi > D ) , upload pi ; // If di > D, upload p, as a feature point 12. Pk=Pi ,Pk-1' = Pi-1'; // pi assigned to pk. p,-! assigned topk-1' 13. Else idi < D) , ignore p,; // If di < D, discardpias a non-feature point 14. End foreach; // End of cycle 15. Return Traj' = {p\, p2,^ pk,^ Pn }_ 2.2.2 The recovery algorithm Determine whether exist the non-feature points between the current point pi and the previous feature point pk from the third feature point received. Judgment is that getting the number N, of intervals based on the time d- value of two points divided by the interval T. By the Formula (6) and the number Ci of lost points p,, the number Mi of recovery point is calculated. Special attention is that the interval number of two adjacent original positioning points is "1", but there is no recovery point between them. Therefore, the number of r recovery points should extra minus "1". If Mi is not equal"0", it means that it exists recovery points. To recover the Mi recovery points in order. The speed of the recovery point is calculated by Formula (2), and the product of speed and r interval T is the distance Gr of the r-th recovery point. Mr = Nr - Cr-1, Gr = Vi X r X T, 3 < i < n 3 < i < n, 1 < r 3); // Cyclic judge for current feature point pi 5.Calculate Mi ; // Calculate the number of recovery points between current feature point and previous feature point 6. Foreach r (1< r >jj|' (1) i=1 j=1 Here, Uj is the degree of membership of in the cluster j , which is defined as (2) (VI Xi -v/ \V m-1 (2) Z (VIIX /ix - P)V k=1 where Vj is the center of the J th cluster, 1 " VJ -Z(Uij)m • (3) Z (Uj)m i=1 2=1 Xj is the i th data sample, n is the number of data points; m e (1,«) is a weighting constant. The optimal clusters are produced by minimizing the objective function. 2.2 Fuzzy association rules For the numerical data, fuzzy association rules [19] are easily understood by humans because of the fuzzy termsets. In order to mine fuzzy association rules, we apply FCM clustering to transform each of the numeric variables into fuzzy sets (fuzzy partition) with its membership function u, and then these fuzzy partitions are used to generate fuzzy rules. Meanwhile, the centre of each fuzzy set and the maximum and minimum value for each partition of the input data points are determined by FCM. By finding the centre of each partition, we can label it very easily according to the data point at which the core occurs. The labelling of each partition is very important as it helps a lot in the -eventual generation of fuzzy association rules. Given a set of records, n is the record number, two items X = {x1,Xp} and Y = |y1,y2,^,y^}, p is the length of the itemset X, q is the length of the itemset Y. The fuzzy set membership value of variable Xj in the i th record is denoted u(Xj). The Apriror approach is used for extracting fuzzy itemsets from a fuzzy data set based on interesting measures(support and confidence) and able to generate fuzzy association rules. A fuzzy association rule is an implication of the form(X is A) ^ (Yis B). A and B are fuzzy sets that characterize X and Y respectively. The measures of support, confidence and correlation have been fuzzified for the purpose of fuzzy association rules. The fuzzy support of X is defined as follows. uij = 1 n P fsuppot (^)=-n zn"( xii) 2=1 /=1 Where Z ={x\xi =[xij J = l, -, p]; i = l, -, n\ fuzzy support of X ^ Y is defined as follows. support (X ^ Y) = - =-n it finu( xj) n in ij i=1 K J=1 J=1 u( Xjj ) (4) .The (5) The fuzzy confidence of X ^ Y is defined as follows. ^.„„o^t(X ^ Y) fconl^idence( X ^ Y) = = ^support support (X) (6) The fuzzy correlation measure of the association rule X ^ Y can be measured by computing P( X U Y) f^c^o,ffe,^enc^e( X ^ Y) Correlation( X ,Y) = P( X )P(Y) f. support tt (Y) (7) In order to mine fuzzy association rules, the definitions of fuzzy support and fuzzy confidence are used in Fuzzy Apriori instead of their crisp counterparts used in Apriori. 3 Multilevel fuzzy SVR network (MLFSVRN) modelling of incremental type In order to determine a multilevel fuzzy SVR network model, we must determine the model structure and initial parameters. Based on an incremental type structure like the one shown in Fig.1(a), it is quite possible to consider some influential variables to the first level, the less influential ones to the next level, and so on. To do so, we must find a set of candidate variables, which play a significant role in determining the output. 3.1 Variable selection based on FCM clustering and fuzzy association rules In this paper, fuzzy association rules are used to select more representative variables from the original ones to improve the later prediction performance. The different fuzzy confidence and fuzzy support values between input variables and output variables are examined by using fuzzy association rules. In this paper, cross-validation with best parameter grid search is adopted to obtain the best rules. we only report the best rules that are based on the fuzzy confidence value = 1 and the fuzzy support value = 0.03. These rules are further ranked as their importance by Iwportanc^X, Y) = log fconfiden<(X^ X) fsuppor(X) =log p(X/Y) p;X) (8) variables are calculated. The term ID(xi) is used to represent the influence degree (ID) of xi ,i.e., iwportant( x^) + correlation( x^) ID( x,) =- Here, the SUMJd term important(x^) is used to represent the best result of the degree of importance of the fuzzy association rules with xi , which is done by (8), In a similar way, the term correlation( x^) is used to represent the best result of the degree of correlation of the fuzzy association rules with xi , which is done by (7). For all extracted variables from fuzzy association rules, these two items are added up and described as SUM = i, J (important(x^) + correlation(x^)). 3.2 Constructing a MLFSVRN model with incremental architecture Based upon the analysis method just described in Section 3.1, the influential input variables can be obtained and consequently a MLFSVRN model with incremental architecture can be constructed as shown in Fig.2. The construction algorithm of MLFSVRN Model with Incremental Architecture can be summarized as follows. Step 1- Initialization: The number of levels is h. This identified model is called model h and its output is denoted by 7(h) . All n input variables are put in a set S . Let SUMj^d =i {important(x^) + coi-i-elation(x^)] . A threshold value Tnc is set to control the model structure(the number of levels). A large Tnc will set the combination of the representative input variables rigorously and hence generate complicated networks while a small value will set the combination of the representative input variables loosely and generate the networks with few sublevels. Such arrangement is used to make the first level reserve enough system information and let the first level contain at least two input variables. Step 2- Determination of Level 1: 1) Choose n1 most influential inputs as the input variables of the first level and write them as x(1)'s. In order to make the first level contain enough system information, the value of n1 is determined by i:=, iD(x(1)) and n, > 2 ^=1 n importartfxf') + correla^tin{xi ^) SUMj < Tinc ID In particular, according to (8), the rule that has the importance value less than 0.8 is excluded. As a result, the extracted rules that relate to the important variables will be obtained. In order to determine the most influential input variables, the influence degree of (9) According to (9), the first level of the model architecture contain at least two input variables. Here, the term important(x^j1^) is used to represent the best result of the degree of importance of the fuzzy association rules with x(1), the term correlation(x(1))is used to represent the best result of the degree of correlation of the fuzzy association rules with^'^. 2) Set the level index A = 2 . Remove these n^ input variables from S. Step 3— Recalculation: Recalculating SUMj^ and the influence degree values of the variables left in S, i.e., IDix,) = + correlaticnix,) ^^ ^ ^ ^^^^ SUMjjj Step 4—Determination of level h: Choose n^ most influential input variables, i.e., , from S and assign them to level h . The number n^ is determined by and > 2. Step 5—Termination: Remove these n^, input variables from S . If 5 is empty, the algorithm terminates, otherwise go back to Step3,letA = A + l. Uf^ is the total number of input variables (ordinaiy system inputs and input from previous level). (i = 1, • • •, ij^) is input variables being determined by the proposed method in Section 3 and assigned to this level; /''\h = exp^ - /(h) [jji^ ( z) '-my) 'J (12) Here, rrfy^ and Oj correspond to the center and width of the fuzzy set respectively, which are determined by FCM clustering. The firing strength ß^j'^ (z) of rule J is calculated by /jj+i /jj+i 2=1 = exp 2=1 (J , Figure 2: The most parsimonious incremental architecture for a three-level-input system. 3.3 TSK type MLFSVRN model structure and learning 3.3.1 TSK fuzzy inference algorithm In this section, a TSK fuzzy inference algorithm in its each single-level module of MLFSVRN model is presented. Consider a TSK fuzzy system, the jth fuzzy mle in level h will have the form Rule^'^: IF and «4+1 THEN = 4'] + J] afjzf(11) (13) If the single-level fuzzy system contains r mles, then according to the simple weighted sum method [20], the output would be 7=1 A^f(i)- iiu+i 2=1 >1 Where i f> = [af..., a^^^ ]. By setting -J '^J' '-n,. J ^ if = "ij' J J (14) (15) Eq.(14) becomes Where J=1 j=i (16) By adopting the kernel trick, a TS-kemel is integrated as z^'^-rhfW data 5(A) IS transformed to Where af^andafVe solved by SVR. Eq.(19) can be represented as i=l J=\ ' N (20) N Where = ^ (af^ -af(21) i=l (17) And setting ^ a^^j exp then (16) can be further rewritten as: Eq.(17) is the output of TSK-type fuzzy system, which is equivalent to the output of the SVR. For each level module, the parameters ihj and cr^ in the TS-kemels and the number of rules are determined by FCM clustering, the weighting parameters vif^ and bias Z?'-^^ are determined by the linear SVR. Each input According to Eq.(21), the weighting parameters w'-j'^ are obtained. 3.3.2 Structure of single-level modules The structure of each single-level FSVRN module is based on the structure like the four-layered Fuzzy neural network in Fig. 3, which consists of the membership function, fuzzy rules, weighted consequent and output layers, and their functions are briefly described below in the our context. In the following descriptions, O^^j represents the output of the i th node in J th layer of level h, u'j''^ is the 1 th input of a certain node in current layer of level h. Layer 1: Each node in this layer corresponds to a membership function of ordinary system inputs and input from previous level [see (22) and (23)]. = -1,A = 2,---,// (22) and the O m _ J, Ah) (23) vector = . where = is the output of the jth TS- kemel. The vector is fed as input to a linear SVR, and the training data pairs are represented by (18) The optimal linear SVR function is k=\ Here, and are the corresponding membership value of the input (or intermediate) variable connected to it by (12). is equal to the total number of inputs to level h, including the membership value of system inputs and intermediate input. When A = 1, there will not exist any intermediate inputs and only (22) is apphed. Layer 2: The output of each node in this layer represents the firing strength of a rule (24) i=\ The determination of (24) is similar with (13). r^ denotes the number of preconditions in rule node i and A/"^^^ indicates the total number of rules in level h. Layer 3: The output of each rule is computed in this layer. In level h = l where no intermediate variable appears, the function has the form ^ \J=0 and when h > 1 Vi = 1,2,^, #31) (25) = u(h) "h +1 Za (h) aJJ 7 (h) (h) J Vi = 1,2, N3h) (26) where N'^1 and N3(h) correspond to the total number of nodes in the third layer for Level 1and Level 3,respectively. jis the consequent parameter set in level h, z0h) = 1;. Layer 4: The final output is computed by summing the outputs of all rules NShl ofh1=z (h) (27) N^h) is the total number of fuzzy rules in level h, which equals to N2h) . Eqs.(14), (16), (17), (18) and (20) determine the output in (27) clustering method. The membership function parameters of all intermediate variables are fixed according to the final outputs. Step 2—Apply Linear SVR to Level h: In order to evaluate the optimal value of all unknown consequent parameters a(Jh,i) , (25),(26) and(27) can be rewritten in the form of linear SVR according to Eqs.(14), (16), (17), (18) and (20) . The consequent parameters then can be evaluated using Linear SVR method with (15) and (21). Here, the output value y will be used as the desired value of y(h). Step 3—Forward Computation: The output of Level h , y(h) , can be computed using the evaluated a(jhl' s. Step 4—Termination: Set h = h + 1.If h < H, go to Step 2; otherwise, the training process stops. [ 4:OLtpiitLiiiEij&ti. Figure 3: The structure of the four-layered network. 3.3.3 Learning algorithm As we have found in 3.3.1, the consequent part of the TSK fuzzy inference rule is a linear combination of all consequent parameters, which can be reconstructed in the form of linear SVR. So, the linear SVR algorithm is first applied to evaluate the optimal values of all consequent parameters of TSK-type MLFSVRN model. Given a set of training data pairs and setting the desired output of each single-level module as same as the final system output. The linear SVR learning algorithm of a TSK-type MLFSVRN model with incremental architecture is listed as follows. Step 1—Initialization: 1) Divide the input variables into H subsets -(h) x(h) „(h) h = 1, ^, H jaccording to the variable selection method proposed in Section 3, each of them attached to a single-level reasoning network module. 2) Set the level index h = 1 and initialize appropriate membership function parameters based on the FCM Figure 4: MLFSVRN model with incremental architecture (a) six-dimensional example (b) Mackey-Glass time series. =1 h 4 Simulation results In this section, the proposed method has been evaluated for nonlinear system identification and Mackey-Glass chaotic time-series prediction. Section 4.1 discusses a six-dimensional example, which is used to validate the variable selection analysis method described in Sections 3. Section 4.2 discusses the Mackey-Glass chaotic time series prediction, aiming at demonstrating the satisfactoiy learning behavior and good generahzation ability of the MLFSVRN models. 4.1 Six-dimensional example The six-dimensional nonlinear system was given by the following equation: 7 = ++ + 5 sin(s'2 + )+exp(l+A4 + A5) (28) A data set of 1000 pair was prepared by drawing the inputs uniformly from the six-dimensional unit hypercube. To construct the MLFSVRN with incremental architecture, using the variable selection analysis method proposed in Section 3, the input variables are grouped into three subsets , {x2, } > K' } The influence degree of each input variable is evaluated and hsted as follows: X2 0.109 0.107 0.176 0.178 0.273 0.270 Rules Error-Train Error-Test TSK Level 1 4 MLFSVRN Level 2 8 With Level 3 8 0.0168 0.0135 Incremental Architecture Total 20 FSVRN 48 0.0003 0.0126 Jang's ANFIS 64 0.0005 0.0157 (RMSE). After tiie liner SVR learning phase with 400 tiaining data pairs, the models were validated by the testing data set consisting of 600 points. For the purpose of comparison, the performance of the TSK-type MLFSVRN models, their single level counterpart, i.e., FSVRN model and Jang's ANFIS model [21] with 400 tiaining data are listed in Table 1. Obviously, the TSK-type MLFSVRN models has 3 levels, in which the number of terms for each input is 2, 4, 4, respectively. The total number of rules is 20, which use much less fuzzy rules and adjustable parameters than single-level FSVRN. Furthermore, although the single-level counterparts-FSVRN has the smallest tiaining RMSE and testing RMSE, the number of fuzzy rules is more than our proposed model. The testing RMSE of Jang's ANFIS model is biggest among these three models because it caimot have correct response to unforeseen inputs when the training samples are limited. So, the TSK-type MLFSVRN model with incremental architecture shows relatively better generahzation ability. 4.2 Prediction of a chaotic time-series The Mackey-Glass chaotic differential delay equation is recognized as a benchmark problem for time-series prediction which frequently used in the study of chaotic dynamics and defined as follows [14]: dx(t) 0.2x(t-T) --O.lxit) (29) It can be seen that is the most influential input among the six and this corresponds veiy well to what we can deduce from (28). Furthermore, the influence degree of Xq and x^, X2 and x^, ^4 and is about the same and this again matches with our expectation from (28). Thus, the effectiveness of the variable selection method is demonstiated by this example. Here, the threshold is chosen as so the incremental architecture can have ^4 and assigned to the first level as inputs. For the other four input variables, ^nd x^ are put to the second level, Xq and x^ are put to the third level, as shown inFig.4(a). dt i + WhenT>I7, the equation shows chaotic behavior. In our simulations, we set T = 30. In this paper we used^f-SO), x{t-24), Xf-lS), Xf-12), x{t-6)mdx(t) as input variables to predict the value of x{t + &). To construct the MLFSVRN with incremental architecture, using the variable selection method proposed in section 3, the input variables are grouped into three subsets {x{t-24), x{t-30)} {A(f-l8),Xf-6)}, {x{t-\2),x{f)} .The influence degree A(f-24) A(f-18) Xf-12) Kt-6) Kt) 0.253 0.327 0.087 0.165 0.074 0.173 Table I: Comparison of the TSK-type MLFSVRN models from their single-level counterpart FSVRN and Jang's ANFIS in Function prediction. The performance of the proposed TSK-type MLFSVRN models with incremental architecture has been evaluated with the Root Mean Square Error of each input variable is computed and listed as follows: It is found that x(t-30) and x(t-24) are two most influential input variables. Among three clusters, the combination of the influence degree of X^-30) and x{t-2^) is biggest and less than the threshold =0.6 which consistentiy follow the algorithm in section 3. So the incremental architecture can have x(t-30) and x(t- 24) assigned to the first level as inputs. For the other four input variables, Xf-lS) and x(t-6) are put to the second level, x(t-l2) and x(t) are put to the third level, as shown inFig.4(b). Rules Error-Train Error-Test TSK Level 1 9 MLFSVRN Level 2 9 With Level 3 9 0.0253 0.0262 Incremental Architecture Total 27 FSVRN 35 0.0258 0.0352 Jang's ANFIS 39 0.0275 0.0408 Table 2: Comparison of the TSK-type MLFSVRN models from their single-level counterpart FSVRN and Jang's ANFIS in Mackey-Glass chaotic prediction. In order to evaluate the performance of TSK-type MLFSVRN models with the Root Mean Square Error (RMSE), the 200 points of the series from t = 501 - 700 and a comparatively larger one consisting of 700 points of the series from t = 130 - 829 are used as training data, and 500 points from t = 830 -1329 are used as testing data. According to the variable selection method proposed in section 3, we obtained 3 levels and the number of fuzzy rules in each level of the incremental architecture is 9. For the purpose of comparison, the performance of the proposed TSK-type MLFSVRN models, their single level counterpart, i.e., FSVRN model and Jang's ANFIS model with 700 training points are listed in Table 2. From that, it can be seen that the proposed TSK-type MLFSVRN models uses much less fuzzy rules than other models. Furthermore, it is found that the MLFSVRN models perform best among the three models in terms of training and testing RMSE. Fig.5 shows that the TSK-type MLFSVRN prediction outputs are close the real outputs and achieves good generalization ability. Figure 5: Test Result of the TSK-type MLFSVRN model. The above simulations show that our proposed MLFSVRN models can get rid of the dimensionality problem fundamentally. TSK-type MLFSVRN models consume much less fuzzy rules compared with their single-level counterparts-FSVRN and Jang's ANFIS. TSK-type MSFNN models save both fuzzy rules and adjustable parameters significantly compared with Jang's ANFIS. 5 Conclusion In this paper, a hierarchical TSK-type fuzzy system was proposed and its applications in system identification and time-series prediction were studied. In the proposed method, the major characteristic of such model is that the consequence of a rule will be used as a fact to another rule from which the number of fuzzy rules resulted will no longer be an exponential function of the number of input variables. The proposed MLFSVRN model is constructed with incremental architecture. First, some influential input variables are arranged to different reasoning levels by analyzing the influence degree of each input variable based on FCM clustering and fuzzy association rules. Then, each level reasoning module can be realized by FSVRN model. Its consequent parameters are learned by a linear SVR with a new TS-kernel. The major advantage of using MLFSVRN model other than a single-level fuzzy system is that the number of fuzzy rules and parameters involved in modelling process can be reduced significantly and the generalization ability can be improved when compared with those required by the single-level FSVRN systems and Jang's ANFIS systems. The effectiveness of the MLFSVRN model has been demonstrated through two problems. It can generally be concluded that the proposed method has higher performance in identification and time-series prediction in comparison with the other methods. Funding This paper is supported by the Fundamental Research Funds for the Central Universities (No.FRF-SD-12-009B) and the State Scholarship Fund. References [1] G. V. S. Raju, J. Zhou, and R. A. Kisner (1991), "Hierarchical fuzzy control", Int. J. Contr., vol. 54 no. 5, pp. 1201-1216. [2] Cheong F (2007), "A hierarchical fuzzy system with high input dimensions for forecasting foreign exchange rates". IEEE Congress on Evolutionary Computation, CEC (Singapore), pp. 1642-1647. [3] Aja-Ferna'ndez S, Alberola-Lo'pez C (2008), "Matriz modeling of hierarchical fuzzy systems", IEEE Trans Fuzzy Syst, vol. 16, no. 3, pp. 585-599. [4] Zeng X, Goulermas J, Liatsis P, Wang D, Keane J (2008), "Hierarchical fuzzy systems for function approximation on discrete input spaces with application", IEEE Trans Fuzzy Syst, vol. 16 no. 5, pp. 1197-1215. [5] Benftez A, Casillas J (2009), "Genetic learning of serial hierarchical fuzzy systems for large-scale problems". Proceedings of Joint 2009 International Fuzzy Systems Association World Congress and 2009 European Society of Fuzzy Logic and Technology Conference (IFSA-EUSFLAT, Lisbon), pp. 1751-1756. [6] Zajaczkowski J, Verma B (2008), "Selection and impact of different topologies in multilayered hierarchical fuzzy systems", Appl Intell, vol. 36, no. 3, pp. 564-584. [7] Salgado P (2008), "Rule generation for hierarchical collaborative fuzzy system", Appl Math Modell Sci Direct, vol. 32, no. 7), pp. 1159-1178. [8] Joo M, Sudkamp T (2009), "A method of converting a fuzzy system to a two-layered hierarchical fuzzy system and its run-time efficiency", IEEE Trans Fuzzy Sys, vol. 17, no. 1, pp. 93-103. [9] Jelleli T, Alimi A (2010), "Automatic design of a least complicated hierarchical fuzzy system". IEEE International Conference on Fuzzy Systems (FUZZ), pp. 1-7. [10] S. Mitra and Y. Hayashi (2000), "Neuro-fuzzy rule generation: Survey in soft computing framework", IEEE Trans. Neural Net, vol. 11, no. 3, pp. 748768. [11] M. F. Azeem, M. Hanmandlu, N. Ahmad (2003), "Structure Identification of Generalized Adaptive Neuro-Fuzzy Inference Systems", IEEE Trans. On Fuzzy Systems, vol. 11, no. 5, pp. 666-681. [12] L. X. Wang (1999), "Analysis and design of hierarchical fuzzy systems", IEEE Transactions on Fuzzy systems, vol. 7, no. 5, pp. 617-624. [13] F. L Chung and J. C. Duan (2000), "On multistage fuzzy neural network modelling", IEEE Transactions on fuzzy systems, vol. 8, no. 2, pp. 125-142. [14] M. C. Mackey and L. Glass (1977), "Oscillation and chaos in physiological control systems", Sci., vol. 197, pp. 287-289. [15] J. Bezdek (1981), Pattern Recognition With Fuzzy Objective Function Algorithms.New York: Plenum. [16] W. Pedrycz (2002), "Collaborative Fuzzy Clustering", Pattern Recognition Letters, vol. 23, no. 14, pp. 1675-1686. [17] Agrawal R, Imielinski T, Swami A (1993), "Mining association rules between sets of items in large databases". In Proceedings of the ACM SIGMOD conference on management of data, pp. 207-216. [18] Kantardzic M,John Wiley and Sons (2003), Data mining -Concepts, models, methods, and algorithms. [19] Y. C. Lee, T. P. Hong and W. Y. Lin (2004), "Mining fuzzy association rules with multiple minimum supports using maximum constraints". The Eighth International Conference on Knowledge-Based Intelligent Information and Engineering Systems, Lecture Notes in Computer Science, pp. 1283-1290. [20] B. Schölkopf, A.J. Smola (2002), Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge MA. : MIT Press. [21] Jang, J.-S.R (1993). "ANFIS: adaptive-network-based fuzzy inference system". IEEE Transactions on systems, Man and Cybernetics, vol. 23, no. 3, pp. 665-685. [22] Tachibana K, Furuhashi T (2002). "A structure identification method of submodels for hierarchical fuzzy modeling using the multiple objective genetic algorithm", Int J Intell Syst, vol. 17, no. 5, pp. 495513. [23] C. T. Lin, S. F. Liang, C. M. Yeh, and K. W. Fan (2005), "Fuzzy neural network design using support vector regression for function approximation with outliers", IEEE International Conference on Systems, Man and Cybernetics, vol. 3, pp. 27632768. [24] J. M. Leski (2005), "TSK-fuzzy modeling based on -insensitive learning", IEEE Trans, Fuzzy Systems, vol. 13, no. 2, pp. 181-193. Machine Learning Approach for Emotion Recognition in Speech Martin Gjoreski and Kristijan Gjoreski Department of Intelligent Systems, Jožef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia E-mail: {martin.gjoreski, hristijan.gjoreski}@ijs.si Andrea Kulakov Faculty of Computer Science and Engineering, Rugjer Boshkovikj 16, 1000 Skopje, Macedonia E-mail: andrea.kulakov@finki.ukim.mk Keywords: machine learning, emotions, speech, recognition, Auto-WEKA Received: December 11, 2014 This paper presents a machine learning approach to automatic recognition of human emotions from speech. The approach consists of three steps. First, numerical features are extracted from the sound database by using audio feature extractor. Then, feature selection method is used t^o select the most relevant features. Finally, a machine learning model is trained t^o recognize seven universal emotions: anger, fear, sadness, happiness, boredom, disgust and neutral. A thorough ML experimental analysis is performed for each step. The results showed that 300 (out of1582) features, as rankled by the gain rat^io, a^re sufficient for achieving 86% accuracy when evaluated with 10 fold cross-validat^ion. SVM achieved the highest accuracy when compared t^o KNN and Naive Bayes. We additionally compared the accuracy of the standard SVM (wit^h default parameters) and the one enhanced by Auto-WEKA (optimized algorithm parameters) using the leave-one-speaker-o^t technique. The results showed that the SVM enhanced with Aut^o-WEK^A achieved significantly better accuracy than the standard SVM, i.e., 73% and 77% respectively. Finally, the results achieved with the 10 fold cross-^alidat^ion are comparable and similar to the ones achieved by a human, i.e., 86% accuracy in both cases. Even more, low energy emotions (boredom^, sadness and disgust) are better recognized by our machine learning approach compared t^o the human. Povzetek: Predstavljeno je prepoznavanje čustev iz govora s pomočjo strojnega učeanja. 1 Introduction and related work Kuman capabilities for perception, adaptation and also can be also used for improving the accuracy in learning about the surroundings are often three main speech recognition. It is expected that automatic emotion compounds of the definition about what intelligent recognition will change the human-computer behaviour is. In the last few decades there are many communication [23]. studies suggesting that one very important compound is The goal of the emotion recognition systems is to left out of this definition about intelligent behaviour. recognize the emotional state that is experiencing the That compound is emotional intelligence. Emotional speaker. The focus is usually on how something is said, intelligence is the ability of one to feel, express, regulate and not what is said. Besides the approaches where only his own, to recognize and handle the emotional state of the speaker's voice is analysed, there are several different others. In psychology the emotional state is defined as approaches for recognizing the emotional state. In some complex state that results in psychological and approaches the voice and the spoken words are analysed physiological changes that influence our behaving and [2] . Some are focused only on the facial expressions [3] . thinking [1] . Some are analysing the reactions in the human brain for With the recent advancements of the technology and different emotional states [4] . Also there are combined the growing research areas like machine learning (ML), approaches where combination of the mentioned audio processing and speech processing, the emotional approaches is used [5] . states will be inevitable part of the human-computer In general, there are two approaches in human interaction. There are more and more studies that are emotions analysis. In the first approach the emotions are working on providing the computers with abilities like represented as discrete and distinct recognition classes recognizing, interpretation and simulation of emotional [6] . The other approach represents the emotional states states. in 2D or 3D space, where parameters like emotional Automatic emotion recognition is part of growing distance, level of activeness, level of dominance and research areas such as industry for robots [22], level of pleasure are be observed [7] . automobile industry, entertainment industry, marketing In this research we present a ML approach for industry, and similar. The automatic emotion recognition automatic recognition of emotions from speech. Our approach uses the discrete type of approach; therefore the emotional states are represented by seven classes: anger, fear, sadness, happiness, boredom, disgust and neutral. Even though ML approaches have been proposed in the literature, our approach improves upon them by performing a thorough ML analysis, including methods for: feature extraction, feature standardization, feature selection, algorithm selection, and algorithm parameters optimization. With this analysis, we try to find the optimal ML configuration of: features, algorithms and parameters, for the task of emotion recognition in speech. The remainder of this paper is organized as follows. Section 2 is a brief overview of speech emotion analysis. In Section 3 our ML approach for emotion recognition is presented. Section 4 presents the experimental setup and the experimental results. Finally, the conclusion and a brief discussion about the results is given. 2 Speech emotion analysis Speech emotion analysis refers to usage of methods to extract vocal cues from speech as a marker for emotional state, mood or stress. The main assumption is that there are objectively measureable cues that can be used for predicting the emotional state of the speaker. This assumption is quite reasonable since the emotional states arouse physiological reactions that affect the process of speech production. For example, the emotional state of fear usually initiates rapid heartbeat, rapid breathing, sweating and muscle tension. As a result of these physiological activities there are changes in the vibration of the vocal folds and the shape of the vocal tract. All of this affects the vocal characteristics of the speech which allows to the listener to recognize the emotional state that the speaker is experiencing [8] . The basic speech audio features that are used for speech emotion recognition are: fundamental frequency (human perception for fundamental frequency is pith), power, intensity (human perception for intensity is loudness), duration features (ex. rate of speaking) and vocal perturbations. The main question is: Are there any objective feature profiles of the voice that can be used for speaker emotion recognition? A lot of studies are done for the sake of providing such feature profiles that can be used for representation of the emotions, but results are not always consistent. For some basic problems like distinguishing normal speech from angry speech or distinguishing normal speech from bored speech the experimental results converge [9] . For example such converging results are showing that compared to normal speech, when expressing fear or happiness human speak with higher pitch (fundamental frequency). Figure 1 shows an example of audio wave (top graphs) and pitch (bottom graphs) of normal speech (left) and angry speech (right). The missing parts of the pitch graphs are parts of the speech signals which would not have foundation in human perception. They relate to parameters (ex. silence threshold, voicing threshold) of the pitch analysis algorithms. The graphs show that by using the pitch as a feature, one can note the different characteristics of speech under different emotional states. On the left we have normal speech and on the right we have angry speech of the same words by the same person. On the left lower graph (normal speech) we can see that the pitch is around 120Hz and it is monotone. On the other hand the right lower graph (angry speech) we can see higher pitch (there are parts where the pitch goes up to 500Hz) and there is noticeable variability (there are parts where the pitch goes from 500Hz to 100Hz and vice versa). This simple analysis is just an example of how we can compare speech signals by using their physical characteristics. This simple approach cannot be used for speech emotion recognition. The problem arises when we have to distinguish emotional states like anger from happiness or fear from happiness. By using the basic speech audio features for describing these emotional states, the feature profiles are quite similar so distinguishing them is hard. Us 0.6 B 3 n Figure 1. Audio wave (top graphs) and pitch (bottom graphs) of normal speech (left) and angry speech (right). The missing parts of the bottom graphs are parts of the speech signals which would not have foundation in human perception. In the last few years, new method is introduced where static feature vectors are obtained by using so called acoustic Low-Level Descriptors (LLDs) and descriptive statistical functionals [10] . By using this approach a big number of large feature vectors is obtained. The downside is that not all of the feature vectors are of good value, especially not for emotion recognition. For that reason a feature selection method is often used. 3 ML Approach Figure 2 shows the whole process of the ML speech emotion recognition used in this study. First, an emotional speech database is used, which consists of simulated and annotated utterances. Next, feature extraction is performed by using open source feature extractor. Then, feature selection method is used for decreasing the number of features and selecting only the most relevant ones. Finally, the emotion recognition is performed by a classification algorithm. 3.1 Emotional speech database There are several emotional speech databases that are extensively used in the literature [11] : German, English, Japanese, Spanish, Chinese, Russian, Dutch etc. One main characteristic of an emotional speech database is the type of the emotions expressed in the speech: whether they are simulated or they are extracted from real life situations. The advantage of having a simulated speech is that the researcher has a complete control over the emotion that it is expressed and complete control over the quality of the audio. However, the disadvantage is that there is loss in the level of naturalness and spontaneity. On the other hand, the non-simulated emotional databases consist of a speech that is extracted from real life scenarios like call-centers, interviews, meetings, movies, short videos and similar situations where the naturalness and spontaneity is kept. The disadvantage is that in these databases there is not a complete control over the expressed emotions. Also the low quality of the audio can be problem. For this research the Berlin emotional speech database [12] is used, which is one of the most exploited databases for speech emotion analysis. It consists of 535 audio files, where 10 actors (5 male and 5 female) are pronouncing 10 sentences (5 short and 5 long). The sentences are chosen so that all 7 emotions that we are analyzing can be expressed. The database is additionally checked for naturalness by testing it with 20 human Emotional Speech Database Feature Extraction volunteers. The volunteers were supposed to recognize and rate the naturalness of the expressed emotion by listening to random utterance. The utterances that were rated with more than 60% naturalness and from which the expressed emotion was recognized with more than 80%, were included in the final database. In Figure 3 statistics for the Berlin emotional speech database is shown. We can see information about the number of instances per class and information about the human recognition rate obtained from the tests. Table 1. Statistics for the Berlin emotional speech database, including the human recognition rate. Emotions Number of instances Human recognition rate (%) Anger 127 96.2 Neutral 79 88.2 Fear 69 87.3 Boredom 81 86.2 Happiness 71 83.7 Sadness 62 80.7 Disgust 46 79.6 3.2 Feature Extraction The feature extraction tool used in this research is OpenSmile (Open Speech and Music Interpretation by Large Space Extraction) [13] . It is a commonly used tool for signal processing and feature extraction when ML approach is applied on sound data. OpenSmile provides configuration files that can be used for extracting predefined features. For this research the configuration file 'emobase2010' is used. By using the 'emobase2010' configuration file in total 1582 features are extracted [14] . OpenSmile computes LLDs from basic speech features (pitch, loudness, voice quality) or representations of the speech signal (cepstrum, linear predictive coding). On these LLDs functionals are applied and static feature vectors are computed, therefore static classifiers can be used. The functionals that are applied are: extremes (position of mix/min value), statistical moments (first to forth), percentiles (ex. the first quartile), duration (ex. percentage of time the signal is above threshold) and regression (ex. the offset of a linear approximation of the contour) Feature Selection J-V Classification ^_r ^ ^ ^ Emotions Figure 2. ML approach for emotion recognition. After the feature extraction the feature vectors are standardized so the distribution of the values of each feature is with mean equal to 0 and standard deviation equal to 1. This way, the values for each feature are on the same scale from -1 to 1, preventing some features (with bigger values) to have more influence when creating the ML model. This is an important step in ML, especially for classification algorithms that do not have mechanism for feature standardization. 3.3 Feature Selection Feature selection is the process of selecting a subset of relevant features for use in model construction. The central assumption when using a feature selection technique is that the data contains many redundant or irrelevant features. Redundant features are those which provide no more information than the currently selected features, and irrelevant features provide no useful information in any context [15] . To deal with this issue, we used a method for feature selection. Features were ranked with an algorithm for feature ranking and experiments were performed with varying number of top ranked features. For ranking the features the well-known gain ratio [16] algorithm is used. Gain ratio is the ratio of information gain and the entropy of one feature. It is used to avoid overestimation of multi-valued features (the drawback of information gain).The algorithm is used as it is implemented in Orange ML toolkit [18] . 3.4 Classification Once the features are extracted, standardized and selected, they are used to form the feature vector database. Each data sample in the data base is an instance, i.e., feature vector, used for classification. Because each instance is labeled with the appropriate emotion, supervised classification algorithms are used. In our experiments three commonly used algorithms for classification were tested, K-Nearest Neighbors (KNN) [19] , Naive Bayes [21] and Support Vector Machine (SVM) [20] . KNN is an instance-based classifier (lazy) that does not learn a model, but it uses similarity metrics (e.g. Euclidian distance) to find the K most similar training instances and apply the majority class value (emotion) of these K instances. Naive Bayes is a probability-based algorithm. It applies a probability theory to the feature values in order to create a model that divides the instances according to the class values. The SVM is the most complex of the three algorithms. Its goal is to find hyperplanes in the attribute's space in order to maximize the margin between instances that belong to distinct classes. It uses a kernel function in order to create non-linear classifiers. We performed thorough experiments with each of the classification models, and once we selected the one with the highest recognition accuracy, we further enhanced its accuracy with Auto-WEKA [25] . Auto-WEKA is a ML tool that is using approach for parameter optimization of classification algorithms. It searches to the huge space of algorithm parameters and by using an intelligent optimization functions finds the near optimal parameter setting, which should increase the accuracy of the chosen algorithm. The problem of parameter optimization is viewed as a single hierarchical parameter optimization in which even the classification algorithm is considered as a parameter. The root-level parameter is the learning algorithm and the rest of the searching space is depending on the chosen parameter (algorithm) in the previous level. The main idea of Auto-WEKA is that search in the combined space of algorithms and parameters results with better-performing models than standard algorithm selection and parameter optimization methods. For searching the huge space of parameters Auto-WEKA is using Bayesian optimization methods TPE [23] and SMAC [24] . For the model selection problem Auto-WEKA splits the train data in k folds so that each of the pre-selected learning algorithms is tested with k-fold cross validation. The ultimate goal of the model selection approach is to find an algorithm with optimal generalization performance. The selection criteria is minimizing the misclassification rate. For the parameter optimization problem Auto-WEKA is using hierarchical parameter search and again the goal of the optimization problem is minimization of the misclassification rate. 4 Experiments 4.1 Experimental setup The experiments were conducted in the following order. First, the OpenSmile feature extraction tool extracted 1582 feature. Next, all of the extracted features were ranked using the gain ratio algorithm [16] . Then, we tested the accuracy of the three algorithms by using different number of features as ranked by the ranking algorithm. In particular we used 50, 100, 200, 400, 500, 600, 750, 1000, and 1582. As an evaluation metrics, we used the 10 fold cross-validation, which is a goldstandard technique for evaluating datasets when the instances are not structured or time dependent (e.g., time series). It usually gives a good estimate of the accuracy of an algorithm. In the next step, the algorithm with the highest accuracy is further evaluated with Leave-One-Speaker-Out (LOSO) technique. Because in our case the data are collected by 10 individuals (speakers) the model was trained on the data recorded for nine people and tested on the remaining person. This procedure was repeated for each person (10 times). The LOSO evaluation approach is more reliable than using the same person's data for training and testing if the model is intended to be used by unknown people (not included in the training dataset). Finally, we applied Auto-WEKA toolkit in order to optimize the parameters of the chosen algorithm, i.e., parameters, and therefore to improve the accuracy. For each Auto-WEKA experiment the SMAC optimization method was chosen. The optimization timeout for Auto-WEKA was set to 24h. The training memory per experiment was set to 1000MB and the training run timeout was set to 150 min. For these experiments the Auto-WEKA's feature selection module was turned off. For each comparison, tests to confirm the statistical significance of the results were performed using paired Student's T-test with a significance level of 5%. Three, commonly used in ML, evaluation metrics were analyzed: the recall, precision and accuracy. The following formulas define each of the metrics, where Q can be any emotion that we are trying to recognize (happiness, neutral, etc.): recall = No. of correctly recognized emotions labeled as Q No. of all the emotions labeled as Q precision = No. of correctly recognized emotions labeled as Q No. of all the emotions recognized as Q (1) (2) accuaacy = No. of correctly recognized emotions of all types (3) No. of all the emotions 4.2 Experimental Results Once the features were extracted and ranked by the gain ratio algorithm, we compared the accuracy of the three ML algorithms (KNN, SVM and Naive Bayes) for different number of top-ranked features (50, 100, 200, 400, 500, 600, 750, 1000, and 1582). The results presented in Figure 3 show that, as the number of features increases up to 400, also the accuracy increases for each of the three algorithms. After that, the accuracy drops for the SVM, and small (statistically insignificant) improvements are noticed for the other two algorithms. The decrease in performance as the number of features increases is due to overfitting. This is especially notable for the SVM, which was in a way expected because its model is more complex compared to the KNN and Naive Bayes and this complexity usually increases as the number of features increases. If too many not relevant features are used, it will overfit on the training data and the accuracy on the test data will drop (which is the case in our experiments). I SVM 78 Accuracy in % 100 90 80 74 70 60 50 40 30 20 50 84 86 I KNN 87 ■ NaiVe Bayes 85 72 47 I 24 Figure 4 shows the confusion matrix of the SVM for the top ranked 300 features. Additionally we present the recall and the precision for each class (emotion), and the overall accuracy. The highest precision and recall are achieved for the class "sadness" and the lowest are achieved for the class "happiness". Also we can see that the classes "anger" and "happiness" are often mixed by the classifier. The class "fear" is mixed with all other 6 classes. 10 fold cross-valid. PR EDIC :ted class Recall (%) Conf Matrix SVM ^ B D F H N S Anger (A) LJ 1 0 10 0 0 91 VI Boredom (B) ■q pi 3 1 0 3 2 89 hj Diigust (D) 2 2 40 1 0 1 0 87 ü hj < Fear(F) 3 1 1 4 6 1 77 Happiness (H) 17 0 0 2 1 0 72 h Pi Neutral (N) 0 6 0 2 "n 1 87 Sadness (S) 0 0 0 0 0 NN 98 Precision (%) 84 89 89 90 77 85 94 Acc = 86% Figure 4. Confusion matrix for SVM obtained with 10 fold cross-validation with the top ranked 300 features. In the next step, we evaluated the SVM algorithm with LOSO technique. We compared the accuracy of the standard SVM (with default parameters) and the one enhanced by the Auto-WEKA. The results are shown for each test subject (speaker) individually in Figure 5. The results show that the SVM enhanced with Auto-WEKA achieved significantly better accuracy than the standard SVM, except for the first two speakers (S1 and S2). Also we can see that the accuracy depends on test subject. For example by using Auto-WEKA the lowest average accuracy (64%) is obtained when the speaker S2 is used as a test speaker and the highest average accuracy (91%) is obtained when the user S8 is used as test speaker. ■ Auto-WEKA with SVM ■ SVM Accuracy in % 95 91 86 85 80 75 70 65 60 83 76 jiün lil S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 84 100 200 300 400 500 750 1000 1582 Number of features Figure 3. SVM, KNN and Naive Bayes accuracy for 10 fold-cross validation with varying number of features. We additionally analysed the results obtained by the SVM, which achieved the highest accuracy, i.e., 87% when the top ranked 400 features are used. However, there was no statistical difference between the accuracy achieved for 300 and 400 features. Therefore, for further analysis we used the top ranked 300 features, which was a good tradeoff between accuracy and number of features. Testing speaker (S) Figure 5. Auto-WEKA and SVM classification accuracy for LOSO with top ranked 300 features Figure 6 shows more detailed analysis, i.e., confusion matrix of the results achieved by the SVM enhanced with Auto-WEKA. The highest precision and recall are achieved for the class "sadness" and the lowest are achieved for the class "happiness". Also we can see that the class "anger" is mixed with the class "happiness" and vice versa. Also the class "boredom" is mixed with the class "disgust". For the classes "fear" and "neutral" there is no single class that can be pointed as mixing class. These classes are mixed with several others. LOSO Conf. Matrix PREDICTED CLASS Recall (%) Auto-weka A B D F H N S Anger (A) 105 0 1 1 20 0 0 83 Boredom (B) 0 63 11 0 0 4 3 78 Disgust (D) 2 1 38 1 1 3 0 83 Ü . 1 Fear (F) 7 1 3 46 3 8 1 67 Happiness (H) 22 0 0 3 44 2 0 62 H Pi Neutral (N) 0 7 4 4 4 59 1 75 Sadness (S) 0 3 0 0 0 1 58 94 Precision (%) 77 84 67 84 61 77 92 Acc = 77% Figure 6. Confusion matrix for Auto-WEKA obtained with LOSO cross-validation with top ranked 300 features. Finally, we compared the recognition accuracy achieved by a human (manual recognition) and the two ML techniques: SVM trained and tested with 10 fold cross-validation and SVM trained and tested with LOSO (the results are shown in Figure 7). The human recognition rate is obtained from the tests for checking the naturalness of the database. That is, 20 volunteers were asked to recognize the emotions from the audio files. The results show that the SVM trained and tested with 10 fold cross-validation achieved similar results as the human; on average, both achieve 86% accuracy. However, low energy emotions (boredom, sadness and disgust) are better recognized by ML compared to human. This means that for these emotions, the human ear requires additional information, which can be easily extracted using a software tools. The comparison to the LOSO shows that a model trained on subjects different from the ones used for testing should achieve significantly lower accuracy, i.e., 77%. ■ Human recognition rate ■ 10-fold SVM I LOSO SVM 100% 90% 80% 70% 60% 50% Anger Neutral Fear Boredom Happiness Sadness Disgust Average Figure 7. Comparison between the recognition accuracy achieved by a human, and the two ML techniques: 10 fold cross-validation and LOSO. 5 Conclusion Even though ML approaches have been proposed in the emotion recognition literature, our approach improves upon them by performing a thorough ML analysis, including methods for: feature extraction and standardization, feature selection analysis, algorithm selection analysis, and algorithm parameters optimization. With this whole analysis, we tried to find the optimal ML configuration of: features, algorithms and parameters, for the task of emotion recognition in speech. The results showed that the top 300 features, as ranked by the gain ratio, are sufficient for achieving 86% accuracy in emotion recognition. Adding more would just cause overfitting. SVM achieved the highest accuracy and significantly outperformed the KNN and Naive Bayes. When trained and tested with 10 fold cross-validation, SVM achieved 86% over all the emotions. The per-emotion analysis shows that the highest precision and recall were achieved for the "sadness" and the lowest were achieved for the "happiness". Also the "anger" and "happiness" were often mixed by the classifier, and the "fear" was mixed with all other 6 emotions. In the next step, we evaluated the SVM algorithm with LOSO technique. We compared the accuracy of the standard SVM (with default parameters) and the one enhanced by the Auto-WEKA. The results showed that the SVM enhanced with Auto-WEKA achieved significantly better accuracy than the standard SVM. The overall accuracy achieved was 73% and 77% for the standard SVM and the one enhanced with Auto-WEKA. The highest accuracy (94%) was achieved for the "sadness" emotion and the lowest accuracy (62%) for the "happiness". The confusion matrix in Figure 6 shows that the "anger" is mixed with the "happiness" and vice versa. Also the "boredom" is mixed with "disgust". SVM trained and tested with 10 fold cross-validation achieves better accuracy compared to LOSO, 86% and 77% accuracy respectively. The reason for this is that with the 10 fold cross-validation the training and the testing data usually contain data samples of the same speaker. On the other hand, the LOSO technique gives better estimate if the system for speech emotion recognition is supposed to work in an environment where it does not have any information about the speaker. A hybrid approach that includes a calibration phase at the beginning of the usage of the system (for example asking the user to record several data samples) is considered for future work. The recognition accuracy achieved by the SVM trained and tested with 10 fold cross-validation is similar to the one achieved by human (manual recognition); in both cases the accuracy is 86%. Even more, low energy emotions (boredom, sadness and disgust) are better recognized by ML compared to the human. This means that for these emotions, the human ear requires additional information, which can be easily extracted using a software tools. Auto-WEKA is state of the art approach for parameter optimization in ML. This is the first time that it is used in the field of speech analysis especially in speech emotion recognition where there is not yet gold standard approach that is widely accepted by the research community. For future work we plan to test our approach on other languages and to provide language independent model for emotion recognition. This is possible since the emotions that we are trying to recognize are proven to be universal and the features that we are using are language-independent. The ultimate goal would be real time language independent emotion recognition service that can be used as a part of a human affect tracking system which promotes wellbeing. References [1] D. G. Myers. Theories of Emotion. Psychology: Seventh Edition. New York NY: Worth Publishers. 2004. [2] V. Perez-Rosas, R. Mihalcea. Sentiment Analysis of Online Spoken Reviews. Interspeech, 2013. [3] A. Halder, A. Konar, R. Mandal, A. Chakraborty. General and Interval Type-2 Fuzzy Face-Space Approach to Emotion Recognition. IEEE Transactions on Systems, Man, and Cybernetics,43 (3), 2013. [4] R. Horlings, D. Datcu, L. J. M. Rothkrantz. Emotion recognition using brain activity. Proceeding CompSysTech '08 Proceedings of the 9th International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing, 2008. A. Metallinou, S. Lee, S. Narayanan. Audio-Visual Emotion Recognition Using Gaussian Mixture Models for Face and Voice. Multimedia. 2008. ISM 2008. IEEE International Symposium on Multimedia, 2008. P. Ekman. Emotions in the Human Faces. 1982. James A. Russell. A circumplex model of affect. 1980. P. N. Juslin, K. R. Scherer. Vocal expression of affect. In J. A. Harrigan, R. Rosenthal, & K. R. Scherer (Eds.). The new handbook of methods in nonverbal behavior research, pp. 65-135, 2004. K. R. Scherer. Vocal communication of emotion: A review of research paradigms. Speech Communication 40: 227-256. 2003 M. E. Mena. Emotion Recognition From Speech Signals, 2012. [11] D. Ververidis, C. Kotropoulos. A review of emotional speech databases. In: PCI 2003. 9th Panhellenic Conference on Informatics., pp. 560574, 2003. [12] F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, B. Weiss. A Database of German Emotional Speech. 2005. In: Proc. Interspeech. pp. 1517-1520. [13] F. Eyben, M. Wöllmer, B. Schuller. openSMILE -The Munich Versatile and Fast Open-Source Audio Feature Extractor. 2010. [14] F. Eyben, F. Weninger, M. Wollmer, Bjorn Schuller. openSmile Documentation. Version 2.0.0., 2013. [15] G. Isabellem E. Andre. "An Introduction to Variable and Feature Selection". JMLR, 2003. [16] H. Deng, G. Runger, E. Tuv. Bias of importance measures for multi-valued attributes and solutions. Proceedings of the 21st International Conference on Artificial Neural Networks (ICANN2011). 2011 [5] [6] [7] [8] [9] [10] [17] I. Kononenko, E. Simec, M. Robnik-Sikonja. Overcoming the myopia of inductive learning algorithms with RELIEFF. Applied Intelligence, Forthcoming. [18] J. Demšar, B. Zupan. Orange: From experimental machine learning to interactive data mining. White Paper (http://www.ailab.si/orange). Faculty of Computer and Information Science. University of Ljubljana. [19] D. Aha, D. Kibler. Instance-based learning algorithms. 1991, Machine Learning. 6:37-66. [20] N. Cristianini, J. Shawe-Taylor. An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, 2000. [21] R. Stuart, N. Peter. Artificial Intelligence: A Modern Approach. Second Edition, Prentice Hall. [22] I. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn, Morgan Kaufmann, 2005. [23] J. Bergstra, R. Bardenet, Y. Bengio, and B. Kegl. Algorithms for Hyper-Parameter Optimization. In Proc. Of NIPS-11, 2011. [24] F. Hutter, H. Hoos, and K. Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In Proc. of LION-5, pages 507-523, 2011. [25] Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classifiaction Algorithms. Chris Thornton, Frank Hutter, Holger Hoos, and Kevin Leyton-Brown. In Proc. of KDD 2013, 2013. The Inherent Context Awareness of Natural User Interfaces: a Case Study on Multitouch Displays Bojan Blažica XLAB Research, Pot za Brdom 100, 1000 Ljubljana, Slovenia Jožef Stefan International Postgraduate School, Jamova 39, 1000 Ljubljana, Slovenija E-mail: bojan.blazica@xlab.si www.xlab.si Thesis Summary Keywords: natural user interfaces, context-awareness, multitouch displays, biometric identification Received: November 11, 2014 In computer science, context-awareness refers to the capability of a computing device to sense, understand a^nd react to contextual information, i.e. information that is not at the centre of an activity but is st^ill relevant for that activity. A computing device does not necessarily interact with humans at a given moment, but when it does, its Context-Awareness has many implications for human-computer interaction. The thesis that this paper surmises looks at Natural User Interfaces from a Context-Awareness perspective. Povzetek: V računalništvu se pojem kontekstna ozaveščenost nanaša na sposobnost računalniškega sistema, da zazna, razume in se odzove na informacije, ki izvirajo ^z konteksta, v katerem se nahaja in deluje. Imenujemo jih kon^ekstne informacije in jih definiramo kot tiste informacije, ki sicer niso v centru neke aktivnosti, a so zanjo še vedno pomembne. V primeru, da računalniški sistem interagira s človekom, ima lahko kontekstna ozaveščenost sistema velik vpliv na samo komunikacijo človek-računalnik. Ta članek povzema disertacijo, ki z vidika kontekstne ozaveščenosti obravnava naravne uporabniške vmesnike. 1 Introduction This paper surmises a PhD thesis [1] that looks at natural user interfaces from a context-awareness perspective (Figure 1). On the one hand, we show that considering natural user interfaces as context-aware systems further increases the expressive power of these interfaces and, on the other hand, we show that natural user interfaces can also represent essential building blocks for context-aware systems and are therefore a viable way towards context-awareness. Research prospects addressed that arise from this perspective are: to what extent are natural user interfaces already inherently context-aware, how to increase the expressiveness of natural user interfaces through context-awareness, do natural user interfaces provide enough information to perform biometric user identification, and how to take advantage of information implicitly conveyed by the user during interaction with natural user interfaces. The specific natural user interfaces used are multitouch displays. The thesis first reviews the fields of natural user interfaces and context-awareness. Regarding natural user interfaces, as this is an emerging research field, special care is taken to survey all currently available definitions of the term. Similarly, multitouch displays and multitouch interaction are described in more detail as they are considered in the case studies for the thesis. Other related fields such as ubiquitous/pervasive computing, ambient intelligence etc. are also briefly explained. The presented overview does not merely introduce the topic of the thesis, but also shows how interconnected these fields are and how natural user interfaces are indeed inherently context-aware. Figure 1: Context in human-computer interaction. 2 MTi: a Method for user identification on multitouch displays We have shown how the increased amount and variety of data from natural user interfaces can be exploited to acquire contextual information by developing a biometric user identification method and a clustering algorithm for hand detection, both for multitouch displays. The method for user identification, named MTi, is based on features obtained only from the coordinates of the 5 touchpoints of one of the user's hands (Figure 2). This makes it applicable to (almost) all multitouch displays without requiring additional hardware and regardless of the display's underlying sensing technology. The method was tested on a dataset of 34 users and reported 94.69 % identification accuracy. The method also proved to scale well and has an above-average usability [2]. Figure 2: Illustration of types of features used for user identification on multitouch displays. 3 HDCMD: a clustering algorithm to support hand detection on multitouch displays Next, we address the problem of hand detection, i.e. detecting how many hands are currently on the surface and associating each touch point to its corresponding hand (Figure 3) [3]. The presented solution - a clustering algorithm with simple heuristics based on the anatomy of the human hand - is software-based and thus again applicable to all multitouch surfaces regardless of their construction. Along with these two, other related methods that increase the expressiveness of multitouch displays are surveyed in [2, 3]. 4 A personal perspective on photowork: implicit human computer interaction for photo collection management Finally, the thesis explores the possibility to use implicit human-computer interaction to aid personal photo collection management. The idea is that the way we interact with natural user interfaces can implicitly disclose additional (contextual) information, which helps a context-aware system to better understand the user. More specifically, we take into account the user's personal relationship with a single photo; whether the photo is of particular importance to the user. We call this personal relationship the user's affinity for a photo. Figure 3: Illustration of the problem of hand detection on multitouch displays. Experiments revealed that affinity is correlated with the time a user spends viewing a picture. Furthermore, by looking at viewing times, it is also possible to distinguish the task a user is currently performing [4]. 5 Conclusion The positive examples of context acquisition on multitouch displays presented confirm that natural user interfaces are inherently context-aware and show how their expressive power can be further increased by viewing them from a context-aware perspective. Acknowledgement The author wants to thank his supervisors Dunja Mladenič and Daniel Vladušič for their advice and support. Funded in part by the European Union, European Social Fund, Operational Program for Human Resources, Development for the Period 2007-2013 and by the Slovenian Research Agency. References [1] Blažica, B. The inherent contex awareness of natural user interfaces: a case study on multitouch displays : doctoral dissertation. Ljubljana, 2013 http://hci.si/?p=511 [2] Blažica, B.; Vladušic, D.; Mladenič, D. Mti: A method for user identification for multitouch displays. International Journal of Human-Computer Studies vol. 71, no. 6, p. 691-702 (2013). http://goo.gl/pcyUzl. [3] Blažica, B.; Vladušic, D.; Mladenic, D. HDCMD: a Clustering Algorithm to Support Hand Detection on Multitouch Displays. V: First International Conference, SouthCHI 2013, Maribor, Slovenia, July 1-3, 2013. Human factors in computing and informatics : proceedings (Lecture notes in computer science), vol. 7946, p. 803-814 (2013). [4] Blažica, B.; Vladušic, D.; Mladenic, D. A personal perspective on photowork: implicit humancomputer interaction for photo collection management. Personal and Ubiquitous Computing, vol. 17, no. 8, p. 1787-1795 (2013). http://goo.gl/sGfvNh. The Role Of Hubness in High-dimensional Data Analysis Nenad Tomašev Artificial Intelligence Laboratory, Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia E-mail: nenad.tomasev@gmail.com Thesis Summary Keywords: machine learning, curse of dimensionality, hubness Received: November 9, 2014 This article presents a summary of the doctoral dissertation of the author, which addresses the task of machine learning under hubness in intrinsically high-dimensional data. Povzetek: Prispevek predstavlja povzetek doktorske disertacije avtorja, ki obravnava naloge strojnega učenja pri zvezdiščnosti visokodimenzionalnih podatkov. 1 Introduction Machine learning in intrinsically high-dimensional data is known to be challenging and this is usually referred to as the curse of dimensionality. Designing machine learning methods that perform well in many dimensions is critical, since high-dimensional data arises often in practical applications and typical examples include textual, image and multimedia feature representations, as well as time series and biomedical data. The hubness phenomenon [1] has recently come into focus as an important aspect of the curse of dimensionality that affects many instance-based machine learning systems. With increasing dimensionality, the distribution of instance relevance within the models tends to become long-tailed. A small number of hub points dominates the analysis and influences a disproportionate number of system predictions. Most remaining points are rarely or never retrieved in relevance queries, resulting in an information loss. High data hubness has been linked to poor system performance in many data domains. The dissertation [2] proposes several novel hubness-aware machine learning algorithms to improve the effectiveness of machine learning in intrinsically high-dimensional data. The proposed methods are based on modeling the influence of hub points on the training data. The article is organized as follows. Section 2 gives an overview of the proposed hubness-aware methods. Section 3 summarizes the evaluation results. The dissertation's scientific contributions are outlined in Section 4 together with plans for future work. hubness-aware classification algorithms based on class-conditional k-nearest neighbor occurrence models were proposed: h-FNN, dwh-FNN, HIKNN and NHBNN. The fuzzy hubness-aware k-nearest neighbor methods (h-FNN, dwh-FNN) utilize hubness-based fuzzy measures for voting in an extended fuzzy k-nearest neighbor (FNN) framework. HIKNN further extends dwh-FNN by taking the neighbor occurrence self-information into account and assigning higher relevance to less frequently occurring neighbor points. HIKNN also removes the need for special anti-hub handling mechanisms and reduces the number of parameters needed for dwh-FNN. Unlike the fuzzy approaches, the naive hubness-Bayesian k-nearest neighbor method (NHBNN) presents a Bayesian re-interpretation of the neighbor occurrence events and offers a novel probabilistic framework for kNN classification. 2.2 Clustering It was demonstrated in the dissertation that the neighbor occurrence frequencies in intrinsically high-dimensional data tend to be correlated with local cluster central-ity. This was used to propose several deterministic and stochastic extensions of the well-known K-means clustering framework: local K-hubs (LKH), global K-hubs (GKH), local hubness-proportional clustering (LHPC), global hubness-proportional clustering (GHPC) and global hubness-proportional K-means (GHPKM) [3]. The novelty of the methods is that they exploit hubs as cluster prototypes and use point-wise hubness for guiding the search for the optimal configuration. 2 Hubness-aware machine learning 2.3 Metric learning 2.1 Classification The earlier approach of using hubness-based weighting in k-nearest neighbor classification was extended and several A hubness-aware extension of the commonly used simcosg shared-neighbor secondary similarity measure was proposed, simhubg. The proposed approach uses hubness-based weights when calculating the secondary similarity scores. 3 Evaluation The proposed classification, clustering and metric learning approaches were evaluated on many intrinsically high-dimensional datasets from various domains, as well as specially generated challenging synthetic datasets. They were compared with standard hubness non-aware baselines and statistically significant improvements were observed. The proposed approaches were shown to improve system performance in several highly challenging tasks, including learning under class imbalance and learning with feature or label noise. A disproportionate amount of misclassification in high-dimensional class-imbalanced data was determined to be caused by minority class points. Hubness-aware classifiers were found to be well suited for classification under this newly discovered curse of minority hubs [4]. [2] N. Tomašev (2013) The Role of Hubness in High-dimensional Data Analysis, IPS Jož Stefan, Ljubljana, Slovenia. [3] N. Tomašev et al. (2013) The Role of Hubness in Clustering High-dimensional data, IEEE Transactions on Knowledge and Data Engineering, IEEE Computer Society, pp. 1041-4347. [4] N. Tomašev et al. (2013) Class Imbalance and The Curse of Minority Hubs, Knowledge-Based Systems, Elsevier, pp. 157-172. 4 Conclusions The dissertation addresses the problem of designing effective machine learning approaches under the assumption of hubness in intrinsically high-dimensional data. It proposes several novel hubness-aware methods for learning in many dimensions. The main contributions of the dissertation are: - Novel hubness-aware kNN classification methods: h-FNN, dwh-FNN, HIKNN and NHBNN. The novelty lies in using class-conditional neighbor occurrences for hub modeling on the training data. - Novel clustering approaches: LKH, GKH, LHPC, GHPC, GHPKM. These are the first hubness-based clustering approaches to be proposed. - A novel secondary similarity measure, simhubg. It is the first shared-neighbor similarity score to take hubness explicitly into account. The proposed hubness-aware approaches achieved statistically significant improvements over the corresponding baselines in various tested experimental contexts. As future work, we wish to further extend the proposed hubness-aware approaches. We intend to pay special attention to unsupervised and semi-supervised learning, as obtaining ground truth is often expensive in practical applications. We also intend to work on scalability in order to make the proposed approaches useful on large-scale datasets. References [1] M. Radovanovic (2011) Representations and Metrics in High-Dimensional Data Mining, Izdavacka kn-jižarnica Zorana Stojanovica, Novi Sad, Serbia. CONTENTS OF Informatica Volume 38 (2014) pp. 1-391 Papers Aissaoui, O. &, A. Amirat, F. Atil. 2014. A Model-Based Framework for Building Self-Adaptive Distributed Software. Informatica 38:289-306. Amirou, a. &, O.-A. Djaffar, Z. Zahia, a. Mohamed, M. Jean. 2014. Optimizing the Classification Cost using SVMs with a Double Hinge Loss. Informatica 38:125-133. BenedicCicC, L. &, M. Pesko, T. Javornik, P. Korošec. 2014. a Metaheuristic Approach for Propagation-Model Tuning in LTE Networks. Informatica 38:135-143. BlaŽica, B. & . 2014. The Inherent Context Awareness of Natural User Interfaces: a Case Study on Multitouch Displays. Informatica 38:385-386. Gams, M. & . 2014. Editorial: "Michie-Turing" IS2014 Award recipient: Prof. Dr. Janez Grad. Informatica 38:311-312. Gams, M. & . 2014. The Unexpected Hanging Paradox from an AI Viewpoint. Informatica 38:181-185. Gjoreski, M. &, H. Gjoreski. 2014. Machine Learning Approach for Emotion Recognition in Speech. Informatica 38:377-384. Janech, J. &, E. KršAk, Š. Toth. 2014. The Architecture of Distributed Database System in the VANET Environment. Informatica 38:205-211. Kaluža, B. & . 2014. A Unified Framework for Detection of Suspicious and Anomalous Beahvior from Spatio-Temporal Traces. Informatica 38:187-188. Boshkoska, B.M. & . 2014. From Qualitative to Quantitative Evaluation Methods in Multi-criteria Decision Models. Informatica 38:89-90. Chakraborty, T. & . 2014. Using Semantic Clustering for Detecting Bengali Multiword Expressions. Informatica 38:103-113. Kane, T.B. & . 2014. Using Cognitive Tunnels in a New Approach to Building Social Elevators in the Information Society. Informatica 38:263-271. Kazakovtsev, L.A. &, A.n. Antamoshkin. 2014. Genetic Algorithm with Fast Greedy Heuristic for Clustering and Location Problems. Informatica 38:229-240. Chojnacki, a. &, M. Grzenda, a. Kowalski, B. Macukow. 2014. Editors's Introduction to the Special Issue on "Frontiers in Network Systems and Applications". Informatica 38:191-192. Kefali, a. &, T. Sari, H. Bahi. 2014. Foreground-Background Separation by Feed-forward Neural Networks in Old Manuscripts. Informatica 38:329-338. Chovanec, M. &, M. Hodon, L. Cechovic. 2014. Tiny Low-Power WSN Node for the Vehicle Detection. Informatica 38:223-227. Klyuev, v. &, M. Mozgovoy. 2014. Editors's Introduction to the Special Issue on "Advances in Semantic Information Retrieval". Informatica 38:1-2. Craig, p. &, N. Roa-Sedler, M.M. DIaz, F.L. Rosano. 2014. a Cognitonics Approach to Computer Supported Learning in the Mexican State of Oaxaca. Informatica 38:241-248. Demšar, J. &, Z. Bosnicc, I. Kononenko. 2014. Visualization and Concept Drift Detection Using Explanations of Incremental Models. Informatica 38:321-327. Dimitrijevicc, D. &, V. Dimitrieski, N. Nedicc. 2014. Prototype Implementation of a Scalable Real-Time Dynamic Carpooling and Ride-Sharing Application. Informatica 38:213222. Fomichov, v.a. &, O.S. Fomichova. 2014. An Imperative of a Poorly Recognized Existential Risk: Early Socialization of Smart Young Generation in Information Society. Informatica 38:59-70. Kumar, Y. &, G. Sahoo. 2014. A Chaotic Charged System Search Approach for Data Clustering. Informatica 38:249-262. Li, L. &, Y. Wu, M. Ye. 2014. Multi-class Image Classification Based on Fast Stochastic Gradient Boosting. Informatica 38:145-153. Lin, L. &, Z. Xiaolong, Z. Kai, L. Jun. 2014. Bilinear Grid Search Strategy Based Support Vector Machines Learning Method. Informatica 38:51-58. Menai, M.B. & . 2014. Word Sense Disambiguation Using an Evolutionary Approach. Informatica 38:155-169. MihAAescu, M.C. &, M.G. Tacu, d.d. Burdescu. 2014. Use Case of Cognitive and HCI Analysis for an E-Learning Tool. Informatica 38:273-279. Fran.cois, M. &, D. Defour, C. Negre. 2014. a Fast Chaos-Based Pseudo-Random Bit Generator Using Binary64 Floating-Point Arithmetic. Informatica 38:115-124. Miljkovic, D. & . 2014. Semi-automated Knowledge Elicitation for Modelling Plant Defence Response. Informatica 38:307-308. Mirchevska, V. & . 2014. Classifier Generation by Combin- Zhou, X. &, K. Liu, Z. Jin, S. Tian, Y. Fu, L. Qin. 2014. ing Domain Knowledge and Machine Learning . Informatica Artificial Immune Based Cryptography Optimization Algorithm. 38:91-92. Informatica 38:43-50. Mostafa, M. &, E.M.F. El Houby, A. Salah. 2014. ZUPAN^I^, D. &, B. CVETKOVICC. 2014. Smart-Home Energy Semantic Searching of Biological Documents Using Gene Management in the Context of Occupants' Activity. Informatica Ontology. Informatica 38:71-80. 38:171-180. Munezero, M. &, C.S. Montero, T. Kakkonen, E. Suti-NEN, M. Mozgovoy, v. Klyuev. 2014. Automatic Detection of Antisocial Behaviour in Texts. Informatica 38:3-10. Pan, W. &, C. Yao, X. LI, L. Shen. 2014. An Online Compression Algorithm for Positioning Data Acquisition. Informatica 38:339-346. Parol, P. &, M. Pawlowski. 2014. Future Proof Access Networks for B2B Applications. Informatica 38:193-204. PURGINA, M. &, A. KUZNETSOV, E. PYSHKIN. 2014. Leveraging User Experience through Input Style Transformation to Improve Access to Music Search Services. Informatica 38:11-19. Pocs, J. & . 2014. On the Inverse Problem for Generalized One-Sided Concept Lattices. Informatica 38:95-101. Salah, H. &, B. Nacer, B. Mahmoud. 2014. Semantic Annotations for Workflow Interoperability. Informatica 38:347-366. ŠEVCECH, J. &, R. Miro, M. Holub, M. Bielikovä. 2014. User Annotations as a Context for Related Document Search on the Web and Digital Libraries. Informatica 38:21-30. Somrak, M. &, M. Luštrek, J. ŠušteršicC, T. Krivc, A. Mlinar, T. Travnik, L. Stepan, M. Mavsar, M. Gams. 2014. Tricorder: Consumer Medical Device for Discovering Common Medical Conditions. Informatica 38:81-88. Szwed, p. &, P. SkrzyiNski, G. Rogus, J. Werewka. 2014. SoARoAD: An ontology of Architectural Decisions Supporting Assessment of Service oriented Architectures. Informatica 38:31-42. Tomašev, N. & . 2014. The Role Of Hubness in High-dimensional Data Analysis. Informatica 38:387-388. Wang, L. &, D. Fu, L. Wu. 2014. Incremental Hierarchical Fuzzy model generated from Multilevel Fuzzy Support Vector Regression Network. Informatica 38:367-375. Wong, P. &, V. Varikota, D. Nguyen, A. Abukmail. 2014. Automatic Android-based Wireless Mesh Networks. Informatica 38:313-320. Zhou, H. &, H. Wang. 2014. A Database-Based Two-Phase Algorithm For Efficient and Complete Detection of siRNA Off-Target Homology. Informatica 38:281-287. JOŽEF STEFAN INSTITUTE Jožef Stefan (1835-1893) was one of the most prominent physicists of the 19th century. Born to Slovene parents, he obtained his Ph.D. at Vienna University, where he was later Director of the Physics Institute, Vice-President of the Vienna Academy of Sciences and a member of several scientific institutions in Europe. Stefan explored many areas in hydrodynamics, optics, acoustics, electricity, magnetism and the kinetic theory of gases. Among other things, he originated the law that the total radiation from a black body is proportional to the 4th power of its absolute temperature, known as the Stefan-Boltzmann law. The Jožef Stefan Institute (JSI) is the leading independent scientific research institution in Slovenia, covering a broad spectrum of fundamental and applied research in the fields of physics, chemistry and biochemistry, electronics and information science, nuclear science technology, energy research and environmental science. The Jožef Stefan Institute (JSI) is a research organisation for pure and applied research in the natural sciences and technology. Both are closely interconnected in research departments composed of different task teams. Emphasis in basic research is given to the development and education of young scientists, while applied research and development serve for the transfer of advanced knowledge, contributing to the development of the national economy and society in general. At present the Institute, with a total of about 900 staff, has 700 researchers, about 250 of whom are postgraduates, around 500 of whom have doctorates (Ph.D.), and around 200 of whom have permanent professorships or temporary teaching assignments at the Universities. industry, to promote joint ventures between university bodies, research institutes and innovative industry, to act as an incubator for high-tech initiatives and to accelerate the development cycle of innovative products. Part of the Institute was reorganized into several high-tech units supported by and connected within the Technology park at the Jožef Stefan Institute, established as the beginning of a regional Technology park "Ljubljana". The project was developed at a particularly historical moment, characterized by the process of state reorganisation, privatisation and private initiative. The national Technology Park is a shareholding company hosting an independent venture-capital institution. The promoters and operational entities of the project are the Republic of Slovenia, Ministry of Higher Education, Science and Technology and the Jožef Stefan Institute. The framework of the operation also includes the University of Ljubljana, the National Institute of Chemistry, the Institute for Electronics and Vacuum Technology and the Institute for Materials and Construction Research among others. In addition, the project is supported by the Ministry of the Economy, the National Chamber of Economy and the City of Ljubljana. Jožef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia Tel.:+386 1 4773 900, Fax.:+386 1 251 93 85 WWW: http://www.ijs.si E-mail: matjaz.gams@ijs.si Public relations: Polona Strnad In view of its activities and status, the JSI plays the role of a national institute, complementing the role of the universities and bridging the gap between basic science and applications. Research at the JSI includes the following major fields: physics; chemistry; electronics, informatics and computer sciences; biochemistry; ecology; reactor technology; applied mathematics. Most of the activities are more or less closely connected to information sciences, in particular computer sciences, artificial intelligence, language and speech technologies, computer-aided design, computer architectures, biocybernetics and robotics, computer automation and control, professional electronics, digital communications and networks, and applied mathematics. The Institute is located in Ljubljana, the capital of the independent state of Slovenia (or S^nia). The capital today is considered a crossroad between East, West and Mediterranean Europe, offering excellent productive capabilities and solid business opportunities, with strong international connections. Ljubljana is connected to important centers such as Prague, Budapest, Vienna, Zagreb, Milan, Rome, Monaco, Nice, Bern and Munich, all within a radius of 600 km. From the Jožef Stefan Institute, the Technology park "Ljubljana" has been proposed as part of the national strategy for technological development to foster synergies between research and INFORMATICA AN INTERNATIONAL JOURNAL OF COMPUTING AND INFORMATICS INVITATION, COOPERATION Submissions and Refereeing Please submit a manuscript to: http://www.informatica.si/Editors/ PaperUpload.asp. At least two referees outside the author's country will examine it, and they are invited to make as many remarks as possible from typing errors to global philosophical disagreements. The chosen editor will send the author the obtained reviews. If the paper is accepted, the editor will also send an email to the managing editor. The executive board will inform the author that the paper has been accepted, and the author will send the paper to the managing editor. The paper will be published within one year of receipt of email with the text in Informat-ica MS Word format or Informatica I^TeX format and figures in .eps format. Style and examples of papers can be obtained from http://www.informatica.si. Opinions, news, calls for conferences, calls for papers, etc. should be sent directly to the managing editor. QUESTIONNAIRE Send Informatica free of charge Yes, we subscribe Please, complete the order form and send it to Dr. Drago Torkar, Informatica, Institut Jožef Stefan, Jamova 39, 1000 Ljubljana, Slovenia. E-mail: drago.torkar@ijs.si Since 1977, Informatica has been a major Slovenian scientific journal of computing and informatics, including telecommunications, automation and other related areas. In its 16th year (more than twenty years ago) it became truly international, although it still remains connected to Central Europe. The basic aim of Infor-matica is to impose intellectual values (science, engineering) in a distributed organisation. Informatica is a journal primarily covering intelligent systems in the European computer science, informatics and cognitive community; scientific and educational as well as technical, commercial and industrial. Its basic aim is to enhance communications between different European structures on the basis of equal rights and international refereeing. It publishes scientific papers accepted by at least two referees outside the author's country. In addition, it contains information about conferences, opinions, critical examinations of existing publications and news. Finally, major practical achievements and innovations in the computer and information industry are presented through commercial publications as well as through independent evaluations. Editing and refereeing are distributed. Each editor can conduct the refereeing process by appointing two new referees or referees from the Board of Referees or Editorial Board. Referees should not be from the author's country. If new referees are appointed, their names will appear in the Refereeing Board. Informatica is free of charge for major scientific, educational and governmental institutions. Others should subscribe (see the last page of Informatica). ORDER FORM - INFORMATICA Name: ............................... Title and Profession (optional): ......... Home Address and Telephone (optional): Office Address and Telephone (optional): E-mail Address (optional): ............. Signature and Date: ................... Informatica WWW: http://www.informatica.si/ Referees from 2008 on: A. Abraham, S. Abraham, R. Accornero, A. Adhikari, R. Ahmad, G. Alvarez, N. Anciaux, R. Arora, I. Awan, J. Azimi, C. Badica, Z. Balogh, S. Banerjee, G. Barbier, A. Baruzzo, B. Batagelj, T. Beaubouef, N. Beaulieu, M. ter Beek, P. Bellavista, K. Bilal, S. Bishop, J. Bodlaj, M. Bohanec, D. Bolme, Z. Bonikowski, B. Boškovic, M. Botta, P. Brazdil, J. Brest, J. Brichau, A. Brodnik, D. Brown, I. Bruha, M. Bruynooghe, W. Buntine, D.D. Burdescu, J. Buys, X. Cai, Y. Cai, J.C. Cano, T. Cao, J.-V. Capella-Hernändez, N. Carver, M. Cavazza, R. Ceylan, A. Chebotko, I. Chekalov, J. Chen, L.-M. Cheng, G. Chiola, Y.-C. Chiou, I. Chorbev, S.R. Choudhary, S.S.M. Chow, K.R. Chowdhury, V. Christlein, W. Chu, L. Chung, M. Ciglaric, J.-N. Colin, V. Cortellessa, J. Cui, P. Cui, Z. Cui, D. Cutting, A. Cuzzocrea, V. Cvjetkovic, J. Cypryjanski, L. CCehovin, D. CCerepnalkoski, I. CCosic, G. Daniele, G. Danoy, M. Dash, S. Datt, A. Datta, M.-Y. Day, F. Debili, C.J. Debono, J. Dedic, P. Degano, A. Dekdouk, H. Demirel, B. Demoen, S. Dendamrongvit, T. Deng, A. Derezinska, J. Dezert, G. Dias, I. Dimitrovski, S. Dobrišek, Q. Dou, J. Doumen, E. Dovgan, B. Dragovich, D. Drajic, O. Drbohlav, M. Drole, J. Dujmovic, O. Ebers, J. Eder, S. Elaluf-Calderwood, E. Engström, U. riza Erturk, A. Farago, C. Fei, L. Feng, Y.X. Feng, B. Filipic, I. Fister, I. Fister Jr., D. Fišer, A. Flores, V.A. Fomichov, S. Forli, A. Freitas, J. Fridrich, S. Friedman, C. Fu, X. Fu, T. Fujimoto, G. Fung, S. Gabrielli, D. Galindo, A. Gambarara, M. Gams, M. Ganzha, J. Garbajosa, R. Gennari, G. Georgeson, N. Gligoric, S. Goel, G.H. Gonnet, D.S. Goodsell, S. Gordillo, J. Gore, M. Grcar, M. Grgurovic, D. Grosse, Z.-H. Guan, D. Gubiani, M. Guid, C. Guo, B. Gupta, M. Gusev, M. Hahsler, Z. Haiping, A. Hameed, C. Hamzagebi, Q.-L. Han, H. Hanping, T. Härder, J.N. Hatzopoulos, S. Hazelhurst, K. Hempstalk, J.M.G. Hidalgo, J. Hodgson, M. Holbl, M.P. Hong, G. Howells, M. Hu, J. Hyvärinen, D. Ienco, B. Ionescu, R. Irfan, N. Jaisankar, D. Jakobovic, K. Jassem, I. Jawhar, Y. Jia, T. Jin, I. Jureta, D. Juricic, S. K, S. Kalajdziski, Y. Kalantidis, B. Kaluža, D. Kanellopoulos, R. Kapoor, D. Karapetyan, A. Kassler, D.S. Katz, A. Kaveh, S.U. Khan, M. Khattak, V. Khomenko, E.S. Khorasani, I. Kitanovski, D. Kocev, J. Kocijan, J. Kollär, A. Kontostathis, P. Korošec, A. Koschmider, D. Košir, J. Kovac, A. Krajnc, M. Krevs, J. Krogstie, P. Krsek, M. Kubat, M. Kukar, A. Kulis, A.P.S. Kumar, H. Kwasnicka, W.K. Lai, C.-S. Laih, K.-Y. Lam, N. Landwehr, J. Lanir, A. Lavrov, M. Layouni, G. Leban, A. Lee, Y.-C. Lee, U. Legat, A. Leonardis, G. Li, G.-Z. Li, J. Li, X. Li, X. Li, Y. Li, Y. Li, S. Lian, L. Liao, C. Lim, J.-C. Lin, H. Liu, J. Liu, P. Liu, X. Liu, X. Liu, F. Logist, S. Loskovska, H. Lu, Z. Lu, X. Luo, M. Luštrek, I.V. Lyustig, S.A. Madani, M. Mahoney, S.U.R. Malik, Y. Marinakis, D. Marincic, J. Marques-Silva, A. Martin, D. Marwede, M. Matijaševic, T. Matsui, L. McMillan, A. McPherson, A. McPherson, Z. Meng, M.C. Mihaescu, V. Milea, N. Min-Allah, E. Minisci, V. Mišic, A.-H. Mogos, P. Mohapatra, D.D. Monica, A. Montanari, A. Moroni, J. Mosegaard, M. Moškon, L. de M. Mourelle, H. Moustafa, M. Možina, M. Mrak, Y. Mu, J. Mula, D. Nagamalai, M. Di Natale, A. Navarra, P. Navrat, N. Nedjah, R. Nejabati, W. Ng, Z. Ni, E.S. Nielsen, O. Nouali, F. Novak, B. Novikov, P. Nurmi, D. Obrul, B. Oliboni, X. Pan, M. Pancur, W. Pang, G. Papa, M. Paprzycki, M. Paralic, B.-K. Park, P. Patel, T.B. Pedersen, Z. Peng, R.G. Pensa, J. Perš, D. Petcu, B. Petelin, M. Petkovšek, D. Pevec, M. Piculin, R. Piltaver, E. Pirogova, V. Podpecan, M. Polo, V. Pomponiu, E. Popescu, D. Poshyvanyk, B. Potocnik, R.J. Povinelli, S.R.M. Prasanna, K. Pripužic, G. Puppis, H. Qian, Y. Qian, L. Qiao, C. Qin, J. Que, J.-J. Quisquater, C. Rafe, S. Rahimi, V. Rajkovic, D. Rakovic, J. Ramaekers, J. Ramon, R. Ravnik, Y. Reddy, W. Reimche, H. Rezankova, D. Rispoli, B. Ristevski, B. Robic, J.A. Rodriguez-Aguilar, P. Rohatgi, W. Rossak, I. Rožanc, J. Rupnik, S.B. Sadkhan, K. Saeed, M. Saeki, K.S.M. Sahari, C. Sakharwade, E. Sakkopoulos, P. Sala, M.H. Samadzadeh, J.S. Sandhu, P. Scaglioso, V. Schau, W. Schempp, J. Seberry, A. Senanayake, M. Senobari, T.C. Seong, S. Shamala, c. shi, Z. Shi, L. Shiguo, N. Shilov, Z.-E.H. Slimane, F. Smith, H. Sneed, P. Sokolowski, T. Song, A. Soppera, A. Sorniotti, M. Stajdohar, L. Stanescu, D. Strnad, X. Sun, L. Šajn, R. Šenkerfk, M.R. Šikonja, J. Šilc, I. Škrjanc, T. Štajner, B. Šter, V. Štruc, H. Takizawa, C. Talcott, N. Tomasev, D. Torkar, S. Torrente, M. Trampuš, C. Tranoris, K. Trojacanec, M. Tschierschke, F. De Turck, J. Twycross, N. Tziritas, W. Vanhoof, P. Vateekul, L.A. Vese, A. Visconti, B. Vlaovic, V. Vojisavljevic, M. Vozalis, P. Vracar, V. Vranic, C.-H. Wang, H. Wang, H. Wang, H. Wang, S. Wang, X.-F. Wang, X. Wang, Y. Wang, A. Wasilewska, S. Wenzel, V. Wickramasinghe, J. Wong, S. Wrobel, K. Wrona, B. Wu, L. Xiang, Y. Xiang, D. Xiao, F. Xie, L. Xie, Z. Xing, H. Yang, X. Yang, N.Y. Yen, C. Yong-Sheng, J.J. You, G. Yu, X. Zabulis, A. Zainal, A. Zamuda, M. Zand, Z. Zhang, Z. Zhao, D. Zheng, J. Zheng, X. Zheng, Z.-H. Zhou, F. Zhuang, A. Zimmermann, M.J. Zuo, B. Zupan, M. Zuqiang, B. Žalik, J. Žižka, Informatica An International Journal of Computing and Informatics Web edition of Informatica may be accessed at: http://www.informatica.si. Subscription Information Informatica (ISSN 0350-5596) is published four times a year in Spring, Summer, Autumn, and Winter (4 issues per year) by the Slovene Society Informatika, Litostrojska cesta 54, 1000 Ljubljana, Slovenia. The subscription rate for 2014 (Volume 38) is - 60 EUR for institutions, - 30 EUR for individuals, and - 15 EUR for students Claims for missing issues will be honored free of charge within six months after the publication date of the issue. Typesetting: Borut Žnidar. Printing: ABO grafika d.o.o., Ob železnici 16, 1000 Ljubljana. Orders may be placed by email (drago.torkar@ijs.si), telephone (+386 1 477 3900) or fax (+386 1 251 93 85). The payment should be made to our bank account no.: 02083-0013014662 at NLB d.d., 1520 Ljubljana, Trg republike 2, Slovenija, IBAN no.: SI56020830013014662, SWIFT Code: LJBASI2X. Informatica is published by Slovene Society Informatika (president Niko Schlamberger) in cooperation with the following societies (and contact persons): Robotics Society of Slovenia (Jadran Lenarcic) Slovene Society for Pattern Recognition (Janez Perš) Slovenian Artificial Intelligence Society (Dunja Mladenic) Cognitive Science Society (Urban Kordeš) Slovenian Society of Mathematicians, Physicists and Astronomers (Andrej Likar) Automatic Control Society of Slovenia (Sašo Blažic) Slovenian Association of Technical and Natural Sciences / Engineering Academy of Slovenia (Vojteh Leskovšek) ACM Slovenia (Andrej Brodnik) Informatica is financially supported by the Slovenian research agency from the Call for co-financing of scientific periodical publications. Informatica is surveyed by: ACM Digital Library, Citeseer, COBISS, Compendex, Computer & Information Systems Abstracts, Computer Database, Computer Science Index, Current Mathematical Publications, DBLP Computer Science Bibliography, Directory of Open Access Journals, InfoTrac OneFile, Inspec, Linguistic and Language Behaviour Abstracts, Mathematical Reviews, MatSciNet, MatSci on SilverPlatter, Scopus, Zentralblatt Math Volume 38 Number 4 December 2014 ISSN 0350-5596 Informatica An International Journal of Computing and Informatics Editorial: "Michie-Turing" IS2014 Award recipient: Prof. Dr. Janez Grad Automatic Android-based Wireless Mesh Networks Visualization and Concept Drift Detection Using Explanations of Incremental Models Foreground-Background Separation by Feed-forward Neural Networks in Old Manuscripts An Online Compression Algorithm for Positioning Data Acquisition Semantic Annotations for Workflow Interoperability Incremental Hierarchical Fuzzy model generated from Multilevel Fuzzy Support Vector Regression Network Machine Learning Approach for Emotion Recognition in Speech The Inherent Context Awareness of Natural User Interfaces: a Case Study on Multitouch Displays The Role Of Hubness in High-dimensional Data Analysis M. Gams 311 P. Wong, V. Varikota, 313 D. Nguyen, A. Abukmail J. Demšar, Z. Bosnic, 321 I. Kononenko A. Kefali, T. Sari, H. Bahi 329 W. Pan, C. Yao, X. Li, 339 L. Shen H. Salah, B. Nacer, 347 B. Mahmoud L. Wang, D. Fu, L. Wu 367 M. Gjoreski, H. Gjoreski, 377 A. Kulakov B. Blažica 385 N. Tomašev 387 Informatica 38 (2014) Number 4, pp. 311-391