Zbornik 19. mednarodne multikonference INFORMACIJSKA DRUŽBA - IS 2016 Zvezek C Proceedings of the 19th International Multiconference INFORMATION SOCIETY - IS 2016 Volume C Sodelovanje, programska oprema in storitve v informacijski družbi Collaboration, Software and Services in Information Society Uredil / Edited by Marjan Heričko http://is.ijs.si 10. oktober 2016 / 10 October 2016 Ljubljana, Slovenia Zbornik 19. mednarodne multikonference INFORMACIJSKA DRUŽBA – IS 2016 Zvezek C Proceedings of the 19th International Multiconference INFORMATION SOCIETY – IS 2016 Volume C Sodelovanje, programska oprema in storitve v informacijski družbi Collaboration, Software and Services in Information Society Uredil / Edited by Marjan Heričko 10. oktober 2016 / 10 October 2016 Ljubljana, Slovenia Urednik: Marjan Heričko University of Maribor Faculty of Electrical Engineering and Computer Science Založnik: Institut »Jožef Stefan«, Ljubljana Priprava zbornika: Mitja Lasič, Vesna Lasič, Lana Zemljak Oblikovanje naslovnice: Vesna Lasič Dostop do e-publikacije: http://library.ijs.si/Stacks/Proceedings/InformationSociety Ljubljana, oktober 2016 CIP - Kataložni zapis o publikaciji Narodna in univerzitetna knjižnica, Ljubljana 004.77(082)(0.034.2) MEDNARODNA multikonferenca Informacijska družba (19 ; 2016 ; Ljubljana) Sodelovanje, programska oprema in storitve v informacijski družbi [Elektronski vir] : zbornik 19. mednarodne multikonference Informacijska družba - IS 2016, 10. oktober 2016, [Ljubljana, Slovenija] : zvezek C = Collaboration, software and services in information society : proceedings of the 19th International Multiconference Information Society - IS 2016, 10 October 2016, Ljubljana, Slovenia : volume C / uredil, edited by Marjan Heričko. - El. zbornik. - Ljubljana : Institut Jožef Stefan, 2016 ISBN 978-961-264-099-6 (pdf) 1. Gl. stv. nasl. 2. Vzp. stv. nasl. 3. Dodat. nasl. 4. Heričko, Marjan 287010304 PREDGOVOR MULTIKONFERENCI INFORMACIJSKA DRUŽBA 2016 Multikonferenca Informacijska družba (http://is.ijs.si) je z devetnajsto zaporedno prireditvijo osrednji srednjeevropski dogodek na področju informacijske družbe, računalništva in informatike. Letošnja prireditev je ponovno na več lokacijah, osrednji dogodki pa so na Institutu »Jožef Stefan«. Informacijska družba, znanje in umetna inteligenca so spet na razpotju tako same zase kot glede vpliva na človeški razvoj. Se bo eksponentna rast elektronike po Moorovem zakonu nadaljevala ali stagnirala? Bo umetna inteligenca nadaljevala svoj neverjetni razvoj in premagovala ljudi na čedalje več področjih in s tem omogočila razcvet civilizacije, ali pa bo eksponentna rast prebivalstva zlasti v Afriki povzročila zadušitev rasti? Čedalje več pokazateljev kaže v oba ekstrema – da prehajamo v naslednje civilizacijsko obdobje, hkrati pa so planetarni konflikti sodobne družbe čedalje težje obvladljivi. Letos smo v multikonferenco povezali dvanajst odličnih neodvisnih konferenc. Predstavljenih bo okoli 200 predstavitev, povzetkov in referatov v okviru samostojnih konferenc in delavnic. Prireditev bodo spremljale okrogle mize in razprave ter posebni dogodki, kot je svečana podelitev nagrad. Izbrani prispevki bodo izšli tudi v posebni številki revije Informatica, ki se ponaša z 39-letno tradicijo odlične znanstvene revije. Naslednje leto bo torej konferenca praznovala 20 let in revija 40 let, kar je za področje informacijske družbe častitljiv dosežek. Multikonferenco Informacijska družba 2016 sestavljajo naslednje samostojne konference: • 25-letnica prve internetne povezave v Sloveniji • Slovenska konferenca o umetni inteligenci • Kognitivna znanost • Izkopavanje znanja in podatkovna skladišča • Sodelovanje, programska oprema in storitve v informacijski družbi • Vzgoja in izobraževanje v informacijski družbi • Delavnica »EM-zdravje« • Delavnica »E-heritage« • Tretja študentska računalniška konferenca • Računalništvo in informatika: včeraj za jutri • Interakcija človek-računalnik v informacijski družbi • Uporabno teoretično računalništvo (MATCOS 2016). Soorganizatorji in podporniki konference so različne raziskovalne institucije in združenja, med njimi tudi ACM Slovenija, SLAIS, DKZ in druga slovenska nacionalna akademija, Inženirska akademija Slovenije (IAS). V imenu organizatorjev konference se zahvaljujemo združenjem in inštitucijam, še posebej pa udeležencem za njihove dragocene prispevke in priložnost, da z nami delijo svoje izkušnje o informacijski družbi. Zahvaljujemo se tudi recenzentom za njihovo pomoč pri recenziranju. V 2016 bomo četrtič podelili nagrado za življenjske dosežke v čast Donalda Michija in Alana Turinga. Nagrado Michie-Turing za izjemen življenjski prispevek k razvoju in promociji informacijske družbe bo prejel prof. dr. Tomaž Pisanski. Priznanje za dosežek leta bo pripadlo prof. dr. Blažu Zupanu. Že šestič podeljujemo nagradi »informacijska limona« in »informacijska jagoda« za najbolj (ne)uspešne poteze v zvezi z informacijsko družbo. Limono je dobilo ponovno padanje Slovenije na lestvicah informacijske družbe, jagodo pa informacijska podpora Pediatrične klinike. Čestitke nagrajencem! Bojan Orel, predsednik programskega odbora Matjaž Gams, predsednik organizacijskega odbora i FOREWORD - INFORMATION SOCIETY 2016 In its 19th year, the Information Society Multiconference (http://is.ijs.si) remains one of the leading conferences in Central Europe devoted to information society, computer science and informatics. In 2016 it is organized at various locations, with the main events at the Jožef Stefan Institute. The pace of progress of information society, knowledge and artificial intelligence is speeding up, but it seems we are again at a turning point. Will the progress of electronics continue according to the Moore’s law or will it start stagnating? Will AI continue to outperform humans at more and more activities and in this way enable the predicted unseen human progress, or will the growth of human population in particular in Africa cause global decline? Both extremes seem more and more likely – fantastic human progress and planetary decline caused by humans destroying our environment and each other. The Multiconference is running in parallel sessions with 200 presentations of scientific papers at twelve conferences, round tables, workshops and award ceremonies. Selected papers will be published in the Informatica journal, which has 39 years of tradition of excellent research publication. Next year, the conference will celebrate 20 years and the journal 40 years – a remarkable achievement. The Information Society 2016 Multiconference consists of the following conferences: • 25th Anniversary of First Internet Connection in Slovenia • Slovenian Conference on Artificial Intelligence • Cognitive Science • Data Mining and Data Warehouses • Collaboration, Software and Services in Information Society • Education in Information Society • Workshop Electronic and Mobile Health • Workshop »E-heritage« • 3st Student Computer Science Research Conference • Computer Science and Informatics: Yesterday for Tomorrow • Human-Computer Interaction in Information Society • Middle-European Conference on Applied Theoretical Computer Science (Matcos 2016) The Multiconference is co-organized and supported by several major research institutions and societies, among them ACM Slovenia, i.e. the Slovenian chapter of the ACM, SLAIS, DKZ and the second national engineering academy, the Slovenian Engineering Academy. In the name of the conference organizers we thank all the societies and institutions, and particularly all the participants for their valuable contribution and their interest in this event, and the reviewers for their thorough reviews. For the fourth year, the award for life-long outstanding contributions will be delivered in memory of Donald Michie and Alan Turing. The Michie-Turing award will be given to Prof. Tomaž Pisanski for his life-long outstanding contribution to the development and promotion of information society in our country. In addition, an award for current achievements will be given to Prof. Blaž Zupan. The information lemon goes to another fall in the Slovenian international ratings on information society, while the information strawberry is awarded for the information system at the Pediatric Clinic. Congratulations! Bojan Orel, Programme Committee Chair Matjaž Gams, Organizing Committee Chair ii KONFERENČNI ODBORI CONFERENCE COMMITTEES International Programme Committee Organizing Committee Vladimir Bajic, South Africa Matjaž Gams, chair Heiner Benking, Germany Mitja Luštrek Se Woo Cheon, South Korea Lana Zemljak Howie Firth, UK Vesna Koricki Olga Fomichova, Russia Mitja Lasič Vladimir Fomichov, Russia Robert Blatnik Vesna Hljuz Dobric, Croatia Aleš Tavčar Alfred Inselberg, Israel Blaž Mahnič Jay Liebowitz, USA Jure Šorn Huan Liu, Singapore Mario Konecki Henz Martin, Germany Marcin Paprzycki, USA Karl Pribram, USA Claude Sammut, Australia Jiri Wiedermann, Czech Republic Xindong Wu, USA Yiming Ye, USA Ning Zhong, USA Wray Buntine, Australia Bezalel Gavish, USA Gal A. Kaminka, Israel Mike Bain, Australia Michela Milano, Italy Derong Liu, Chicago, USA Toby Walsh, Australia Programme Committee Bojan Orel, chair Andrej Gams Vladislav Rajkovič Grega Nikolaj Zimic, co-chair Matjaž Gams Repovš Franc Solina, co-chair Marko Grobelnik Ivan Rozman Viljan Mahnič, co-chair Nikola Guid Niko Schlamberger Cene Bavec, co-chair Marjan Heričko Stanko Strmčnik Tomaž Kalin, co-chair Borka Jerman Blažič Džonova Jurij Šilc Jozsef Györkös, co-chair Gorazd Kandus Jurij Tasič Tadej Bajd Urban Kordeš Denis Trček Jaroslav Berce Marjan Krisper Andrej Ule Mojca Bernik Andrej Kuščer Tanja Urbančič Marko Bohanec Jadran Lenarčič Boštjan Vilfan Ivan Bratko Borut Likar Baldomir Zajc Andrej Brodnik Janez Malačič Blaž Zupan Dušan Caf Olga Markič Boris Žemva Saša Divjak Dunja Mladenič Leon Žlajpah Tomaž Erjavec Franc Novak Bogdan Filipič iii iv KAZALO / TABLE OF CONTENTS Sodelovanje, programska oprema in storitve v informacijski družbi / Collaboration, Software and Services in Information Society ........................................................................................................................... 1 PREDGOVOR / FOREWORD ................................................................................................................................. 3 PROGRAMSKI ODBORI / PROGRAMME COMMITTEES ..................................................................................... 5 Information Privacy and Information Technology Outsourcing / Verber Domen .................................................... 7 A Survey on Geolocation Data Anonymization / Heričko Matija, Palanisamy Balaji, Tatjana Welzer, Marko Hölbl, Krishnamurthy Prashant, Zadorozhny Vladimir I. ....................................................................... 11 Analysis of Techniques for Managing Data on Mobile / Sagadin Klemen, Šumak Boštjan ................................. 15 Can We Predict Software Vulnerability with Deep Neural Networks? / Çatal Çağatay, Akbulut Akhan, Karakatič Sašo, Pavlinek Miha, Podgorelec Vili ............................................................................................... 19 Exhaustive Key Search of DES Using Cloud Computing / Drevenšek Aleks, Marko Hölbl ................................. 23 From a New Paradigm to Consistent Representation / Rakić Gordana, Kolek Jozef, Budimac Zoran ............... 27 Comparison of Agile Methods: Scrum Kanban and Scrumban / Brezočnik Lucija, Majer Črtomir ...................... 31 Introduction to Case Management Model and Notation / Kocbek Mateja, Polančič Gregor ................................ 35 Indeks avtorjev / Author index ................................................................................................................................ 39 v vi Zbornik 19. mednarodne multikonference INFORMACIJSKA DRUŽBA – IS 2016 Zvezek C Proceedings of the 19th International Multiconference INFORMATION SOCIETY – IS 2016 Volume C Sodelovanje, programska oprema in storitve v informacijski družbi Collaboration, Software and Services in Information Society Uredil / Edited by Marjan Heričko 10. oktober 2016 / 10 October 2016 Ljubljana, Slovenia 1 2 PREDGOVOR Konferenco “Sodelovanje, programska oprema in storitve v informacijski družbi” organiziramo v sklopu multikonference Informacijska družba že šestnajstič. Kot običajno, tudi letošnji prispevki naslavljajo aktualne teme in izzive, povezane z razvojem sodobnih programskih in informacijskih rešitev ter storitev. Sprejem in uspešna uporaba na informacijskih tehnologijah temelječih storitev je v veliki meri odvisna od njihove kakovosti, kar vključuje tudi skrb za zaščito zasebnosti in zaupnosti osebnih podatkov, ki se uporabljajo pri za zagotavljanju uporabnikom prilagojenih storitev. Agilni pristopi in uporabniško naravnan razvoj dodatno prispevata k boljši uporabniški izkušnji. Prispevki, zbrani v tem zborniku, omogočajo vpogled v in rešitve za izzive na področjih kot so npr.: - varovanje zasebnosti pri zunanjem izvajanju v informatiki; - metode in tehnike anonimizacije geo-lokacijskih podatkov; - hranjenje in obdelava podatkov na mobilnih napravah; - kulturni, sociološki in formalin izzivi pri integraciji podatkovnih virov; - analiza in napovedovanje ranljivosti v programski opremi; - kriptografski algoritmi in računalništvo v oblaku; - statična analiza kakovosti kode na osnovi konsistentne predstavitve programskih sistemov; - izbira primernih agilnih pristopov in metod; - nestrukturirano modeliranje primerov in notacija CMMN. Upamo, da boste v zborniku prispevkov, ki povezujejo teoretična in praktična znanja, našli koristne informacije za svoje nadaljnje delo tako pri temeljnem kot aplikativnem raziskovanju. 3 FOREWORD This year, the Conference “Collaboration, Software and Services in Information Society” is being organised for the sixteenth time as a part of the “Information Society” multi-conference. As in previous years, the papers from this year's proceedings address actual challenges and best practices related to the development of advanced software and information solutions. The acceptance and success of advanced ICT-based services depends heavily on their quality, including their ability to protect privacy and confidentiality of personal data which are used to provide better services to end-users. User-centric and agile development approaches can also contribute significantly to improved user experience, whereas efficient quality assurance should not be limited to specific programming paradigms and platforms. Papers in these proceedings provide a better insight and/or propose solutions to challenges related to: - Information privacy in IT/IS outsourcing; - Methods and techniques for geolocation data anonymization; - Data storage and processing on mobile devices; - Cultural, social and legal issues in data integration; - Software vulnerability prediction; - Cryptography issues caused by cloud computing; - Consistent representation of software systems to apply software quality static analysis; - Selection of suitable agile method(s); - Case management modelling and notation. We hope that these proceedings will be beneficial for your reference and that the information in this volume will be useful for further advancements in both research and industry. Prof. Dr. Marjan Heričko CSS 2016 – Collaboration, Software and Services in Information Society Conference Chair 4 PROGRAMSKI ODBOR / PROGRAM COMITTEE Dr. Marjan Heričko University of Maribor, Faculty of Electrical Engineering and Computer Science Dr. Ivan Rozman University of Maribor, Faculty of Electrical Engineering and Computer Science Dr. Lorna Uden Staffordshire University, Faculty of Computing, Engineering and Technology Dr. Gabriele Gianini University of Milano, Faculty of Mathematical, Physical and Natural Sciences Dr. Hannu Jaakkola Tampere University of Technology Information Technology (Pori) Dr. Mirjana Ivanović University of Novi Sad, Faculty of Science, Department of Mathematics and Informatics Dr. Zoltán Porkoláb Eötvös Loránd University, Faculty of Informatics Dr. Aleš Živkovič Innopolis University, Faculty of Computer Science Dr. Boštjan Šumak University of Maribor, Faculty of Electrical Engineering and Computer Science Dr. Gregor Polančič University of Maribor, Faculty of Electrical Engineering and Computer Science Dr. Luka Pavlič University of Maribor, Faculty of Electrical Engineering and Computer Science 5 6 Information Privacy and Information Technology Outsourcing Domen Verber Faculty of Electrical Engineering and Computer Science, University of Maribor Maribor, Slovenia +386 2 220 7434 domen.verber@um.si ABSTRACT some of the data stored in the datacentres of the primary company, In this paper we discuss the issue of information privacy in the must be shared with the other IT providers. This raises the question regime of Information Technology Outsourcing (ITO). Nowadays, of responsibilities for information protection and increases the privacy in general, and information privacy in particular, is a very complexity of assuring it significantly. important and much debated issue. With ITO the companies The term information privacy is strongly related to the Data contracted out IT infrastructure and/or IT related services, such as Protection. The latter represents a much wider concept and is programing, to other companies. By doing this the responsibility discussed only briefly. for information privacy is shared by several parties. The owner of In the first part of the paper some basic introduction to information the data must protect the privacy of their customers even in the case privacy is given. In the second part some challenges are presented of ITO. However, this may be in contradiction with the need for for assuring information privacy in the context of ITO, software efficient utilization of data and may hinder the proper software development and maintenance. In the third part two case studies are development, testing and maintenance. demonstrated related to the practical implementation of The paper contains a short introduction to information privacy and information privacy. presents some real-world case studies related to this topic. 2. INFORMATION PRIVACY Categories and Subject Descriptors 2.1 Introduction to information privacy K.4.1. [Computers and Society]: Public Policy Issues – Privacy Information privacy (or data privacy) considers the relationship D.2.9. [Software]: Management - Programming teams between the collection and dissemination of data. Information privacy is a part of a broader term, data protection, which is the General Terms process of safeguarding important information from corruption, Management, Performance, Security, Human Factors, Legal unwanted exploitation and/or loss. In most cases, information Aspects privacy is related to personally identifiable information in Keywords combination with other attributes, such as financial records, medical history, religion and beliefs, shipping habits, web surfing Information privacy, data protection, anonymization, information behavior, etc. There are other sensitive data related with the technology outsourcing. business processes of some companies that must be protected also. 1. INTRODUCTION Those are trade contracts, financial transactions and similar, made by other companies and persons. The rapid growth of the Internet, ever increasing number of mobile phones and smart devices, coupled with the new business practices, Information privacy involves data storage and data processing have raised far-reaching questions about the future of privacy. technologies, and the public, legal and political issues surrounding Computers and applications track us in almost everything we do. them. Privacy concerns extend through the entire life-span of some Data are collected when we click on some link in the web browser. information. It considers how such information is collected, stored, With the support of our Loyalty Cards, the grocery store collects used and destroyed, either in digital or some other form. information about what we are buying. Data are stored about us in Most countries have derived strict laws that protect personal some medical records, financial records, school records, etc. In privacy [3]. The EU Data Protection Regulation [4] promotes two most cases, those data can be beneficial to us. Modern personal main principles of data privacy: Privacy by design and privacy by recommender systems can be very efficient and helpful, data default. Privacy by design means that each new service or product records can speed-up the utilization of services, the doctors can that makes use of personal data must take the protection of such devise better diagnostics, etc. However, some of those data are data into consideration. IT developers must take privacy into intimate to us and we do not want to reveal them to unauthorized account during the whole life cycle of the system or process companies and persons. development. Privacy by Default means that the strictest privacy The paper discusses the issue of information privacy in the modern settings apply automatically once a customer acquires a new IT landscape [1]. Most companies today employ some sort of product or service. No manual change to the privacy settings should outsourcing (and offshoring) to reduce the costs. With the be required on the part of the user. There is also a temporal element Information Technology Outsourcing, the companies are to this principle, as personal information must, by default, only be contracting out their IT infrastructure and/or IT related services, kept for the amount of time necessary to provide the product or such as programing, to other companies. As a consequence, at least 7 service. Slovenia adopted most of the EU Regulations and has very sounds reasonable, it is almost impossible to implement it entirely. progressive policies regarding information privacy. For example, within the modern user interface, the end-user may see a list of several objects related with the sensitive data at the 2.2 Information privacy and IT solutions same time. However, some of the data can be seen only if the list is The main challenge of data privacy is how to maximize the scrolled. It would be very cumbersome to implement the proper utilization of data while protecting personally identifiable auditing in this case. The alternative is to show the data one by one, information. For example, the end user wish to have access to the but this would diminish the user experience. Sensitive data can also list of customers with their Personal Identification Numbers. This be printed out or exported. In this case it is impossible to track the can be very convenient for unique identification of a person. users with access. The end-user should be aware of Auditing. This However, the Personal Identification Numbers are considered as would prevent any unnecessary and unwarranted access of the data. private information because they can reveal his or her birthdate and To enter a reason every time the data is accessed is not always gender. practical and would slow-down the business process scientifically To maintain information privacy, first, we need to assure the data in most cases. Again, this can be avoided with proper authentication security. All potential measures to protect the privacy are useless if and authorization. the data can be accessed by unauthorized parties. We need to Data export and external access to the data with sensitive consider all software, hardware and human resources to address this information should be made only with trusted parties and with clear issue and implement the proper actions. The human resources are and justifiable intention. We cannot track what happens to the data the most difficult to consider. A frustrated Data Administrator may outside of our system. The printing of sensitive data should be expose the data to the public or even to some criminal group. We forbidden or limited to obligatory documents. All such operations must also contemplate the employees and the end-users, who may, must be audited properly. unintentionally or intentionally, expose the private information for Both employees and the end-user should be educated about no justifiable reason. It is the responsibility of the IT company to information privacy and the data security. Most confidentiality minimize such risks. breaches are made unintentionally by the users who were not aware Nowadays, it is taken for granted that the data can be accessed of the Regulations and the significance of information privacy. everywhere: From the web, from mobile devices and remotely from personal computers. This presents an additional challenge. We 3. INFORMATION PRIVACY WITHIN must contemplate different scenarios to maintain information SOFTWARE DEVELOPMENT AND privacy and data security. The devices can be lost or stolen, the MAINTENANCE IN ITO communication channels can be eavesdropped, a badly implemented web application can be hacked, etc. All sensitive data 3.1 Information Technology Outsourcing (not only the private ones) should be stored and transferred on the In general, outsourcing involves the contracting out of a business communication channels in encrypted form. By this, the data is process and/or the assets to another party or company. One kind of protected if it is stolen or accessed unwarrantedly by the business outsourcing is Information Technology Outsourcing administrative personnel. (ITO), which is a company's outsourcing of its IT infrastructure The laws and regulations related to information privacy and data and/or IT related services, such as programing, to other companies. protection are changing constantly. Therefore, the IT solution With the expansion of Cloud Computing in recent years it has providers must reassess the compliance with information privacy become more and more popular for the companies to transfer their and other security regulations continually. This may be difficult for IT infrastructure to the Cloud. This would reduce the costs of applications that have already been in use for a long time and hardware and administrative personnel. Nowadays, the Cloud cannot be replaced or adapted easily. suppliers provide strong data protection. However, several controversial cases where other parties and even governments have 2.3 Basic techniques for assuring the access to the data has slowed down the migration. Most of the information privacy medium and large sized companies today still try to maintain their own datacentres. The best approach to information privacy is to minimize the number of situations where privacy can be breached. To achieve The trend of ITO is also observed with the in-house software this, we must remove all sensitive data from the basic parts of the development. A lot of companies have reduced or even eliminated user interface and show them only with the explicit request from their IT departments and contracted them out to third parties. the user. One example is the usage of the Personal Identification There exist several models of ITO. In some cases, the company has Number mentioned above. If possible, any personalized no IT Department at all. In this case, the IT solution provider serves identification should be replaced by computer generated as the sole service provider. It maintains the software and the identification. With modern automatic identification techniques hardware, performs all the backups, trains the end-users, etc. In (e.g. RFID cards, bar codes, etc.), it is possible to implement most cases, the companies maintain a small IT Department which seamless data identification and speed-up the processing of data is responsible for the smooth running of software and hardware in- without the need to expose some personal identification codes. house. The IT solution provider, in addition to maintaining the However, if this is not possible, or if this can slow-down the software, is responsible for all off-premise assets. Some companies business processes significantly, it is better to keep some of have large IT Departments which maintain the hardware, and may sensitive data available to the authorized end-user and make some run their own applications in parallel with several outsourced obligatory contract with them to maintain the privacy. solution providers. A well-established practice in information privacy is Auditing. The software solution must keep track of who has accessed the sensitive 3.2 Data sharing and information privacy data and when and whose data was seen. In some cases, the user Most business processes today rely on the acquisition and must enter the reason why the data was accessed. Although this processing of some data. In the most common scenario, some sort 8 of relational database is used. When the software development is The university staff has full access and can modify all personal outsourced, some of this data must be available to the IT solution attributes for the students of a related Faculty and read-only access providers. This may be in conflict with the requirements for the data to some of the attributes of the other students. protection and information privacy. Data protection can be At first, the unique personal number was one of the primary keys sustained with well-established techniques (e.g. firewalls, VPN that identified a student as a person. It was (and still is) one of the communication channels, etc.). This is in the main interest of both attributes that is shown on the lists user interfaces. There was an the owner of the data and the IT company. On the other hand, it is idea to hide this attribute entirely; however, this would slow down much more difficult to maintain the information privacy. For proper some processes. Recently, the unique personal number of the software development the IT company possesses the elevated students was replaced with a synthetic student identification priority level to access the data directly, although some security number. The identification procedure was automated with the measures can be avoided easily with the customized version of the RFID cards applications, etc. The owner of the data has no guarantee that the information would not be used improperly or even sold off. Once entered, the private attributes of the student can be changed only explicitly and the reason must be presented. All such changes Data anonymization and data obfuscation can be used to tackle this are audited. In addition, all print-outs and exports where personal problem [5]. It is common practice that a copy of the database is data of the students are presented are also recorded. The users have used for the development, testing and training. In this database, all access to the audit trails. However, only the administrators can sensitive data can be replaced with some arbitrary content. For retrieve the details (e.g., to see which attributes were changed). better data protection, the copy of the database and anonymization is performed by the owner; the outsourced solution providers have The development and testing of the applications is performed on a no direct access to the originals. However, there are some testing databased with some of the personal data obfuscated. This drawbacks to this scheme. Firstly, some faults in the application is demonstrated in Figure 1. can be related to the content of the original data and may not be duplicated easily in the test environment. Secondly, in comparison, to copy the database as a whole, data anonymization can be a challenging and time consuming task, especially for the large data sets. The data size required for some tables can be duplicated if the data base system integrates some sort of change traces, and thirdly, it is almost impossible to assure complete anonymization. From the secondary attributes, and with some social engineering, it is possible to reconstruct the identity of a subject. 4. CASE STUDIES In this section we represent real-world examples of IT solutions where information privacy plays a significant role. For more than 25 years we were employed as an external solution provider to small and medium sized companies in this part of Europe. In most Figure 1. Obfuscated list of students in the test database. case we were the sole solution provider for almost all software solutions and with full administrative access to the data. In this case The same database is used for the training and the testing performed we have also envisioned the role of information privacy and have by the end-users. The primary intention of obfuscation is to prevent implemented all the measures by ourselves. At first, these were unintentional exposure of personal data and not to isolate the IT “common-sense” rules. Later on, we tried to comply with all the provider form the client. If necessary, the tests can still be suggestions and requirements of the personal data protection performed on the primary database. legislation of Slovenia and the EU. At first, this was also true for the two examples presented here. However, several years ago, our 4.2 Financial information subsystem of a clients became more aware of the issues of information privacy and bank we upgraded our cooperation to a higher level. The financial information subsystem is used for tracking financial and other related transactions of a company. It provides 4.1 Academic Information System bookkeeping, reposting, data analyze and other information of the The Academic Information System is responsible for the smooth financial records of a client of the company. A bank, as any other implementation of academic activity in the university. It allows the company, is also obliged to keep these records. Furthermore, there academic community, university staff and public to access a wide are some unique functionalities specific to the banks. Here, the range of information. Among them it keeps all the records of the private information is not some personal attributes of persons but students and their marks. Some personal attributes and the marks the identities of those financial clients. are considered personal data and should be protected from unattained disclosure. For obvious reasons, the banks have very strict data protection measures. As in the example above, all the development and testing Each student has access to his or her own data. They may change is done on a separate database with some obfuscated attributes. some of the secondary personal attributes and some preferences. Despite being a testing database, the outside access to it is protected The teaching staff (subject leaders and their assistants) has direct heavily with time-changing encryption keys and firewalls. We are access to the application forms of the exams for which they are also isolated from the primary database and have direct access only responsible. For security reasons, their data are recorded separately in some exceptional circumstances. Because of the specifics in the and transferred into the student records at the end of the exam. Only database management system it is not very easy to prepare a copy. the subject leader can do that There a lot of database triggers that must be switched off during the copying of the data. Because all this is time consuming, the 9 secondary database is updated only occasionally. The primary 6. REFERENCES database also has direct access to some other information [1] Solove D.J., Schwartz, P.M. 2011. Privacy, Information, and subsystems of the bank, which are not available on the test Technology. Aspen Publishers. database. Instead of that, the client provided us with the working development environment inside the company that can be used on [2] Cullen, S., Lacity M., Willcocks, L.P. 2014. Outsourcing: the production data if needed, under the supervision of their IT All You Need To Know. White Plume Publishing. staff. [3] Solove, D.J. 2014. Information Privacy Law. Wolters Kluwer Law & Business. 5. CONCLUSION [4] EU data protection regulation. 2016. Privacy is a much debated issue today. IT solutions are obliged to http://www.eudataprotectionregulation.com/. Visited on: maintain information privacy with the data they process. In the case 2016/09/18. of IT Outsourcing, the complexity to achieve this is much higher. [5] Ferrer, J,D., Sanchez D., Comas, J.S. 2016. Database The requirements imposed by the information privacy sometimes Anonymization: Privacy Models, Data Utility, and contradicts the requirements for effective data processing and good Microaggregation-based Inter-model Connections. Morgan user experience. While the strict information privacy is possible, in & Claypool Publishers. real-life scenarios it is sometimes better to make some compromises. Furthermore, the strict information privacy cannot be achieved only with the IT means. Probably, the biggest role for this lies in the persons that are involved with the solutions. 10 A survey on geolocation data anonymization Matija Heričko Balaji Palanisamy Tatjana Welzer Faculty of Electrical Engineering and University of Pittsburgh, Faculty of Electrical Engineering and Computer Science, School of Information Sciences, Computer Science, University of Maribor Pittsburgh, USA University of Maribor Maribor, Slovenia Maribor, Slovenia matija.hericko@student.um.si bpalan@pitt.edu tatjana.welzer@um.si Marko Hölbl Prashant Krishnamurthy Vladimir I. Zadorozhny University of Pittsburgh, Faculty of Electrical Engineering and University of Pittsburgh, School of Computer Science, School of Information Sciences, Information Sciences, Pittsburgh, USA University of Maribor Pittsburgh, USA vladimir@sis.pitt.edu Maribor, Slovenia prashant@sis.pitt.edu marko.holbl@um.si ABSTRACT Location-based services require users to submit their geolocation The advancements in positioning technologies and mobile devices along with their query, so that the service can contextualize the have made it possible for location-based services to become very response based on the users' location. Examples of some frequently popular, since they provide contextualized information for users used location-based services are navigation, point of interest depending on their position. Despite the big numbers of users that application ("where is the nearest ATM?"), traffic alerts, weather use these services, many are wary of their risks and have concerns information, location-based games, etc. [3, 4]. about their privacy. Data anonymization plays an important part in However, the convenience of these services is accompanied by location-based services. Since the services do not have strict some security concerns, because of the sensitive nature of the users' regulations, it is up to the data anonymization methods and location information. If the user wishes to use location-based techniques to protect the users' privacy. In this paper, we present a services he/she must send his/her location along with his/her survey of data anonymization in the context of geolocation and request (or query) to an untrusted third party server, his/her privacy location-based services. We provide an overview of recent work in can be intruded easily. If the server has malicious intent it can easily the research field, summarise the methods, architectures and use the location information for its own malicious actions, or the configurations used in the research and provide some open data can be forwarded or sold to some other third party. Users problems, challenges and direction for further research. should be aware of the risks that accompany location-based services and should take steps to protect their privacy [1 ,5, 6]. Categories and Subject Descriptors In this paper we conduct a survey of data anonymization, which is H.4 [Information Systems Applications]: Miscellaneous; D.2.8 one of the ways to protect user privacy. We survey the field of data [Software Engineering]: Metrics complexity measures, anonymization in the context of geolocation data. More performance measures specifically, we look into geolocation data that are used by location- based services. With this review of the data anonymization we wish General Terms to determine which different methods are available to achieve the Theory required data anonymity level. We also review briefly the different metrics for measuring the achieved level of anonymity and examine the environment in which it is used. Keywords data anonymization, data generalization, location-based services, The rest of the paper is structured as follows: In Section 2 we geolocation data overview and discuss related work, in Section 3 we present current work, as well as methods of achieving user privacy, in Section 4 we 1. INTRODUCTION discuss some open issues and future research directions of data anonymization in location-based services, and in Section 5 we The Internet of Things (IoT) paradigm, where everything and conclude the paper. everyone is connected, enables us to witness significant advances in wireless network communication and positioning technologies, such as Wi-Fi, NFC, RFID, 3G/4G network, Bluetooth, etc. 2. RELATED WORK Additionally, the new paradigm facilitates devices that support Data anonymization plays a big role in preserving the privacy of network communication and geo-positioning [1]. users and is, therefore, often an important security requirement in many different technological areas. Due to the importance of data These advancements, together with the growth of the network anonymization, many researchers tackle the problem in different infrastructure, provide an excellent platform for applications which application areas. The difference between areas is the techniques make use of the devices' geolocating ability. We are, therefore, used to achieve data anonymity and the environment in which it is witnessing an increase in location-based services, which use the used [7, 8]. In [7] Parmar and Jinwala surveyed the area of wireless geo-spatial location information to deliver on-line location sensor networks and they observed the approaches to data enhanced information [2]. aggregation. The objectives of data aggregation in wireless sensor 11 networks are end-to-end privacy preservation and aggregation at classify the techniques into two categories based on the involved intermediate nodes. The technique most used in wireless sensor actors [15]. These categories are anonymization server-based (or networks is privacy homomorphism and its variants, which assures centralised) schemes and mobile device-based (or decentralised) privacy and helps with data aggregation, but affects integrity and schemes [1]. As the names imply, the server-based schemes use a freshness negatively. They have concluded that data aggregation trusted server for the anonymization of the data, while the mobile could possibly be used in cloud computing and that there is need device-based schemes do not use a server to achieve for more protocols that provide integrity and freshness. anonymization, but rely on the sharing of information between users [1]. Dhand and Tyagi in [9] further reviewed the data techniques to achieve data aggregation in wireless sensor networks. They Centralised schemes make use of the trusted anonymizing server to identified several cluster-based approaches which minimise anonymize the query of the user. The server first removes sensitive communication requirements and, at the same time, maximise information about the user (such as the name, age, etc.) and then it network lifetime. They have divided the protocols into anonymizes location information by either cloaking, using dummy homogenous and heterogeneous and each of those groups further locations or confusing the path. The biggest disadvantage of a into single-hop protocols and multi-hop protocols. The authors centralised scheme is that the data is gathered in a single location have concluded that data aggregation extends the network and, if the server is compromised, all the data that it holds is resources, since it lowers the data that needs to be transmitted. compromised as well [15, 17, 18, 19, 20]. Decentralised schemes do not use a trusted server to anonymize the queries, instead they Data anonymization is also an important topic in the field of big use other methods. The most prevalent method uses peer-to-peer data. A survey on big data privacy was done by Vennila and communication. In this method the users' device searches for Priyadashini in [8] where big data sets are sent to a cloud. They neighbouring devices and uses their location to anonymize its own have observed that traditional privacy models and data location information in the query. The biggest disadvantage of anonymization approaches are not applicable to big data sets. these schemes is that a user has to rely on neighbouring devices In [10] the authors surveyed the field of location-based wireless and, if there are not enough devices nearby, the location services and they classified services based on various attributes. information cannot be anonymized. Another drawback is that the They analysed the usage trends of services, technologies used by computational overhead may be too much for some smaller devices the services, protocols and standards used and architecture. They [1, 6]. have mapped the requirements with the technical aspects with the Anonymity-based techniques try to preserve users' privacy by purpose of increasing the awareness. making his/her query anonymous with the use of different methods. The most popular and important method of these techniques is 3. DATA ANONYMIZATION METHODS cloaking. Cloaking is divided further into spatial and spatio- AND TECHNIQUES temporal cloaking. Both of these methods make use of a metric called k-anonymity which was first proposed by Sweeney [21]. K- Data anonymization in location-based services is used to protect anonymity means that a user cannot be distinguished from (k-1) user privacy. User location information is anonymized in such a other users whose data is also in the same data set. Two other way that a service cannot infer the users' identity, interests, or any metrics that have gained traction recently are entropy-based metric other specific information, but rather the data is so generalised that and l-diversity metric. The basis for the entropy based metric is it can be used to describe a multitude of different users. On the other information theory, where the entropy is a measure of uncertainty hand, data should still be specific enough to allow the user to enjoy or unpredictability. This means that for the entropy-based metric the benefits and convenience of services contextualized to his/her measures with what level of certainty can we define the real location [1, 6]. location among a group of locations? The L-diversity metric is With the wide spread of location-based services that are used daily based on a graph theory and it examines the l-neighbourhood by many users, data privacy became a big concern as services come graphs and the connections between neighbours to try and with many hidden risks and threats to user privacy. Threats to determine the user [22]. privacy arise from a multitude of actions, such as the collection of Anonymity-based techniques are the most researched area, and personal information, unauthorised use of personal information, there are many different variants. Some researchers focus on improper access to personal information, bad storage of personal cloaking of mobile users where the issues are the continuous information and other actions similar to or derived from these queries of users and their movements [4, 5, 23, 24, 25, 26, 27, 28, actions [11, 12]. 29]. Others research centralised schemes with a focus on Privacy is the users' right to have control over how information microscopic or snapshot queries, where every query stands alone about him/her is collected, maintained, used, disclosed or shared [15, 17, 18, 19, 20, 30, 31]. Less researched are some combinations, [13], and we can classify location privacy into microscopic and such as the hybrid approach to cloaking, which uses both a macroscopic levels [14], where microscopic presents a single user centralised and decentralised scheme [6], or a scheme that uses query and macroscopic presents a whole journey with multiple middleware [32] to provide privacy preservation [33, 34]. queries. [15] also divides the macroscopic level further into Obfuscation-based techniques try to prevent services from journey-level and long term location privacy. Techniques to identifying the user, whether by adding some noise to his location achieve location privacy can be divided into three major groups. information or by shifting the original location. The idea behind These are anonymity-based schemes, obfuscation-based and false location obfuscation is that the real location is transformed into location or dummy generation-based [15, 16]. another space in which their spatial relationships are maintained to The difference between the schemes is that the anonymity-based answer the location based queries. Obfuscation is not as frequent as and obfuscation-based can only provide location privacy for the other two techniques, perhaps because it is similar to dummy microscopic levels, and the dummy generation-based can provide generation. Maybe these two methods will be known under one for both the microscopic and the macroscopic levels. We can also 12 name in the future. Nevertheless, it is a very active research field Along with these two bigger problems, we also believe that the area [35, 36, 37, 38, 39]. of measuring the methods and examining their real-life application adequacy will also grow. And, while obfuscation based techniques do their best to conceal the users’ real location information, false location-based techniques do not try to conceal the location information, but rather they hide 5. CONCLUSION the users’ location information in plain sight. In this article we surveyed the area of data anonymization in the context of geolocation. Specifically, we have investigated False location-based techniques protect user privacy either by mechanisms of protecting user privacy, classified them according reporting false locations to the location based-services, or by to the architecture and techniques used, discussed some of these generating some dummy locations which are added to the real techniques, and reflected on some of the open research questions location and packaged into a query, so that the service does not and problems. We have observed that the most popular method for know which location is the real one. In these false location-based protecting users' privacy is cloaking, which widens the location methods there is a choice of using a random [40] or a carefully area of the user to a bigger field that encompasses multiple users planned generator [41, 42] where the generator uses some other and, while this is a good method for protecting the user’s privacy, principles and techniques to generate the dummy data, such as soft it also lowers the location-based service’ precision, which is not computing techniques in [15]. ideal for the user experience. The research field of data So far there has not been a universal method developed, that would anonymization will, therefore, continue to see many problems provide the desired privacy protection across all the different tackled and researches published. architectures and configurations. Each of the methods and techniques discussed in this paper has its strengths and weaknesses, 6. ACKNOWLEDGMENTS and we must observe these qualities when deciding on which This work was supported by the Slovenian Research Agency under method to use. Another factor that we must pay attention to is also the grant no. BI-US/15-16-067 and by the Research Programme P2- the rules and regulations of the country in which the location-based 0057. service provider is located, as that may also play a big role in protecting the privacy of the user. 7. REFERENCES [1] B. Niu, X. Zhu, Q. Li, J. Chen, and H. Li, “A novel attack to 4. OPEN PROBLEMS AND FURTHER spatial cloaking schemes in location-based services,” Future DIRECTION OF RESEARCH Gener. Comput. Syst. , vol. 49, pp. 125–132, Aug. 2015. The field of data anonymization is fairly popular and well-known [2] R. Abbas, K. Michael, M. Michael, and R. Nicholls, “Key and there are many researches being carried out. Despite that, there government agency perspectives on location based services are still many challenges and open problems that await future regulation,” Comput. Law Secur. Rev. , vol. 31, no. 6, pp. research. 736–748, Dec. 2015. One such open problem is the balance between the location-based [3] W. Zhang, X. Cui, D. Li, D. Yuan, and M. Wang, “The services’ precision and the user privacy. This is a particularly location privacy protection research in location-based interesting topic, because of the delicate balance between the two service,” 2010, pp. 1–4. opposing interests. Users wish for the location-based services to [4] C.-Y. Chow, M. F. Mokbel, and X. Liu, “Spatial cloaking provide information with pin-point accuracy and, at the same time, for anonymous location-based services in mobile peer-to- not expose their location in such detail. And therein lies the peer environments,” GeoInformatica, vol. 15, no. 2, pp. dilemma because, if we want the service to provide precise 351–380, Apr. 2011. information, we must provide it with the most detailed location information that we can, and in order to secure the location privacy [5] I. Memon, “Authentication User’s Privacy: An Integrating of the user, we have to broaden the location to a range of multiple Location Privacy Protection Algorithm for Secure Moving users, which diminishes the precision of the service. So this Objects in Location Based Services,” Wirel. Pers. problem will continue to be a priority for researchers, as they will Commun. , vol. 82, no. 3, pp. 1585–1600, Jun. 2015. try to find the best balance between the services’ precision and [6] C. Zhang and Y. Huang, “Cloaking locations for anonymous users’ location privacy. location based services: a hybrid approach,” Another open problem that is interesting, but has not seen much GeoInformatica, vol. 13, no. 2, pp. 159–182, Jun. 2009. research, is the problem of setting the desired level of privacy [7] K. Parmar and D. C. Jinwala, “Concealed data aggregation protection dynamically. This problem is interesting because it in wireless sensor networks: A comprehensive survey,” would give the users the power to choose what level of security Comput. Netw. , vol. 103, pp. 207–227, Jul. 2016. they want for themselves. So far, there has been little done in the [8] S. Vennila and J. Priyadarshini, “Scalable Privacy way of allowing users to choose their desired level of privacy. Preservation in Big Data a Survey,” Procedia Comput. Sci. , Often, users have to accept the implicit demands or terms of use of vol. 50, pp. 369–373, 2015. location-based services. On the other hand, if users take advantage of one of the methods discussed in this paper, they also simply have [9] G. Dhand and S. S. Tyagi, “Data Aggregation Techniques in to accept the level of privacy protection that method is designed WSN: Survey,” Procedia Comput. Sci. , vol. 92, pp. 378– with, which leaves users with two absolute options, either ‘full’ 384, 2016. exposure or ‘full’ protection. So we expect some research to go in [10] D. Mohapatra and S. S.B, “Survey of location based wireless that direction in the future. services,” 2005, pp. 358–362. [11] R. P. Minch, “Privacy issues in location-aware mobile devices,” 2004, p. 10 pp. 13 [12] J. V. Chen, W. Ross, and S. F. Huang, “Privacy, trust, and IEEE Trans. Parallel Distrib. Syst. , vol. 23, no. 10, pp. justice considerations for location‐based mobile 1805–1818, Oct. 2012. telecommunication services,” info, vol. 10, no. 4, pp. 30–45, [28] M. Y. Mun, D. H. Kim, K. Shilton, D. Estrin, M. Hansen, Jun. 2008. and R. Govindan, “PDVLoc: A Personal Data Vault for [13] S. Saravanan and B. Sadhu Ramakrishnan, “Preserving Controlled Location Data Sharing,” ACM Trans. Sens. privacy in the context of location based services through Netw. , vol. 10, no. 4, pp. 1–29, Jun. 2014. location hider in mobile-tourism,” Inf. Technol. Tour. , vol. [29] X. Pan, X. Meng, and J. Xu, “Distortion-based anonymity 16, no. 2, pp. 229–248, Jun. 2016. for continuous queries in location-based mobile services,” [14] R. Shokri, J. Freudiger, and J. Hubaux, “A Unified 2009, p. 256. Framework for Location Privacy,” 2010. [30] F.-Y. Leu, “A novel network mobility handoff scheme using [15] F. Tang, J. Li, I. You, and M. Guo, “Long-term location SIP and SCTP for multimedia applications,” J. Netw. privacy protection for location-based services in mobile Comput. Appl. , vol. 32, no. 5, pp. 1073–1091, Sep. 2009. cloud computing,” Soft Comput. , vol. 20, no. 5, pp. 1735– [31] A. Samanta, F. Zhou, and R. Sundaram, “SamaritanCloud: 1747, May 2016. Secure infrastructure for scalable location-based services,” [16] H. Lu, C. S. Jensen, and M. L. Yiu, “PAD: privacy-area Comput. Commun. , vol. 56, pp. 1–13, Feb. 2015. aware, dummy-based location privacy in mobile services,” [32] G. Myles, A. Friday, and N. Davies, “Preserving privacy in 2008, p. 16. environments with location-based applications,” IEEE [17] Baik Hoh and M. Gruteser, “Protecting Location Privacy Pervasive Comput. , vol. 2, no. 1, pp. 56–64, Jan. 2003. Through Path Confusion,” 2005, pp. 194–205. [33] J. Meyerowitz and R. Roy Choudhury, “Hiding stars with [18] R. Cheng, Y. Zhang, E. Bertino, and S. Prabhakar, fireworks: location privacy through camouflage,” 2009, p. “Preserving User Location Privacy in Mobile Data 345. Management Infrastructures,” in Privacy Enhancing [34] B. Niu, Xiaoyan Zhu, Xiaosan Lei, Weidong Zhang, and Hui Technologies, vol. 4258, G. Danezis and P. Golle, Eds. Li, “EPS: Encounter-Based Privacy-Preserving Scheme for Berlin, Heidelberg: Springer Berlin Heidelberg, 2006, pp. Location-Based Services,” 2013, pp. 2139–2144. 393–412. [35] C. Ardagna, M. Cremonini, S. De Capitani di Vimercati, and [19] B. Gedik and Ling Liu, “Location Privacy in Mobile P. Samarati, “An Obfuscation-Based Approach for Systems: A Personalized Anonymization Model,” 2005, pp. Protecting Location Privacy,” IEEE Trans. Dependable 620–629. Secure Comput. , vol. 8, no. 1, pp. 13–27, Jan. 2011. [20] M. Gruteser and D. Grunwald, “Anonymous Usage of [36] C. A. Ardagna, M. Cremonini, E. Damiani, S. De Capitani Location-Based Services Through Spatial and Temporal di Vimercati, and P. Samarati, “Location Privacy Protection Cloaking,” 2003, pp. 31–42. Through Obfuscation-Based Techniques,” in Data and [21] L. Sweeney, “k-ANONYMITY: A MODEL FOR Applications Security XXI, vol. 4602, S. Barker and G.-J. PROTECTING PRIVACY,” Int. J. Uncertain. Fuzziness Ahn, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, Knowl.-Based Syst. , vol. 10, no. 05, pp. 557–570, Oct. 2002. 2007, pp. 47–60. [22] B. Zhou and J. Pei, “The k-anonymity and l-diversity [37] G. Ghinita, P. Kalnis, A. Khoshgozaran, C. Shahabi, and K.- approaches for privacy preservation in social networks L. Tan, “Private queries in location based services: against neighborhood attacks,” Knowl. Inf. Syst. , vol. 28, no. anonymizers are not necessary,” 2008, p. 121. 1, pp. 47–77, Jul. 2011. [38] A. Khoshgozaran and C. Shahabi, “Blind Evaluation of [23] I. Bilogrevic, M. Jadliwala, V. Joneja, K. Kalkan, J.-P. Nearest Neighbor Queries Using Space Transformation to Hubaux, and I. Aad, “Privacy-Preserving Optimal Meeting Preserve Location Privacy,” in Advances in Spatial and Location Determination on Mobile Devices,” IEEE Trans. Temporal Databases, vol. 4605, D. Papadias, D. Zhang, and Inf. Forensics Secur. , vol. 9, no. 7, pp. 1141–1156, Jul. G. Kollios, Eds. Berlin, Heidelberg: Springer Berlin 2014. Heidelberg, 2007, pp. 239–257. [24] C.-Y. Chow, M. F. Mokbel, J. Bao, and X. Liu, “Query- [39] M. L. Yiu, G. Ghinita, C. S. Jensen, and P. Kalnis, aware location anonymization for road networks,” “Outsourcing Search Services on Private Spatial Data,” GeoInformatica, vol. 15, no. 3, pp. 571–607, Jul. 2011. 2009, pp. 1140–1143. [25] C.-Y. Chow, M. F. Mokbel, and X. Liu, “A peer-to-peer [40] H. Kido, Y. Yanagisawa, and T. Satoh, “An anonymous spatial cloaking algorithm for anonymous location-based communication technique using dummies for location- service,” 2006, p. 171. based services,” 2005, pp. 88–97. [26] H. Lee, B.-S. Oh, H. Kim, and J. Chang, “Grid-based [41] B. Niu, Q. Li, X. Zhu, G. Cao, and H. Li, “Achieving k- cloaking area creation scheme supporting continuous anonymity in privacy-aware location-based services,” 2014, location-based services,” 2012, p. 537. pp. 754–762. [27] M. M. E. A. Mahmoud and X. Shen, “A Cloud-Based [42] B. Niu, Q. Li, X. Zhu, G. Cao, and H. Li, “Enhancing Scheme for Protecting Source-Location Privacy against privacy through caching in location-based services,” 2015, Hotspot-Locating Attack in Wireless Sensor Networks,” pp. 1017–1025. 14 Analysis of techniques for managing data on mobile devices Klemen Sagadin Boštjan Šumak Faculty of Electrical Engineering and Computer Science, Faculty of Electrical Engineering and Computer Science, University of Maribor University of Maribor Maribor, Slovenia Maribor, Slovenia +386 51 242 244 +386 2 220 7378 klemen.sagadin@gmail.com bostjan.sumak@um.si ABSTRACT For developers of mobile applications that run on systems with In this paper, we conducted a study of techniques for managing data limited resources there are various mechanisms for data on mobile devices and defined needs for local data storage and data management. There are techniques for storing data, which are processing. Based on the results of a preliminary analysis, a set of specific to a mobile operating system and corresponding data management techniques were chosen for a detailed analysis programming environment, and there are techniques that are and comparison in terms of their usability, performance and supported in multiple programming environments and mobile complexity in the software development process. The set of data operating systems. A large number of mechanisms and techniques management techniques that were analysed included the relational for data storage makes choosing the most suitable mechanism for database SQLite, the object database Realm and the relational- development of a specific mobile application a challenging task. In objective mapper OrmLite. The results of this study showed that this paper, we have conducted an analysis and comparison on there are significant differences among the chosen techniques in mechanisms for managing and storing data on mobile devices with their usability for the developer, performance and complexity in the emphasis on the newer tools and concepts, which allow more development of software solutions. contemporary approaches. In this study, we have taken into account the importance of presenting information in programming solutions in the form of a domain object oriented programming model, while Categories and Subject Descriptors considering the fact that, for efficient data storage, they often need D.2.8 [Software Engineering]: Metrics – complexity measures to be converted properly into a form suitable for permanent storing. and performance measures 2. TECHNIQUES FOR MANAGING DATA H.2.4 [Database Management]: Systems – Multimedia databases, Object-oriented databases, Query processing, Relational ON MOBILE DEVICES databases, Transaction processing In this section, the requirements are presented for processing and storing data on mobile devices and groups of techniques for data General Terms storage. Algorithms, Measurement, Performance, Design, Reliability, 2.1 Requirements for Processing and Data Experimentation, Languages, Theory Storage on Mobile devices Applications on mobile devices need information which, in Keywords contemporary mobile solutions, can be obtained from various data Techniques of processing and data storage on mobile devices, sources in order to operate and ensure good user experience. We performance analysis, complexity of development using data identified two domains where the use of managing and storing data storage, functionality of data storage techniques, object database techniques is of crucial importance. Realm, relational database SQLite, mapper OrmLite. 2.1.1 Work in off-line mode 1. INTRODUCTION In contemporary IT solutions, embedded and other mobile devices Globally, the number of mobile device users had exceeded the are becoming more connected to the World Wide Web and can number of desktop computer users in 2014. In 2016, the number of access remote data, which are necessary to ensure user experience. mobile device users has increased to 1,900 million and more than Despite better connectivity and regardless of the mobile device 50% of users spent their time on mobile devices when searching location, uninterrupted Internet access is not always possible. This and using digital media. On average, users with Android and iOS is the reason for an increasing demand for undisturbed functioning mobile devices spend 32% of their time for playing games and 68% of mobile solutions in off-line mode, meaning the mobile solution on applications that need Internet access [1]. can work without an Internet connection. This applies particularly for mobile solutions which depend on data obtained from remote Because an Internet connection is not always available, it is of great sources. In order to ensure functioning of a mobile solution in off- importance for devices to work efficiently in off-line mode. For that line mode, a suitable data storage technique on mobile device must to be possible, devices need to process data obtained from a be used for local managing and storing of information [3][4]. network, save it locally and transmit it to the user even when the mobile device has no Internet connection. Because of the increasing 2.1.2 Big data on mobile devices use of mobile applications that depend on information gathered With the arrival of more advanced smart mobile devices, which use from the Internet, applications need to be developed in such a way sensors for capturing information from the environment, the that they are able to work in off-line mode [2]. amount of data is increasing constantly. Consequently, there is a 15 growing need for managing large data and transmitting the 3.1 Usability of Techniques from the Aspect analysed results to the user. With the use of sensors, mobile devices of the Developer can generate large amounts of data. Furthermore, advancements in multimedia technology (improved cameras, sound recorders etc.) We have defined functionalities important for a developer of have enabled capturing increased amounts of multimedia programming solutions and their influence on the final usability of information (pictures, sounds and video). This kind of data needs individual techniques. For each technique we observed (1) Its tools to be processed and stored properly for it to be available for further support for managing the database, (2) The possibilities of use. Therefore, contemporary data storage mechanisms must be automatic mapping of domain objects to the database, (3) The used for supporting such data [5][6]. support for different types of relations between saved data, (4) The support for managing with various data types, (5) The support for advanced data demands, (6) The support for multithread 2.2 Mobile Data Storage Techniques functioning, (7) The support for saving data on physical locations Based on an overview of the possibilities for storing data on in the memory, (8) The support of transactions and ACID attributes, Android, iOS and Windows Phone mobile operating systems, we (9) The support for migrating data with data scheme changes and can divide data storage techniques into three groups: (10) The support for already built-in data encryption.  Key-value data storage,  We came to the conclusion that the programming interface of File data storage, and  OrmLite has no support for database managing tools. Most Local database storage. functionalities for automatic mapping of domain object to database Key-value data storage presents a database management system are supported in database Realm and programming interface which offers a set of basic functions for manipulation with OrmLite, and are not supported in SQLite database. Support for unstructured data objects, where each value has its own unique data managing functionalities and for relations between data is best identifier [7][8]. File data storage presents saving files of specific in the Realm and SQLite databases. All data storage techniques data format in the mobile device’s file system, where information enable advanced data querying. The number of supported is presented in the form of files [9]. Local database storage is used functionalities for multithread functioning is biggest in the Realm for saving structured and unstructured information. For storing data database and all data storage techniques have the same support for on mobile devices we used local databases, which are mostly saving data on physical locations in the memory. The Realm independent libraries without a server component, without database has most supported functionalities for transactions, ACID administration need and smaller demands for use of system sources features and migrating data by data scheme changes. Database [10][11]. encryption is supported in Realm and SQLite databases and is not supported in OrmLite mapper. Most defined functionalities are 3. ANALAYSIS OF MOBILE DATA supported by the Realm database, due to its object orientation and good support for multithread functioning on mobile devices. The STORAGE TECHNIQUES SQLite database enables the least defined functionalities because In this study we focused on techniques for storing complex data of a lack of support for automatic mapping of objects to the structures and, based on a systematic literature review, we have database, which the OrmLite technique is trying to substitute. chosen the most researched and used local database for mobile devices, which is SQLite. Because SQLite is a relational database, Based on the analysis results, a Chi Square statistic test was it needs to map data from a domain programming model to a conducted at a significance level of 1%; we have accepted the relational model. alternative hypothesis, stating there are significant differences in the number of supported functionalities between Realm and SQLite techniques and between Realm and OrmLite techniques. We were not able to discard the null hypothesis that there are no significant differences between SQLite and OrmLite, therefore we cannot accept its alternative hypothesis. 3.2 Complexity Analysis of the Development We have researched complexity of development with use of data storage techniques by an experiment in which we have developed three functionally equivalent software solutions and each technique Figure 1. Conceptual research model used one of the analysing data storage techniques. Software Based on the results of a preliminary survey, we have chosen solutions are based on a domain object oriented model, whereby the techniques Realm and OrmLite which automate this process and data from entity classes must convert to a form suitable for the local compared them with the relational database SQLite. Realm is an database. We have defined 7 groups of software solution object-relational database that enables direct storing of domain functionalities, which include database configuration (F0), defining model to the database. OrmLite is an ORM (Object Relational software scheme (F1), creating new data inserts (F2), updating data Mapper), which maps objects to the relational database. We have values (F3), deleting already existing data (F4), selecting stored analysed in detail the influence of our chosen storage techniques on data based on different criteria and aggregate functions (F5) and performance, usability for developer, and complexity of executing asynchronous transactions (F6). development by individual use of techniques. Figure 1 shows the conceptual research model. Regarding the development of software solutions, we measured the time needed for development of individual functionalities. The following Figure 2 presents the average measured times of performed experiments. For development of software solutions by using the Realm database, we needed less than half the time, 16 because of the more demanding configuration and established needed for use of the SQLite database that helps us with easier SQL workability of the OrmLite technique. For software solution command writing and with creating a database; therefore, classes development with use of the SQLite database we needed more time are bigger and poorly built. All three software solutions achieved for implementation of individual operations on data. Because we comparable results in the LCOM metric, because it presents minor had to implement proper methods for data mapping from object software solutions we divided well, based on dependencies model to entity-relational model by ourselves, this resulted in a between individual class attributes. longer time needed for implementation for each operation on the Based on the time needed for software solution development, use data. of OrmLite mapper was the most complex. However, we must take into consideration the less complex final implemented software 2:09:36 code in comparison with the SQLite database. The fastest 1:55:12 development and highest quality software code was reached by use of the Realm database and this is the reason we consider it the least 1:40:48 complex of the compared techniques for a developer’s use. 1:26:24 Based on the gained results of average times needed for solution 1:12:00 development we ran a Leven’s test of homogeneity variances and, based on the results, we came to the conclusion that homogeneity 0:57:36 variances are being violated. Therefore, we decided to use Welch’s 0:43:12 statistic test for hypothesis testing with a significance level of 5%. A significant difference was established between the average times 0:28:48 measured for the Realm and SQLite techniques and for the Realm and OrmLite techniques. We were not able to discard the null 0:14:24 hypothesis, which states that there are no significant differences 0:00:00 between the SQLite and OrmLite techniques; therefore, we cannot Time Realm Time SQLite Time OrmLite confirm that significant differences exist in the measured times between the SQLite and OrmLite techniques. F0 F1 F2 F3 F4 F5 F6 Based on the calculated indexes of quality, we can confirm the hypothesis for the existing differences in code with use of each of Figure 2. Time needed for development with the use of data the data storage techniques. storage techniques We have analysed the developed software solutions with tools for 3.3 Performance Analysis calculation of software code metrics. After reviewing an article To perform an analysis of performance of data storage techniques, researched by ( Gerlec, C., and Heričko, M. (2010). Evaluating we developed a mobile application which conducts transactions on refactoring with a quality index. World Academy of Science) we data storage operations, with which we can measure the times have chosen the set of metrics WMC, DIT, CBO and LCOM, based needed for each individual transaction’s execution. The mobile on which we have calculated an index of software code quality. The application, developed for analysing the complexity of chosen metrics are non-complementary and non-correlational development and explained in the previous Chapter, tests existing between each other. The results of code analysis are shown in the software code. During analysis of each technique, we were following Table 1 [12]. changing the capacity of processed data in individual transactions, with the purpose of trying to understand the impact of processed Table 1. Results of software code metrics and quality index data on the performance of the functioning of individual Software WMC DIT CBO LCOM Qi techniques. We performed multiple tests for performance analysis solution and divided them into several groups, based on the chosen data Realm 1 4 3,8 4 3,5 storage techniques, as well as the functionality of testing and group of processed data. We tested the performance of inserting certain SQLite 1 2 3,75 3,38 3 entities and relations, updating data, deleting data and obtaining OrmLite 3 2 5,67 3,22 3,25 data based on data relations, obtaining data based on arithmetic operator and calculating based on values of aggregate functions. Based on a systematic overview of the literature, we chose a metric Due to better results in DIT metric, which assesses the depth of for measuring performance and the time needed for execution of inheritance in classes, the Realm database has reached a higher individual transactions. Times were measured with nanoseconds index because of its degradation and inheritance hierarchy tree, and built-in methods of the Java programming language. We have when using database software constructs. OrmLite mapper has implemented each tested operation by DAO class, where data achieved a higher result in comparison with Realm database in storage techniques use the same software interface; therefore, we WMC metric, which assesses the sum of class methods` can ensure equivalency and comparability of the performed tests. complexity, because of a higher number of classes with smaller Tests were run on an LG G3 mobile device with installed Android amounts of methods. The results of OrmLite mapper in DIT and 6.0 operating system. With each data storage technique we CBO metrics are less successful, because classes which use their conducted 8 groups of tests and for each test we increased the own libraries are not well degraded and are coupled more between capacity of data storage by the logarithmic scale with the base 10 each other. Consequently, they present a more complex software that ranges from 1 to 10,000. solution. The software solution with the use of the software We came to the conclusion that the data storage technique database constructs of SQLite reached lower results due to bad influences the performance when executing individual results in the WMC and DIT metrics. There are multiple classes transactions. We performed different types of tests which showed 17 that the Realm database was the most powerful, confirmed by limitations in this research. However, it does have an impact on the statistical tests, proving that there are significant differences experience of the final user and mobile solution developer. between operational capabilities in comparison with the SQLite database and OrmLite mapper. The OrmLite and SQlite techniques 5. REFERENCES achieved comparable results, confirming that it is not possible to [1] Bosomworth, D. 2016. Mobile marketing statistics 2016. prove significant differences between them. In certain tests with a http://www.smartinsights.com/mobile-marketing/mobile- smaller number of data, the techniques reached extremely marketing-analytics/mobile-marketing-statistics/. comparable results, although the difference in operational capability increased with increasing the number of data. [2] Whitney, L. 2012. Offline Capabilities: Native Mobile Apps vs. Mobile Web Apps. http://www.sitepoint.com/offline- Based on the results gained from the One-Way Anova statistical capabilities-native-mobile-apps-vs-mobile-web-apps/. test at a significance level of 5%, we confirmed that there are significant differences in performance between the OrmLite and [3] Elgan, M. 2014. The hottest trend in mobile: going offline! | Realm techniques and between the Realm and SQLite techniques. Computerworld. We cannot discard the null hypothesis; therefore, we cannot http://www.computerworld.com/article/2489829/mobile- confirm there are significant differences in performance between wireless/the-hottest-trend-in-mobile--going-offline-.html. the OrmLite and SQLite techniques. [4] Mahemoff, M. 2013. ‘Offline’: What does it mean and why should I care? 4. CONCLUSION AND FUTURE WORK http://www.html5rocks.com/en/tutorials/offline/whats- We have analysed an area of data storage techniques on mobile offline/. devices and came to the conclusion that data storage techniques can [5] Liebowitz, J. 2016. Big Data and Business Analytics. CRC be devided into three groups, based on their characteristics. Based Press. on preliminary research, we chose the SQLite, OrmLite and Realm techniques and compared them in terms of their usability, [6] Walls, T. A. and Schafer, J. L. Models for Intensive complexity of development and operational performance. The Longitudinal Data. Illustrate. results provided proof that data storage techniques have impact on [7] Basescu, C., Cachin, C., Eyal, I., Haas, R., Sorniotti, A., analysed concepts. Based on the performed comparative analysis Vukolic, M. and Zachevsky, I. 2012. Robust data sharing with and experiments we found that, for development of mobile key-value stores. IEEE/IFIP International Conference on solutions, the use of the Realm data storage technique is more Dependable Systems and Networks (DSN 2012), 1–12. efficient in comparison with the SQLite and OrmLite data storage techniques, because the Realm technique supports most analysed [8] Aerospike Inc. What is a Key-Value Store? functionalities. Consequently, Realm`s execution of technique is http://www.aerospike.com/what-is-a-key-value-store/. more efficient, its implemented software code less complex and [9] Sadaqat, J., Maozhen, L., Ghaidaa, A. and Hamed, A. 2010. there is less time needed for development. We were not able to File annotation and sharing on low-end mobile devices. provide proof for significant differences between the SQLite and Seventh International Conference on Fuzzy Systems and OrmLite techniques in operational capabilities and times needed for Knowledge Discovery, 6, Fskd, 2973–2977. development. However, we did confirm that, OrmLite mapper in [10] Roukounaki, K. 2014. Five popular databases for mobile - comparison with the SQLite database, supports more Developer Economics. functionalities and its implemented software solutions are less http://www.developereconomics.com/five-popular- complex. We have confirmed that picking the right data storage databases-for-mobile/. technique has impact on the efficiency of software solution development. Techniques which enable automatised mapping from [11] Ouarnoughi, H., Boukhobza, Olivier, J., P., Plassart, L. and domain model to data storage have proven to be more effective and Bellatreche, L. 2013. Performance analysis and modeling of in the Realm object database even more capable. SQLite embedded databases on flash file systems, Des. Autom. Embed. Syst., 17, 3–4, 507–542. In future work research we will expand existing research with analysis of techniques for energy waste and other sources on mobile [12] Gerlec, C. and Hericko, M. 2010. Evaluating refactoring with devices. This concept could not be analysed in detail due to a quality index, World Acad. Sci. Eng. Technol., 63, 3, 76–80. 18 Can we predict software vulnerability with deep neural network? Cagatay Catal Akhan Akbulut Sašo Karakatič Department of Computer Engineering Department of Computer Engineering Faculty of Electrical Engineering and Istanbul Kültür University Istanbul Kültür University Computer Science, Istanbul, Turkey Istanbul, Turkey University of Maribor c.catal@iku.edu.tr a.akbulut@iku.edu.tr Maribor, Slovenia saso.karakatic@um.si Miha Pavlinek Vili Podgorelec Faculty of Electrical Engineering and Computer Science, Faculty of Electrical Engineering and Computer Science, University of Maribor University of Maribor Maribor, Slovenia Maribor, Slovenia miha.pavlinek@um.si vili.podgorelec@um.si ABSTRACT If vulnerable components of software systems can be detected In this paper, we present an alternative approach to software prior to the deployment of the software, verification resources can vulnerability prediction with modern machine learning methods – be assigned effectively. This research area is known as software with deep learning methods. Deep learning methods are security vulnerability prediction and researchers developed techniques where features in our case software metrics) are several prediction models so far. Although some of the processed and sent through multiple layers where transformations researchers showed the benefit of those models, we need much and computations are done in sequence to form a prediction better models in terms of prediction accuracy, precision, and model. recall. Some of the companies do not adopt these models yet due to their inefficient prediction performance [1]. The deep learning methods have not been used for software vulnerability prediction so far and could provide a new and Software developers apply static code analysis tools [2] and code potentially competitive alternative to the existing techniques. In reviews [3] to avoid security vulnerabilities. For large scale the paper we make an overview of existing solutions on the software systems and systems of systems, it is not practical to subject and compare them to the proposed system with deep review all the code against possible threats. Therefore, good learning. The deep learning techniques are presented in details vulnerability prediction model is inevitable. and a proposition for the prediction system is made. Although there are many research papers on this topic, companies like Microsoft still do not adopt Vulnerability Prediction Models Categories and Subject Descriptors (VPM) [4]. The reason is related with the low prediction H.4 [Information Systems Applications]: Miscellaneous; D.2.5 performance on the source code level in terms of recall and [Testing and Debugging]: Testing tools, Code inspections; D.2.8 precision evaluation parameters. In that study, Morrison et al. [Software Engineering]: Metrics—complexity measures, (2015) reported that state-of-the-art models do not provide performance measures accurate prediction and security-specific metrics can be utilized in the later studies to achieve an acceptable performance. General Terms According to our literature survey, we did not encounter any Algorithms, Measurement, Reliability, Theory research paper which applied deep learning. Deep learning is being used by many high-tech companies such as Facebook, Microsoft, and Google to solve challenging problems such as Keywords facial recognition, real-time translation, and speech recognition Software vulnerability prediction, machine learning, deep learning respectively. We aim to advanced machine learning techniques such as deep learning to predict vulnerable components 1. INTRODUCTION accurately. Software security vulnerabilities are still very common and new In this paper we make an initial review of the field and propose a alerts and reports from several agencies are published every day. new outlook on the problem. In the next chapter, the review of the One such incident was published on May 13, 2015 when US Food related work is presented. Then, we follow up with the and Drug Administration (FDA) reported an alert about presentation of the modern machine learning technique – deep computerized infusion pumps which can be programmed remotely learning or deep neural networks. We continue with the proposed and malicious Internet users can modify the dosage of therapeutic novel approach to the software vulnerability prediction technique drugs. FDA suggested several actions for the hospitals which are with the deep neural networks. using these systems to secure them. As we see in this recent incident, software security vulnerabilities are quite dangerous for software-intensive systems. 19 2. RELATED WORK automatically. The approach has been validated on a large set of PHP applications and compared to well-known PHP tools for Shin and Williams (2008) [5], [6] reported that complexity static code analysis. The performance was 5% better than metrics have correlation with security vulnerabilities. They PhpMinerII and 45% better than Pixy’s performance in terms of worked on Mozilla JavaScript Engine. Shin et al. (2011) applied accuracy and precision. logistic regression technique and analyzed the relationship of developer activity, complexity, and code churn with software 3. DEEP LEARNING security vulnerabilities [7]. Chowdhury and Zulkernine (2011) Deep learning is a term that combines together techniques of used decision trees to predict the vulnerabilities by using machine learning that result in complex models where each model complexity, cohesion, and coupling metrics [8]. Mean accuracy is composed of multiple processing layers. For the sake of was 72.85% on Mozilla Firefox data. Zimmermann et al. (2010) simplicity and understandability, we will focus our research on the reported that traditional metrics such as complexity, code churn, deep neural networks, which are a subset of deep learning and organizational measures have a weak correlation between method. Deep learning approaches have dramatically improved vulnerabilities for Windows Vista [9]. Although complexity, the state-of-the-art results in several fields, which have cohesion, and coupling metrics have been studied in previous traditionally been dominated by ensemble machine learning studies in detail, security-specific metrics should be determined techniques or other approaches. These field, with their state-of- and applied in the models. Shin and Williams (2013) investigated the-art solutions are shown in the Table 1. whether fault prediction models can be used for vulnerability prediction or not [10]. They built both fault prediction and Table 1. Applied deep learning on different problems vulnerability prediction models and concluded that fault Image recognition [18]–[21] prediction models provide similar results as vulnerability prediction models, but both of them must be improved to reduce Speech recognition [22]–[24] the number of false positives. Prediction of the drug molecule activity [25] Recent studies on vulnerability prediction started to focus on machine learning techniques. Scandariato et al. (2014) presented a model based on machine learning to predict the vulnerabilities Analyzing the particle accelerator data [26], [27] [11]. Terms in the source code are taken into account and their associated frequencies are noted. Twenty Android applications Natural language processing and [28] were used for the validation of the prediction approach. During understanding the experiments, they analyzed the performance of Naive Bayes and Random Forest algorithms on this problem. They reported Language translation [29], [30] that Random Forest provides better performance than Naive Bayes algorithm. Walden et al. (2014) prepared a vulnerability dataset which has 223 vulnerabilities [12]. They used Drupal, Deep neural networks are composed of several layers, where first Moodle, and PHPMyAdmin projects to analyze vulnerabilities. As layers have a goal of representation learning. With representation the machine learning algorithm, they applied Random Forests learning we can feed in raw data and the method discovery proper algorithm and reported that models using text mining is better problem representation on its own. Each layer in network than models using metrics in terms of recall parameter. They used represents a non-linear module that transforms and represents the 3-fold cross-validation and experiments were performed 10 times. data in different way. [31] Mokhov et al. (2015) showed that machine learning approach is effective to detect vulnerabilities and implemented a tool called MARFCAT for fast code analysis [13]. Tool works on source code level, binary level, and bytecode level. Shar et al. (2015) applied static and dynamic code attributes to detect vulnerabilities in web applications [14]. They used not only supervised machine learners but also semi-supervised algorithms to analyze the performance of prediction. They reported that semi-supervised learning is preferable when vulnerability data is limited. Last (2016) explained the research on Vulnerability Discovery Models development to forecast the zero-day vulnerability [15]. He stated that the research created two approaches based on machine learning and one approach based on regression technique. Grieco et al. (2016) implemented a tool called VDiscover which applies machine learning techniques for the prediction of vulnerabilities in test cases [16]. Experimental result showed that the proposed approach predicts the programs which contain dangerous memory corruptions effectively. Medeiros et al. (2015) Figure 1. Example of deep neural network with two hidden used taint analysis in conjunction with data mining [17]. layers Candidate vulnerabilities are detected with taint analysis and false Deep neural networks always start with the first layer of inputs – positives are identified by using data mining technique. In raw data. For data of an image, the input layer can be different addition to detection of vulnerabilities, automatic corrections of intensity levels for each pixel on each of the color levels. The vulnerabilities are performed by adding fixes to the source code following layer can transform the raw data in such way, that only 20 edges in different angles and orientations are highlighted. Next  Keras [37] – an Python library which utilizes layer can detect round shapes, corners or other intensity TensorFlow or Theano and provides an easy to use API. transitions on the image. Following layer usually combine the One or multiple libraries can be used in vulnerability prediction outputs of the edge, corner and round detection layers and detect software – it depends on the programming language used. All of motifs and shapes which are combined by edges, corners and the above deep neural network libraries contain basic other shapes. Next layer can combine the output of motif and convolutional and recurrent layers. The performance of specific shape detection layers in even higher dimension figure, where type of neural network will have to be determined with the familiar shapes are starting to form: rectangles, triangles, circles experiment. and other shapes or parts of these shapes. Then the output of this layer is fed into next layers which can detect even lees abstract shapes. This process can be repeated as long as necessary and 5. CONCLUSION each next layer searches for less abstractions and moves onto real During our literature review we recognized the lack of usage of shapes and figures. [32] modern machine learning technique of deep neural networks for software vulnerability prediction. Deep neural networks represent The main thing to note here is that these layers are not designed the state-of-the-art on multiple optimization, prediction and by hand, but are usually learned through the process of pattern recognition problems, so there is a surprising lack of backpropagation on the whole neural net through all of the layers. application with them in software engineering topic Instead of the backpropagation process, some heuristic approaches can be used, such as genetic algorithms and simulated Our paper serves to persuade researchers, that this problem is annealing, but this is out of the scope of this paper. worth to tackle while this topic still remains under-researched. Multiple deep neural network types could be used with this kind As in other machine learning techniques, we can use deep neural of problem, but the performance of each on vulnerability networks for different types of problems. Different kinds of deep prediction is yet to be determined. neural networks are used for different kind of problem. Following is the list divided by machine learning problem with specific 6. REFERENCES neural network designed used for that problem. [1] C. F. Kemerer and M. C. Paulk, “The impact of design and  Supervised deep learning: Deep convolutional code reviews on software quality: An empirical study networks, Recurrent neural networks based on PSP data,” IEEE Trans. Softw. Eng. , vol. 35, no.  Unsupervised deep learning: Auto encoders, 4, pp. 534–550, 2009. Restricted Boltzmann machines, Deep Belief Networks [2] A. G. Bardas and others, “Static Code Analysis,” J. Inf.  Semi-supervised deep learning: Ladder networks Syst. Oper. Manag. , vol. 4, no. 2, pp. 99–107, 2010. 4. DNNs FOR VULNERABILITY [3] M. V Mäntylä and C. Lassenius, “What types of defects are really discovered in code reviews?,” IEEE Trans. Softw. PREDICTION SYSTEM Eng. , vol. 35, no. 3, pp. 430–448, 2009. We propose a system that utilizes the deep neural network [4] P. Morrison, K. Herzig, B. Murphy, and L. Williams, machine learning technique for prediction of the software “Challenges with applying vulnerability prediction vulnerabilities. This can be done in number of ways. If there is models,” in Proceedings of the 2015 Symposium and previous vulnerability data, supervised learning models can be Bootcamp on the Science of Security, 2015, p. 4. applied. If there is no previous data, unsupervised deep learning algorithms can be used. If there is very limited vulnerability data, [5] Y. Shin and L. Williams, “An empirical model to predict semi-supervised deep learning techniques can be investigated. We security vulnerabilities using code complexity metrics,” in will analyze all of these three problems in the project to solve Proceedings of the Second ACM-IEEE international them efficiently. symposium on Empirical software engineering and measurement, 2008, pp. 315–317. There are number of implementation of deep neural networks that can be used in the proposed system. Following is the list of such [6] Y. Shin and L. Williams, “Is complexity really the enemy libraries, packages and software that are mainly used in the of software security?,” in Proceedings of the 4th ACM industry or in the research. workshop on Quality of protection, 2008, pp. 47–50.  TensorFlow [33] – an open source library developed by [7] Y. Shin, A. Meneely, L. Williams, and J. A. Osborne, Google and written in Python and C++ language, which “Evaluating complexity, code churn, and developer activity can be used with other Python and C++ software metrics as indicators of software vulnerabilities,” IEEE through the API provided. Trans. Softw. Eng. , vol. 37, no. 6, pp. 772–787, 2011.  Theano [34] – an open source Python library for DNN [8] I. Chowdhury and M. Zulkernine, “Using complexity, developed by University of Montreal. coupling, and cohesion metrics as early indicators of  Torch [35] – an open source machine learning library vulnerabilities,” J. Syst. Archit. , vol. 57, no. 3, pp. 294– written in C, maintained by Facebook and Google 313, 2011. engineers and used by Google DeepMind and Facebook AI research teams. [9] T. Zimmermann, N. Nagappan, and L. Williams,  “Searching for a needle in a haystack: Predicting security Deeplearning4j – an open source C and C++ vulnerabilities for windows vista,” in implementation of deep neural networks developed by 2010 Third Skymind that provide Java API. International Conference on Software Testing, Verification  and Validation, 2010, pp. 421–428. Caffe [36] – implemented in C++ and Python and pdovides APIs for C++, Python and Matlab. 21 [10] Y. Shin and L. Williams, “Can traditional fault prediction groups,” IEEE Signal Process. Mag. , vol. 29, no. 6, pp. models be used for vulnerability prediction?,” Empir. 82–97, 2012. Softw. Eng. , vol. 18, no. 1, pp. 25–59, 2013. [24] T. N. Sainath, A. Mohamed, B. Kingsbury, and B. [11] R. Scandariato, J. Walden, A. Hovsepyan, and W. Joosen, Ramabhadran, “Deep convolutional neural networks for “Predicting vulnerable software components via text LVCSR,” in 2013 IEEE International Conference on mining,” IEEE Trans. Softw. Eng. , vol. 40, no. 10, pp. Acoustics, Speech and Signal Processing, 2013, pp. 8614– 993–1006, 2014. 8618. [12] J. Walden, J. Stuckman, and R. Scandariato, “Predicting [25] J. Ma, R. P. Sheridan, A. Liaw, G. E. Dahl, and V. Svetnik, vulnerable components: Software metrics vs text mining,” “Deep neural nets as a method for quantitative structure-- in 2014 IEEE 25th International Symposium on Software activity relationships,” J. Chem. Inf. Model. , vol. 55, no. 2, Reliability Engineering, 2014, pp. 23–33. pp. 263–274, 2015. [13] S. A. Mokhov, J. Paquet, and M. Debbabi, “MARFCAT: [26] T. Ciodaro, D. Deva, J. M. De Seixas, and D. Damazio, Fast code analysis for defects and vulnerabilities,” in “Online particle detection with neural networks based on Software Analytics (SWAN), 2015 IEEE 1st International topological calorimetry information,” in Journal of Workshop on, 2015, pp. 35–38. Physics: Conference Series, 2012, vol. 368, no. 1, p. [14] L. K. Shar, L. C. Briand, and H. B. K. Tan, “Web 12030. application vulnerability prediction using hybrid program [27] C. Adam-Bourdarios, G. Cowan, C. Germain, I. Guyon, B. analysis and machine learning,” IEEE Trans. Dependable Kegl, and D. Rousseau, “Learning to discover: the Higgs Secur. Comput. , vol. 12, no. 6, pp. 688–707, 2015. boson machine learning challenge,” URL http//higgsml. [15] D. Last, “Forecasting Zero-Day Vulnerabilities,” in lal. in2p3. fr/documentation, 2014. Proceedings of the 11th Annual Cyber and Information [28] A. Bordes, S. Chopra, and J. Weston, “Question answering Security Research Conference, 2016, p. 13. with subgraph embeddings,” arXiv Prepr. arXiv1406.3676, [16] G. Grieco, G. L. Grinblat, L. Uzal, S. Rawat, J. Feist, and 2014. L. Mounier, “Toward Large-Scale Vulnerability Discovery [29] I. Sutskever, O. Vinyals, and Q. V Le, “Sequence to using Machine Learning,” in Proceedings of the Sixth sequence learning with neural networks,” in Advances in ACM Conference on Data and Application Security and neural information processing systems, 2014, pp. 3104– Privacy, 2016, pp. 85–96. 3112. [17] I. Medeiros, N. Neves, and M. Correia, “Detecting and [30] S. J. K. Cho, R. Memisevic, and Y. Bengio, “On Using removing web application vulnerabilities with static Very Large Target Vocabulary for Neural Machine analysis and data mining,” IEEE Trans. Reliab. , vol. 65, Translation,” 2015. no. 1, pp. 54–69, 2016. [31] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” [18] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet Nature, vol. 521, no. 7553, pp. 436–444, 2015. classification with deep convolutional neural networks,” in [32] J. Schmidhuber, “Deep learning in neural networks: An Advances in neural information processing systems, 2012, overview,” Neural Networks, vol. 61, pp. 85–117, 2015. pp. 1097–1105. [33] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. [19] C. Farabet, C. Couprie, L. Najman, and Y. LeCun, Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, and “Learning hierarchical features for scene labeling,” IEEE others, “Tensorflow: Large-scale machine learning on Trans. Pattern Anal. Mach. Intell. , vol. 35, no. 8, pp. heterogeneous distributed systems,” arXiv Prepr. 1915–1929, 2013. arXiv1603.04467, 2016. [20] J. J. Tompson, A. Jain, Y. LeCun, and C. Bregler, “Joint [34] J. Bergstra, F. Bastien, O. Breuleux, P. Lamblin, R. training of a convolutional network and a graphical model Pascanu, O. Delalleau, G. Desjardins, D. Warde-Farley, I. for human pose estimation,” in Advances in neural Goodfellow, A. Bergeron, and others, “Theano: Deep information processing systems, 2014, pp. 1799–1807. learning on gpus with python,” in NIPS 2011, BigLearning [21] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Workshop, Granada, Spain, 2011. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, [35] N. Léonard, S. Waghmare, and Y. Wang, “RNN: Recurrent “Going deeper with convolutions,” in Proceedings of the library for torch,” arXiv Prepr. arXiv1511.07889, 2015. IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1–9. [36] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: [22] T. Mikolov, A. Deoras, D. Povey, L. Burget, and J. Convolutional architecture for fast feature embedding,” in Černock\`y, “Strategies for training large scale neural Proceedings of the 22nd ACM international conference on network language models,” in Automatic Speech Multimedia, 2014, pp. 675–678. Recognition and Understanding (ASRU), 2011 IEEE Workshop on, 2011, pp. 196–201. [37] F. Chollet, “Keras: Deep learning library for theano and tensorflow.” 2015. [23] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and others, “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research 22 Exhaustive key search of DES using cloud computing Aleks Drevenšek Marko Hölbl Faculty of Electrical Engineering and Computer Science, Faculty of Electrical Engineering and Computer Science, University of Maribor University of Maribor Maribor, Slovenia Maribor, Slovenia aleks.drevensek@gmail.com marko.holbl@um.si ABSTRACT the full cipher text. The competitors were provided with 192 bits of In this paper we present the time complexity of exhaustive key the plain text, the methods of converting plain text to a hexadecimal search for the DES algorithm using modern cloud computing. We value and separate it into blocks of 64 bits, and the methods of demonstrate that it is possible to perform a brute force attack on a padding to the full block, since DES is a block cipher and operates known encryption algorithm in practice using commercially with 64-bit blocks of data. The encryption mode was CBC [5]. available cloud computing services. We also discuss a previous attempt of exhaustive key searches, and explain the methods and 2.1 Competition DES-I preparations for the experiment. The time complexity is still very The first DES competition was held in 1997 and was won by a high, but the time needed for finding a key can be improved using group of three called DESCHALL. They tackled the problem by cloud computing, but not with the available free resources. building a distributed network of applications for executing the exhaustive key search [7]. They used a client-server architecture, Categories and Subject Descriptors where the server determined which key space was to be searched E.3 [Data Encryption]: Data Encryption Standard (DES) next and which keys were already checked [7]. Their clients were physical computers owned by volunteers. The fastest computer they used was a Power PC 604e with processor speed of 250 MHz and General Terms search speed of 1.5 million keys per second [7]. An improvement Algorithms, Measurements, Performance, Experimentation, of searching algorithm was conducted while the search was being Security performed. Near the end of the competition, a team developed a new technique called bit slicing that allowed it to search 32 or 64 Keywords keys simultaneously, depending on the CPU architecture – a 32-bit Cloud computing, exhaustive key search, DES, Microsoft Azure CPUs was able to calculate 32 keys simultaneously and 64-bit CPUs 64 keys. With this improvement the fastest speed calculated on a 167MHz UltraSPARC computer was 2.4 million keys per 1. INTRODUCTION second [7]. One of the goals of modern cryptography is the assurance of confidentiality which is achieved through the use of encryption. The first competition was completed successfully in 96 days with Algorithms of encryption, referred to as ciphers, are classified into 51.8% of key space searched. They recorded more than 78,000 two types: Asymmetrical and symmetrical. Symmetrical ciphers unique IP addresses on the server and had around 14 thousand use one key for both encryption and decryption [1]. This paper highest concurrent computers searchers. focuses on these types of ciphers. Additionally, these symmetric ciphers are classified into block and stream ciphers[2]. 2.2 Competition DES-II-1 and DES-II-2 Exhaustive key search, or brute force attack, on a modern The second DES competition was held in the beginning of 1998. symmetric cipher is a method of trying every single key in known The wining organisation was distributed.net, which used a similar key space to identify the key used to encrypt selected plain text [3]. infrastructure as DESCHALL. The key was found after 39 days. Modern cloud services are perfect options to execute such heavy The highest search speed in this competition was 32,430 million tasks [4]. keys per second. 90% of all key space has to be searched. The organisation distributed.net estimated that their computing power The purpose of our research was to demonstrate that it is possible was equivalent to 22 thousand computers with Intel Pentium II at to find an encryption key using cloud computing in reasonable time, 333 MHz That was about double the power of the best DES-I which could open new questions about the security of modern competitors’ resources [8]. algorithms. The third competition was announced in the same year. The EFF In this paper we will answer two questions: Is it possible to execute [9], with a dedicated super computer created specifically for this an exhaustive key search for an algorithm successfully using cloud purpose, named Deep Crack won. This super computer was using computing? How long would such an exhaustive key search take? advanced hardware implementation of DES, which was faster than the equivalent software implementations. The average speed was The chosen block cipher for the experiment was DES [5]. It is a 88,804 million keys per second. The total time of the search was symmetrical block cipher. Due to its short key length, only 56-bits, 56.05 hours. The size of the key space that was needed to be it is prone to brute force attacks [6]. searched to find the key was around 24.8% [10]. 2. PREVIOUS DES CHALLENGES 2.3 Competition DES-III In the past, several competitions were carried out by RSA Security The last competition was in January, 1999. In this competition the with the intention of finding a key for DES. The data provided to highest prize money was given if the search would be completed the competitors was: A known algorithm, part of a plain text and 23 within 24 hours and if the searches would take more than 56 hours, 4.1 Experimental variables no reward would be given[11]. Independent variables were connected to the chosen cloud service. The winner of the competition was a team consisting of They were all combined in packets called virtual machines. In this distributed.net [12] and EFF [12]. The search was finished in 22 way we could not change them separately. Our independent hours and 15 minutes. The average search speed was 199,000 variables were: CPU speed, number of CPU cores and memory million keys per second which is more than double the speed of size. Deep Crack (88,804 million keys per second). They also needed to The CPU speed was a continuous variable with a value range of 0 check around 22.2% of the key space, which was the lowest number GHz to 3.3 GHz. The number of cores was a discrete variable with of keys needed to be searched in all competitions [13]. a value range of 1, 2, 4, 8, 16 or 20. The memory size was a continuous variable with a range from 0.75 GB to 140 GB. 3. CLOUD COMPUTING We defined two dependent variables: The search speed and the time Cloud computing simplifies the access to ready to use computer required to find the key. The first was defined by dividing the resources. The main feature is the availability of computing power number of all searched keys with the required search time. The which is necessary for exhaustive key search [14]. second variable was continuous as well and was calculated from the We identified the necessary resources to execute an exhaustive key search speed and the number of all keys. search. We focused on cloud services that were offering a cloud computing service. The first resource that was considered was 4.2 Experimental plan computing power, the number of CPU cores and the amount of First we needed to prepare the environment. This step included the available memory. Computing power is defined by the type of log-in procedure with a valid Azure Batch account. For our virtual machine. The second resource was storage, for storing experiment we used a trial account. Then we created a pool of searching application and results. In contrast to CPU we did not virtual machines - up to 20 cores. Finally, we uploaded the program need a huge amount of storage [15]. for exhaustive key search. While most cloud services offered a similar type and amount of The following step was performing the exhaustive key search. We resources, only Microsoft Azure offered a dedicated service for used speed evaluation mode to be able to measure speed and high performance computing, which is referred to as Azure Batch. calculate results. In this mode the search program checked 230 keys With that in mind, we decided to use the Microsoft Azure cloud and then finished (exited). service. Their Batch service is designed to execute computing that We measured the time for each instance of a virtual machine requires up to 10 thousand processor cores [4]. separately. The time needed to prepare instances, transfer files and The computers used in the Azure Batch service are of the same type other auxiliary tasks was ignored. For the measured time we as Microsoft Azure virtual machines. They are divided into three considered only the time needed to execute a key search. groups: A, D and D version 2, with Dv2 being the fastest regarding We repeated all steps for each different type of virtual machine, CPU resources. Virtual machines of type A use the Intel Xeon E5- iterating through all the available types. The procedure of the 2670 processor with speed of 2.6 GHz and type Dv2 use Intel Xeon experiments is shown in Figure 1. We could conduct the same E5-2673 v3 CPUs with speed of 2.4 GHz that can be boosted up to procedure on different variables and we would expect different 3.2 GHz. The instances that we used in our experiment are shown results. in Table 1 [16]. Table 1: Microsoft Azure type of virtual machines Instance Number of cores Memory (GB) A1 1 1.75 Figure 1: Representation of experiment process A2 2 3.5 A3 4 7 5. KEY SEARCH IMPLEMENTATION A4 8 14 The task of exhaustive key search is highly time and resource consuming. Keeping that in mind, we were forced to use A5 2 14 performance improvements. Our software was written in the C++ A6 4 28 programming language and we used updated versions of tools that D1v2 1 3.5 were used in past competitions. D2v2 2 7 The used software employs the method of bit slicing and is intended to be used on 64 bit processors for best performance results. Since D11v2 2 14 we wanted to execute the search on multiple keys simultaneously over one search cycle we had to transform the input data. We had to convert the starting data into hexadecimal values. The next 4. USING CLOUD COMPUTING TO operation was to transform data into a bit slicing compatible format, where each bit that was marked as 1 was transformed into the PERFORM AN EXHAUSTIVE KEY highest possible value of data type unsigned __int64. SEARCH OF DES After the data was prepared, we executed the search using the Our experiment was conducted in an on-line environment. method keySearch(). The task of the method was to prepare Research context was specific. candidate keys, variable for multithreaded mode if that would be 24 optimal and execute another method deseval() for the current set of faster by 43.4%. The fastest instance D2v2 was faster by 25.6% keys. If a key was returned we found the correct key, otherwise the than the fastest type A instance - A1. deseval() was repeated with a new set of keys. Before starting the search, the deseval() method would load first 6.1 Time Complexity using the first set of keys. The purpose of that was to load it into We calculated the time required to finish successfully an exhaustive memory and save on time while actually executing a search. The key search of a random generated DES key for the fastest instances. measuring process started just before the first execution of According to the rules of previous DES competitions we generated deseval() and stopped after the last execution was finished. a key randomly and used it to encrypt an arbitrary plain text. To perform an exhaustive key search for this key successfully we This main method, deseval(), uses modified S-boxes for would have to search 34.26% of all keys. Based on this, we deciphering with multiple keys at once. The method runs 14 rounds calculated the different times required by the search. of the algorithm before it is possible to check if all keys are incorrect. In 1.6% of all executions, the method continued and proceeded to do multiple checks over the last 2 rounds. Compared Table 3: Time required to find a random key with D2v2 to the normal process of the DES deciphering procedure, we were able to check the correctness of the key after only 14 rounds Number of Total search speed Required time compared to the 16 that the process would normally take. Another cores (keys per second) improvement that was included was the possibility to check 64 keys 20 366,164,856 26.01 months at once instead of just one. 400 7,323,297,120 39 days 6. RESULTS 6,700 122,665,226,760 56 hours During the execution of the experiment we noticed that not all 10,000 183,082,428,000 37.45 hours Microsoft Azure virtual machines were available. This may be due 17,000 311,240,127,600 22 hours to the fact that Microsoft Azure Batch is a new service and may still have some imperfections. We were able to perform searches on the following 9 instances: A1-A6, D1v2, D2v2 and D11v2. For each The results for the fastest instance D2 version 2 indicate that, with instance we normalized the speed to one core. the limited number of cores available in the trial version of Azure, the time complexity of the search would be more than 26 months. Table 2: Search speed of available instances To achieve the winning time of the DES-II challenge, 39 days, we would need 400 cores. To lower the time to 56 hours as of the next Instance Cores Speed (keys per Speed per core DES competition 6,700 cores would be needed. With the maximal second) (keys per second) number of cores allowed by Microsoft, 10,000, the search would D2v2 2 36,616,485 18,308,242 take 37.45 hours. To get faster results than the ones from the competitions, we would have to use more cores than are allowed by D1v2 1 17,724,361 17,724,361 the cloud, that is 17,000. D11v2 2 33,172,943 16,586,471 A1 1 14,573,834 14,573,834 6.2 Worst case scenario The worst case scenario for an exhaustive key search would be if A5 2 27,506,451 13,753,225 the random generated key would be the last key in the sets of keys A3 4 52,123,389 13,030,847 which needed to be searched – the entire key space would need to be searched. We recalculated our results to fit the worst case A2 2 22,510,310 11,255,155 scenario. Instances used in this calculation were D2v2. A6 4 41,688,997 10,422,249 A4 8 82,671,837 10,333,979 Table 4: Time required to find the last key with D2v2 Number of Total search speed Required time The instances of type D2 version 2 were faster than all of the cores (keys per second) instances of type A, which was expected since they use newer 20 366,164,856 75.9 months CPUs. We observed that instances with less cores were mostly faster than those with more. 1168 21,384,027,590 39 days The fastest instance was D2v2 with 2 cores of Intel Xeon E5-2673 19,523 357,431,824,184 56 hours v3 at a speed of 2.4 GHz with 7GB of memory. The search speed 10,000 183,082,428,000 4.5 days per core was about 18.3 million keys per second. The second fastest instance was D1v2 which performed about 500 thousand keys per 49,695 909,828,125,946 22 hours second slower. The third instance type D11v2 was slower than the first by over 1.7 million keys per second. Since they all use the same Using instance D2v2, in the case of searching through every key hardware, we assumed the cause could lie in the overhead of the the time required to finish with 20 cores would be almost 76 virtualization. months. To beat the best time of 39 days, we would need 1,168 We compared instance type A versus type Dv2. The average cores. To beat the 56 hours of the winner of the second competition, calculated speed of type A was around 12,228,215 keys per second the required number of cores would be 19,523. This is already over while the average speed of type Dv2 was 17,539,682 keys per the maximum number of cores allowed by the Microsoft Azure second, which means that instance type Dv2 were, on average, cloud service. Using the maximum number of cores, we would need 25 4.5 days. To find a key faster than in all previous competitions we [2] A. J. Menezes, P. C. Van Oorschot, and S. A. Vanstone, would have to use almost 50,000 cores. Handbook of applied cryptography. Boca Raton: CRC Press, 1997. 6.3 Virtualization overhead [3] F. Rubin, ‘Foiling an Exhaustive Key-Search Attack’, Since modern cloud computing is powered by virtualization Cryptologia, vol. 11, no. 2, pp. 102–107, Apr. 1987. technology we also investigated this aspect. While virtualization [4] ‘Azure Batch feature overview | Microsoft Azure’. [Online]. may have numerous advantages, it has also drawbacks. One is Available: https://azure.microsoft.com/en- performance loss. To measure how much of performance is lost us/documentation/articles/batch-api-basics/. [Accessed: 17- when using virtualization, we ran our searching algorithm on a May-2016]. normal computer (Intel i5-3570k CPU 3.4GHz). The personal [5] ‘RSA Laboratories - Contest Rules’. [Online]. Available: computer was able to search 30 million keys per second per single http://www.emc.com/emc-plus/rsa-labs/historical/contest- core, that is around 64% faster than the speed of cloud instance rules.htm. [Accessed: 12-Apr-2016]. D2v2. If we subtract the difference in speed of the CPUSs we could [6] F. PUB, Data Encryption Standard (DES). 1999. assume that the loss in performance is around 22%. [7] M. Curtin and J. Dolske, ‘A Brute Force Search of DES Keyspace’. 7. CONCLUSION [8] D. McNett, ‘[RC5] [ADMIN] The secret message is...’, 24- Feb-1998. In this paper we presented the use of Microsoft Azure cloud [9] Electronic Frontier Foundation, Ed., Cracking DES: secrets services with the new Azure Batch service for high performance of encryption research, wiretap politics & chip design, 1st application computing, namely for exhaustive key search for the ed. San Francisco, CA: Electronic Frontier Foundation, DES algorithm. We used a brute force attack approach and 1998. estimated the computing power needed for a successful attack. [10] ‘EFF DES Cracker Press Release, July 17, 1998’. [Online]. We used different variants of the Microsoft Azure cloud service Available: platform (A1, A2, A3, A4, A5, A6, D1v2, D2v2 and D11v2). https://w2.eff.org/Privacy/Crypto/Crypto_misc/DESCracke According to our measurements the fastest instance was type D r/HTML/19980716_eff_descracker_pressrel.html. version 2. The maximal number of cores of virtual machines that [Accessed: 12-Apr-2016]. could be run per account was 10,000. Since cloud computing is [11] ‘distributed.net: Project DES’. [Online]. Available: based on virtualization, there is some loss of performance - we http://www.distributed.net/DES. [Accessed: 12-Apr-2016]. calculated it to be around 22%. The documentation of the cloud [12] ‘RSA Laboratories - DES Challenge III’. [Online]. provider estimated the loss to be around 15-20% [17]. Available: http://www.emc.com/emc-plus/rsa- It can be concluded that exhaustive key search can be performed labs/historical/des-challenge-iii.htm. [Accessed: 12-Apr- successfully. The required time depends on the number of activated 2016]. cores. In the worst case scenario, using the maximum number of [13] ‘Brute force attacks on cryptographic keys’. [Online]. cores, 10,000, it is possible to find the key in a time of 4.5 days. If Available: http://www.cl.cam.ac.uk/~rnc1/brute.html. we wanted to improve this time, we would need more cores, which [Accessed: 12-Apr-2016]. would require multiple accounts. [14] S. Srinivasan, Cloud Computing Basics. Springer, 2014. [15] ‘Azure infrastructure services implementation guidelines’. Another aspect of speeding up the process would include the [Online]. Available: https://azure.microsoft.com/en- optimization of the software used for searching. In Key feature the us/documentation/articles/virtual-machines-linux- S-boxes were outdated and by updating them to the newest version infrastructure-service-guidelines/. [Accessed: 13-Apr- we could lower the time complexity. 2016]. Finally, it has to be noted that those results are based on the [16] ‘Pricing - Virtual Machines (VMs) | Microsoft Azure’. Microsoft Azure cloud and could differ if other cloud provider [Online]. Available: https://azure.microsoft.com/en- would have been used. us/pricing/details/virtual-machines/. [Accessed: 13-Apr- 2016]. For the future we could research the search speed of other [17] ‘Optimizing Performance on Hyper-V’. [Online]. Available: encryption algorithms, mostly those which are still considered as https://msdn.microsoft.com/en- secure. us/library/cc768529(v=bts.10).aspx. [Accessed: 23-May- 2016]. 8. REFERENCES [1] G. J. Simmons, ‘Symmetric and asymmetric encryption’, ACM Comput. Surv. CSUR, vol. 11, no. 4, pp. 305–330, 1979. 26 From a New Paradigm to Consistent Representation Gordana Rakić Jozef Kolek Zoran Budimac University of Novi Sad University of Novi Sad University of Novi Sad Trg D. Obradovića 4 Trg D. Obradovića 4 Trg D. Obradovića 4 21000 Novi Sad, Serbia 21000 Novi Sad, Serbia 21000 Novi Sad, Serbia goca@dmi.uns.ac.rs jkolek@gmail.com zjb@dmi.uns.ac.rs ABSTRACT become complex and heterogeneous with respect to tech- In this paper, a method for mapping between language con- nologies and languages. A characteristic example are soft- structs that belong to different programming paradigms is ware products where business logic is developed in some dy- provided. The method is based on an universal source code namic multi-paradigm language, where functional paradigm representation used by Set of Software Quality Static Ana- is always very popular. These business components are of- lyzers (SSQSA) platform, and motivated by need to consis- ten hidden behind modern user interfaces developed in, lan- tently support different paradigms by static analysis. The uages desingned for that. Even if functional paradigm is method is illustrated by an example of integration of support not well supported by static analysis tools, there is an in- for functional paradigm. terest coming from practice for improvements in this area. Some language-specific tools are already in mature phase of Categories and Subject Descriptors development [2]. D.3.3 [Programming Languages]: Language Constructs and Features Described conditions bring us to the very difficult task of reconciliation of opposed objectives - heterogeneous projects are to be consistently supported by static analysis. This sup- General Terms port has to involve multiple tools because of limitations of Languages available ones, but we cannot rely on consistency of analysis results among tools. Solution is to achieve consistency of Keywords static analysis by involving only one universal tool that will SSQSA, eCST, Functional Languages, Scheme support all languages, technologies and platforms. SSQSA platform [6] is on a good way to meet these goals. 1. INTRODUCTION Static analysis of computer pqrograms is the analysis that 2. SSQSA AND ECST is performed without actual execution of programs. Static Set of Software Quality Static Analyzers (SSQSA) is a plat- analysis is mostly performed on source code of computer pro- form for building and integration of a set of software tools gram or on some intermediate representation of it (e.g. an for static analysis. Starting aim of the framework is con- intermediate code, a tree, a graph, a combination of previous sistent software quality analysis for projects developed in ones, or even some complex meta-model). Systematic and multiple languages, paradigms, and technologies. Essential consistent application of static analysis techniques can sig- characteristics of SSQSA platform are its: nificantly improve the quality of a software product (to find (1)Extendibility by new analysis. All implementations weak points, discover bad design, bad maintainability, etc.). of analysis algorithms are independent of the input program- Static analysis is usually done by specialized tools. However, ming language and each of the integrated analyzers can be in practice they suffer from several weaknesses (e.g. limita- uniformly applied to software systems that are written in tions regarding supported languages) [6]. Furthermore, it different programming languages. Furthermore, after inte- is shown that different tools give differqent results for the gration of a new analysis it is applicable to all languages same metrics applied on the same source code [4],[5]. supported by SSQSA. (2)Adaptability to a new language. Support for a new Characteristics of contemporary software projects are that language can be integrated. After adding a new language, they sometimes last for decades, while during decades they whole set of implemented analyses is immediately applica- ble to it. Introducing the support for a new input language into SSQSA framework is a straightforward semi-automated procedure [6]. For these purposes we need an appropriate (formal) specification of the programming language. After- wards, we should only follow the steps of the established procedure. These characteristics of SSQSA platform are based on uni- versality of enriched Concrete Syntax Tree (eCST)[6]. eCST is based on concepts of syntax trees. It contains full source 27 code without abstractions enriched with universal nodes. and returns the result. We assume that redefinitions of im- Universal nodes are predefined so-called imaginary nodes portant syntax constructs are not performed. (2) Scheme with language independent meanings which denote semantic support macros. It is supposed that, all Scheme code that concepts expressed by specific constructs of a language (e.g. is used as input to SSQSA is a Scheme code with already LOOP STATEMENT is used to denote any loop expressed expanded macros. Therefore, the macro free Scheme code is by for, while, do, repeats, etc. depending on the language). expected, and macros are not considered in this paper. Currently, SSQSA supports representative set input lan- In this section we are passing through characteristic con- guages, while support for functional languages is still weak. structs of Scheme languagqe, level by level. Namely, integration of functional languages such is Erlang or Scala is in the testing phase. However, support for a clean 4.1 High-level entities functional language such is Scheme is still not introduced. The largest entity that has to be marked is a compilation In this paper we describe a motivation and an approach unit. Compilation unit is marked using the universal node to integration of functional language. Focus of the paper COMPILATION UNIT which is always the root node of a is on mapping of functional constructs to eCST universal single source unit that has to be compiled or interpreted sep- nodes. We map some of the most characteristic functional arately. To make a parallel to other languages, compilation constructs written in Scheme to illustrate the approach. unit is a single compilable unit (e.g. class or module), usu- ally determined by an input file. Scheme compilation unit 3. MAPPING APPROACH consists of expression sequence where in the most Scheme As it was mentioned earlier the translation of programming interpreter implementations the last expression is evaluated language Scheme into the eCST is done by adding universal when unit is loaded. nodes into generated syntax tree by the specification. Here, we are describing how the particular syntax elements are Scheme entity can be Scheme programs and library. Scheme marked by the corresponding universal nodes. We focus on libraries can import and export functions. The universal some constructs characteristic for functional languages to node PACKAGE DECL which is used for marking a pro- demonstrate the approach, while the rest of constructs will gram packages must be child of the COMPILATION UNIT. be just mentioned. Even the Scheme does not have packages in the real sense, each compilation unit is marked with this node. Names that Our approach is based on previous experience. When con- are imported from libraries are marked with the universal sidering a specific construct in Scheme, concrete mapping node IMPORT DECL, whose direct child must be NAME method consists of following steps: (1) Determine the con- which marks identifiers. struct which is to be mapped (e.qg. code fragment) (2) Determine its semantic (e.g. definition of the function) (3) At the next level of hierarchy in Scheme entity we can find Determine all factors participating in it (e.g. argument dec- variable or function definition. Another important construct larations body or statements) (4) Compare the role of all in any program is block. factors with other supported languages in order to find an equivalence (5) Define mapping which is consistent with sup- ported languages 4.2 Block Scheme defines sequences or special expressions that are 4. CASE STUDY: SCHEME used for grouping of other expressions. These sequences The programming language Scheme is a functional, and dy- are defined using keyword begin. The last expression in namically typed programming language. It is based on math- the body of begin block returns its value as a value of the ematical concept called lambda calculus, introduced by Alonzo block. Sequences are nothing more than a scopes without Church [1]. Although the lambda calculus is very powerful locally declared variables. It can be compared to block be- concept which can be used to write any program, it is not tween BEGIN and END in Modula-2 or between { and } in the most practical approach. Therefore, Scheme brings some Java. Sequences, starting with expression begin, are placed minor modifications of it. Unlike basic lambda calculus, in in sub-tree of the universal node BLOCK SCOPE. the Scheme lambda expression can bind several variables at once. It also contains constants, numbers, data structures, Let expressions in Scheme represents expressions with the and so on. Also, Scheme contains various programming lan- scope and locally defined variables. There are four different guage constructs, assignment operation, environment of de- let expressions: let, let*, letrec, and named let. Example of fined names, libraries for input/output operations, etc. The basic let expression would be: language Scheme adds to basic lambda calculus all the neces- ( l e t ( ( x 1 0 ) ( y 2 0 ) ) (+ x y ) ) sary features that a practical programming language needs. Thus, Scheme becomes a simple but very expressive pro- gramming language. It finds its place in education and also This is a block with two declared variables and one opera- in software industry. tion on them, i.e. statement. Variables x and y are bound to the numbers 10 and 20 respectively and the whole let ex- There are two assumptions before mapping Scheme to eCST. pression returns the sum of these two values. When trans- (1) Scheme symbol can be redefined in any way, without any lating Scheme into eCST the expression let, let*, and letrec restriction. For instance, the expression if can be easily re- are treated equally and they are marked by universal node defined using the function (define (if x y) (+ x y)), where BLOCK SCOPE, The named let is treated as a function, if becomes a function that makes a sum of two numbers, because it can recursively call itself (Section 4.4.3). 28 4.3 Variables function in the subtree. Lambda function is again marked In Scheme variables are declared and defined using define or by FUNCTION DECL, FORMAL PARAM LIST, and PA- let expression. Following examples of using define and let RAMETER DECL, as described, while the node NAME of to define a variable are equivalent, while let is usually used FUNCTION DECL remains empty in this case. only inside the function body. 4.4.3 Let ( d e f i n e x 1 0 ) o r ( l e t x 1 0 ) A special case of the Let block is named let. It is used to express tail-recursion. It can be observed as a function that These are corresponding constructs to variable declaration can be called only from its body. Therefore it is a function with initialization in any other language, e.g. int i = 10 in with certain restrictions on syntactical level. However, once Java. In both cases, a variable declaration is marked with when this function is defined according to language rules it universal node VAR DECL. VAR DECL has the universal has all characteristics of recursive function. For example: node TYPE as a direct child. In Scheme a type of newly declared variable is determined implicitly, thus TYPE sub- ( d e f i n e ( f a c t o r i a l x ) tree stays empty until we determine types. This is a task ( l e t l o o p ( ( x x ) a c c 1 ) ) for eCST Manipulator [6]. Consistent post-processing of dy- ( i f ( z e r o ? x ) a c c namic types is planned for future work (Section 6). Initial- ( l o o p ( sub1 x ) ( ∗ x a c c ) ) ) ) ) ization is observed as an assignment statement and inside it variable name is marked with the NAME and value by node It is obvious that this is equivalent to recursive function VALUE. The x is the name of variable and 10 is value that definition in any other language. The main difference is variable x is bound to. that other languages usually do not require explicit syntax constructs for recursive function. Named let is marked by 4.4 Functions universal nodes used for other function definitions, where A Scheme functions are defined using define and let expres- name of the let block is a name of the function. sions, as well. There are several approaches to define func- tions. In all cases it is marked using the universal node 4.5 Statements FUNCTION DECL, list of parameters is marked using FOR- Blocks and function bodies are built from statements. State- MAL PARAM LIST, and each parameters marked by PA- ments in Scheme vary from simple expressions to complex RAMETER DECL, which is similar as for the first approach. ones such are branch statements, loop statements and con- The node NAME marks the function name. Similarly, like tinuation statements. variables, parameters have their name and type. Inside func- tion body we can find different expressions (i.e. statements). 4.5.1 Function calls Scheme comes with the two possible ways that functions can 4.4.1 Define be called. For example: The first approach to declare function is mostly used in prac- tice. For example: ( sum a b c ) , o r ( a p p l y sum a b ’ ( 1 2 3 ) ) ( d e f i n e ( sum x y ) (+ x y ) ) The first way one is the mostly used. The second way is explicit call of function by using command apply. The main This is an equivalent case as definition of, for example, pro- difference is in the way they are executed, while the mean- cedure in Modula-2 or method in Java. Thus, this function ing is the same. Both types of function calls are marked by declaration is marked by universal nodes FUNCTION DECL, the universal node FUNCTION CALL, whose direct chil- FORMAL PARAM LIST, and PARAMETER DECL, as de- dren nodes are NAME and ARGUMENT LIST. The node scribed. The node NAME marks the function name, which ARGUMENT LIST is used to mark a list of actual param- is sum in this particular case, and TYPE remains empty. eters, and ARGUMENT is used to mark each argument in Parameters also have their name and type. Names are x the list. and y, while type is temporarily empty. 4.5.2 Branch statements 4.4.2 Define lambda In Scheme there are many of conditional expressions: if, not, The second approach to function definition is by using key- and, or, cond, when, unless, and case. The if expression is word lambda. Lambda function is treated as an anonymous equivalent to conditional expression in Java-like languages. function bounded to a variable. Analogy which can be used For example, following expressions are equivalent. when observing these variables whose type is anonymous ( i f (< x y ) \#t \# f ) , and function are procedural types in the programming language ( c o n d i t i o n ? c o n s e q u e n t : a l t e r n a t i v e ) Modula-2. Example of the function defined by this approach is: A conditional expression is marked using the universal node ( d e f i n e sum ( lambda ( x y ) (+ x y ) ) ) BRANCH STATEMENT, condition is marked using CON- DITION, while consequent and alternative are marked using This can be observed as a variable whose type is the lambda BRANCH as a direct child of the BRANCH STATEMENT. function. Therefore, the root node is VAR DECL, with two The conditional expressions not, and and or are marked by children nodes NAME (sum) and TYPE with whole lambda universal node LOGICAL OPERATOR. 29 Comparison of Agile Methods: Scrum, Kanban, and Scrumban Lucija Brezočnik Črtomir Majer Faculty of Electrical Engineering Faculty of Electrical Engineering and Computer Science, and Computer Science, University of Maribor, University of Maribor, Maribor, Slovenia Maribor, Slovenia lucija.brezocnik@um.si crtomir.majer1@um.si ABSTRACT 2.1 Scrum In software development, companies are forced to introduce Scrum [8, 9, 10] is an agile framework that comprises principles changes in the way they manage a project’s development because and practices that help teams deliver new products as soon as of ever-shorter cycles and ongoing changing requirements. The possible with continual improvements and with rapid adaptation to changes in development projects are frequently conducted by the changes. Scrum has three roles: Product Owner (the voice of the introduction of agile methods, which have in recent decades sharply customer who is responsible for the ROI and should not be increased in popularity. But a major question remains: "Which agile mistaken with the product manager), the Scrum Master (who method is optimal for our company?" In order to answer this observes the team, ensures that there are no violations in the Scrum question, we compared the three most prevalent among them: rules, and removes any impediments that the team may have), and Scrum, Kanban and Scrumban. the Team (cross-functional team that is responsible for delivering shippable increments of product at the end of each Sprint). Categories and Subject Descriptors The Sprint is a fixed-length iteration and represents the basic unit D.2.9. [Software Engineering] – Management of development. Before each Sprint, the Sprint Planning event takes D.2.10. [Software Engineering] - Design place in which the Sprint Backlog is defined. All Sprints end with a Sprint Review and a Sprint Retrospective. In the Sprint Review, General Terms the Team and Product Owner are involved and seek opportunities for improvement. The Sprint Retrospective convenes a Scrum Management, Design, Theory. Master and tries to optimize the development process itself. Keywords 2.2 Kanban agile software development, agile methods, Scrum, Kanban, Kanban is a process management method developed at Toyota and Scrumban builds on the experience of other agile methods. The main objective of Kanban is the elimination of delays and waste, which has a 1. INTRODUCTION positive effect on workflow optimization. It is based on the Just-In- Software companies switch to agile development mostly due to the Time technique for task scheduling, which requires the precise desire to accelerate product delivery, enhance the ability to manage definition and implementation of a task as late as possible in the fast-changing priorities, to increase productivity, and to improve workflow, to get rid of unnecessary re-planning [4, 6]. The three software quality [12]. Interestingly, the cost of the project and basic guidelines for the Kanban method are: maintenance of software has no significant impact in making the  transition [5, 12]. From this, we can conclude that the biggest Visualize workflow. That is typically done with the problem in the traditional approach is a period of software Kanban board, which clearly defines all the required development and a decreased ability to manage changing priorities. steps (board columns) of the development process. Tasks But that is precisely what is most important for customers [3]. are prioritised and put in the board column that best defines the current state of the task. Tasks are moved In this paper, we focus on three agile methods – Scrum, Kanban, between states until they get into the Done state – the goal and Scrumban. Research [5] has shown that about half of is to finish tasks that are already in the flow as soon as businesses still use the waterfall model, while the other half uses possible instead of starting new tasks [6]. agile and iterative approaches. Companies using agile methods, according to data from the tenth annual survey VersionOne [12],  Limit Work in Progress (WIP). Each step in the process most often opt for Scrum and Scrum + XP (70%), Scrumban (7%) must have a WIP limit, optimized according to and Kanban (5%). From our selection of agile methods, we complexity, in terms of the number of workers and other removed Extreme Programming (XP), because its principles are parameters. The WIP limit forces us to focus on one task often used in combination with other methods (Scrum, Kanban). at a time instead of doing multiple things concurrently. It 2. AGILE METHODS The main point of agile methods is the constant embrace of changes, which is in contrast with traditional methods. Changes are a natural part of development projects and as such should be adequately addressed [8]. 30 3.1 Board The board is used in virtually all methods but it differs in terms of how we use it. The Scrumban board is reset with each Sprint, which means that all tasks are put into the ToDo column. When using follows the “achieve more by doing less” principle, Kanban, resets do not occur, because there are no iterations – new which has repeatedly been proven true [4, 6]. tasks are provided in a constant flow. The Scrumban board typically  looks like a Kanban board, but we can experience some resets when The “pull” principle. When moving tasks between stages, finishing the current bucket and moving to the next one. we must obey the pull rule, which states that we can only take some new task in a certain stage if the WIP limit has 3.2 Artifacts not already been reached. This helps us with the early Scrum requires a clearly defined product backlog, sprint backlog Figure 1: Board comparison in Scrum and Kanban and burndown chart, thus requiring more effort from the team to identification of delays and impediments the workflow, keep artefacts up-to-date, compared to Scrumban and Kanban. thus encouraging teamwork. While Kanban does not demand any specific artefact, Scrumban  requires an iteration backlog and bucket plans. Minimalize, measure and improve. Kanban maintains existing teams, processes, roles and responsibilities of the 3.3 Iterations team – it introduces minimal changes for its adoption. It Scrum defines iterations (called Sprints) as part of the Scrum establishes some control over process flow, but keeps the lifecycle. They can last from one to three weeks. At the end of every existing approaches that work well in place. Kanban Sprint, we expect a totally functional product with new features or encourages the usage of agile metrics to measure other upgrades that are accepted by the product owner. Scrumban performance, monitor the progress and improve also has iterations, which are not strictly defined in terms of tasks workflow efficiency [4, 6]. and length; however, their duration should not be longer than two weeks since shorter iterations allow for a more rapid adaptation to 2.3 Scrumban change. Kanban does not define iterations, as new tasks are defined Scrumban is a composite of Scrum and Kanban methods, as it on demand as late as possible. contains the basic properties of Scrum and the flexibility of Kanban. Long-term development goals, in Scrumban, are defined 3.4 Tasks via bucket size planning. Each bucket contains a development plan, that needs to be realised within a given time, for example: three months for the nearest bucket. This bucket holds fine-grain definitions of tasks, while buckets that represent long-term plans, for example, a 1-year bucket, hold only a draft – those buckets are deficient [7]. That is due to the Just-In-Time principle taken from Kanban, which urges us to make fine-grained plans as late as possible. Just like Kanban, Scrumban also limits the Work-in- Progress and enforces the “pull” principle for moving tasks between stages [1]. Scrumban does not require any new roles (like Scrum), however, it encourages short daily meetings and kaizen events that are meant for the resolution of everyday impediments [2, 11]. Scrumban stipulates that iterations should not be longer The time span of each task in Scrum is limited to the duration of than two weeks, but unlike Scrum, it allows for long-running tasks Sprint. In any case, we try to break such long tasks into smaller which can extend across several iterations. This can lead to an ones, so that no task is longer than one day. Kanban and Scrumban incomplete product at the end of the iteration, which is why do not limit the time span of tasks. Even though Scrumban has Scrumban has introduced a Feature Freeze (FF). When the team is iterations, it allows long-running tasks (Figure 1). approaching the end of the current iteration, it stops working on new features, and instead focuses on finishing those already in 3.5 Priority With Scrum, task prioritisation is made when planning the Sprint, while with Kanban task prioritisation is made on a daily basis with just-in-time planning and the pull principle. Whenever a new task process. Features that are still incomplete need to be disabled or is pulled into the workflow, it must have the highest priority for the removed from the final product, so the incremental release can be team. Scrumban first prioritises work with bucket size planning, made [2, 7, 11]. after which tasks are defined and prioritised for each iteration and, lastly, on a daily basis as is the case with Kanban. 3. COMPARISON OF AGILE METHODS 3.6 Work estimation Figure 2: Tasks in iterations in Scrum (left) and in Scrum prescribes task estimation before each Sprint, while Kanban Kanban (right) Figure 3: Changes in work plan in Scrum (above) Hereafter, we will compare the methods according to the 12 main and in Kanban (below) perspectives. and Scrumban do not require estimation. Some teams prefer to define tasks in such a manner that all tasks have similar 31 complexities, thus requiring approximately the same time for completion. 3.7 Team Scrum teams must be cross-functional, which means they are able to provide product increment entirely on their own (from planning to deployment). Kanban and Scrumban allow cross-functional and specialised teams, depending on the product type and what works best for a given scenario. 3.8 Roles Scrum prescribes the following roles: product owner, development team and Scrum master. The product owner is responsible for TODO, and the Scrum Master is responsible for daily meetings and solving the non-technical problems of the team. Kanban and Figure 4: Graph of stress levels, depending on the work done Scrumban do not define any special roles, so that tasks for maintaining the agile method are divided among team members. 3.9 Changes in work plan Scrum does not allow any changes in the work plan when Sprint is running – that is why we make detailed plans and estimations before Sprint, and do not make any changes (Figure 3, above). Scrumban and Kanban provide no rules that forbid changes in the work plan at any given time (Figure 3, below). The tasks in the ToDo state can be easily replaced with new ones. Also, tasks that Figure 5: Stress level through iterations are already in process can be taken back to ToDo and more important tasks can be pulled in. 3.12 Activities to maintain the agile method Scrum activities to keep the method alive, consist of an up-to-date 3.10 Bug fixing backlog, a Sprint backlog, daily meetings, a board and There are two types of software faults, those that appear at the time retrospective. Kanban requires visualisation of a workflow of development (often called defects) and ones that appear after (typically a board) and demands the respect of Work-in-Progress software is released and running in a real-time environment, called limits for each stage of the process. Scrumban extends Kanban bugs. Kanban and Scrumban allow unplanned bug-fixing right activities and adds bucket size planning, daily events (standups) away – if fixing a bug has a higher priority than current tasks in and iteration planning. ToDo, then this task is put on the board. With Scrum, bug-fixing is by-the-book planned for the next Sprint – it would be unreasonable 4. CONCLUSION to change the current Sprint plan due to all the preparations and In this paper, we presented the most widespread agile methods: estimations that are done before the Sprint. In reality, we know that Scrum, Kanban and Scrumban. Each method has its own critical bugs must be fixed as soon as possible, so Scrum teams take advantages and disadvantages, but it is necessary to bear in mind different approaches to tackling this problem. Some teams define that none of them will benefit businesses, if not used in the right one day of a week (or part of a day) as a “bug fixing day,” other way. It is, therefore, important to choose the one agile method that teams reduce the number of story points for Sprint, so that there is best meets the requirements and wishes of the company. still some time left for unexpected things, like fixing bugs. Scrum certainly works best in mature companies that have 3.11 Stress experienced teams who have been working on the product or Research show that stress is highly correlated to the amount of work project for more than one year. For companies with continuous that a team member is responsible for. The ideal workload per production that need a rapid response to changes and product teams person would be evenly distributed to their optimal level. A person that are working in support and maintenance of the product, we must not feel too much of a burden on their shoulders, thus leading recommend using Kanban. Scrumban is best for young, small to exhaustion, nor too free, which leads to poor progress (Figure 4). companies since it boasts the flexibility of Kanban and the basic Team members must see a constant improvement in the product, characteristics of Scrum. which keeps them motivated and dedicated. With Kanban, we can Agile methods definitely include a strong component of flexibility. achieve a mostly evenly distributed workload because there are no Teams could, regardless of the method chosen, adapt it in a way iterations, and thus tasks are continuously added to the workflow. that would serve their purpose – i.e. an effective work organisation Sprints in Scrum are time-limited (typically from 2 to 4 weeks), so and the development of quality products. there is often more work done at the end of the Sprint than in the beginning (Figure 5). Scrumban is somewhere in the middle of those two, because it allows long-running tasks, so team members 5. REFERENCES are not so stressed if some tasks are not completed. For highly [1] Baleviciute, G. 2014. Whitepaper – Scrum vs Kanban vs. motivated and self-initiative teams, Kanban can be a good fit. Scrumban. Retrieved September 10, 2016, from Teams that do not have such properties need time limits in which http://goo.gl/dkrbGE. some progress is expected, thus Scrum and Scrumban provide a [2] Bieliūnas, E. 2014. Scrum-ban for Project Management. better match. Retrieved September 10, 2016, from http://goo.gl/JgfaaA. 32 [3] Bittner K., Lo Giudice D., DeMartine A., Mines C., [8] Pichler, R. 2010. Agile Product Management with Scrum: Hammond J. S., Turrisi T., and Izzi M. 2016. Forrester Creating Products That Customers Love. Addison Wesley. Research – Boost Application Delivery Speed And Quality [9] Swisher, W. P. 2014. Implementing Scrumban. Retrieved With Agile DevOps Practices. September 10, 2016, from [4] Brechner, E. 2015. Agile Project Management with Kanban, https://switchingtoscrum.files.wordpress.com/2013/12/imple Microsoft Press. menting-scrumban_v1-32.pdf. [5] Gartner. 2015. Holz B. presentation “Agile in the [10] VersionOne. 2016. VersionOne 10th Annual State of Agile Enterprise”. Report. [6] Klipp P. 2014. Getting Started with Kanban, Amazon Digital [11] Sutherland J., and Schwaber, K. 2013. The Scrum Guide. Services LLC. Retrieved September 10, 2016, from http://www.scrumguides.org/docs/scrumguide/v1/scrum- [7] Misevičiūtė, D. 2014. Scrumban: on demand vs. long-term planning. Retrieved September 10, 2016, from guide-us.pdf. http://www.eylean.com/blog/2014/11/scrumban-on-demand- [12] Sutherland J. 2010. Scrum Handbook, The Scrum Training vs-long-term-planning/. Institute. 33 4.5.3 Loop statement mapping of this paradigm to eCST. In case that we find some Scheme has a loop expression do which can be compared to new construct that belongs to already supported paradigm, for statement in programming language Java. we can apply the same procedure to meet the consistency of mapping. ( do ( ( v ( make−v e c t o r 5 ) ) ( i 0 (+ i 1 ) ) ) ((= i 5 ) v ) ( v e c t o r −s e t ! v i i ) ) Furthermore, this method provides SSQSA with consistency among languages and paradigms. Namely, when we are in- Loop statements are marked by using the universal node tegrating a multi-paradigm language, we are determining LOOP STATEMENT. However, characteristic approach for paradigms included in that language, recognise which con- dealing with repetitions in functional languages is recursion. struct belongs to which paradigm, and map each paradigm In eCST recursive functions are marked as regular func- separately according to the defined method. This is applied tions (Section 4.4.3, while semantic transformations are are to each new language and each new paradigm. planed for future work. There are still some open questions to be addressed in future work. They are related to more general issues. One of these 4.5.4 Continuations question is: How to map implicitly defined types is dynam- First class continuations of computer programs are construc- ically typed languages? Next question is related to similari- tions that are representing program state which can be saved ties and differences between iterations and recursions. This as value of variable, to be used at a later point in the pro- topic especially rises with control-flow analysis where two gram. Programming language Scheme implements these kinds of repetitions should be consistently analyses. Nev- first class continuations by an operator call-with-current- ertheless, these are not problems related only to functional continuation. When translating Scheme into eCST the con- languages, while all aspects of these issues are consistently tinuation call is marked using JUMP STATEMENT, since mapped to eCST among integrated languages. Therefore, continuations are changing the control flow of program. An they are not subject of this paper, but will be subject of operator call-with-current-continuation is marked by using improvements of SSQSA platform. The future work directly OPERATOR. related to integration of Scheme and functional paradigm into SSQSA is testing the analysers over new datasets that 5. RELATED WORK will contain code written in functional languages. Before definition of this method some languages were mapped to eCST and integrated into SSQSA [6]. These were mainly 7. REFERENCES imperative languages, and mapping among their constructs [1] H. P. Barendregt and E. Barendsen. Introduction to was more logical. However, Erlang as a functional language lambda calculus. Nieuw archief voor wisenkunde, was integrated up to prototype level [7], while some issues 4(2):337–372, 1984. remained unsolved. [2] I. Bozó, D. Horpácsi, Z. Horváth, R. Kitlei, J. K˝ oszegi, M. Tejfel, and M. Tóth. RefactorErl – Source Code Authors of [3] tried to cross the gap between imperative Analysis and Refactoring in Erlang. In Proc. of the 12th and functional programming by refactoring. They were mo- Symposium on Programming Languages and Software tivated by integration of functional paradigm in Java pro- Tools, pages 138–148, Tallin, Estonia, October 2011. gramming language, and the goal was to provide Java de- [3] A. Gyori, L. Franklin, D. Dig, and J. Lahoda. Crossing velopers with refactoring techniques that will lead them to the gap from imperative to functional programming functional code. Basically, this is a kind of mapping be- through refactoring. In Proc. of the 2013 9th Joint tween two paradigms and can be useful in our research for Meeting on Foundations of Software Engineering, pages a comparison of approaches. However, they provide only 543–553. ACM, 2013. two refactoring methods, focused on two new Java features, [4] R. Lincke, J. Lundberg, and W. Löwe. Comparing while other constructs are not covered. software metrics tools. In Proc. of the International Symposium on Software Testing and Analysis, ISSTA 6. CONCLUSIONS AND FUTURE WORK ’08, pages 131–142, Seattle, WA, USA, 2008. ACM, In this paper we describe a method for mapping constructs, New York, NY, USA. that belong to a new paradigm, to the eCST in SSQSA plat- [5] J. Novak and G. Rakić. Comparison of software metrics form. We illustrate it by introducing functional paradigm tools for: net. In Proc. of 13th International and Scheme as a clean functional language. In that way we Multiconference Information Society (IS’10), pages provide a double contribution of this paper: (1) determined 231–234, Ljubljana, Slovenia, 2010. rules for mapping functional language to eCST to be fol- [6] G. Rakić. Extendable and adaptable framework for lowed while integrating language which includes functional input language independent static analysis, 2015. paradigm, and (2) determined a method to be applied while [7] M. Tóth, A. Páter-Részeg, and G. Rakić. Introducing introducing support for any new paradigm. The method support for erlang into ssqsa framework. In Proc. Of recommends first to choose a language which is clean rep- The International Conference On Numerical Analysis resent of the paradigm to be integrated. Afterward, we are And Applied Mathematics 2014 (ICNAAM-2014), passing through all paradigm-specific constructs, analysing volume 1648, page 310012. AIP Publishing, 2015. them and comparing with similar constructs from other, al- ready supported paradigms, determining equivalent ones, and specifying concrete mapping. Finally, the mapping de- fined on a clean language is to be applied whenever we need 34 Introduction to Case Management Model and Notation Mateja Kocbek Gregor Polančič Faculty of Electrical Engineering and Computer Science, Faculty of Electrical Engineering and Computer Science, University of Maribor University of Maribor Maribor, Slovenia Maribor, Slovenia mateja.kocbek@um.si gregor.polancic@um.si ABSTRACT case management. Case management is usually driven by a team of A case is presented as a proceeding that involves actions taken case/knowledge workers, who make decisions or perform certain regarding a subject in a particular situation to achieve a desired tasks [5]. outcome. Cases are used in many areas of human operations. The One of the most important characteristics of case management is most common example of a case is from medicine, where every planning. Every case requires a high degree of flexibility, which is patient represents its own case. Every case requires its own essential for the success of human activities. Flexibility is needed operations and functions whereas sometimes humans, who are with selection of tasks for a case, run-time ordering of the sequence involved, can use their knowledge from previous cases. This article in which the tasks are executed, and ad-hoc collaboration with other presents a new standard, called CMMN (Case Management Model knowledge workers on the tasks [5]. Case or knowledge workers and Notation), which has recently been published by OMG. It are those, who have to determine which tasks are applicable, or covers the whole process of case management. The presentation of which follow-up tasks are required to perform [5]. Decisions may standard CMMN includes abstract and concrete syntax as well the be triggered by events or new facts that continuously emerge during semantics and diagram interchange specifications. the course of the case, i.e. the receipt of a new document, completion of certain tasks, or achievement of certain milestones Categories and Subject Descriptors [5]. I.6.5 [Simulation and modelling]: Model Development - In 2014, Case Management Model and Notation (or CMMN) was Modelling methodologies. introduced, by OMG (Object Management Group), as a standard for case management [4]. This article focuses on CMMN, with the General Terms following structure. Chapter 2 gives an overview of CMMN. In Management, Documentation, Performance, Standardization, chapter 3, the actual use of standard is presented. We will conclude Design, Languages, Theory. our article with discussion and conclusion. Keywords 2. RATIONALE FOR INTRODUCING CMMN, Case Management Model and Notation, Case CMMN Management, BPMN. CMMN is in general a graphical representation for expressing a case [10]. It provides an efficient notation for capturing less 1. INTRODUCTION repeatable, dynamic, information-rich contexts. CMMN was In everyday life, many different cases can be found. A case is a very introduced to document the ad-hoc scenarios faced by knowledge common term and can represent a variety of different things or workers in which they need to respond to a continuous flow of concepts. Its common definition is “a particular situation or business events, data and documents. The CMMN specification example of something” [11], whereas in CMMN specification [5], defines abstract elements, notation, execution semantics and a case is presented as “a proceeding that involves actions taken exchange formats [5]. A consortium of 11 companies contributed regarding a subject in a particular situation to achieve a desired to the development of CMMN, which is being maintained by the outcome” . OMG. Version 1.0 of CMMN was released in May 2014 [5], currently CMMN version is 1.1 – Beta [4]. An illustrative example of listed definitions of cases, can be found in medicine, where a case involves the care of a patient, together 2.1 CMMN versus BPMN with his/her medical history as well as current situation. Other The focal rationale for CMMN introduction was a need for more examples of cases are: a law case, social security case, employment flexibility for knowledge workers when modelling business case, etc. A project-related case definition states that “a case is a processes. Flexibility is needed, because some tasks can be done project, transaction, service or response that has different states independently of the time and the sequence of tasks is not (for example: opened, doing, closed) over a period of time to important. So, workers can decide which work to do and what order achieve resolution of a problem, claim, request, proposal, is the best in a particular case. This is the main difference compared development or other complex activity” [13]. to the well accepted business process standard - BPMN. Within A case always contains some kind of a subject which may be a BPMN models, an exact order of activities is defined (i.e. structured person, a legal action, a business transaction, or some other focal process), e.g. activity A has to finish before activity B starts. point around which actions are taken to achieve an objective [5]. However, the exact order is not always the best way to solve Besides, resolving a specific case usually requires a lot of specific instances or cases. A good example is a health case, where information [5], whereas new cases, with no previous experience knowledge workers (i.e. medical stuff, administration, etc.) do not of involved individuals, can be resolved intuitively [5]. know precisely in which direction the specific case will evolve. Another illustrative example is also exception handling, where As mentioned above, resolving a case includes information, flexibility is welcome. But it is also reasonable to stress that to some actions, human resources, knowledge, etc., which can be united in level, processes have to be defined. For example, a nurse has to 35 know exactly which steps need to be taken, when a patient comes case (instance). A Planning Table defines the scope of planning: to a hospital. Collapsed Planning Table (discretionary elements are not visible) and, Expanded Planning Table (discretionary elements are visible). Above we discussed the differences between CMMN and BPMN. BPMN is well known, used and accepted standard, but CMMN can fill out the existing weaknesses of BPMN. Currently, CMMN and BPMN are used separately [12]. 2.2 CMMN structure Beside a modelling notation, CMMN defines a meta-model, a XML–based model for Interchange (XMI) and XML-Schema for Figure 2: Elements exchanging Case models among different environments and tools CMMN defines the following Plan Model Elements: Stage – [5]. considered as episodes of a Case (shown in Figure 3), Task – atomic The meta-model can be used by case management definition tools unit of work during a case (also shown in Figure 3), Event Listener to define functions and features that a business analyst might use – something that happens during the course of a case (shown in when defining a case model for a particular type of case. The Figure 2), Milestone – an achievable target defined to enable notation is intended to express the model graphically [5]. evaluation of progress of the case (also shown in Figure 2). This specification enables portability of case models, so that users can take a model defined in one CMMN implementation and use it in another one. The CMMN XMI and/or XML-Schema are intended for importing and exporting case models among different CMMN implementers [5]. A case model is intended to be used by a run-time case management product to guide and assist a knowledge worker in the handling of a particular instance of a case, for example a particular invoice discrepancy. The meta-model and notation are used to express a Figure 3: Element Stage, Task and Discretionary Tasks case model in a common notation for a particular type of case, and the resulting model can subsequently be instantiated for the handling of a particular instance of a case [5]. In CMMN, an event is something that “happens” during the course of a case. Event may trigger the enabling, activation and 2.3 CMMN Notation termination of Stages and Tasks, or the achievement of Milestones. The outermost element that defines a case, is Case Plan Model Standard events are: Case File Items lifecycle transitions, and (Figure 1). The various elements of a Case Plan Model are depicted Stages, Tasks and Milestones lifecycle transitions. In CMMN there within the boundary of the Case Plan Model shape. The Case Plan are also Event Listeners, that are used to influence the proceeding Model comprises: all elements that represent the initial plan of the of the Case directly, instead of indirectly via impacting information case and, all elements that support the further evolution of the plan in the Case File. There are also two special Event Listeners: Timer through run-time planning by case workers. Event Listener, which is used to catch predefined elapses of time, and User Event Listener enables direct interaction of a user with the case. Figure 4: Tasks Figure 1: Case Plan Model CMMN also defines a variety of Tasks (Figure 4): Human Task – a All information, or references to information, that is required as non-blocking task, that is not waiting for the work to complete, but context for managing a Case, is defined by exactly one Case File. it completes immediately upon installation, Decision Task – a A Case File is meant as a logical model. It does not imply any blocking task, that is waiting until the work associated with the assumptions about physical storage of information. A Case File Task is completed, Process Task – can be used in the case to initiate contains Case File Items (Figure 2) that can be anything from a a business process, and Case Task – can be used to initiate another folder or document stored, an entire folder hierarchy referring or case. containing other Case File Items. A Sentry “watches out” for important situations to occur, which Case management planning is typically concerned with influence the further proceedings in a case. A Sentry is a determination of which tasks are applicable, or which follow-up combination of an Event and/or Condition. A Sentry can be used as tasks are required. Case workers execute the plan, particularly by an entry criterion or as an exit criterion and may consist of two performing tasks as planned and adding Discretionary Tasks parts: an On-Part specifies the event that serves as trigger, and an (Figure 3) to the plan of a case instance. If-Part specifies a condition, as Expression that evaluates over the In CMMN planning is a run-time effort. Users (i.e. case workers) Case File [1,5,7] (Figure 5). are said to “plan” (in run-time), when they select Discretionary Items from a Planning Table, and move them into the plan of the 36 collection of data about the case is often described as a Case File. Case workers use structured and unstructured data when decision- making [3]. Cases are directed not just by explicit knowledge about the particular Case and its context represented in the Case File, but also by explicit knowledge encoded as rules by business analysts, the Figure 5: Task with Sentries tacit knowledge of human participants, and tacit knowledge from Besides, various Decorators can be added to CMMN shapes. Table the organization or community in which participants are members 1 presents Decorators (Planning Table, Entry Criterion, Exit [3]. Criterion, Auto Complete, Manual Activation, Required, 3. CURRENT CMMN ACCEPTANCE Repetition) applicability to CMMN shapes (Case Plan Model, The use of standard CMMN is not widespread. It was designed to Stage, Task, Milestone, Event Listener, Case File Item, Plan be used when planning activities, that do not require an exact order. Fragment). Symbol “+” means that a certain shape accepts Every group of tasks has to be performed, but the time and sequence associated Decorator [5]. are not important. In the following paragraphs, some aspects of use Table 1: Decorators Applicability Summary Table [5] of standard CMMN are discussed. n Table 2 is showing Operating Models used in companies. Models for Coordination, Diversification, Unification and Replication have le n atio n lete v its own degree of Process Integration and Process Standardization erio p ti Tab n [6] [1]. Table shows that CMMN has low degree of Process g terio mo Ac o Standardization for Coordination and Diversification. in al ti n Crit Cri C u iredu eti try it to q p Table 2: Operating models lan an P En Ex Au M Re Re Process High Coordination Unification Case Plan Model + + + Integration Low Diversification Replication Stage + + + + + + + Low High Task +* + + + + + CMMN BPMN Milestone + + + Process Standardization Event Listener Case File Item According to the fact that CMMN was introduced in 2014 and also that version 1.1 is in its Beta phase, it makes sense that there is not Plan Fragment a great number of tools that support standard CMMN. At the time *Human Task only. of the survey we detected only two adequate tools. The first tool for A case can be considered in ad-hoc manner, which is some kind of modelling with standard CMMN is Camunda [2], an open source equivalent to ad-hoc processes in BPMN, because there is no platform for Business Process Management. It is suitable for specific order or sequence of the completion of the tasks. It is also development and provides business-IT-alignment based on BPMN permitted to perform tasks in any frequency [3]. Usually all ad-hoc for structured workflows, CMMN for less structured Cases and activities are conducted by human resources, who determinate the DMN for business rules [2]. The other tool for standard CMMN is sequence, time and frequency of the performance of each activity CMMN Modeler (Trisotech) [9]. It is a payable tool. in ad-hoc process [3]. CMMN was primary designed for business analysts, which are the anticipated users of Case management tools for capturing and formalizing repeatable patterns of common Tasks, Event Listeners, and Milestones into a Case model [5]. 3.1 Illustrative Example In this section, a simple example is presented, in which we collected few common elements of the CMMN, which were also introduced in previous chapter. The example briefly defines a process of writing a document, with its basic components. Figure 6: Phases of a Case Besides, a case may be in one of the two phases: design-time and run-time (Figure 6). During the design-time phase, business analysts engage in modelling, which includes defining: (1) tasks that are always part of pre-defined segments in a case model, and (2) “discretionary” tasks that are available to the case worker, to be applied in addition, to his/her discretion. In the run-time phase, case workers execute the plan, particularly by (1) performing tasks as planned, and (2) adding discretionary tasks to the case plan instance in run-time [3]. As we already mentioned, a very important part in case management is reference to data about the subject of the case. The 37 Figure 7 represent a CMMN model that encompass the whole CMMN is in its beginnings, but it has a great potential for at least process of writing a document. First, in the model, we have two to be used in combination with BPMN. Our intentions for future tasks: “Find research topic” and “Create template & graphics”. research are to perform a survey, to recognize the actual acceptance Figure 7: Model of CMMN [8] Initially, any of those tasks can be performed. The next, more and potential use of CMMN. extensive element is a Stage, with the name “Prepare draft”. It contains four tasks. The task “Organize references”, is a Tasks with 5. REFERENCES entry criterion (see the symbol in Figure 5). It is mandatory that [1] Gagne, D. Case Management Model and Notation (CMMN): Tasks, related to this Sentry, perform earlier. The next task “Write An Introduction. 2016. https://prezi.com/yu3lbxamg09v/case- Text”, is special because it contains exclamation mark at the bottom management-model-and-notation-cmmn-an-introduction/. of the shape, which means that the performance of this tasks is required. The same symbol (exclamation mark) is positioned on the [2] GmbH, C.S. Camunda Tool. 2016. https://camunda.org. level of Stage “Prepare draft”. The task “Prepare table of content” [3] Hinkelmann, K. Case Management Model and Notation - is a Human Task, marked with a small human symbol in the left CMMN. 2014. http://knut.hinkelmann.ch/lectures/bpm2013- upper corner of the shape. The last task in this Stage is “Implement 14/06_CMMN.pdf. template & graphics”. It also has a Criterion and it is a [4] OMG. OMG CMMN. 2014. Discretionary Tasks, which is symbolized with dotted line. http://www.omg.org/spec/CMMN/. Later on we can see task “Seek comments” and also Stage “Review draft” with two [5] OMG (Object Management Group). Case Management Model constituting tasks. The speciality of this part is an and Notation 1.0. May (2014), 82. exit Criterion (see the symbol in Figure 5). Both used Stages “Prepare draft” and “Review draft” are later on connected to [6] Ross, J.W., Weill, P., and Robertson, D.C. Enterprise element Milestones with entry Criterion. Two additional elements Architecture as Strategy: Creating a Foundation for Business are also used, namely: Event Listener (Timer) and Case File Item. Execution. 2006. The first one defines a deadline for completing a document, and the [7] Rucker, B. Camunda BPM 7.2: CMMN Case Management second one contains an actual document. The last important (English). 2015. concept, we need to highlight is the Case Model. It is symbolized with a folder and covers the whole described process (also shown [8] Torsten Winterberg. Oracle - CMMN. in Figure 1). Case model “Write document” includes three exit https://blogs.oracle.com/soacommunity/entry/case_managem Criterions. ent_model_and_notation. [9] Trisotech. Trisotech - CMMN Modeler. 4. DISCUSSION http://www.trisotech.com/cmmn-modeler. In our article, we presented a novel standard for Case Management, [10] Wikipedia. Wikipedia - CMMN. - CMMN, which also includes a notation for modelling business https://en.wikipedia.org/wiki/CMMN. processes and graphically expressing a Case. CMMN has some similarities with well-known and accepted standard BPMN. There [11] Cambridge Dictionary. are some similar elements, like Tasks, Events, Sub process, etc., but https://dictionary.cambridge.org/dictionary/english/case. there is also very important difference between CMMN and [12] BPMN and CMMN Compared. 2014. BPMN. BPMN requires accurate knowledge of a business process http://brsilver.com/bpmn-cmmn-compared/. that is intended to be used when modelling. There is actually no space for flexible execution of business processes. On the opposite, [13] AIIM - What is Case Management? 2016. CMMN offers flexibility, which is very welcome (or also required) http://www.aiim.org/What-is-Case-Management. in many business process cases. As we already mentioned, the 38 Indeks avtorjev / Author index Akbulut Akhan ............................................................................................................................................................................. 19 Brezočnik Lucija .......................................................................................................................................................................... 31 Budimac Zoran ............................................................................................................................................................................. 27 Çatal Çağatay ............................................................................................................................................................................... 19 Drevenšek Aleks .......................................................................................................................................................................... 23 Heričko Matija ............................................................................................................................................................................. 11 Karakatič Sašo .............................................................................................................................................................................. 19 Kocbek Mateja ............................................................................................................................................................................. 35 Kolek Jozef ................................................................................................................................................................................... 27 Krishnamurthy Prashant ............................................................................................................................................................... 11 Majer Črtomir ............................................................................................................................................................................... 31 Marko Hölbl ........................................................................................................................................................................... 11, 23 Palanisamy Balaji ......................................................................................................................................................................... 11 Pavlinek Miha .............................................................................................................................................................................. 19 Podgorelec Vili ............................................................................................................................................................................. 19 Polančič Gregor ............................................................................................................................................................................ 35 Rakić Gordana .............................................................................................................................................................................. 27 Sagadin Klemen ........................................................................................................................................................................... 15 Šumak Boštjan ............................................................................................................................................................................. 15 Tatjana Welzer ............................................................................................................................................................................. 11 Verber Domen ................................................................................................................................................................................ 7 Zadorozhny Vladimir I. ................................................................................................................................................................ 11 39 Konferenca / Conference Uredili / Edited by Sodelovanje, programska oprema in storitve v informacijski družbi / Collaboration, Software and Services in Information Society Marjan Heričko Document Outline Blank Page Blank Page Blank Page I - Preogramski odbor.pdf Blank Page Blank Page