Zbornik 19. mednarodne multikonference

INFORMACIJSKA DRUŽBA - IS 2016

Zvezek C

Proceedings of the 19th International Multiconference

INFORMATION SOCIETY - IS 2016

Volume C

Sodelovanje, programska oprema in storitve v

informacijski družbi

Collaboration, Software and Services in

Information Society

Uredil / Edited by

Marjan Heričko

http://is.ijs.si

10. oktober 2016 / 10 October 2016

Ljubljana, Slovenia





Zbornik 19. mednarodne multikonference

INFORMACIJSKA DRUŽBA – IS 2016

Zvezek C



Proceedings of the 19th International Multiconference

INFORMATION SOCIETY – IS 2016

Volume C



Sodelovanje, programska oprema in storitve v informacijski

družbi

Collaboration, Software and Services in Information Society



Uredil / Edited by



Marjan Heričko





10. oktober 2016 / 10 October 2016

Ljubljana, Slovenia





Urednik:





Marjan Heričko

University of Maribor

Faculty of Electrical Engineering and Computer Science





Založnik: Institut »Jožef Stefan«, Ljubljana

Priprava zbornika: Mitja Lasič, Vesna Lasič, Lana Zemljak

Oblikovanje naslovnice: Vesna Lasič





Dostop do e-publikacije:

http://library.ijs.si/Stacks/Proceedings/InformationSociety





Ljubljana, oktober 2016





CIP - Kataložni zapis o publikaciji

Narodna in univerzitetna knjižnica, Ljubljana



004.77(082)(0.034.2)



MEDNARODNA multikonferenca Informacijska družba (19 ; 2016 ; Ljubljana)

Sodelovanje, programska oprema in storitve v informacijski družbi

[Elektronski vir] : zbornik 19. mednarodne multikonference

Informacijska družba - IS 2016, 10. oktober 2016, [Ljubljana,

Slovenija] : zvezek C = Collaboration, software and services in

information society : proceedings of the 19th International

Multiconference Information Society - IS 2016, 10 October 2016,

Ljubljana, Slovenia : volume C / uredil, edited by Marjan Heričko. -

El. zbornik. - Ljubljana : Institut Jožef Stefan, 2016



ISBN 978-961-264-099-6 (pdf)

1. Gl. stv. nasl. 2. Vzp. stv. nasl. 3. Dodat. nasl. 4. Heričko, Marjan

287010304





PREDGOVOR MULTIKONFERENCI

INFORMACIJSKA DRUŽBA 2016



Multikonferenca Informacijska družba (http://is.ijs.si) je z devetnajsto zaporedno prireditvijo osrednji srednjeevropski dogodek na področju informacijske družbe, računalništva in informatike. Letošnja prireditev je ponovno na več lokacijah, osrednji dogodki pa so na Institutu »Jožef Stefan«.



Informacijska družba, znanje in umetna inteligenca so spet na razpotju tako same zase kot glede vpliva na človeški razvoj. Se bo eksponentna rast elektronike po Moorovem zakonu nadaljevala ali stagnirala? Bo umetna inteligenca nadaljevala svoj neverjetni razvoj in premagovala ljudi na čedalje več področjih in s tem omogočila razcvet civilizacije, ali pa bo eksponentna rast prebivalstva zlasti v Afriki povzročila zadušitev rasti? Čedalje več pokazateljev kaže v oba ekstrema – da prehajamo v naslednje civilizacijsko obdobje, hkrati pa so planetarni konflikti sodobne družbe čedalje težje obvladljivi.



Letos smo v multikonferenco povezali dvanajst odličnih neodvisnih konferenc. Predstavljenih bo okoli 200

predstavitev, povzetkov in referatov v okviru samostojnih konferenc in delavnic. Prireditev bodo spremljale okrogle mize in razprave ter posebni dogodki, kot je svečana podelitev nagrad. Izbrani prispevki bodo izšli tudi v posebni številki revije Informatica, ki se ponaša z 39-letno tradicijo odlične znanstvene revije. Naslednje leto bo torej konferenca praznovala 20 let in revija 40 let, kar je za področje informacijske družbe častitljiv dosežek.



Multikonferenco Informacijska družba 2016 sestavljajo naslednje samostojne konference:

•

25-letnica prve internetne povezave v Sloveniji

•

Slovenska konferenca o umetni inteligenci

•

Kognitivna znanost

•

Izkopavanje znanja in podatkovna skladišča

•

Sodelovanje, programska oprema in storitve v informacijski družbi

•

Vzgoja in izobraževanje v informacijski družbi

•

Delavnica »EM-zdravje«

•

Delavnica »E-heritage«

•

Tretja študentska računalniška konferenca

•

Računalništvo in informatika: včeraj za jutri

•

Interakcija človek-računalnik v informacijski družbi

•

Uporabno teoretično računalništvo (MATCOS 2016).



Soorganizatorji in podporniki konference so različne raziskovalne institucije in združenja, med njimi tudi ACM

Slovenija, SLAIS, DKZ in druga slovenska nacionalna akademija, Inženirska akademija Slovenije (IAS). V imenu organizatorjev konference se zahvaljujemo združenjem in inštitucijam, še posebej pa udeležencem za njihove dragocene prispevke in priložnost, da z nami delijo svoje izkušnje o informacijski družbi. Zahvaljujemo se tudi recenzentom za njihovo pomoč pri recenziranju.



V 2016 bomo četrtič podelili nagrado za življenjske dosežke v čast Donalda Michija in Alana Turinga. Nagrado Michie-Turing za izjemen življenjski prispevek k razvoju in promociji informacijske družbe bo prejel prof. dr.

Tomaž Pisanski. Priznanje za dosežek leta bo pripadlo prof. dr. Blažu Zupanu. Že šestič podeljujemo nagradi

»informacijska limona« in »informacijska jagoda« za najbolj (ne)uspešne poteze v zvezi z informacijsko družbo.

Limono je dobilo ponovno padanje Slovenije na lestvicah informacijske družbe, jagodo pa informacijska podpora Pediatrične klinike. Čestitke nagrajencem!



Bojan Orel, predsednik programskega odbora

Matjaž Gams, predsednik organizacijskega odbora



i

FOREWORD - INFORMATION SOCIETY 2016



In its 19th year, the Information Society Multiconference (http://is.ijs.si) remains one of the leading conferences in Central Europe devoted to information society, computer science and informatics. In 2016 it is organized at various locations, with the main events at the Jožef Stefan Institute.



The pace of progress of information society, knowledge and artificial intelligence is speeding up, but it seems we are again at a turning point. Will the progress of electronics continue according to the Moore’s law or will it start stagnating? Will AI continue to outperform humans at more and more activities and in this way enable the predicted unseen human progress, or will the growth of human population in particular in Africa cause global decline? Both extremes seem more and more likely – fantastic human progress and planetary decline caused by humans destroying our environment and each other.



The Multiconference is running in parallel sessions with 200 presentations of scientific papers at twelve conferences, round tables, workshops and award ceremonies. Selected papers will be published in the Informatica journal, which has 39 years of tradition of excellent research publication. Next year, the conference will celebrate 20 years and the journal 40 years – a remarkable achievement.





The Information Society 2016 Multiconference consists of the following conferences:

•

25th Anniversary of First Internet Connection in Slovenia

•

Slovenian Conference on Artificial Intelligence

•

Cognitive Science

•

Data Mining and Data Warehouses

•

Collaboration, Software and Services in Information Society

•

Education in Information Society

•

Workshop Electronic and Mobile Health

•

Workshop »E-heritage«

•

3st Student Computer Science Research Conference

•

Computer Science and Informatics: Yesterday for Tomorrow

•

Human-Computer Interaction in Information Society

•

Middle-European Conference on Applied Theoretical Computer Science (Matcos 2016)



The Multiconference is co-organized and supported by several major research institutions and societies, among them ACM Slovenia, i.e. the Slovenian chapter of the ACM, SLAIS, DKZ and the second national engineering academy, the Slovenian Engineering Academy. In the name of the conference organizers we thank all the societies and institutions, and particularly all the participants for their valuable contribution and their interest in this event, and the reviewers for their thorough reviews.



For the fourth year, the award for life-long outstanding contributions will be delivered in memory of Donald Michie and Alan Turing. The Michie-Turing award will be given to Prof. Tomaž Pisanski for his life-long outstanding contribution to the development and promotion of information society in our country. In addition, an award for current achievements will be given to Prof. Blaž Zupan. The information lemon goes to another fall in the Slovenian international ratings on information society, while the information strawberry is awarded for the information system at the Pediatric Clinic. Congratulations!



Bojan Orel, Programme Committee Chair

Matjaž Gams, Organizing Committee Chair





ii

KONFERENČNI ODBORI

CONFERENCE COMMITTEES



International Programme Committee

Organizing Committee

Vladimir Bajic, South Africa

Matjaž Gams, chair

Heiner Benking, Germany

Mitja Luštrek

Se Woo Cheon, South Korea

Lana Zemljak

Howie Firth, UK

Vesna Koricki

Olga Fomichova, Russia

Mitja Lasič

Vladimir Fomichov, Russia

Robert Blatnik

Vesna Hljuz Dobric, Croatia

Aleš Tavčar

Alfred Inselberg, Israel

Blaž Mahnič

Jay Liebowitz, USA

Jure Šorn

Huan Liu, Singapore

Mario Konecki

Henz Martin, Germany



Marcin Paprzycki, USA



Karl Pribram, USA

Claude Sammut, Australia

Jiri Wiedermann, Czech Republic

Xindong Wu, USA

Yiming Ye, USA

Ning Zhong, USA

Wray Buntine, Australia

Bezalel Gavish, USA

Gal A. Kaminka, Israel

Mike Bain, Australia

Michela Milano, Italy

Derong Liu, Chicago, USA

Toby Walsh, Australia





Programme Committee

Bojan Orel, chair

Andrej Gams

Vladislav Rajkovič Grega

Nikolaj Zimic, co-chair

Matjaž Gams

Repovš

Franc Solina, co-chair

Marko Grobelnik

Ivan Rozman

Viljan Mahnič, co-chair

Nikola Guid

Niko Schlamberger

Cene Bavec, co-chair

Marjan Heričko

Stanko Strmčnik

Tomaž Kalin, co-chair

Borka Jerman Blažič Džonova

Jurij Šilc

Jozsef Györkös, co-chair

Gorazd Kandus

Jurij Tasič

Tadej Bajd

Urban Kordeš

Denis Trček

Jaroslav Berce

Marjan Krisper

Andrej Ule

Mojca Bernik

Andrej Kuščer

Tanja Urbančič

Marko Bohanec

Jadran Lenarčič

Boštjan Vilfan

Ivan Bratko

Borut Likar

Baldomir Zajc

Andrej Brodnik

Janez Malačič

Blaž Zupan

Dušan Caf

Olga Markič

Boris Žemva

Saša Divjak

Dunja Mladenič

Leon Žlajpah

Tomaž Erjavec

Franc Novak

Bogdan Filipič





iii





iv



KAZALO / TABLE OF CONTENTS



Sodelovanje, programska oprema in storitve v informacijski družbi / Collaboration, Software and

Services in Information Society ........................................................................................................................... 1

PREDGOVOR / FOREWORD ................................................................................................................................. 3

PROGRAMSKI ODBORI / PROGRAMME COMMITTEES ..................................................................................... 5

Information Privacy and Information Technology Outsourcing / Verber Domen .................................................... 7

A Survey on Geolocation Data Anonymization / Heričko Matija, Palanisamy Balaji, Tatjana Welzer,

Marko Hölbl, Krishnamurthy Prashant, Zadorozhny Vladimir I. ....................................................................... 11

Analysis of Techniques for Managing Data on Mobile / Sagadin Klemen, Šumak Boštjan ................................. 15

Can We Predict Software Vulnerability with Deep Neural Networks? / Çatal Çağatay, Akbulut Akhan,

Karakatič Sašo, Pavlinek Miha, Podgorelec Vili ............................................................................................... 19

Exhaustive Key Search of DES Using Cloud Computing / Drevenšek Aleks, Marko Hölbl ................................. 23

From a New Paradigm to Consistent Representation / Rakić Gordana, Kolek Jozef, Budimac Zoran ............... 27

Comparison of Agile Methods: Scrum Kanban and Scrumban / Brezočnik Lucija, Majer Črtomir ...................... 31

Introduction to Case Management Model and Notation / Kocbek Mateja, Polančič Gregor ................................ 35

Indeks avtorjev / Author index ................................................................................................................................ 39





v





vi



Zbornik 19. mednarodne multikonference

INFORMACIJSKA DRUŽBA – IS 2016

Zvezek C



Proceedings of the 19th International Multiconference

INFORMATION SOCIETY – IS 2016

Volume C



Sodelovanje, programska oprema in storitve v informacijski

družbi

Collaboration, Software and Services in Information Society



Uredil / Edited by



Marjan Heričko





10. oktober 2016 / 10 October 2016

Ljubljana, Slovenia

1





2

PREDGOVOR



Konferenco “Sodelovanje, programska oprema in storitve v informacijski družbi” organiziramo v sklopu multikonference Informacijska družba že šestnajstič. Kot običajno, tudi letošnji prispevki naslavljajo aktualne teme in izzive, povezane z razvojem sodobnih programskih in informacijskih rešitev ter storitev.



Sprejem in uspešna uporaba na informacijskih tehnologijah temelječih storitev je v veliki meri odvisna od njihove kakovosti, kar vključuje tudi skrb za zaščito zasebnosti in zaupnosti osebnih podatkov, ki se uporabljajo pri za zagotavljanju uporabnikom prilagojenih storitev. Agilni pristopi in uporabniško naravnan razvoj dodatno prispevata k boljši uporabniški izkušnji. Prispevki, zbrani v tem zborniku, omogočajo vpogled v in rešitve za izzive na področjih kot so npr.:

- varovanje zasebnosti pri zunanjem izvajanju v informatiki;

- metode in tehnike anonimizacije geo-lokacijskih podatkov;

- hranjenje in obdelava podatkov na mobilnih napravah;

- kulturni, sociološki in formalin izzivi pri integraciji podatkovnih virov;

- analiza in napovedovanje ranljivosti v programski opremi;

- kriptografski algoritmi in računalništvo v oblaku;

- statična analiza kakovosti kode na osnovi konsistentne predstavitve programskih sistemov;

- izbira primernih agilnih pristopov in metod;

- nestrukturirano modeliranje primerov in notacija CMMN.



Upamo, da boste v zborniku prispevkov, ki povezujejo teoretična in praktična znanja, našli koristne informacije za svoje nadaljnje delo tako pri temeljnem kot aplikativnem raziskovanju.





3

FOREWORD



This year, the Conference “Collaboration, Software and Services in Information Society” is being

organised for the sixteenth time as a part of the “Information Society” multi-conference. As in previous years, the papers from this year's proceedings address actual challenges and best practices related to the development of advanced software and information solutions.



The acceptance and success of advanced ICT-based services depends heavily on their quality, including their ability to protect privacy and confidentiality of personal data which are used to provide better services to end-users. User-centric and agile development approaches can also contribute significantly to improved user experience, whereas efficient quality assurance should not be limited to specific

programming paradigms and platforms. Papers in these proceedings provide a better insight and/or

propose solutions to challenges related to:

- Information privacy in IT/IS outsourcing;

- Methods and techniques for geolocation data anonymization;

- Data storage and processing on mobile devices;

- Cultural, social and legal issues in data integration;

- Software vulnerability prediction;

- Cryptography issues caused by cloud computing;

- Consistent representation of software systems to apply software quality static analysis;

- Selection of suitable agile method(s);

- Case management modelling and notation.



We hope that these proceedings will be beneficial for your reference and that the information in this volume will be useful for further advancements in both research and industry.





Prof. Dr. Marjan Heričko

CSS 2016 – Collaboration, Software and Services in Information Society Conference Chair





4

PROGRAMSKI ODBOR / PROGRAM COMITTEE



Dr. Marjan Heričko

University of Maribor, Faculty of Electrical Engineering and Computer Science

Dr. Ivan Rozman

University of Maribor, Faculty of Electrical Engineering and Computer Science

Dr. Lorna Uden

Staffordshire University, Faculty of Computing, Engineering and Technology

Dr. Gabriele Gianini

University of Milano, Faculty of Mathematical, Physical and Natural Sciences

Dr. Hannu Jaakkola

Tampere University of Technology Information Technology (Pori)

Dr. Mirjana Ivanović

University of Novi Sad, Faculty of Science, Department of Mathematics and Informatics

Dr. Zoltán Porkoláb

Eötvös Loránd University, Faculty of Informatics

Dr. Aleš Živkovič

Innopolis University, Faculty of Computer Science

Dr. Boštjan Šumak

University of Maribor, Faculty of Electrical Engineering and Computer Science

Dr. Gregor Polančič

University of Maribor, Faculty of Electrical Engineering and Computer Science

Dr. Luka Pavlič

University of Maribor, Faculty of Electrical Engineering and Computer Science





5





6

Information Privacy and Information Technology

Outsourcing



Domen Verber





Faculty of Electrical Engineering and





Computer Science,





University of Maribor





Maribor, Slovenia





+386 2 220 7434



domen.verber@um.si





ABSTRACT

some of the data stored in the datacentres of the primary company,

In this paper we discuss the issue of information privacy in the

must be shared with the other IT providers. This raises the question

regime of Information Technology Outsourcing (ITO). Nowadays,

of responsibilities for information protection and increases the

privacy in general, and information privacy in particular, is a very

complexity of assuring it significantly.

important and much debated issue. With ITO the companies

The term information privacy is strongly related to the Data

contracted out IT infrastructure and/or IT related services, such as

Protection. The latter represents a much wider concept and is

programing, to other companies. By doing this the responsibility

discussed only briefly.

for information privacy is shared by several parties. The owner of

In the first part of the paper some basic introduction to information

the data must protect the privacy of their customers even in the case

privacy is given. In the second part some challenges are presented

of ITO. However, this may be in contradiction with the need for

for assuring information privacy in the context of ITO, software

efficient utilization of data and may hinder the proper software

development and maintenance. In the third part two case studies are

development, testing and maintenance.

demonstrated related to the practical implementation of

The paper contains a short introduction to information privacy and

information privacy.

presents some real-world case studies related to this topic.

2. INFORMATION PRIVACY

Categories and Subject Descriptors

2.1 Introduction to information privacy

K.4.1. [Computers and Society]: Public Policy Issues – Privacy

Information privacy (or data privacy) considers the relationship

D.2.9. [Software]: Management - Programming teams

between the collection and dissemination of data. Information

privacy is a part of a broader term, data protection, which is the

General Terms

process of safeguarding important information from corruption,

Management, Performance, Security, Human Factors, Legal

unwanted exploitation and/or loss. In most cases, information

Aspects

privacy is related to personally identifiable information in

Keywords

combination with other attributes, such as financial records,

medical history, religion and beliefs, shipping habits, web surfing

Information privacy, data protection, anonymization, information

behavior, etc. There are other sensitive data related with the

technology outsourcing.

business processes of some companies that must be protected also.

1. INTRODUCTION

Those are trade contracts, financial transactions and similar, made

by other companies and persons.

The rapid growth of the Internet, ever increasing number of mobile

phones and smart devices, coupled with the new business practices,

Information privacy involves data storage and data processing

have raised far-reaching questions about the future of privacy.

technologies, and the public, legal and political issues surrounding

Computers and applications track us in almost everything we do.

them. Privacy concerns extend through the entire life-span of some

Data are collected when we click on some link in the web browser.

information. It considers how such information is collected, stored,

With the support of our Loyalty Cards, the grocery store collects

used and destroyed, either in digital or some other form.

information about what we are buying. Data are stored about us in

Most countries have derived strict laws that protect personal

some medical records, financial records, school records, etc. In

privacy [3]. The EU Data Protection Regulation [4] promotes two

most cases, those data can be beneficial to us. Modern personal

main principles of data privacy: Privacy by design and privacy by

recommender systems can be very efficient and helpful, data

default. Privacy by design means that each new service or product

records can speed-up the utilization of services, the doctors can

that makes use of personal data must take the protection of such

devise better diagnostics, etc. However, some of those data are

data into consideration. IT developers must take privacy into

intimate to us and we do not want to reveal them to unauthorized

account during the whole life cycle of the system or process

companies and persons.

development. Privacy by Default means that the strictest privacy

The paper discusses the issue of information privacy in the modern

settings apply automatically once a customer acquires a new

IT landscape [1]. Most companies today employ some sort of

product or service. No manual change to the privacy settings should

outsourcing (and offshoring) to reduce the costs. With the

be required on the part of the user. There is also a temporal element

Information Technology Outsourcing, the companies are

to this principle, as personal information must, by default, only be

contracting out their IT infrastructure and/or IT related services,

kept for the amount of time necessary to provide the product or

such as programing, to other companies. As a consequence, at least

7

service. Slovenia adopted most of the EU Regulations and has very

sounds reasonable, it is almost impossible to implement it entirely.

progressive policies regarding information privacy.

For example, within the modern user interface, the end-user may

see a list of several objects related with the sensitive data at the

2.2 Information privacy and IT solutions

same time. However, some of the data can be seen only if the list is

The main challenge of data privacy is how to maximize the

scrolled. It would be very cumbersome to implement the proper

utilization of data while protecting personally identifiable

auditing in this case. The alternative is to show the data one by one,

information. For example, the end user wish to have access to the

but this would diminish the user experience. Sensitive data can also

list of customers with their Personal Identification Numbers. This

be printed out or exported. In this case it is impossible to track the

can be very convenient for unique identification of a person.

users with access. The end-user should be aware of Auditing. This

However, the Personal Identification Numbers are considered as

would prevent any unnecessary and unwarranted access of the data.

private information because they can reveal his or her birthdate and

To enter a reason every time the data is accessed is not always

gender.

practical and would slow-down the business process scientifically

To maintain information privacy, first, we need to assure the data

in most cases. Again, this can be avoided with proper authentication

security. All potential measures to protect the privacy are useless if

and authorization.

the data can be accessed by unauthorized parties. We need to

Data export and external access to the data with sensitive

consider all software, hardware and human resources to address this

information should be made only with trusted parties and with clear

issue and implement the proper actions. The human resources are

and justifiable intention. We cannot track what happens to the data

the most difficult to consider. A frustrated Data Administrator may

outside of our system. The printing of sensitive data should be

expose the data to the public or even to some criminal group. We

forbidden or limited to obligatory documents. All such operations

must also contemplate the employees and the end-users, who may,

must be audited properly.

unintentionally or intentionally, expose the private information for

Both employees and the end-user should be educated about

no justifiable reason. It is the responsibility of the IT company to

information privacy and the data security. Most confidentiality

minimize such risks.

breaches are made unintentionally by the users who were not aware

Nowadays, it is taken for granted that the data can be accessed

of the Regulations and the significance of information privacy.

everywhere: From the web, from mobile devices and remotely from

personal computers. This presents an additional challenge. We

3. INFORMATION PRIVACY WITHIN

must contemplate different scenarios to maintain information

SOFTWARE DEVELOPMENT AND

privacy and data security. The devices can be lost or stolen, the

MAINTENANCE IN ITO

communication channels can be eavesdropped, a badly

implemented web application can be hacked, etc. All sensitive data

3.1 Information Technology Outsourcing

(not only the private ones) should be stored and transferred on the

In general, outsourcing involves the contracting out of a business

communication channels in encrypted form. By this, the data is

process and/or the assets to another party or company. One kind of

protected if it is stolen or accessed unwarrantedly by the

business outsourcing is Information Technology Outsourcing

administrative personnel.

(ITO), which is a company's outsourcing of its IT infrastructure

The laws and regulations related to information privacy and data

and/or IT related services, such as programing, to other companies.

protection are changing constantly. Therefore, the IT solution

With the expansion of Cloud Computing in recent years it has

providers must reassess the compliance with information privacy

become more and more popular for the companies to transfer their

and other security regulations continually. This may be difficult for

IT infrastructure to the Cloud. This would reduce the costs of

applications that have already been in use for a long time and

hardware and administrative personnel. Nowadays, the Cloud

cannot be replaced or adapted easily.

suppliers provide strong data protection. However, several

controversial cases where other parties and even governments have

2.3 Basic techniques for assuring the

access to the data has slowed down the migration. Most of the

information privacy

medium and large sized companies today still try to maintain their

own datacentres.

The best approach to information privacy is to minimize the

number of situations where privacy can be breached. To achieve

The trend of ITO is also observed with the in-house software

this, we must remove all sensitive data from the basic parts of the

development. A lot of companies have reduced or even eliminated

user interface and show them only with the explicit request from

their IT departments and contracted them out to third parties.

the user. One example is the usage of the Personal Identification

There exist several models of ITO. In some cases, the company has

Number mentioned above. If possible, any personalized

no IT Department at all. In this case, the IT solution provider serves

identification should be replaced by computer generated

as the sole service provider. It maintains the software and the

identification. With modern automatic identification techniques

hardware, performs all the backups, trains the end-users, etc. In

(e.g. RFID cards, bar codes, etc.), it is possible to implement

most cases, the companies maintain a small IT Department which

seamless data identification and speed-up the processing of data

is responsible for the smooth running of software and hardware in-

without the need to expose some personal identification codes.

house. The IT solution provider, in addition to maintaining the

However, if this is not possible, or if this can slow-down the

software, is responsible for all off-premise assets. Some companies

business processes significantly, it is better to keep some of

have large IT Departments which maintain the hardware, and may

sensitive data available to the authorized end-user and make some

run their own applications in parallel with several outsourced

obligatory contract with them to maintain the privacy.

solution providers.

A well-established practice in information privacy is Auditing. The

software solution must keep track of who has accessed the sensitive

3.2 Data sharing and information privacy

data and when and whose data was seen. In some cases, the user

Most business processes today rely on the acquisition and

must enter the reason why the data was accessed. Although this

processing of some data. In the most common scenario, some sort

8



of relational database is used. When the software development is

The university staff has full access and can modify all personal

outsourced, some of this data must be available to the IT solution

attributes for the students of a related Faculty and read-only access

providers. This may be in conflict with the requirements for the data

to some of the attributes of the other students.

protection and information privacy. Data protection can be

At first, the unique personal number was one of the primary keys

sustained with well-established techniques (e.g. firewalls, VPN

that identified a student as a person. It was (and still is) one of the

communication channels, etc.). This is in the main interest of both

attributes that is shown on the lists user interfaces. There was an

the owner of the data and the IT company. On the other hand, it is

idea to hide this attribute entirely; however, this would slow down

much more difficult to maintain the information privacy. For proper

some processes. Recently, the unique personal number of the

software development the IT company possesses the elevated

students was replaced with a synthetic student identification

priority level to access the data directly, although some security

number. The identification procedure was automated with the

measures can be avoided easily with the customized version of the

RFID cards

applications, etc. The owner of the data has no guarantee that the

information would not be used improperly or even sold off.

Once entered, the private attributes of the student can be changed

only explicitly and the reason must be presented. All such changes

Data anonymization and data obfuscation can be used to tackle this

are audited. In addition, all print-outs and exports where personal

problem [5]. It is common practice that a copy of the database is

data of the students are presented are also recorded. The users have

used for the development, testing and training. In this database, all

access to the audit trails. However, only the administrators can

sensitive data can be replaced with some arbitrary content. For

retrieve the details (e.g., to see which attributes were changed).

better data protection, the copy of the database and anonymization

is performed by the owner; the outsourced solution providers have

The development and testing of the applications is performed on a

no direct access to the originals. However, there are some

testing databased with some of the personal data obfuscated. This

drawbacks to this scheme. Firstly, some faults in the application

is demonstrated in Figure 1.

can be related to the content of the original data and may not be



duplicated easily in the test environment. Secondly, in comparison,

to copy the database as a whole, data anonymization can be a

challenging and time consuming task, especially for the large data

sets. The data size required for some tables can be duplicated if the

data base system integrates some sort of change traces, and thirdly,

it is almost impossible to assure complete anonymization. From the

secondary attributes, and with some social engineering, it is

possible to reconstruct the identity of a subject.

4. CASE STUDIES

In this section we represent real-world examples of IT solutions

where information privacy plays a significant role. For more than

25 years we were employed as an external solution provider to



small and medium sized companies in this part of Europe. In most

Figure 1. Obfuscated list of students in the test database.

case we were the sole solution provider for almost all software

solutions and with full administrative access to the data. In this case

The same database is used for the training and the testing performed

we have also envisioned the role of information privacy and have

by the end-users. The primary intention of obfuscation is to prevent

implemented all the measures by ourselves. At first, these were

unintentional exposure of personal data and not to isolate the IT

“common-sense” rules. Later on, we tried to comply with all the

provider form the client. If necessary, the tests can still be

suggestions and requirements of the personal data protection

performed on the primary database.

legislation of Slovenia and the EU. At first, this was also true for

the two examples presented here. However, several years ago, our

4.2 Financial information subsystem of a

clients became more aware of the issues of information privacy and

bank

we upgraded our cooperation to a higher level.

The financial information subsystem is used for tracking financial

and other related transactions of a company. It provides

4.1 Academic Information System

bookkeeping, reposting, data analyze and other information of the

The Academic Information System is responsible for the smooth

financial records of a client of the company. A bank, as any other

implementation of academic activity in the university. It allows the

company, is also obliged to keep these records. Furthermore, there

academic community, university staff and public to access a wide

are some unique functionalities specific to the banks. Here, the

range of information. Among them it keeps all the records of the

private information is not some personal attributes of persons but

students and their marks. Some personal attributes and the marks

the identities of those financial clients.

are considered personal data and should be protected from

unattained disclosure.

For obvious reasons, the banks have very strict data protection

measures. As in the example above, all the development and testing

Each student has access to his or her own data. They may change

is done on a separate database with some obfuscated attributes.

some of the secondary personal attributes and some preferences.

Despite being a testing database, the outside access to it is protected

The teaching staff (subject leaders and their assistants) has direct

heavily with time-changing encryption keys and firewalls. We are

access to the application forms of the exams for which they are

also isolated from the primary database and have direct access only

responsible. For security reasons, their data are recorded separately

in some exceptional circumstances. Because of the specifics in the

and transferred into the student records at the end of the exam. Only

database management system it is not very easy to prepare a copy.

the subject leader can do that

There a lot of database triggers that must be switched off during the

copying of the data. Because all this is time consuming, the

9

secondary database is updated only occasionally. The primary

6. REFERENCES

database also has direct access to some other information

[1] Solove D.J., Schwartz, P.M. 2011. Privacy, Information, and

subsystems of the bank, which are not available on the test

Technology. Aspen Publishers.

database. Instead of that, the client provided us with the working

development environment inside the company that can be used on

[2] Cullen, S., Lacity M., Willcocks, L.P. 2014. Outsourcing:

the production data if needed, under the supervision of their IT

All You Need To Know. White Plume Publishing.

staff.

[3] Solove, D.J. 2014. Information Privacy Law. Wolters Kluwer

Law & Business.

5. CONCLUSION

[4] EU data protection regulation. 2016.

Privacy is a much debated issue today. IT solutions are obliged to

http://www.eudataprotectionregulation.com/. Visited on:

maintain information privacy with the data they process. In the case

2016/09/18.

of IT Outsourcing, the complexity to achieve this is much higher.

[5] Ferrer, J,D., Sanchez D., Comas, J.S. 2016. Database

The requirements imposed by the information privacy sometimes

Anonymization: Privacy Models, Data Utility, and

contradicts the requirements for effective data processing and good

Microaggregation-based Inter-model Connections. Morgan

user experience. While the strict information privacy is possible, in

& Claypool Publishers.

real-life scenarios it is sometimes better to make some

compromises. Furthermore, the strict information privacy cannot



be achieved only with the IT means. Probably, the biggest role for

this lies in the persons that are involved with the solutions.





10

A survey on geolocation data anonymization

Matija Heričko

Balaji Palanisamy

Tatjana Welzer

Faculty of Electrical Engineering and

University of Pittsburgh,

Faculty of Electrical Engineering and

Computer Science,

School of Information Sciences,

Computer Science,

University of Maribor

Pittsburgh, USA

University of Maribor

Maribor, Slovenia

Maribor, Slovenia

matija.hericko@student.um.si

bpalan@pitt.edu

tatjana.welzer@um.si





Marko Hölbl

Prashant Krishnamurthy

Vladimir I. Zadorozhny

University of Pittsburgh,

Faculty of Electrical Engineering and

University of Pittsburgh, School of

Computer Science,

School of Information Sciences,

Information Sciences, Pittsburgh, USA

University of Maribor

Pittsburgh, USA

vladimir@sis.pitt.edu

Maribor, Slovenia

prashant@sis.pitt.edu

marko.holbl@um.si





ABSTRACT

Location-based services require users to submit their geolocation

The advancements in positioning technologies and mobile devices

along with their query, so that the service can contextualize the

have made it possible for location-based services to become very

response based on the users' location. Examples of some frequently

popular, since they provide contextualized information for users

used location-based services are navigation, point of interest

depending on their position. Despite the big numbers of users that

application ("where is the nearest ATM?"), traffic alerts, weather

use these services, many are wary of their risks and have concerns

information, location-based games, etc. [3, 4].

about their privacy. Data anonymization plays an important part in

However, the convenience of these services is accompanied by

location-based services. Since the services do not have strict

some security concerns, because of the sensitive nature of the users'

regulations, it is up to the data anonymization methods and

location information. If the user wishes to use location-based

techniques to protect the users' privacy. In this paper, we present a

services he/she must send his/her location along with his/her

survey of data anonymization in the context of geolocation and

request (or query) to an untrusted third party server, his/her privacy

location-based services. We provide an overview of recent work in

can be intruded easily. If the server has malicious intent it can easily

the research field, summarise the methods, architectures and

use the location information for its own malicious actions, or the

configurations used in the research and provide some open

data can be forwarded or sold to some other third party. Users

problems, challenges and direction for further research.

should be aware of the risks that accompany location-based

services and should take steps to protect their privacy [1 ,5, 6].

Categories and Subject Descriptors

In this paper we conduct a survey of data anonymization, which is

H.4 [Information Systems Applications]: Miscellaneous; D.2.8

one of the ways to protect user privacy. We survey the field of data

[Software

Engineering]:

Metrics

complexity

measures,

anonymization in the context of geolocation data. More

performance measures

specifically, we look into geolocation data that are used by location-

based services. With this review of the data anonymization we wish

General Terms

to determine which different methods are available to achieve the

Theory

required data anonymity level. We also review briefly the different

metrics for measuring the achieved level of anonymity and examine

the environment in which it is used.

Keywords

data anonymization, data generalization, location-based services,

The rest of the paper is structured as follows: In Section 2 we

geolocation data

overview and discuss related work, in Section 3 we present current

work, as well as methods of achieving user privacy, in Section 4 we

1. INTRODUCTION

discuss some open issues and future research directions of data

anonymization in location-based services, and in Section 5 we

The Internet of Things (IoT) paradigm, where everything and

conclude the paper.

everyone is connected, enables us to witness significant advances

in wireless network communication and positioning technologies,

such as Wi-Fi, NFC, RFID, 3G/4G network, Bluetooth, etc.

2. RELATED WORK

Additionally, the new paradigm facilitates devices that support

Data anonymization plays a big role in preserving the privacy of

network communication and geo-positioning [1].

users and is, therefore, often an important security requirement in

many different technological areas. Due to the importance of data

These advancements, together with the growth of the network

anonymization, many researchers tackle the problem in different

infrastructure, provide an excellent platform for applications which

application areas. The difference between areas is the techniques

make use of the devices' geolocating ability. We are, therefore,

used to achieve data anonymity and the environment in which it is

witnessing an increase in location-based services, which use the

used [7, 8]. In [7] Parmar and Jinwala surveyed the area of wireless

geo-spatial location information to deliver on-line location

sensor networks and they observed the approaches to data

enhanced information [2].

aggregation. The objectives of data aggregation in wireless sensor

11

networks are end-to-end privacy preservation and aggregation at

classify the techniques into two categories based on the involved

intermediate nodes. The technique most used in wireless sensor

actors [15]. These categories are anonymization server-based (or

networks is privacy homomorphism and its variants, which assures

centralised) schemes and mobile device-based (or decentralised)

privacy and helps with data aggregation, but affects integrity and

schemes [1]. As the names imply, the server-based schemes use a

freshness negatively. They have concluded that data aggregation

trusted server for the anonymization of the data, while the mobile

could possibly be used in cloud computing and that there is need

device-based schemes do not use a server to achieve

for more protocols that provide integrity and freshness.

anonymization, but rely on the sharing of information between

users [1].

Dhand and Tyagi in [9] further reviewed the data techniques to

achieve data aggregation in wireless sensor networks. They

Centralised schemes make use of the trusted anonymizing server to

identified several cluster-based approaches which minimise

anonymize the query of the user. The server first removes sensitive

communication requirements and, at the same time, maximise

information about the user (such as the name, age, etc.) and then it

network lifetime. They have divided the protocols into

anonymizes location information by either cloaking, using dummy

homogenous and heterogeneous and each of those groups further

locations or confusing the path. The biggest disadvantage of a

into single-hop protocols and multi-hop protocols. The authors

centralised scheme is that the data is gathered in a single location

have concluded that data aggregation extends the network

and, if the server is compromised, all the data that it holds is

resources, since it lowers the data that needs to be transmitted.

compromised as well [15, 17, 18, 19, 20]. Decentralised schemes

do not use a trusted server to anonymize the queries, instead they

Data anonymization is also an important topic in the field of big

use other methods. The most prevalent method uses peer-to-peer

data. A survey on big data privacy was done by Vennila and

communication. In this method the users' device searches for

Priyadashini in [8] where big data sets are sent to a cloud. They

neighbouring devices and uses their location to anonymize its own

have observed that traditional privacy models and data

location information in the query. The biggest disadvantage of

anonymization approaches are not applicable to big data sets.

these schemes is that a user has to rely on neighbouring devices

In [10] the authors surveyed the field of location-based wireless

and, if there are not enough devices nearby, the location

services and they classified services based on various attributes.

information cannot be anonymized. Another drawback is that the

They analysed the usage trends of services, technologies used by

computational overhead may be too much for some smaller devices

the services, protocols and standards used and architecture. They

[1, 6].

have mapped the requirements with the technical aspects with the

Anonymity-based techniques try to preserve users' privacy by

purpose of increasing the awareness.

making his/her query anonymous with the use of different methods.

The most popular and important method of these techniques is

3. DATA ANONYMIZATION METHODS

cloaking. Cloaking is divided further into spatial and spatio-

AND TECHNIQUES

temporal cloaking. Both of these methods make use of a metric

called k-anonymity which was first proposed by Sweeney [21]. K-

Data anonymization in location-based services is used to protect

anonymity means that a user cannot be distinguished from (k-1)

user privacy. User location information is anonymized in such a

other users whose data is also in the same data set. Two other

way that a service cannot infer the users' identity, interests, or any

metrics that have gained traction recently are entropy-based metric

other specific information, but rather the data is so generalised that

and l-diversity metric. The basis for the entropy based metric is

it can be used to describe a multitude of different users. On the other

information theory, where the entropy is a measure of uncertainty

hand, data should still be specific enough to allow the user to enjoy

or unpredictability. This means that for the entropy-based metric

the benefits and convenience of services contextualized to his/her

measures with what level of certainty can we define the real

location [1, 6].

location among a group of locations? The L-diversity metric is

With the wide spread of location-based services that are used daily

based on a graph theory and it examines the l-neighbourhood

by many users, data privacy became a big concern as services come

graphs and the connections between neighbours to try and

with many hidden risks and threats to user privacy. Threats to

determine the user [22].

privacy arise from a multitude of actions, such as the collection of

Anonymity-based techniques are the most researched area, and

personal information, unauthorised use of personal information,

there are many different variants. Some researchers focus on

improper access to personal information, bad storage of personal

cloaking of mobile users where the issues are the continuous

information and other actions similar to or derived from these

queries of users and their movements [4, 5, 23, 24, 25, 26, 27, 28,

actions [11, 12].

29]. Others research centralised schemes with a focus on

Privacy is the users' right to have control over how information

microscopic or snapshot queries, where every query stands alone

about him/her is collected, maintained, used, disclosed or shared

[15, 17, 18, 19, 20, 30, 31]. Less researched are some combinations,

[13], and we can classify location privacy into microscopic and

such as the hybrid approach to cloaking, which uses both a

macroscopic levels [14], where microscopic presents a single user

centralised and decentralised scheme [6], or a scheme that uses

query and macroscopic presents a whole journey with multiple

middleware [32] to provide privacy preservation [33, 34].

queries. [15] also divides the macroscopic level further into

Obfuscation-based techniques try to prevent services from

journey-level and long term location privacy. Techniques to

identifying the user, whether by adding some noise to his location

achieve location privacy can be divided into three major groups.

information or by shifting the original location. The idea behind

These are anonymity-based schemes, obfuscation-based and false

location obfuscation is that the real location is transformed into

location or dummy generation-based [15, 16].

another space in which their spatial relationships are maintained to

The difference between the schemes is that the anonymity-based

answer the location based queries. Obfuscation is not as frequent as

and obfuscation-based can only provide location privacy for

the other two techniques, perhaps because it is similar to dummy

microscopic levels, and the dummy generation-based can provide

generation. Maybe these two methods will be known under one

for both the microscopic and the macroscopic levels. We can also

12

name in the future. Nevertheless, it is a very active research field

Along with these two bigger problems, we also believe that the area

[35, 36, 37, 38, 39].

of measuring the methods and examining their real-life application

adequacy will also grow.

And, while obfuscation based techniques do their best to conceal

the users’ real location information, false location-based techniques

do not try to conceal the location information, but rather they hide

5. CONCLUSION

the users’ location information in plain sight.

In this article we surveyed the area of data anonymization in the

context of geolocation. Specifically, we have investigated

False location-based techniques protect user privacy either by

mechanisms of protecting user privacy, classified them according

reporting false locations to the location based-services, or by

to the architecture and techniques used, discussed some of these

generating some dummy locations which are added to the real

techniques, and reflected on some of the open research questions

location and packaged into a query, so that the service does not

and problems. We have observed that the most popular method for

know which location is the real one. In these false location-based

protecting users' privacy is cloaking, which widens the location

methods there is a choice of using a random [40] or a carefully

area of the user to a bigger field that encompasses multiple users

planned generator [41, 42] where the generator uses some other

and, while this is a good method for protecting the user’s privacy,

principles and techniques to generate the dummy data, such as soft

it also lowers the location-based service’ precision, which is not

computing techniques in [15].

ideal for the user experience. The research field of data

So far there has not been a universal method developed, that would

anonymization will, therefore, continue to see many problems

provide the desired privacy protection across all the different

tackled and researches published.

architectures and configurations. Each of the methods and

techniques discussed in this paper has its strengths and weaknesses,

6. ACKNOWLEDGMENTS

and we must observe these qualities when deciding on which

This work was supported by the Slovenian Research Agency under

method to use. Another factor that we must pay attention to is also

the grant no. BI-US/15-16-067 and by the Research Programme P2-

the rules and regulations of the country in which the location-based

0057.

service provider is located, as that may also play a big role in

protecting the privacy of the user.

7. REFERENCES

[1]

B. Niu, X. Zhu, Q. Li, J. Chen, and H. Li, “A novel attack to

4. OPEN PROBLEMS AND FURTHER

spatial cloaking schemes in location-based services,” Future

DIRECTION OF RESEARCH

Gener. Comput. Syst. , vol. 49, pp. 125–132, Aug. 2015.

The field of data anonymization is fairly popular and well-known

[2]

R. Abbas, K. Michael, M. Michael, and R. Nicholls, “Key

and there are many researches being carried out. Despite that, there

government agency perspectives on location based services

are still many challenges and open problems that await future

regulation,” Comput. Law Secur. Rev. , vol. 31, no. 6, pp.

research.

736–748, Dec. 2015.

One such open problem is the balance between the location-based

[3]

W. Zhang, X. Cui, D. Li, D. Yuan, and M. Wang, “The

services’ precision and the user privacy. This is a particularly

location privacy protection research in location-based

interesting topic, because of the delicate balance between the two

service,” 2010, pp. 1–4.

opposing interests. Users wish for the location-based services to

[4]

C.-Y. Chow, M. F. Mokbel, and X. Liu, “Spatial cloaking

provide information with pin-point accuracy and, at the same time,

for anonymous location-based services in mobile peer-to-

not expose their location in such detail. And therein lies the

peer environments,” GeoInformatica, vol. 15, no. 2, pp.

dilemma because, if we want the service to provide precise

351–380, Apr. 2011.

information, we must provide it with the most detailed location

information that we can, and in order to secure the location privacy

[5]

I. Memon, “Authentication User’s Privacy: An Integrating

of the user, we have to broaden the location to a range of multiple

Location Privacy Protection Algorithm for Secure Moving

users, which diminishes the precision of the service. So this

Objects in Location Based Services,” Wirel. Pers.

problem will continue to be a priority for researchers, as they will

Commun. , vol. 82, no. 3, pp. 1585–1600, Jun. 2015.

try to find the best balance between the services’ precision and

[6]

C. Zhang and Y. Huang, “Cloaking locations for anonymous

users’ location privacy.

location

based

services:

a

hybrid

approach,”

Another open problem that is interesting, but has not seen much

GeoInformatica, vol. 13, no. 2, pp. 159–182, Jun. 2009.

research, is the problem of setting the desired level of privacy

[7]

K. Parmar and D. C. Jinwala, “Concealed data aggregation

protection dynamically. This problem is interesting because it

in wireless sensor networks: A comprehensive survey,”

would give the users the power to choose what level of security

Comput. Netw. , vol. 103, pp. 207–227, Jul. 2016.

they want for themselves. So far, there has been little done in the

[8]

S. Vennila and J. Priyadarshini, “Scalable Privacy

way of allowing users to choose their desired level of privacy.

Preservation in Big Data a Survey,” Procedia Comput. Sci. ,

Often, users have to accept the implicit demands or terms of use of

vol. 50, pp. 369–373, 2015.

location-based services. On the other hand, if users take advantage

of one of the methods discussed in this paper, they also simply have

[9]

G. Dhand and S. S. Tyagi, “Data Aggregation Techniques in

to accept the level of privacy protection that method is designed

WSN: Survey,” Procedia Comput. Sci. , vol. 92, pp. 378–

with, which leaves users with two absolute options, either ‘full’

384, 2016.

exposure or ‘full’ protection. So we expect some research to go in

[10] D. Mohapatra and S. S.B, “Survey of location based wireless

that direction in the future.

services,” 2005, pp. 358–362.

[11] R. P. Minch, “Privacy issues in location-aware mobile

devices,” 2004, p. 10 pp.

13

[12] J. V. Chen, W. Ross, and S. F. Huang, “Privacy, trust, and

IEEE Trans. Parallel Distrib. Syst. , vol. 23, no. 10, pp.

justice

considerations

for

location‐based

mobile

1805–1818, Oct. 2012.

telecommunication services,” info, vol. 10, no. 4, pp. 30–45,

[28] M. Y. Mun, D. H. Kim, K. Shilton, D. Estrin, M. Hansen,

Jun. 2008.

and R. Govindan, “PDVLoc: A Personal Data Vault for

[13] S. Saravanan and B. Sadhu Ramakrishnan, “Preserving

Controlled Location Data Sharing,” ACM Trans. Sens.

privacy in the context of location based services through

Netw. , vol. 10, no. 4, pp. 1–29, Jun. 2014.

location hider in mobile-tourism,” Inf. Technol. Tour. , vol.

[29] X. Pan, X. Meng, and J. Xu, “Distortion-based anonymity

16, no. 2, pp. 229–248, Jun. 2016.

for continuous queries in location-based mobile services,”

[14] R. Shokri, J. Freudiger, and J. Hubaux, “A Unified

2009, p. 256.

Framework for Location Privacy,” 2010.

[30] F.-Y. Leu, “A novel network mobility handoff scheme using

[15] F. Tang, J. Li, I. You, and M. Guo, “Long-term location

SIP and SCTP for multimedia applications,” J. Netw.

privacy protection for location-based services in mobile

Comput. Appl. , vol. 32, no. 5, pp. 1073–1091, Sep. 2009.

cloud computing,” Soft Comput. , vol. 20, no. 5, pp. 1735–

[31] A. Samanta, F. Zhou, and R. Sundaram, “SamaritanCloud:

1747, May 2016.

Secure infrastructure for scalable location-based services,”

[16] H. Lu, C. S. Jensen, and M. L. Yiu, “PAD: privacy-area

Comput. Commun. , vol. 56, pp. 1–13, Feb. 2015.

aware, dummy-based location privacy in mobile services,”

[32] G. Myles, A. Friday, and N. Davies, “Preserving privacy in

2008, p. 16.

environments with location-based applications,” IEEE

[17] Baik Hoh and M. Gruteser, “Protecting Location Privacy

Pervasive Comput. , vol. 2, no. 1, pp. 56–64, Jan. 2003.

Through Path Confusion,” 2005, pp. 194–205.

[33] J. Meyerowitz and R. Roy Choudhury, “Hiding stars with

[18] R. Cheng, Y. Zhang, E. Bertino, and S. Prabhakar,

fireworks: location privacy through camouflage,” 2009, p.

“Preserving User Location Privacy in Mobile Data

345.

Management Infrastructures,” in Privacy Enhancing

[34] B. Niu, Xiaoyan Zhu, Xiaosan Lei, Weidong Zhang, and Hui

Technologies, vol. 4258, G. Danezis and P. Golle, Eds.

Li, “EPS: Encounter-Based Privacy-Preserving Scheme for

Berlin, Heidelberg: Springer Berlin Heidelberg, 2006, pp.

Location-Based Services,” 2013, pp. 2139–2144.

393–412.

[35] C. Ardagna, M. Cremonini, S. De Capitani di Vimercati, and

[19] B. Gedik and Ling Liu, “Location Privacy in Mobile

P. Samarati, “An Obfuscation-Based Approach for

Systems: A Personalized Anonymization Model,” 2005, pp.

Protecting Location Privacy,” IEEE Trans. Dependable

620–629.

Secure Comput. , vol. 8, no. 1, pp. 13–27, Jan. 2011.

[20] M. Gruteser and D. Grunwald, “Anonymous Usage of

[36] C. A. Ardagna, M. Cremonini, E. Damiani, S. De Capitani

Location-Based Services Through Spatial and Temporal

di Vimercati, and P. Samarati, “Location Privacy Protection

Cloaking,” 2003, pp. 31–42.

Through Obfuscation-Based Techniques,” in Data and

[21] L. Sweeney, “k-ANONYMITY: A MODEL FOR

Applications Security XXI, vol. 4602, S. Barker and G.-J.

PROTECTING PRIVACY,” Int. J. Uncertain. Fuzziness

Ahn, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg,

Knowl.-Based Syst. , vol. 10, no. 05, pp. 557–570, Oct. 2002.

2007, pp. 47–60.

[22] B. Zhou and J. Pei, “The k-anonymity and l-diversity

[37] G. Ghinita, P. Kalnis, A. Khoshgozaran, C. Shahabi, and K.-

approaches for privacy preservation in social networks

L. Tan, “Private queries in location based services:

against neighborhood attacks,” Knowl. Inf. Syst. , vol. 28, no.

anonymizers are not necessary,” 2008, p. 121.

1, pp. 47–77, Jul. 2011.

[38] A. Khoshgozaran and C. Shahabi, “Blind Evaluation of

[23] I. Bilogrevic, M. Jadliwala, V. Joneja, K. Kalkan, J.-P.

Nearest Neighbor Queries Using Space Transformation to

Hubaux, and I. Aad, “Privacy-Preserving Optimal Meeting

Preserve Location Privacy,” in Advances in Spatial and

Location Determination on Mobile Devices,” IEEE Trans.

Temporal Databases, vol. 4605, D. Papadias, D. Zhang, and

Inf. Forensics Secur. , vol. 9, no. 7, pp. 1141–1156, Jul.

G. Kollios, Eds. Berlin, Heidelberg: Springer Berlin

2014.

Heidelberg, 2007, pp. 239–257.

[24] C.-Y. Chow, M. F. Mokbel, J. Bao, and X. Liu, “Query-

[39] M. L. Yiu, G. Ghinita, C. S. Jensen, and P. Kalnis,

aware location anonymization for road networks,”

“Outsourcing Search Services on Private Spatial Data,”

GeoInformatica, vol. 15, no. 3, pp. 571–607, Jul. 2011.

2009, pp. 1140–1143.

[25] C.-Y. Chow, M. F. Mokbel, and X. Liu, “A peer-to-peer

[40] H. Kido, Y. Yanagisawa, and T. Satoh, “An anonymous

spatial cloaking algorithm for anonymous location-based

communication technique using dummies for location-

service,” 2006, p. 171.

based services,” 2005, pp. 88–97.

[26] H. Lee, B.-S. Oh, H. Kim, and J. Chang, “Grid-based

[41] B. Niu, Q. Li, X. Zhu, G. Cao, and H. Li, “Achieving k-

cloaking area creation scheme supporting continuous

anonymity in privacy-aware location-based services,” 2014,

location-based services,” 2012, p. 537.

pp. 754–762.

[27] M. M. E. A. Mahmoud and X. Shen, “A Cloud-Based

[42] B. Niu, Q. Li, X. Zhu, G. Cao, and H. Li, “Enhancing

Scheme for Protecting Source-Location Privacy against

privacy through caching in location-based services,” 2015,

Hotspot-Locating Attack in Wireless Sensor Networks,”

pp. 1017–1025.



14

Analysis of techniques for managing data on mobile

devices

Klemen Sagadin

Boštjan Šumak

Faculty of Electrical Engineering and Computer Science, Faculty of Electrical Engineering and Computer Science, University of Maribor

University of Maribor

Maribor, Slovenia

Maribor, Slovenia

+386 51 242 244

+386 2 220 7378

klemen.sagadin@gmail.com

bostjan.sumak@um.si



ABSTRACT

For developers of mobile applications that run on systems with

In this paper, we conducted a study of techniques for managing data

limited resources there are various mechanisms for data

on mobile devices and defined needs for local data storage and data

management. There are techniques for storing data, which are

processing. Based on the results of a preliminary analysis, a set of

specific to a mobile operating system and corresponding

data management techniques were chosen for a detailed analysis

programming environment, and there are techniques that are

and comparison in terms of their usability, performance and

supported in multiple programming environments and mobile

complexity in the software development process. The set of data

operating systems. A large number of mechanisms and techniques

management techniques that were analysed included the relational

for data storage makes choosing the most suitable mechanism for

database SQLite, the object database Realm and the relational-

development of a specific mobile application a challenging task. In

objective mapper OrmLite. The results of this study showed that

this paper, we have conducted an analysis and comparison on

there are significant differences among the chosen techniques in

mechanisms for managing and storing data on mobile devices with

their usability for the developer, performance and complexity in the

emphasis on the newer tools and concepts, which allow more

development of software solutions.

contemporary approaches. In this study, we have taken into account

the importance of presenting information in programming solutions

in the form of a domain object oriented programming model, while

Categories and Subject Descriptors

considering the fact that, for efficient data storage, they often need

D.2.8 [Software Engineering]: Metrics – complexity measures

to be converted properly into a form suitable for permanent storing.

and performance measures

2. TECHNIQUES FOR MANAGING DATA

H.2.4 [Database Management]: Systems – Multimedia databases,

Object-oriented

databases,

Query

processing,

Relational

ON MOBILE DEVICES

databases, Transaction processing

In this section, the requirements are presented for processing and

storing data on mobile devices and groups of techniques for data

General Terms

storage.



Algorithms, Measurement, Performance, Design, Reliability,

2.1 Requirements for Processing and Data

Experimentation, Languages, Theory

Storage on Mobile devices

Applications on mobile devices need information which, in

Keywords

contemporary mobile solutions, can be obtained from various data

Techniques of processing and data storage on mobile devices,

sources in order to operate and ensure good user experience. We

performance analysis, complexity of development using data

identified two domains where the use of managing and storing data

storage, functionality of data storage techniques, object database

techniques is of crucial importance.

Realm, relational database SQLite, mapper OrmLite.

2.1.1 Work in off-line mode

1. INTRODUCTION

In contemporary IT solutions, embedded and other mobile devices

Globally, the number of mobile device users had exceeded the

are becoming more connected to the World Wide Web and can

number of desktop computer users in 2014. In 2016, the number of

access remote data, which are necessary to ensure user experience.

mobile device users has increased to 1,900 million and more than

Despite better connectivity and regardless of the mobile device

50% of users spent their time on mobile devices when searching

location, uninterrupted Internet access is not always possible. This

and using digital media. On average, users with Android and iOS

is the reason for an increasing demand for undisturbed functioning

mobile devices spend 32% of their time for playing games and 68%

of mobile solutions in off-line mode, meaning the mobile solution

on applications that need Internet access [1].

can work without an Internet connection. This applies particularly

for mobile solutions which depend on data obtained from remote

Because an Internet connection is not always available, it is of great

sources. In order to ensure functioning of a mobile solution in off-

importance for devices to work efficiently in off-line mode. For that

line mode, a suitable data storage technique on mobile device must

to be possible, devices need to process data obtained from a

be used for local managing and storing of information [3][4].

network, save it locally and transmit it to the user even when the

mobile device has no Internet connection. Because of the increasing

2.1.2 Big data on mobile devices

use of mobile applications that depend on information gathered

With the arrival of more advanced smart mobile devices, which use

from the Internet, applications need to be developed in such a way

sensors for capturing information from the environment, the

that they are able to work in off-line mode [2].

amount of data is increasing constantly. Consequently, there is a

15



growing need for managing large data and transmitting the

3.1 Usability of Techniques from the Aspect

analysed results to the user. With the use of sensors, mobile devices

of the Developer

can generate large amounts of data. Furthermore, advancements in

multimedia technology (improved cameras, sound recorders etc.)

We have defined functionalities important for a developer of

have enabled capturing increased amounts of multimedia

programming solutions and their influence on the final usability of

information (pictures, sounds and video). This kind of data needs

individual techniques. For each technique we observed (1) Its tools

to be processed and stored properly for it to be available for further

support for managing the database, (2) The possibilities of

use. Therefore, contemporary data storage mechanisms must be

automatic mapping of domain objects to the database, (3) The

used for supporting such data [5][6].

support for different types of relations between saved data, (4) The

support for managing with various data types, (5) The support for

advanced data demands, (6) The support for multithread

2.2 Mobile Data Storage Techniques

functioning, (7) The support for saving data on physical locations

Based on an overview of the possibilities for storing data on

in the memory, (8) The support of transactions and ACID attributes,

Android, iOS and Windows Phone mobile operating systems, we

(9) The support for migrating data with data scheme changes and

can divide data storage techniques into three groups:

(10) The support for already built-in data encryption.



Key-value data storage,



We came to the conclusion that the programming interface of



File data storage, and



OrmLite has no support for database managing tools. Most



Local database storage.

functionalities for automatic mapping of domain object to database

Key-value data storage presents a database management system

are supported in database Realm and programming interface

which offers a set of basic functions for manipulation with

OrmLite, and are not supported in SQLite database. Support for

unstructured data objects, where each value has its own unique

data managing functionalities and for relations between data is best

identifier [7][8]. File data storage presents saving files of specific

in the Realm and SQLite databases. All data storage techniques

data format in the mobile device’s file system, where information

enable advanced data querying. The number of supported

is presented in the form of files [9]. Local database storage is used

functionalities for multithread functioning is biggest in the Realm

for saving structured and unstructured information. For storing data

database and all data storage techniques have the same support for

on mobile devices we used local databases, which are mostly

saving data on physical locations in the memory. The Realm

independent libraries without a server component, without

database has most supported functionalities for transactions, ACID

administration need and smaller demands for use of system sources

features and migrating data by data scheme changes. Database

[10][11].

encryption is supported in Realm and SQLite databases and is not

supported in OrmLite mapper. Most defined functionalities are

3. ANALAYSIS OF MOBILE DATA

supported by the Realm database, due to its object orientation and

good support for multithread functioning on mobile devices. The

STORAGE TECHNIQUES

SQLite database enables the least defined functionalities because

In this study we focused on techniques for storing complex data

of a lack of support for automatic mapping of objects to the

structures and, based on a systematic literature review, we have

database, which the OrmLite technique is trying to substitute.

chosen the most researched and used local database for mobile

devices, which is SQLite. Because SQLite is a relational database,

Based on the analysis results, a Chi Square statistic test was

it needs to map data from a domain programming model to a

conducted at a significance level of 1%; we have accepted the

relational model.

alternative hypothesis, stating there are significant differences in

the number of supported functionalities between Realm and SQLite

techniques and between Realm and OrmLite techniques. We were

not able to discard the null hypothesis that there are no significant

differences between SQLite and OrmLite, therefore we cannot

accept its alternative hypothesis.

3.2 Complexity Analysis of the Development

We have researched complexity of development with use of data



storage techniques by an experiment in which we have developed

three functionally equivalent software solutions and each technique

Figure 1. Conceptual research model

used one of the analysing data storage techniques. Software

Based on the results of a preliminary survey, we have chosen

solutions are based on a domain object oriented model, whereby the

techniques Realm and OrmLite which automate this process and

data from entity classes must convert to a form suitable for the local

compared them with the relational database SQLite. Realm is an

database. We have defined 7 groups of software solution

object-relational database that enables direct storing of domain

functionalities, which include database configuration (F0), defining

model to the database. OrmLite is an ORM (Object Relational

software scheme (F1), creating new data inserts (F2), updating data

Mapper), which maps objects to the relational database. We have

values (F3), deleting already existing data (F4), selecting stored

analysed in detail the influence of our chosen storage techniques on

data based on different criteria and aggregate functions (F5) and

performance, usability for developer, and complexity of

executing asynchronous transactions (F6).

development by individual use of techniques. Figure 1 shows the

conceptual research model.

Regarding the development of software solutions, we measured the

time needed for development of individual functionalities. The

following Figure 2 presents the average measured times of

performed experiments. For development of software solutions by

using the Realm database, we needed less than half the time,

16

because of the more demanding configuration and established

needed for use of the SQLite database that helps us with easier SQL

workability of the OrmLite technique. For software solution

command writing and with creating a database; therefore, classes

development with use of the SQLite database we needed more time

are bigger and poorly built. All three software solutions achieved

for implementation of individual operations on data. Because we

comparable results in the LCOM metric, because it presents minor

had to implement proper methods for data mapping from object

software solutions we divided well, based on dependencies

model to entity-relational model by ourselves, this resulted in a

between individual class attributes.

longer time needed for implementation for each operation on the

Based on the time needed for software solution development, use

data.

of OrmLite mapper was the most complex. However, we must take

into consideration the less complex final implemented software

2:09:36

code in comparison with the SQLite database. The fastest

1:55:12

development and highest quality software code was reached by use

of the Realm database and this is the reason we consider it the least

1:40:48

complex of the compared techniques for a developer’s use.

1:26:24

Based on the gained results of average times needed for solution

1:12:00

development we ran a Leven’s test of homogeneity variances and,

based on the results, we came to the conclusion that homogeneity

0:57:36

variances are being violated. Therefore, we decided to use Welch’s

0:43:12

statistic test for hypothesis testing with a significance level of 5%.

A significant difference was established between the average times

0:28:48

measured for the Realm and SQLite techniques and for the Realm

and OrmLite techniques. We were not able to discard the null

0:14:24

hypothesis, which states that there are no significant differences

0:00:00

between the SQLite and OrmLite techniques; therefore, we cannot

Time Realm

Time SQLite

Time OrmLite

confirm that significant differences exist in the measured times

between the SQLite and OrmLite techniques.

F0

F1

F2

F3

F4

F5

F6



Based on the calculated indexes of quality, we can confirm the

hypothesis for the existing differences in code with use of each of

Figure 2. Time needed for development with the use of data

the data storage techniques.

storage techniques

We have analysed the developed software solutions with tools for

3.3 Performance Analysis

calculation of software code metrics. After reviewing an article

To perform an analysis of performance of data storage techniques,

researched by ( Gerlec, C., and Heričko, M. (2010). Evaluating

we developed a mobile application which conducts transactions on

refactoring with a quality index. World Academy of Science) we

data storage operations, with which we can measure the times

have chosen the set of metrics WMC, DIT, CBO and LCOM, based

needed for each individual transaction’s execution. The mobile

on which we have calculated an index of software code quality. The

application, developed for analysing the complexity of

chosen metrics are non-complementary and non-correlational

development and explained in the previous Chapter, tests existing

between each other. The results of code analysis are shown in the

software code. During analysis of each technique, we were

following Table 1 [12].

changing the capacity of processed data in individual transactions,

with the purpose of trying to understand the impact of processed

Table 1. Results of software code metrics and quality index

data on the performance of the functioning of individual

Software

WMC

DIT

CBO

LCOM

Qi

techniques. We performed multiple tests for performance analysis

solution

and divided them into several groups, based on the chosen data

Realm

1

4

3,8

4

3,5

storage techniques, as well as the functionality of testing and group

of processed data. We tested the performance of inserting certain

SQLite

1

2

3,75

3,38

3

entities and relations, updating data, deleting data and obtaining

OrmLite

3

2

5,67

3,22

3,25

data based on data relations, obtaining data based on arithmetic

operator and calculating based on values of aggregate functions.



Based on a systematic overview of the literature, we chose a metric

Due to better results in DIT metric, which assesses the depth of

for measuring performance and the time needed for execution of

inheritance in classes, the Realm database has reached a higher

individual transactions. Times were measured with nanoseconds

index because of its degradation and inheritance hierarchy tree,

and built-in methods of the Java programming language. We have

when using database software constructs. OrmLite mapper has

implemented each tested operation by DAO class, where data

achieved a higher result in comparison with Realm database in

storage techniques use the same software interface; therefore, we

WMC metric, which assesses the sum of class methods`

can ensure equivalency and comparability of the performed tests.

complexity, because of a higher number of classes with smaller

Tests were run on an LG G3 mobile device with installed Android

amounts of methods. The results of OrmLite mapper in DIT and

6.0 operating system. With each data storage technique we

CBO metrics are less successful, because classes which use their

conducted 8 groups of tests and for each test we increased the

own libraries are not well degraded and are coupled more between

capacity of data storage by the logarithmic scale with the base 10

each other. Consequently, they present a more complex software

that ranges from 1 to 10,000.

solution. The software solution with the use of the software

We came to the conclusion that the data storage technique

database constructs of SQLite reached lower results due to bad

influences

the

performance

when

executing

individual

results in the WMC and DIT metrics. There are multiple classes

transactions. We performed different types of tests which showed

17

that the Realm database was the most powerful, confirmed by

limitations in this research. However, it does have an impact on the

statistical tests, proving that there are significant differences

experience of the final user and mobile solution developer.

between operational capabilities in comparison with the SQLite

database and OrmLite mapper. The OrmLite and SQlite techniques

5. REFERENCES

achieved comparable results, confirming that it is not possible to

[1] Bosomworth, D. 2016. Mobile marketing statistics 2016.

prove significant differences between them. In certain tests with a

http://www.smartinsights.com/mobile-marketing/mobile-

smaller number of data, the techniques reached extremely

marketing-analytics/mobile-marketing-statistics/.

comparable results, although the difference in operational

capability increased with increasing the number of data.

[2] Whitney, L. 2012. Offline Capabilities: Native Mobile Apps

vs. Mobile Web Apps. http://www.sitepoint.com/offline-

Based on the results gained from the One-Way Anova statistical

capabilities-native-mobile-apps-vs-mobile-web-apps/.

test at a significance level of 5%, we confirmed that there are

significant differences in performance between the OrmLite and

[3] Elgan, M. 2014. The hottest trend in mobile: going offline! |

Realm techniques and between the Realm and SQLite techniques.

Computerworld.

We cannot discard the null hypothesis; therefore, we cannot

http://www.computerworld.com/article/2489829/mobile-

confirm there are significant differences in performance between

wireless/the-hottest-trend-in-mobile--going-offline-.html.

the OrmLite and SQLite techniques.

[4] Mahemoff, M. 2013. ‘Offline’: What does it mean and why

should

I

care?

4. CONCLUSION AND FUTURE WORK

http://www.html5rocks.com/en/tutorials/offline/whats-

We have analysed an area of data storage techniques on mobile

offline/.

devices and came to the conclusion that data storage techniques can

[5] Liebowitz, J. 2016. Big Data and Business Analytics. CRC

be devided into three groups, based on their characteristics. Based

Press.

on preliminary research, we chose the SQLite, OrmLite and Realm

techniques and compared them in terms of their usability,

[6] Walls, T. A. and Schafer, J. L. Models for Intensive

complexity of development and operational performance. The

Longitudinal Data. Illustrate.

results provided proof that data storage techniques have impact on

[7] Basescu, C., Cachin, C., Eyal, I., Haas, R., Sorniotti, A.,

analysed concepts. Based on the performed comparative analysis

Vukolic, M. and Zachevsky, I. 2012. Robust data sharing with

and experiments we found that, for development of mobile

key-value stores. IEEE/IFIP International Conference on

solutions, the use of the Realm data storage technique is more

Dependable Systems and Networks (DSN 2012), 1–12.

efficient in comparison with the SQLite and OrmLite data storage

techniques, because the Realm technique supports most analysed

[8] Aerospike

Inc.

What

is

a

Key-Value

Store?

functionalities. Consequently, Realm`s execution of technique is

http://www.aerospike.com/what-is-a-key-value-store/.

more efficient, its implemented software code less complex and

[9] Sadaqat, J., Maozhen, L., Ghaidaa, A. and Hamed, A. 2010.

there is less time needed for development. We were not able to

File annotation and sharing on low-end mobile devices.

provide proof for significant differences between the SQLite and

Seventh International Conference on Fuzzy Systems and

OrmLite techniques in operational capabilities and times needed for

Knowledge Discovery, 6, Fskd, 2973–2977.

development. However, we did confirm that, OrmLite mapper in

[10] Roukounaki, K. 2014. Five popular databases for mobile -

comparison

with

the

SQLite

database,

supports

more

Developer

Economics.

functionalities and its implemented software solutions are less

http://www.developereconomics.com/five-popular-

complex. We have confirmed that picking the right data storage

databases-for-mobile/.

technique has impact on the efficiency of software solution

development. Techniques which enable automatised mapping from

[11] Ouarnoughi, H., Boukhobza, Olivier, J., P., Plassart, L. and

domain model to data storage have proven to be more effective and

Bellatreche, L. 2013. Performance analysis and modeling of

in the Realm object database even more capable.

SQLite embedded databases on flash file systems, Des.

Autom. Embed. Syst., 17, 3–4, 507–542.

In future work research we will expand existing research with

analysis of techniques for energy waste and other sources on mobile

[12] Gerlec, C. and Hericko, M. 2010. Evaluating refactoring with

devices. This concept could not be analysed in detail due to

a quality index, World Acad. Sci. Eng. Technol., 63, 3, 76–80.





18

Can we predict software vulnerability

with deep neural network?

Cagatay Catal

Akhan Akbulut

Sašo Karakatič

Department of Computer Engineering Department of Computer Engineering Faculty of Electrical Engineering and Istanbul Kültür University

Istanbul Kültür University

Computer Science,

Istanbul, Turkey

Istanbul, Turkey

University of Maribor

c.catal@iku.edu.tr

a.akbulut@iku.edu.tr

Maribor, Slovenia

saso.karakatic@um.si



Miha Pavlinek

Vili Podgorelec

Faculty of Electrical Engineering and Computer Science, Faculty of Electrical Engineering and Computer Science, University of Maribor

University of Maribor

Maribor, Slovenia

Maribor, Slovenia

miha.pavlinek@um.si

vili.podgorelec@um.si



ABSTRACT

If vulnerable components of software systems can be detected

In this paper, we present an alternative approach to software

prior to the deployment of the software, verification resources can

vulnerability prediction with modern machine learning methods –

be assigned effectively. This research area is known as software

with deep learning methods. Deep learning methods are

security vulnerability prediction and researchers developed

techniques where features in our case software metrics) are

several prediction models so far. Although some of the

processed and sent through multiple layers where transformations

researchers showed the benefit of those models, we need much

and computations are done in sequence to form a prediction

better models in terms of prediction accuracy, precision, and

model.

recall. Some of the companies do not adopt these models yet due

to their inefficient prediction performance [1].

The deep learning methods have not been used for software

vulnerability prediction so far and could provide a new and

Software developers apply static code analysis tools [2] and code

potentially competitive alternative to the existing techniques. In

reviews [3] to avoid security vulnerabilities. For large scale

the paper we make an overview of existing solutions on the

software systems and systems of systems, it is not practical to

subject and compare them to the proposed system with deep

review all the code against possible threats. Therefore, good

learning. The deep learning techniques are presented in details

vulnerability prediction model is inevitable.

and a proposition for the prediction system is made.

Although there are many research papers on this topic, companies

like Microsoft still do not adopt Vulnerability Prediction Models

Categories and Subject Descriptors

(VPM) [4]. The reason is related with the low prediction

H.4 [Information Systems Applications]: Miscellaneous; D.2.5

performance on the source code level in terms of recall and

[Testing and Debugging]: Testing tools, Code inspections; D.2.8

precision evaluation parameters. In that study, Morrison et al.

[Software

Engineering]:

Metrics—complexity

measures,

(2015) reported that state-of-the-art models do not provide

performance measures

accurate prediction and security-specific metrics can be utilized in

the later studies to achieve an acceptable performance.

General Terms

According to our literature survey, we did not encounter any

Algorithms, Measurement, Reliability, Theory

research paper which applied deep learning. Deep learning is

being used by many high-tech companies such as Facebook,

Microsoft, and Google to solve challenging problems such as

Keywords

facial recognition, real-time translation, and speech recognition

Software vulnerability prediction, machine learning, deep learning

respectively. We aim to advanced machine learning techniques

such as deep learning to predict vulnerable components

1. INTRODUCTION

accurately.

Software security vulnerabilities are still very common and new

In this paper we make an initial review of the field and propose a

alerts and reports from several agencies are published every day.

new outlook on the problem. In the next chapter, the review of the

One such incident was published on May 13, 2015 when US Food

related work is presented. Then, we follow up with the

and Drug Administration (FDA) reported an alert about

presentation of the modern machine learning technique – deep

computerized infusion pumps which can be programmed remotely

learning or deep neural networks. We continue with the proposed

and malicious Internet users can modify the dosage of therapeutic

novel approach to the software vulnerability prediction technique

drugs. FDA suggested several actions for the hospitals which are

with the deep neural networks.

using these systems to secure them. As we see in this recent

incident, software security vulnerabilities are quite dangerous for

software-intensive systems.

19



2. RELATED WORK

automatically. The approach has been validated on a large set of

PHP applications and compared to well-known PHP tools for

Shin and Williams (2008) [5], [6] reported that complexity

static code analysis. The performance was 5% better than

metrics have correlation with security vulnerabilities. They

PhpMinerII and 45% better than Pixy’s performance in terms of

worked on Mozilla JavaScript Engine. Shin et al. (2011) applied

accuracy and precision.

logistic regression technique and analyzed the relationship of

developer activity, complexity, and code churn with software

3. DEEP LEARNING

security vulnerabilities [7]. Chowdhury and Zulkernine (2011)

Deep learning is a term that combines together techniques of

used decision trees to predict the vulnerabilities by using

machine learning that result in complex models where each model

complexity, cohesion, and coupling metrics [8]. Mean accuracy

is composed of multiple processing layers. For the sake of

was 72.85% on Mozilla Firefox data. Zimmermann et al. (2010)

simplicity and understandability, we will focus our research on the

reported that traditional metrics such as complexity, code churn,

deep neural networks, which are a subset of deep learning

and organizational measures have a weak correlation between

method. Deep learning approaches have dramatically improved

vulnerabilities for Windows Vista [9]. Although complexity,

the state-of-the-art results in several fields, which have

cohesion, and coupling metrics have been studied in previous

traditionally been dominated by ensemble machine learning

studies in detail, security-specific metrics should be determined

techniques or other approaches. These field, with their state-of-

and applied in the models. Shin and Williams (2013) investigated

the-art solutions are shown in the Table 1.

whether fault prediction models can be used for vulnerability

prediction or not [10]. They built both fault prediction and

Table 1. Applied deep learning on different problems

vulnerability prediction models and concluded that fault

Image recognition

[18]–[21]

prediction models provide similar results as vulnerability

prediction models, but both of them must be improved to reduce

Speech recognition

[22]–[24]

the number of false positives.

Prediction of the drug molecule activity

[25]

Recent studies on vulnerability prediction started to focus on

machine learning techniques. Scandariato et al. (2014) presented a

model based on machine learning to predict the vulnerabilities

Analyzing the particle accelerator data

[26], [27]

[11]. Terms in the source code are taken into account and their

associated frequencies are noted. Twenty Android applications

Natural language processing and

[28]

were used for the validation of the prediction approach. During

understanding

the experiments, they analyzed the performance of Naive Bayes

and Random Forest algorithms on this problem. They reported

Language translation

[29], [30]

that Random Forest provides better performance than Naive

Bayes algorithm. Walden et al. (2014) prepared a vulnerability



dataset which has 223 vulnerabilities [12]. They used Drupal,

Deep neural networks are composed of several layers, where first

Moodle, and PHPMyAdmin projects to analyze vulnerabilities. As

layers have a goal of representation learning. With representation

the machine learning algorithm, they applied Random Forests

learning we can feed in raw data and the method discovery proper

algorithm and reported that models using text mining is better

problem representation on its own. Each layer in network

than models using metrics in terms of recall parameter. They used

represents a non-linear module that transforms and represents the

3-fold cross-validation and experiments were performed 10 times.

data in different way. [31]

Mokhov et al. (2015) showed that machine learning approach is

effective to detect vulnerabilities and implemented a tool called

MARFCAT for fast code analysis [13]. Tool works on source

code level, binary level, and bytecode level. Shar et al. (2015)

applied static and dynamic code attributes to detect vulnerabilities

in web applications [14]. They used not only supervised machine

learners but also semi-supervised algorithms to analyze the

performance of prediction. They reported that semi-supervised

learning is preferable when vulnerability data is limited. Last

(2016) explained the research on Vulnerability Discovery Models

development to forecast the zero-day vulnerability [15]. He stated

that the research created two approaches based on machine

learning and one approach based on regression technique.

Grieco et al. (2016) implemented a tool called VDiscover which

applies machine learning techniques for the prediction of

vulnerabilities in test cases [16]. Experimental result showed that

the proposed approach predicts the programs which contain



dangerous memory corruptions effectively. Medeiros et al. (2015)

Figure 1. Example of deep neural network with two hidden

used taint analysis in conjunction with data mining [17].

layers

Candidate vulnerabilities are detected with taint analysis and false

Deep neural networks always start with the first layer of inputs –

positives are identified by using data mining technique. In

raw data. For data of an image, the input layer can be different

addition to detection of vulnerabilities, automatic corrections of

intensity levels for each pixel on each of the color levels. The

vulnerabilities are performed by adding fixes to the source code

following layer can transform the raw data in such way, that only

20

edges in different angles and orientations are highlighted. Next



Keras [37] – an Python library which utilizes

layer can detect round shapes, corners or other intensity

TensorFlow or Theano and provides an easy to use API.

transitions on the image. Following layer usually combine the

One or multiple libraries can be used in vulnerability prediction

outputs of the edge, corner and round detection layers and detect

software – it depends on the programming language used. All of

motifs and shapes which are combined by edges, corners and

the above deep neural network libraries contain basic

other shapes. Next layer can combine the output of motif and

convolutional and recurrent layers. The performance of specific

shape detection layers in even higher dimension figure, where

type of neural network will have to be determined with the

familiar shapes are starting to form: rectangles, triangles, circles

experiment.

and other shapes or parts of these shapes. Then the output of this

layer is fed into next layers which can detect even lees abstract

shapes. This process can be repeated as long as necessary and

5. CONCLUSION

each next layer searches for less abstractions and moves onto real

During our literature review we recognized the lack of usage of

shapes and figures. [32]

modern machine learning technique of deep neural networks for

software vulnerability prediction. Deep neural networks represent

The main thing to note here is that these layers are not designed

the state-of-the-art on multiple optimization, prediction and

by hand, but are usually learned through the process of

pattern recognition problems, so there is a surprising lack of

backpropagation on the whole neural net through all of the layers.

application with them in software engineering topic

Instead of the backpropagation process, some heuristic

approaches can be used, such as genetic algorithms and simulated

Our paper serves to persuade researchers, that this problem is

annealing, but this is out of the scope of this paper.

worth to tackle while this topic still remains under-researched.

Multiple deep neural network types could be used with this kind

As in other machine learning techniques, we can use deep neural

of problem, but the performance of each on vulnerability

networks for different types of problems. Different kinds of deep

prediction is yet to be determined.

neural networks are used for different kind of problem. Following

is the list divided by machine learning problem with specific

6. REFERENCES

neural network designed used for that problem.

[1]

C. F. Kemerer and M. C. Paulk, “The impact of design and



Supervised deep learning: Deep convolutional

code reviews on software quality: An empirical study

networks, Recurrent neural networks

based on PSP data,” IEEE Trans. Softw. Eng. , vol. 35, no.



Unsupervised

deep

learning:

Auto

encoders,

4, pp. 534–550, 2009.

Restricted Boltzmann machines, Deep Belief Networks

[2]

A. G. Bardas and others, “Static Code Analysis,” J. Inf.



Semi-supervised deep learning: Ladder networks

Syst. Oper. Manag. , vol. 4, no. 2, pp. 99–107, 2010.

4. DNNs FOR VULNERABILITY

[3]

M. V Mäntylä and C. Lassenius, “What types of defects are

really discovered in code reviews?,” IEEE Trans. Softw.

PREDICTION SYSTEM

Eng. , vol. 35, no. 3, pp. 430–448, 2009.

We propose a system that utilizes the deep neural network

[4]

P. Morrison, K. Herzig, B. Murphy, and L. Williams,

machine learning technique for prediction of the software

“Challenges with applying vulnerability prediction

vulnerabilities. This can be done in number of ways. If there is

models,” in Proceedings of the 2015 Symposium and

previous vulnerability data, supervised learning models can be

Bootcamp on the Science of Security, 2015, p. 4.

applied. If there is no previous data, unsupervised deep learning

algorithms can be used. If there is very limited vulnerability data,

[5]

Y. Shin and L. Williams, “An empirical model to predict

semi-supervised deep learning techniques can be investigated. We

security vulnerabilities using code complexity metrics,” in

will analyze all of these three problems in the project to solve

Proceedings of the Second ACM-IEEE international

them efficiently.

symposium on Empirical software engineering and

measurement, 2008, pp. 315–317.

There are number of implementation of deep neural networks that

can be used in the proposed system. Following is the list of such

[6]

Y. Shin and L. Williams, “Is complexity really the enemy

libraries, packages and software that are mainly used in the

of software security?,” in Proceedings of the 4th ACM

industry or in the research.

workshop on Quality of protection, 2008, pp. 47–50.



TensorFlow [33] – an open source library developed by

[7]

Y. Shin, A. Meneely, L. Williams, and J. A. Osborne,

Google and written in Python and C++ language, which

“Evaluating complexity, code churn, and developer activity

can be used with other Python and C++ software

metrics as indicators of software vulnerabilities,” IEEE

through the API provided.

Trans. Softw. Eng. , vol. 37, no. 6, pp. 772–787, 2011.



Theano [34] – an open source Python library for DNN

[8]

I. Chowdhury and M. Zulkernine, “Using complexity,

developed by University of Montreal.

coupling, and cohesion metrics as early indicators of



Torch [35] – an open source machine learning library

vulnerabilities,” J. Syst. Archit. , vol. 57, no. 3, pp. 294–

written in C, maintained by Facebook and Google

313, 2011.

engineers and used by Google DeepMind and Facebook

AI research teams.

[9]

T. Zimmermann, N. Nagappan, and L. Williams,



“Searching for a needle in a haystack: Predicting security



Deeplearning4j – an open source C and C++

vulnerabilities for windows vista,” in

implementation of deep neural networks developed by

2010 Third

Skymind that provide Java API.

International Conference on Software Testing, Verification



and Validation, 2010, pp. 421–428.

Caffe [36] – implemented in C++ and Python and

pdovides APIs for C++, Python and Matlab.

21

[10] Y. Shin and L. Williams, “Can traditional fault prediction

groups,” IEEE Signal Process. Mag. , vol. 29, no. 6, pp.

models be used for vulnerability prediction?,” Empir.

82–97, 2012.

Softw. Eng. , vol. 18, no. 1, pp. 25–59, 2013.

[24] T. N. Sainath, A. Mohamed, B. Kingsbury, and B.

[11] R. Scandariato, J. Walden, A. Hovsepyan, and W. Joosen,

Ramabhadran, “Deep convolutional neural networks for

“Predicting vulnerable software components via text

LVCSR,” in 2013 IEEE International Conference on

mining,” IEEE Trans. Softw. Eng. , vol. 40, no. 10, pp.

Acoustics, Speech and Signal Processing, 2013, pp. 8614–

993–1006, 2014.

8618.

[12] J. Walden, J. Stuckman, and R. Scandariato, “Predicting

[25] J. Ma, R. P. Sheridan, A. Liaw, G. E. Dahl, and V. Svetnik,

vulnerable components: Software metrics vs text mining,”

“Deep neural nets as a method for quantitative structure--

in 2014 IEEE 25th International Symposium on Software

activity relationships,” J. Chem. Inf. Model. , vol. 55, no. 2,

Reliability Engineering, 2014, pp. 23–33.

pp. 263–274, 2015.

[13] S. A. Mokhov, J. Paquet, and M. Debbabi, “MARFCAT:

[26] T. Ciodaro, D. Deva, J. M. De Seixas, and D. Damazio,

Fast code analysis for defects and vulnerabilities,” in

“Online particle detection with neural networks based on

Software Analytics (SWAN), 2015 IEEE 1st International

topological calorimetry information,” in Journal of

Workshop on, 2015, pp. 35–38.

Physics: Conference Series, 2012, vol. 368, no. 1, p.

[14] L. K. Shar, L. C. Briand, and H. B. K. Tan, “Web

12030.

application vulnerability prediction using hybrid program

[27] C. Adam-Bourdarios, G. Cowan, C. Germain, I. Guyon, B.

analysis and machine learning,” IEEE Trans. Dependable

Kegl, and D. Rousseau, “Learning to discover: the Higgs

Secur. Comput. , vol. 12, no. 6, pp. 688–707, 2015.

boson machine learning challenge,” URL http//higgsml.

[15] D. Last, “Forecasting Zero-Day Vulnerabilities,” in

lal. in2p3. fr/documentation, 2014.

Proceedings of the 11th Annual Cyber and Information

[28] A. Bordes, S. Chopra, and J. Weston, “Question answering

Security Research Conference, 2016, p. 13.

with subgraph embeddings,” arXiv Prepr. arXiv1406.3676,

[16] G. Grieco, G. L. Grinblat, L. Uzal, S. Rawat, J. Feist, and

2014.

L. Mounier, “Toward Large-Scale Vulnerability Discovery

[29] I. Sutskever, O. Vinyals, and Q. V Le, “Sequence to

using Machine Learning,” in Proceedings of the Sixth

sequence learning with neural networks,” in Advances in

ACM Conference on Data and Application Security and

neural information processing systems, 2014, pp. 3104–

Privacy, 2016, pp. 85–96.

3112.

[17] I. Medeiros, N. Neves, and M. Correia, “Detecting and

[30] S. J. K. Cho, R. Memisevic, and Y. Bengio, “On Using

removing web application vulnerabilities with static

Very Large Target Vocabulary for Neural Machine

analysis and data mining,” IEEE Trans. Reliab. , vol. 65,

Translation,” 2015.

no. 1, pp. 54–69, 2016.

[31] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,”

[18] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet

Nature, vol. 521, no. 7553, pp. 436–444, 2015.

classification with deep convolutional neural networks,” in

[32] J. Schmidhuber, “Deep learning in neural networks: An

Advances in neural information processing systems, 2012,

overview,” Neural Networks, vol. 61, pp. 85–117, 2015.

pp. 1097–1105.

[33] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C.

[19] C. Farabet, C. Couprie, L. Najman, and Y. LeCun,

Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, and

“Learning hierarchical features for scene labeling,” IEEE

others, “Tensorflow: Large-scale machine learning on

Trans. Pattern Anal. Mach. Intell. , vol. 35, no. 8, pp.

heterogeneous distributed systems,” arXiv Prepr.

1915–1929, 2013.

arXiv1603.04467, 2016.

[20] J. J. Tompson, A. Jain, Y. LeCun, and C. Bregler, “Joint

[34] J. Bergstra, F. Bastien, O. Breuleux, P. Lamblin, R.

training of a convolutional network and a graphical model

Pascanu, O. Delalleau, G. Desjardins, D. Warde-Farley, I.

for human pose estimation,” in Advances in neural

Goodfellow, A. Bergeron, and others, “Theano: Deep

information processing systems, 2014, pp. 1799–1807.

learning on gpus with python,” in NIPS 2011, BigLearning

[21] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D.

Workshop, Granada, Spain, 2011.

Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich,

[35] N. Léonard, S. Waghmare, and Y. Wang, “RNN: Recurrent

“Going deeper with convolutions,” in Proceedings of the

library for torch,” arXiv Prepr. arXiv1511.07889, 2015.

IEEE Conference on Computer Vision and Pattern

Recognition, 2015, pp. 1–9.

[36] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R.

Girshick, S. Guadarrama, and T. Darrell, “Caffe:

[22] T. Mikolov, A. Deoras, D. Povey, L. Burget, and J.

Convolutional architecture for fast feature embedding,” in

Černock\`y, “Strategies for training large scale neural

Proceedings of the 22nd ACM international conference on

network language models,” in Automatic Speech

Multimedia, 2014, pp. 675–678.

Recognition and Understanding (ASRU), 2011 IEEE

Workshop on, 2011, pp. 196–201.

[37] F. Chollet, “Keras: Deep learning library for theano and

tensorflow.” 2015.

[23] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N.

Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath,



and others, “Deep neural networks for acoustic modeling

in speech recognition: The shared views of four research

22

Exhaustive key search of DES using cloud computing

Aleks Drevenšek

Marko Hölbl

Faculty of Electrical Engineering and Computer Science, Faculty of Electrical Engineering and Computer Science, University of Maribor

University of Maribor

Maribor, Slovenia

Maribor, Slovenia

aleks.drevensek@gmail.com

marko.holbl@um.si

ABSTRACT

the full cipher text. The competitors were provided with 192 bits of

In this paper we present the time complexity of exhaustive key

the plain text, the methods of converting plain text to a hexadecimal

search for the DES algorithm using modern cloud computing. We

value and separate it into blocks of 64 bits, and the methods of

demonstrate that it is possible to perform a brute force attack on a

padding to the full block, since DES is a block cipher and operates

known encryption algorithm in practice using commercially

with 64-bit blocks of data. The encryption mode was CBC [5].

available cloud computing services. We also discuss a previous

attempt of exhaustive key searches, and explain the methods and

2.1 Competition DES-I

preparations for the experiment. The time complexity is still very

The first DES competition was held in 1997 and was won by a

high, but the time needed for finding a key can be improved using

group of three called DESCHALL. They tackled the problem by

cloud computing, but not with the available free resources.

building a distributed network of applications for executing the

exhaustive key search [7]. They used a client-server architecture,

Categories and Subject Descriptors

where the server determined which key space was to be searched

E.3 [Data Encryption]: Data Encryption Standard (DES)

next and which keys were already checked [7]. Their clients were

physical computers owned by volunteers. The fastest computer they

used was a Power PC 604e with processor speed of 250 MHz and

General Terms

search speed of 1.5 million keys per second [7]. An improvement

Algorithms,

Measurements,

Performance,

Experimentation,

of searching algorithm was conducted while the search was being

Security

performed. Near the end of the competition, a team developed a

new technique called bit slicing that allowed it to search 32 or 64

Keywords

keys simultaneously, depending on the CPU architecture – a 32-bit

Cloud computing, exhaustive key search, DES, Microsoft Azure

CPUs was able to calculate 32 keys simultaneously and 64-bit

CPUs 64 keys. With this improvement the fastest speed calculated

on a 167MHz UltraSPARC computer was 2.4 million keys per

1. INTRODUCTION

second [7].

One of the goals of modern cryptography is the assurance of

confidentiality which is achieved through the use of encryption.

The first competition was completed successfully in 96 days with

Algorithms of encryption, referred to as ciphers, are classified into

51.8% of key space searched. They recorded more than 78,000

two types: Asymmetrical and symmetrical. Symmetrical ciphers

unique IP addresses on the server and had around 14 thousand

use one key for both encryption and decryption [1]. This paper

highest concurrent computers searchers.

focuses on these types of ciphers. Additionally, these symmetric

ciphers are classified into block and stream ciphers[2].

2.2 Competition DES-II-1 and DES-II-2

Exhaustive key search, or brute force attack, on a modern

The second DES competition was held in the beginning of 1998.

symmetric cipher is a method of trying every single key in known

The wining organisation was distributed.net, which used a similar

key space to identify the key used to encrypt selected plain text [3].

infrastructure as DESCHALL. The key was found after 39 days.

Modern cloud services are perfect options to execute such heavy

The highest search speed in this competition was 32,430 million

tasks [4].

keys per second. 90% of all key space has to be searched. The

organisation distributed.net estimated that their computing power

The purpose of our research was to demonstrate that it is possible

was equivalent to 22 thousand computers with Intel Pentium II at

to find an encryption key using cloud computing in reasonable time,

333 MHz That was about double the power of the best DES-I

which could open new questions about the security of modern

competitors’ resources [8].

algorithms.

The third competition was announced in the same year. The EFF

In this paper we will answer two questions: Is it possible to execute

[9], with a dedicated super computer created specifically for this

an exhaustive key search for an algorithm successfully using cloud

purpose, named Deep Crack won. This super computer was using

computing? How long would such an exhaustive key search take?

advanced hardware implementation of DES, which was faster than

the equivalent software implementations. The average speed was

The chosen block cipher for the experiment was DES [5]. It is a

88,804 million keys per second. The total time of the search was

symmetrical block cipher. Due to its short key length, only 56-bits,

56.05 hours. The size of the key space that was needed to be

it is prone to brute force attacks [6].

searched to find the key was around 24.8% [10].

2. PREVIOUS DES CHALLENGES

2.3 Competition DES-III

In the past, several competitions were carried out by RSA Security

The last competition was in January, 1999. In this competition the

with the intention of finding a key for DES. The data provided to

highest prize money was given if the search would be completed

the competitors was: A known algorithm, part of a plain text and

23



within 24 hours and if the searches would take more than 56 hours,

4.1 Experimental variables

no reward would be given[11].

Independent variables were connected to the chosen cloud service.

The winner of the competition was a team consisting of

They were all combined in packets called virtual machines. In this

distributed.net [12] and EFF [12]. The search was finished in 22

way we could not change them separately. Our independent

hours and 15 minutes. The average search speed was 199,000

variables were: CPU speed, number of CPU cores and memory

million keys per second which is more than double the speed of

size.

Deep Crack (88,804 million keys per second). They also needed to

The CPU speed was a continuous variable with a value range of 0

check around 22.2% of the key space, which was the lowest number

GHz to 3.3 GHz. The number of cores was a discrete variable with

of keys needed to be searched in all competitions [13].

a value range of 1, 2, 4, 8, 16 or 20. The memory size was a

continuous variable with a range from 0.75 GB to 140 GB.

3. CLOUD COMPUTING

We defined two dependent variables: The search speed and the time

Cloud computing simplifies the access to ready to use computer

required to find the key. The first was defined by dividing the

resources. The main feature is the availability of computing power

number of all searched keys with the required search time. The

which is necessary for exhaustive key search [14].

second variable was continuous as well and was calculated from the

We identified the necessary resources to execute an exhaustive key

search speed and the number of all keys.

search. We focused on cloud services that were offering a cloud

computing service. The first resource that was considered was

4.2 Experimental plan

computing power, the number of CPU cores and the amount of

First we needed to prepare the environment. This step included the

available memory. Computing power is defined by the type of

log-in procedure with a valid Azure Batch account. For our

virtual machine. The second resource was storage, for storing

experiment we used a trial account. Then we created a pool of

searching application and results. In contrast to CPU we did not

virtual machines - up to 20 cores. Finally, we uploaded the program

need a huge amount of storage [15].

for exhaustive key search.

While most cloud services offered a similar type and amount of

The following step was performing the exhaustive key search. We

resources, only Microsoft Azure offered a dedicated service for

used speed evaluation mode to be able to measure speed and

high performance computing, which is referred to as Azure Batch.

calculate results. In this mode the search program checked 230 keys

With that in mind, we decided to use the Microsoft Azure cloud

and then finished (exited).

service. Their Batch service is designed to execute computing that

We measured the time for each instance of a virtual machine

requires up to 10 thousand processor cores [4].

separately. The time needed to prepare instances, transfer files and

The computers used in the Azure Batch service are of the same type

other auxiliary tasks was ignored. For the measured time we

as Microsoft Azure virtual machines. They are divided into three

considered only the time needed to execute a key search.

groups: A, D and D version 2, with Dv2 being the fastest regarding

We repeated all steps for each different type of virtual machine,

CPU resources. Virtual machines of type A use the Intel Xeon E5-

iterating through all the available types. The procedure of the

2670 processor with speed of 2.6 GHz and type Dv2 use Intel Xeon

experiments is shown in Figure 1. We could conduct the same

E5-2673 v3 CPUs with speed of 2.4 GHz that can be boosted up to

procedure on different variables and we would expect different

3.2 GHz. The instances that we used in our experiment are shown

results.

in Table 1 [16].



Table 1: Microsoft Azure type of virtual machines

Instance

Number of cores

Memory (GB)

A1

1

1.75



Figure 1: Representation of experiment process

A2

2

3.5

A3

4

7

5. KEY SEARCH IMPLEMENTATION

A4

8

14

The task of exhaustive key search is highly time and resource

consuming. Keeping that in mind, we were forced to use

A5

2

14

performance improvements. Our software was written in the C++

A6

4

28

programming language and we used updated versions of tools that

D1v2

1

3.5

were used in past competitions.

D2v2

2

7

The used software employs the method of bit slicing and is intended

to be used on 64 bit processors for best performance results. Since

D11v2

2

14

we wanted to execute the search on multiple keys simultaneously



over one search cycle we had to transform the input data. We had

to convert the starting data into hexadecimal values. The next

4. USING CLOUD COMPUTING TO

operation was to transform data into a bit slicing compatible format,

where each bit that was marked as 1 was transformed into the

PERFORM AN EXHAUSTIVE KEY

highest possible value of data type unsigned __int64.

SEARCH OF DES

After the data was prepared, we executed the search using the

Our experiment was conducted in an on-line environment.

method keySearch(). The task of the method was to prepare

Research context was specific.

candidate keys, variable for multithreaded mode if that would be

24

optimal and execute another method deseval() for the current set of

faster by 43.4%. The fastest instance D2v2 was faster by 25.6%

keys. If a key was returned we found the correct key, otherwise the

than the fastest type A instance - A1.

deseval() was repeated with a new set of keys.

Before starting the search, the deseval() method would load first

6.1 Time Complexity

using the first set of keys. The purpose of that was to load it into

We calculated the time required to finish successfully an exhaustive

memory and save on time while actually executing a search. The

key search of a random generated DES key for the fastest instances.

measuring process started just before the first execution of

According to the rules of previous DES competitions we generated

deseval() and stopped after the last execution was finished.

a key randomly and used it to encrypt an arbitrary plain text. To

perform an exhaustive key search for this key successfully we

This main method, deseval(), uses modified S-boxes for

would have to search 34.26% of all keys. Based on this, we

deciphering with multiple keys at once. The method runs 14 rounds

calculated the different times required by the search.

of the algorithm before it is possible to check if all keys are

incorrect. In 1.6% of all executions, the method continued and



proceeded to do multiple checks over the last 2 rounds. Compared

Table 3: Time required to find a random key with D2v2

to the normal process of the DES deciphering procedure, we were

able to check the correctness of the key after only 14 rounds

Number of

Total search speed

Required time

compared to the 16 that the process would normally take. Another

cores

(keys per second)

improvement that was included was the possibility to check 64 keys

20

366,164,856

26.01 months

at once instead of just one.

400

7,323,297,120

39 days

6. RESULTS

6,700

122,665,226,760

56 hours

During the execution of the experiment we noticed that not all

10,000

183,082,428,000

37.45 hours

Microsoft Azure virtual machines were available. This may be due

17,000

311,240,127,600

22 hours

to the fact that Microsoft Azure Batch is a new service and may still

have some imperfections. We were able to perform searches on the



following 9 instances: A1-A6, D1v2, D2v2 and D11v2. For each

The results for the fastest instance D2 version 2 indicate that, with

instance we normalized the speed to one core.

the limited number of cores available in the trial version of Azure,



the time complexity of the search would be more than 26 months.

Table 2: Search speed of available instances

To achieve the winning time of the DES-II challenge, 39 days, we

would need 400 cores. To lower the time to 56 hours as of the next

Instance

Cores

Speed (keys per

Speed per core

DES competition 6,700 cores would be needed. With the maximal

second)

(keys per second)

number of cores allowed by Microsoft, 10,000, the search would

D2v2

2

36,616,485

18,308,242

take 37.45 hours. To get faster results than the ones from the

competitions, we would have to use more cores than are allowed by

D1v2

1

17,724,361

17,724,361

the cloud, that is 17,000.

D11v2

2

33,172,943

16,586,471

A1

1

14,573,834

14,573,834

6.2 Worst case scenario

The worst case scenario for an exhaustive key search would be if

A5

2

27,506,451

13,753,225

the random generated key would be the last key in the sets of keys

A3

4

52,123,389

13,030,847

which needed to be searched – the entire key space would need to

be searched. We recalculated our results to fit the worst case

A2

2

22,510,310

11,255,155

scenario. Instances used in this calculation were D2v2.

A6

4

41,688,997

10,422,249



A4

8

82,671,837

10,333,979

Table 4: Time required to find the last key with D2v2



Number of

Total search speed

Required time

The instances of type D2 version 2 were faster than all of the

cores

(keys per second)

instances of type A, which was expected since they use newer

20

366,164,856

75.9 months

CPUs. We observed that instances with less cores were mostly

faster than those with more.

1168

21,384,027,590

39 days

The fastest instance was D2v2 with 2 cores of Intel Xeon E5-2673

19,523

357,431,824,184

56 hours

v3 at a speed of 2.4 GHz with 7GB of memory. The search speed

10,000

183,082,428,000

4.5 days

per core was about 18.3 million keys per second. The second fastest

instance was D1v2 which performed about 500 thousand keys per

49,695

909,828,125,946

22 hours

second slower. The third instance type D11v2 was slower than the



first by over 1.7 million keys per second. Since they all use the same

Using instance D2v2, in the case of searching through every key

hardware, we assumed the cause could lie in the overhead of the

the time required to finish with 20 cores would be almost 76

virtualization.

months. To beat the best time of 39 days, we would need 1,168

We compared instance type A versus type Dv2. The average

cores. To beat the 56 hours of the winner of the second competition,

calculated speed of type A was around 12,228,215 keys per second

the required number of cores would be 19,523. This is already over

while the average speed of type Dv2 was 17,539,682 keys per

the maximum number of cores allowed by the Microsoft Azure

second, which means that instance type Dv2 were, on average,

cloud service. Using the maximum number of cores, we would need

25

4.5 days. To find a key faster than in all previous competitions we

[2]

A. J. Menezes, P. C. Van Oorschot, and S. A. Vanstone,

would have to use almost 50,000 cores.

Handbook of applied cryptography. Boca Raton: CRC

Press, 1997.

6.3 Virtualization overhead

[3]

F. Rubin, ‘Foiling an Exhaustive Key-Search Attack’,

Since modern cloud computing is powered by virtualization

Cryptologia, vol. 11, no. 2, pp. 102–107, Apr. 1987.

technology we also investigated this aspect. While virtualization

[4]

‘Azure Batch feature overview | Microsoft Azure’. [Online].

may have numerous advantages, it has also drawbacks. One is

Available:

https://azure.microsoft.com/en-

performance loss. To measure how much of performance is lost

us/documentation/articles/batch-api-basics/. [Accessed: 17-

when using virtualization, we ran our searching algorithm on a

May-2016].

normal computer (Intel i5-3570k CPU 3.4GHz). The personal

[5]

‘RSA Laboratories - Contest Rules’. [Online]. Available:

computer was able to search 30 million keys per second per single

http://www.emc.com/emc-plus/rsa-labs/historical/contest-

core, that is around 64% faster than the speed of cloud instance

rules.htm. [Accessed: 12-Apr-2016].

D2v2. If we subtract the difference in speed of the CPUSs we could

[6]

F. PUB, Data Encryption Standard (DES). 1999.

assume that the loss in performance is around 22%.

[7]

M. Curtin and J. Dolske, ‘A Brute Force Search of DES

Keyspace’.

7. CONCLUSION

[8]

D. McNett, ‘[RC5] [ADMIN] The secret message is...’, 24-

Feb-1998.

In this paper we presented the use of Microsoft Azure cloud

[9]

Electronic Frontier Foundation, Ed., Cracking DES: secrets

services with the new Azure Batch service for high performance

of encryption research, wiretap politics & chip design, 1st

application computing, namely for exhaustive key search for the

ed. San Francisco, CA: Electronic Frontier Foundation,

DES algorithm. We used a brute force attack approach and

1998.

estimated the computing power needed for a successful attack.

[10] ‘EFF DES Cracker Press Release, July 17, 1998’. [Online].

We used different variants of the Microsoft Azure cloud service

Available:

platform (A1, A2, A3, A4, A5, A6, D1v2, D2v2 and D11v2).

https://w2.eff.org/Privacy/Crypto/Crypto_misc/DESCracke

According to our measurements the fastest instance was type D

r/HTML/19980716_eff_descracker_pressrel.html.

version 2. The maximal number of cores of virtual machines that

[Accessed: 12-Apr-2016].

could be run per account was 10,000. Since cloud computing is

[11] ‘distributed.net: Project DES’. [Online]. Available:

based on virtualization, there is some loss of performance - we

http://www.distributed.net/DES. [Accessed: 12-Apr-2016].

calculated it to be around 22%. The documentation of the cloud

[12] ‘RSA Laboratories - DES Challenge III’. [Online].

provider estimated the loss to be around 15-20% [17].

Available:

http://www.emc.com/emc-plus/rsa-

It can be concluded that exhaustive key search can be performed

labs/historical/des-challenge-iii.htm. [Accessed: 12-Apr-

successfully. The required time depends on the number of activated

2016].

cores. In the worst case scenario, using the maximum number of

[13] ‘Brute force attacks on cryptographic keys’. [Online].

cores, 10,000, it is possible to find the key in a time of 4.5 days. If

Available:

http://www.cl.cam.ac.uk/~rnc1/brute.html.

we wanted to improve this time, we would need more cores, which

[Accessed: 12-Apr-2016].

would require multiple accounts.

[14] S. Srinivasan, Cloud Computing Basics. Springer, 2014.

[15] ‘Azure infrastructure services implementation guidelines’.

Another aspect of speeding up the process would include the

[Online].

Available:

https://azure.microsoft.com/en-

optimization of the software used for searching. In Key feature the

us/documentation/articles/virtual-machines-linux-

S-boxes were outdated and by updating them to the newest version

infrastructure-service-guidelines/.

[Accessed:

13-Apr-

we could lower the time complexity.

2016].

Finally, it has to be noted that those results are based on the

[16] ‘Pricing - Virtual Machines (VMs) | Microsoft Azure’.

Microsoft Azure cloud and could differ if other cloud provider

[Online].

Available:

https://azure.microsoft.com/en-

would have been used.

us/pricing/details/virtual-machines/. [Accessed: 13-Apr-

2016].

For the future we could research the search speed of other

[17] ‘Optimizing Performance on Hyper-V’. [Online]. Available:

encryption algorithms, mostly those which are still considered as

https://msdn.microsoft.com/en-

secure.

us/library/cc768529(v=bts.10).aspx. [Accessed: 23-May-

2016].

8. REFERENCES

[1]

G. J. Simmons, ‘Symmetric and asymmetric encryption’,

ACM Comput. Surv. CSUR, vol. 11, no. 4, pp. 305–330,

1979.



26

From a New Paradigm to Consistent Representation

Gordana Rakić

Jozef Kolek

Zoran Budimac

University of Novi Sad

University of Novi Sad

University of Novi Sad

Trg D. Obradovića 4

Trg D. Obradovića 4

Trg D. Obradovića 4

21000 Novi Sad, Serbia

21000 Novi Sad, Serbia

21000 Novi Sad, Serbia

goca@dmi.uns.ac.rs

jkolek@gmail.com

zjb@dmi.uns.ac.rs

ABSTRACT

become complex and heterogeneous with respect to tech-

In this paper, a method for mapping between language con-

nologies and languages. A characteristic example are soft-

structs that belong to different programming paradigms is

ware products where business logic is developed in some dy-

provided. The method is based on an universal source code

namic multi-paradigm language, where functional paradigm

representation used by Set of Software Quality Static Ana-

is always very popular. These business components are of-

lyzers (SSQSA) platform, and motivated by need to consis-

ten hidden behind modern user interfaces developed in, lan-

tently support different paradigms by static analysis. The

uages desingned for that. Even if functional paradigm is

method is illustrated by an example of integration of support

not well supported by static analysis tools, there is an in-

for functional paradigm.

terest coming from practice for improvements in this area.

Some language-specific tools are already in mature phase of

Categories and Subject Descriptors

development [2].

D.3.3 [Programming Languages]: Language Constructs

and Features

Described conditions bring us to the very difficult task of

reconciliation of opposed objectives - heterogeneous projects

are to be consistently supported by static analysis. This sup-

General Terms

port has to involve multiple tools because of limitations of

Languages

available ones, but we cannot rely on consistency of analysis

results among tools. Solution is to achieve consistency of

Keywords

static analysis by involving only one universal tool that will

SSQSA, eCST, Functional Languages, Scheme

support all languages, technologies and platforms. SSQSA

platform [6] is on a good way to meet these goals.

1.

INTRODUCTION

Static analysis of computer pqrograms is the analysis that

2.

SSQSA AND ECST

is performed without actual execution of programs. Static

Set of Software Quality Static Analyzers (SSQSA) is a plat-

analysis is mostly performed on source code of computer pro-

form for building and integration of a set of software tools

gram or on some intermediate representation of it (e.g. an

for static analysis. Starting aim of the framework is con-

intermediate code, a tree, a graph, a combination of previous

sistent software quality analysis for projects developed in

ones, or even some complex meta-model). Systematic and

multiple languages, paradigms, and technologies. Essential

consistent application of static analysis techniques can sig-

characteristics of SSQSA platform are its:

nificantly improve the quality of a software product (to find

(1)Extendibility by new analysis. All implementations

weak points, discover bad design, bad maintainability, etc.).

of analysis algorithms are independent of the input program-

Static analysis is usually done by specialized tools. However,

ming language and each of the integrated analyzers can be

in practice they suffer from several weaknesses (e.g. limita-

uniformly applied to software systems that are written in

tions regarding supported languages) [6]. Furthermore, it

different programming languages. Furthermore, after inte-

is shown that different tools give differqent results for the

gration of a new analysis it is applicable to all languages

same metrics applied on the same source code [4],[5].

supported by SSQSA.

(2)Adaptability to a new language. Support for a new

Characteristics of contemporary software projects are that

language can be integrated. After adding a new language,

they sometimes last for decades, while during decades they

whole set of implemented analyses is immediately applica-

ble to it. Introducing the support for a new input language

into SSQSA framework is a straightforward semi-automated

procedure [6]. For these purposes we need an appropriate

(formal) specification of the programming language. After-

wards, we should only follow the steps of the established

procedure.

These characteristics of SSQSA platform are based on uni-

versality of enriched Concrete Syntax Tree (eCST)[6]. eCST

is based on concepts of syntax trees. It contains full source

27

code without abstractions enriched with universal nodes.

and returns the result. We assume that redefinitions of im-

Universal nodes are predefined so-called imaginary nodes

portant syntax constructs are not performed. (2) Scheme

with language independent meanings which denote semantic

support macros. It is supposed that, all Scheme code that

concepts expressed by specific constructs of a language (e.g.

is used as input to SSQSA is a Scheme code with already

LOOP STATEMENT is used to denote any loop expressed

expanded macros. Therefore, the macro free Scheme code is

by for, while, do, repeats, etc. depending on the language).

expected, and macros are not considered in this paper.

Currently, SSQSA supports representative set input lan-

In this section we are passing through characteristic con-

guages, while support for functional languages is still weak.

structs of Scheme languagqe, level by level.

Namely, integration of functional languages such is Erlang

or Scala is in the testing phase. However, support for a clean

4.1

High-level entities

functional language such is Scheme is still not introduced.

The largest entity that has to be marked is a compilation

In this paper we describe a motivation and an approach

unit. Compilation unit is marked using the universal node

to integration of functional language. Focus of the paper

COMPILATION UNIT which is always the root node of a

is on mapping of functional constructs to eCST universal

single source unit that has to be compiled or interpreted sep-

nodes. We map some of the most characteristic functional

arately. To make a parallel to other languages, compilation

constructs written in Scheme to illustrate the approach.

unit is a single compilable unit (e.g. class or module), usu-

ally determined by an input file. Scheme compilation unit

3.

MAPPING APPROACH

consists of expression sequence where in the most Scheme

As it was mentioned earlier the translation of programming

interpreter implementations the last expression is evaluated

language Scheme into the eCST is done by adding universal

when unit is loaded.

nodes into generated syntax tree by the specification. Here,

we are describing how the particular syntax elements are

Scheme entity can be Scheme programs and library. Scheme

marked by the corresponding universal nodes. We focus on

libraries can import and export functions. The universal

some constructs characteristic for functional languages to

node PACKAGE DECL which is used for marking a pro-

demonstrate the approach, while the rest of constructs will

gram packages must be child of the COMPILATION UNIT.

be just mentioned.

Even the Scheme does not have packages in the real sense,

each compilation unit is marked with this node. Names that

Our approach is based on previous experience. When con-

are imported from libraries are marked with the universal

sidering a specific construct in Scheme, concrete mapping

node IMPORT DECL, whose direct child must be NAME

method consists of following steps: (1) Determine the con-

which marks identifiers.

struct which is to be mapped (e.qg.

code fragment) (2)

Determine its semantic (e.g. definition of the function) (3)

At the next level of hierarchy in Scheme entity we can find

Determine all factors participating in it (e.g. argument dec-

variable or function definition. Another important construct

larations body or statements) (4) Compare the role of all

in any program is block.

factors with other supported languages in order to find an

equivalence (5) Define mapping which is consistent with sup-

ported languages

4.2

Block

Scheme defines sequences or special expressions that are

4.

CASE STUDY: SCHEME

used for grouping of other expressions.

These sequences

The programming language Scheme is a functional, and dy-

are defined using keyword begin.

The last expression in

namically typed programming language. It is based on math-

the body of begin block returns its value as a value of the

ematical concept called lambda calculus, introduced by Alonzo

block. Sequences are nothing more than a scopes without

Church [1]. Although the lambda calculus is very powerful

locally declared variables. It can be compared to block be-

concept which can be used to write any program, it is not

tween BEGIN and END in Modula-2 or between { and } in

the most practical approach. Therefore, Scheme brings some

Java. Sequences, starting with expression begin, are placed

minor modifications of it. Unlike basic lambda calculus, in

in sub-tree of the universal node BLOCK SCOPE.

the Scheme lambda expression can bind several variables at

once. It also contains constants, numbers, data structures,

Let expressions in Scheme represents expressions with the

and so on. Also, Scheme contains various programming lan-

scope and locally defined variables. There are four different

guage constructs, assignment operation, environment of de-

let expressions: let, let*, letrec, and named let. Example of

fined names, libraries for input/output operations, etc. The

basic let expression would be:

language Scheme adds to basic lambda calculus all the neces-

( l e t

( ( x 1 0 ) ( y 2 0 ) ) (+ x y ) )

sary features that a practical programming language needs.

Thus, Scheme becomes a simple but very expressive pro-

gramming language. It finds its place in education and also

This is a block with two declared variables and one opera-

in software industry.

tion on them, i.e. statement. Variables x and y are bound

to the numbers 10 and 20 respectively and the whole let ex-

There are two assumptions before mapping Scheme to eCST.

pression returns the sum of these two values. When trans-

(1) Scheme symbol can be redefined in any way, without any

lating Scheme into eCST the expression let, let*, and letrec

restriction. For instance, the expression if can be easily re-

are treated equally and they are marked by universal node

defined using the function (define (if x y) (+ x y)), where

BLOCK SCOPE, The named let is treated as a function,

if becomes a function that makes a sum of two numbers,

because it can recursively call itself (Section 4.4.3).

28

4.3

Variables

function in the subtree. Lambda function is again marked

In Scheme variables are declared and defined using define or

by FUNCTION DECL, FORMAL PARAM LIST, and PA-

let expression. Following examples of using define and let

RAMETER DECL, as described, while the node NAME of

to define a variable are equivalent, while let is usually used

FUNCTION DECL remains empty in this case.

only inside the function body.

4.4.3

Let

( d e f i n e x 1 0 ) o r ( l e t x 1 0 )

A special case of the Let block is named let. It is used to

express tail-recursion. It can be observed as a function that

These are corresponding constructs to variable declaration

can be called only from its body. Therefore it is a function

with initialization in any other language, e.g. int i = 10 in

with certain restrictions on syntactical level. However, once

Java. In both cases, a variable declaration is marked with

when this function is defined according to language rules it

universal node VAR DECL. VAR DECL has the universal

has all characteristics of recursive function. For example:

node TYPE as a direct child. In Scheme a type of newly

declared variable is determined implicitly, thus TYPE sub-

( d e f i n e ( f a c t o r i a l x )

tree stays empty until we determine types. This is a task

( l e t

l o o p ( ( x x ) a c c 1 ) )

for eCST Manipulator [6]. Consistent post-processing of dy-

( i f

( z e r o ? x ) a c c

namic types is planned for future work (Section 6). Initial-

( l o o p ( sub1 x ) ( ∗ x a c c ) ) ) ) )

ization is observed as an assignment statement and inside it

variable name is marked with the NAME and value by node

It is obvious that this is equivalent to recursive function

VALUE. The x is the name of variable and 10 is value that

definition in any other language.

The main difference is

variable x is bound to.

that other languages usually do not require explicit syntax

constructs for recursive function. Named let is marked by

4.4

Functions

universal nodes used for other function definitions, where

A Scheme functions are defined using define and let expres-

name of the let block is a name of the function.

sions, as well. There are several approaches to define func-

tions.

In all cases it is marked using the universal node

4.5

Statements

FUNCTION DECL, list of parameters is marked using FOR-

Blocks and function bodies are built from statements. State-

MAL PARAM LIST, and each parameters marked by PA-

ments in Scheme vary from simple expressions to complex

RAMETER DECL, which is similar as for the first approach.

ones such are branch statements, loop statements and con-

The node NAME marks the function name. Similarly, like

tinuation statements.

variables, parameters have their name and type. Inside func-

tion body we can find different expressions (i.e. statements).

4.5.1

Function calls

Scheme comes with the two possible ways that functions can

4.4.1

Define

be called. For example:

The first approach to declare function is mostly used in prac-

tice. For example:

( sum a b c ) , o r ( a p p l y sum a b

’ ( 1 2 3 ) )

( d e f i n e ( sum x y ) (+ x y ) )

The first way one is the mostly used. The second way is

explicit call of function by using command apply. The main

This is an equivalent case as definition of, for example, pro-

difference is in the way they are executed, while the mean-

cedure in Modula-2 or method in Java. Thus, this function

ing is the same. Both types of function calls are marked by

declaration is marked by universal nodes FUNCTION DECL,

the universal node FUNCTION CALL, whose direct chil-

FORMAL PARAM LIST, and PARAMETER DECL, as de-

dren nodes are NAME and ARGUMENT LIST. The node

scribed. The node NAME marks the function name, which

ARGUMENT LIST is used to mark a list of actual param-

is sum in this particular case, and TYPE remains empty.

eters, and ARGUMENT is used to mark each argument in

Parameters also have their name and type. Names are x

the list.

and y, while type is temporarily empty.

4.5.2

Branch statements

4.4.2

Define lambda

In Scheme there are many of conditional expressions: if, not,

The second approach to function definition is by using key-

and, or, cond, when, unless, and case. The if expression is

word lambda. Lambda function is treated as an anonymous

equivalent to conditional expression in Java-like languages.

function bounded to a variable. Analogy which can be used

For example, following expressions are equivalent.

when observing these variables whose type is anonymous

( i f (< x y ) \#t \# f ) , and

function are procedural types in the programming language

( c o n d i t i o n ? c o n s e q u e n t

:

a l t e r n a t i v e )

Modula-2. Example of the function defined by this approach

is:

A conditional expression is marked using the universal node

( d e f i n e sum ( lambda ( x y ) (+ x y ) ) )

BRANCH STATEMENT, condition is marked using CON-

DITION, while consequent and alternative are marked using

This can be observed as a variable whose type is the lambda

BRANCH as a direct child of the BRANCH STATEMENT.

function. Therefore, the root node is VAR DECL, with two

The conditional expressions not, and and or are marked by

children nodes NAME (sum) and TYPE with whole lambda

universal node LOGICAL OPERATOR.

29

Comparison of Agile Methods:

Scrum, Kanban, and Scrumban

Lucija Brezočnik

Črtomir Majer

Faculty of Electrical Engineering

Faculty of Electrical Engineering

and Computer Science,

and Computer Science,

University of Maribor,

University of Maribor,

Maribor, Slovenia

Maribor, Slovenia

lucija.brezocnik@um.si

crtomir.majer1@um.si



ABSTRACT

2.1 Scrum

In software development, companies are forced to introduce

Scrum [8, 9, 10] is an agile framework that comprises principles

changes in the way they manage a project’s development because

and practices that help teams deliver new products as soon as

of ever-shorter cycles and ongoing changing requirements. The

possible with continual improvements and with rapid adaptation to

changes in development projects are frequently conducted by the

changes. Scrum has three roles: Product Owner (the voice of the

introduction of agile methods, which have in recent decades sharply

customer who is responsible for the ROI and should not be

increased in popularity. But a major question remains: "Which agile

mistaken with the product manager), the Scrum Master (who

method is optimal for our company?" In order to answer this

observes the team, ensures that there are no violations in the Scrum

question, we compared the three most prevalent among them:

rules, and removes any impediments that the team may have), and

Scrum, Kanban and Scrumban.

the Team (cross-functional team that is responsible for delivering

shippable increments of product at the end of each Sprint).

Categories and Subject Descriptors

The Sprint is a fixed-length iteration and represents the basic unit

D.2.9. [Software Engineering] – Management

of development. Before each Sprint, the Sprint Planning event takes

D.2.10. [Software Engineering] - Design

place in which the Sprint Backlog is defined. All Sprints end with

a Sprint Review and a Sprint Retrospective. In the Sprint Review,

General Terms

the Team and Product Owner are involved and seek opportunities

for improvement. The Sprint Retrospective convenes a Scrum

Management, Design, Theory.

Master and tries to optimize the development process itself.

Keywords

2.2 Kanban

agile software development, agile methods, Scrum, Kanban,

Kanban is a process management method developed at Toyota and

Scrumban

builds on the experience of other agile methods. The main objective

of Kanban is the elimination of delays and waste, which has a

1. INTRODUCTION

positive effect on workflow optimization. It is based on the Just-In-

Software companies switch to agile development mostly due to the

Time technique for task scheduling, which requires the precise

desire to accelerate product delivery, enhance the ability to manage

definition and implementation of a task as late as possible in the

fast-changing priorities, to increase productivity, and to improve

workflow, to get rid of unnecessary re-planning [4, 6]. The three

software quality [12]. Interestingly, the cost of the project and

basic guidelines for the Kanban method are:

maintenance of software has no significant impact in making the



transition [5, 12]. From this, we can conclude that the biggest



Visualize workflow. That is typically done with the

problem in the traditional approach is a period of software

Kanban board, which clearly defines all the required

development and a decreased ability to manage changing priorities.

steps (board columns) of the development process. Tasks

But that is precisely what is most important for customers [3].

are prioritised and put in the board column that best

defines the current state of the task. Tasks are moved

In this paper, we focus on three agile methods – Scrum, Kanban,

between states until they get into the Done state – the goal

and Scrumban. Research [5] has shown that about half of

is to finish tasks that are already in the flow as soon as

businesses still use the waterfall model, while the other half uses

possible instead of starting new tasks [6].

agile and iterative approaches. Companies using agile methods,

according to data from the tenth annual survey VersionOne [12],



Limit Work in Progress (WIP). Each step in the process

most often opt for Scrum and Scrum + XP (70%), Scrumban (7%)

must have a WIP limit, optimized according to

and Kanban (5%). From our selection of agile methods, we

complexity, in terms of the number of workers and other

removed Extreme Programming (XP), because its principles are

parameters. The WIP limit forces us to focus on one task

often used in combination with other methods (Scrum, Kanban).

at a time instead of doing multiple things concurrently. It

2. AGILE METHODS

The main point of agile methods is the constant embrace of

changes, which is in contrast with traditional methods. Changes are

a natural part of development projects and as such should be

adequately addressed [8].

30





3.1 Board

The board is used in virtually all methods but it differs in terms of

how we use it. The Scrumban board is reset with each Sprint, which

means that all tasks are put into the ToDo column. When using

follows the “achieve more by doing less” principle,

Kanban, resets do not occur, because there are no iterations – new

which has repeatedly been proven true [4, 6].

tasks are provided in a constant flow. The Scrumban board typically



looks like a Kanban board, but we can experience some resets when



The “pull” principle. When moving tasks between stages,

finishing the current bucket and moving to the next one.

we must obey the pull rule, which states that we can only

take some new task in a certain stage if the WIP limit has

3.2 Artifacts

not already been reached. This helps us with the early

Scrum requires a clearly defined product backlog, sprint backlog

Figure 1: Board comparison in Scrum and Kanban

and burndown chart, thus requiring more effort from the team to

identification of delays and impediments the workflow,

keep artefacts up-to-date, compared to Scrumban and Kanban.

thus encouraging teamwork.

While Kanban does not demand any specific artefact, Scrumban



requires an iteration backlog and bucket plans.



Minimalize, measure and improve. Kanban maintains

existing teams, processes, roles and responsibilities of the

3.3 Iterations

team – it introduces minimal changes for its adoption. It

Scrum defines iterations (called Sprints) as part of the Scrum

establishes some control over process flow, but keeps the

lifecycle. They can last from one to three weeks. At the end of every

existing approaches that work well in place. Kanban

Sprint, we expect a totally functional product with new features or

encourages the usage of agile metrics to measure

other upgrades that are accepted by the product owner. Scrumban

performance, monitor the progress and improve

also has iterations, which are not strictly defined in terms of tasks

workflow efficiency [4, 6].

and length; however, their duration should not be longer than two

weeks since shorter iterations allow for a more rapid adaptation to

2.3 Scrumban

change. Kanban does not define iterations, as new tasks are defined

Scrumban is a composite of Scrum and Kanban methods, as it

on demand as late as possible.

contains the basic properties of Scrum and the flexibility of

Kanban. Long-term development goals, in Scrumban, are defined

3.4 Tasks

via bucket size planning. Each bucket contains a development plan,

that needs to be realised within a given time, for example: three

months for the nearest bucket. This bucket holds fine-grain

definitions of tasks, while buckets that represent long-term plans,

for example, a 1-year bucket, hold only a draft – those buckets are

deficient [7]. That is due to the Just-In-Time principle taken from

Kanban, which urges us to make fine-grained plans as late as

possible. Just like Kanban, Scrumban also limits the Work-in-

Progress and enforces the “pull” principle for moving tasks

between stages [1]. Scrumban does not require any new roles (like

Scrum), however, it encourages short daily meetings and kaizen

events that are meant for the resolution of everyday impediments

[2, 11]. Scrumban stipulates that iterations should not be longer

The time span of each task in Scrum is limited to the duration of

than two weeks, but unlike Scrum, it allows for long-running tasks

Sprint. In any case, we try to break such long tasks into smaller

which can extend across several iterations. This can lead to an

ones, so that no task is longer than one day. Kanban and Scrumban

incomplete product at the end of the iteration, which is why

do not limit the time span of tasks. Even though Scrumban has

Scrumban has introduced a Feature Freeze (FF). When the team is

iterations, it allows long-running tasks (Figure 1).

approaching the end of the current iteration, it stops working on

new features, and instead focuses on finishing those already in

3.5 Priority

With Scrum, task prioritisation is made when planning the Sprint,

while with Kanban task prioritisation is made on a daily basis with

just-in-time planning and the pull principle. Whenever a new task

process. Features that are still incomplete need to be disabled or

is pulled into the workflow, it must have the highest priority for the

removed from the final product, so the incremental release can be

team. Scrumban first prioritises work with bucket size planning,

made [2, 7, 11].

after which tasks are defined and prioritised for each iteration and,

lastly, on a daily basis as is the case with Kanban.

3. COMPARISON OF AGILE METHODS

3.6 Work estimation

Figure 2: Tasks in iterations in Scrum (left) and in

Scrum prescribes task estimation before each Sprint, while Kanban

Kanban (right)

Figure 3: Changes in work plan in Scrum (above)

Hereafter, we will compare the methods according to the 12 main

and in Kanban (below)

perspectives.

and Scrumban do not require estimation. Some teams prefer to

define tasks in such a manner that all tasks have similar

31





complexities, thus requiring approximately the same time for

completion.

3.7 Team

Scrum teams must be cross-functional, which means they are able

to provide product increment entirely on their own (from planning

to deployment). Kanban and Scrumban allow cross-functional and

specialised teams, depending on the product type and what works

best for a given scenario.

3.8 Roles

Scrum prescribes the following roles: product owner, development

team and Scrum master. The product owner is responsible for



TODO, and the Scrum Master is responsible for daily meetings and

solving the non-technical problems of the team. Kanban and

Figure 4: Graph of stress levels, depending on the work done

Scrumban do not define any special roles, so that tasks for



maintaining the agile method are divided among team members.

3.9 Changes in work plan

Scrum does not allow any changes in the work plan when Sprint is

running – that is why we make detailed plans and estimations

before Sprint, and do not make any changes (Figure 3, above).

Scrumban and Kanban provide no rules that forbid changes in the

work plan at any given time (Figure 3, below). The tasks in the



ToDo state can be easily replaced with new ones. Also, tasks that

Figure 5: Stress level through iterations

are already in process can be taken back to ToDo and more

important tasks can be pulled in.

3.12 Activities to maintain the agile method

Scrum activities to keep the method alive, consist of an up-to-date

3.10 Bug fixing

backlog, a Sprint backlog, daily meetings, a board and

There are two types of software faults, those that appear at the time

retrospective. Kanban requires visualisation of a workflow

of development (often called defects) and ones that appear after

(typically a board) and demands the respect of Work-in-Progress

software is released and running in a real-time environment, called

limits for each stage of the process. Scrumban extends Kanban

bugs. Kanban and Scrumban allow unplanned bug-fixing right

activities and adds bucket size planning, daily events (standups)

away – if fixing a bug has a higher priority than current tasks in

and iteration planning.

ToDo, then this task is put on the board. With Scrum, bug-fixing is

by-the-book planned for the next Sprint – it would be unreasonable

4. CONCLUSION

to change the current Sprint plan due to all the preparations and

In this paper, we presented the most widespread agile methods:

estimations that are done before the Sprint. In reality, we know that

Scrum, Kanban and Scrumban. Each method has its own

critical bugs must be fixed as soon as possible, so Scrum teams take

advantages and disadvantages, but it is necessary to bear in mind

different approaches to tackling this problem. Some teams define

that none of them will benefit businesses, if not used in the right

one day of a week (or part of a day) as a “bug fixing day,” other

way. It is, therefore, important to choose the one agile method that

teams reduce the number of story points for Sprint, so that there is

best meets the requirements and wishes of the company.

still some time left for unexpected things, like fixing bugs.

Scrum certainly works best in mature companies that have

3.11 Stress

experienced teams who have been working on the product or

Research show that stress is highly correlated to the amount of work

project for more than one year. For companies with continuous

that a team member is responsible for. The ideal workload per

production that need a rapid response to changes and product teams

person would be evenly distributed to their optimal level. A person

that are working in support and maintenance of the product, we

must not feel too much of a burden on their shoulders, thus leading

recommend using Kanban. Scrumban is best for young, small

to exhaustion, nor too free, which leads to poor progress (Figure 4).

companies since it boasts the flexibility of Kanban and the basic

Team members must see a constant improvement in the product,

characteristics of Scrum.

which keeps them motivated and dedicated. With Kanban, we can

Agile methods definitely include a strong component of flexibility.

achieve a mostly evenly distributed workload because there are no

Teams could, regardless of the method chosen, adapt it in a way

iterations, and thus tasks are continuously added to the workflow.

that would serve their purpose – i.e. an effective work organisation

Sprints in Scrum are time-limited (typically from 2 to 4 weeks), so

and the development of quality products.

there is often more work done at the end of the Sprint than in the

beginning (Figure 5). Scrumban is somewhere in the middle of

those two, because it allows long-running tasks, so team members

5. REFERENCES

are not so stressed if some tasks are not completed. For highly

[1] Baleviciute, G. 2014. Whitepaper – Scrum vs Kanban vs.

motivated and self-initiative teams, Kanban can be a good fit.

Scrumban. Retrieved September 10, 2016, from

Teams that do not have such properties need time limits in which

http://goo.gl/dkrbGE.

some progress is expected, thus Scrum and Scrumban provide a

[2] Bieliūnas, E. 2014. Scrum-ban for Project Management.

better match.

Retrieved September 10, 2016, from http://goo.gl/JgfaaA.

32

[3] Bittner K., Lo Giudice D., DeMartine A., Mines C.,

[8] Pichler, R. 2010. Agile Product Management with Scrum:

Hammond J. S., Turrisi T., and Izzi M. 2016. Forrester

Creating Products That Customers Love. Addison Wesley.

Research – Boost Application Delivery Speed And Quality

[9] Swisher, W. P. 2014. Implementing Scrumban. Retrieved

With Agile DevOps Practices.

September 10, 2016, from

[4] Brechner, E. 2015. Agile Project Management with Kanban,

https://switchingtoscrum.files.wordpress.com/2013/12/imple

Microsoft Press.

menting-scrumban_v1-32.pdf.

[5] Gartner. 2015. Holz B. presentation “Agile in the

[10] VersionOne. 2016. VersionOne 10th Annual State of Agile

Enterprise”.

Report.

[6] Klipp P. 2014. Getting Started with Kanban, Amazon Digital

[11] Sutherland J., and Schwaber, K. 2013. The Scrum Guide.

Services LLC.

Retrieved September 10, 2016, from

http://www.scrumguides.org/docs/scrumguide/v1/scrum-

[7] Misevičiūtė, D. 2014. Scrumban: on demand vs. long-term

planning. Retrieved September 10, 2016, from

guide-us.pdf.

http://www.eylean.com/blog/2014/11/scrumban-on-demand-

[12] Sutherland J. 2010. Scrum Handbook, The Scrum Training

vs-long-term-planning/.

Institute.



33

4.5.3

Loop statement

mapping of this paradigm to eCST. In case that we find some

Scheme has a loop expression do which can be compared to

new construct that belongs to already supported paradigm,

for statement in programming language Java.

we can apply the same procedure to meet the consistency of

mapping.

( do ( ( v ( make−v e c t o r

5 ) ) ( i 0 (+ i

1 ) ) )

((= i

5 ) v ) ( v e c t o r −s e t ! v i

i ) )

Furthermore, this method provides SSQSA with consistency

among languages and paradigms. Namely, when we are in-

Loop statements are marked by using the universal node

tegrating a multi-paradigm language, we are determining

LOOP STATEMENT. However, characteristic approach for

paradigms included in that language, recognise which con-

dealing with repetitions in functional languages is recursion.

struct belongs to which paradigm, and map each paradigm

In eCST recursive functions are marked as regular func-

separately according to the defined method. This is applied

tions (Section 4.4.3, while semantic transformations are are

to each new language and each new paradigm.

planed for future work.

There are still some open questions to be addressed in future

work. They are related to more general issues. One of these

4.5.4

Continuations

question is: How to map implicitly defined types is dynam-

First class continuations of computer programs are construc-

ically typed languages? Next question is related to similari-

tions that are representing program state which can be saved

ties and differences between iterations and recursions. This

as value of variable, to be used at a later point in the pro-

topic especially rises with control-flow analysis where two

gram.

Programming language Scheme implements these

kinds of repetitions should be consistently analyses. Nev-

first class continuations by an operator call-with-current-

ertheless, these are not problems related only to functional

continuation. When translating Scheme into eCST the con-

languages, while all aspects of these issues are consistently

tinuation call is marked using JUMP STATEMENT, since

mapped to eCST among integrated languages. Therefore,

continuations are changing the control flow of program. An

they are not subject of this paper, but will be subject of

operator call-with-current-continuation is marked by using

improvements of SSQSA platform. The future work directly

OPERATOR.

related to integration of Scheme and functional paradigm

into SSQSA is testing the analysers over new datasets that

5.

RELATED WORK

will contain code written in functional languages.

Before definition of this method some languages were mapped

to eCST and integrated into SSQSA [6]. These were mainly

7.

REFERENCES

imperative languages, and mapping among their constructs

[1] H. P. Barendregt and E. Barendsen. Introduction to

was more logical. However, Erlang as a functional language

lambda calculus. Nieuw archief voor wisenkunde,

was integrated up to prototype level [7], while some issues

4(2):337–372, 1984.

remained unsolved.

[2] I. Bozó, D. Horpácsi, Z. Horváth, R. Kitlei, J. K˝

oszegi,

M. Tejfel, and M. Tóth. RefactorErl – Source Code

Authors of [3] tried to cross the gap between imperative

Analysis and Refactoring in Erlang. In Proc. of the 12th

and functional programming by refactoring. They were mo-

Symposium on Programming Languages and Software

tivated by integration of functional paradigm in Java pro-

Tools, pages 138–148, Tallin, Estonia, October 2011.

gramming language, and the goal was to provide Java de-

[3] A. Gyori, L. Franklin, D. Dig, and J. Lahoda. Crossing

velopers with refactoring techniques that will lead them to

the gap from imperative to functional programming

functional code. Basically, this is a kind of mapping be-

through refactoring. In Proc. of the 2013 9th Joint

tween two paradigms and can be useful in our research for

Meeting on Foundations of Software Engineering, pages

a comparison of approaches. However, they provide only

543–553. ACM, 2013.

two refactoring methods, focused on two new Java features,

[4] R. Lincke, J. Lundberg, and W. Löwe. Comparing

while other constructs are not covered.

software metrics tools. In Proc. of the International

Symposium on Software Testing and Analysis, ISSTA

6.

CONCLUSIONS AND FUTURE WORK

’08, pages 131–142, Seattle, WA, USA, 2008. ACM,

In this paper we describe a method for mapping constructs,

New York, NY, USA.

that belong to a new paradigm, to the eCST in SSQSA plat-

[5] J. Novak and G. Rakić. Comparison of software metrics

form. We illustrate it by introducing functional paradigm

tools for: net. In Proc. of 13th International

and Scheme as a clean functional language. In that way we

Multiconference Information Society (IS’10), pages

provide a double contribution of this paper: (1) determined

231–234, Ljubljana, Slovenia, 2010.

rules for mapping functional language to eCST to be fol-

[6] G. Rakić. Extendable and adaptable framework for

lowed while integrating language which includes functional

input language independent static analysis, 2015.

paradigm, and (2) determined a method to be applied while

[7] M. Tóth, A. Páter-Részeg, and G. Rakić. Introducing

introducing support for any new paradigm.

The method

support for erlang into ssqsa framework. In Proc. Of

recommends first to choose a language which is clean rep-

The International Conference On Numerical Analysis

resent of the paradigm to be integrated. Afterward, we are

And Applied Mathematics 2014 (ICNAAM-2014),

passing through all paradigm-specific constructs, analysing

volume 1648, page 310012. AIP Publishing, 2015.

them and comparing with similar constructs from other, al-

ready supported paradigms, determining equivalent ones,

and specifying concrete mapping. Finally, the mapping de-

fined on a clean language is to be applied whenever we need

34

Introduction to Case Management Model and Notation

Mateja Kocbek

Gregor Polančič

Faculty of Electrical Engineering and Computer Science, Faculty of Electrical Engineering and Computer Science, University of Maribor

University of Maribor

Maribor, Slovenia

Maribor, Slovenia

mateja.kocbek@um.si

gregor.polancic@um.si

ABSTRACT

case management. Case management is usually driven by a team of

A case is presented as a proceeding that involves actions taken

case/knowledge workers, who make decisions or perform certain

regarding a subject in a particular situation to achieve a desired

tasks [5].

outcome. Cases are used in many areas of human operations. The

One of the most important characteristics of case management is

most common example of a case is from medicine, where every

planning. Every case requires a high degree of flexibility, which is

patient represents its own case. Every case requires its own

essential for the success of human activities. Flexibility is needed

operations and functions whereas sometimes humans, who are

with selection of tasks for a case, run-time ordering of the sequence

involved, can use their knowledge from previous cases. This article

in which the tasks are executed, and ad-hoc collaboration with other

presents a new standard, called CMMN (Case Management Model

knowledge workers on the tasks [5]. Case or knowledge workers

and Notation), which has recently been published by OMG. It

are those, who have to determine which tasks are applicable, or

covers the whole process of case management. The presentation of

which follow-up tasks are required to perform [5]. Decisions may

standard CMMN includes abstract and concrete syntax as well the

be triggered by events or new facts that continuously emerge during

semantics and diagram interchange specifications.

the course of the case, i.e. the receipt of a new document,

completion of certain tasks, or achievement of certain milestones

Categories and Subject Descriptors

[5].

I.6.5 [Simulation and modelling]: Model Development -

In 2014, Case Management Model and Notation (or CMMN) was

Modelling methodologies.

introduced, by OMG (Object Management Group), as a standard

for case management [4]. This article focuses on CMMN, with the

General Terms

following structure. Chapter 2 gives an overview of CMMN. In

Management, Documentation, Performance, Standardization,

chapter 3, the actual use of standard is presented. We will conclude

Design, Languages, Theory.

our article with discussion and conclusion.

Keywords

2. RATIONALE FOR INTRODUCING

CMMN, Case Management Model and Notation, Case

CMMN

Management, BPMN.

CMMN is in general a graphical representation for expressing a

case [10]. It provides an efficient notation for capturing less

1. INTRODUCTION

repeatable, dynamic, information-rich contexts. CMMN was

In everyday life, many different cases can be found. A case is a very

introduced to document the ad-hoc scenarios faced by knowledge

common term and can represent a variety of different things or

workers in which they need to respond to a continuous flow of

concepts. Its common definition is “a particular situation or

business events, data and documents. The CMMN specification

example of something” [11], whereas in CMMN specification [5],

defines abstract elements, notation, execution semantics and

a case is presented as “a proceeding that involves actions taken

exchange formats [5]. A consortium of 11 companies contributed

regarding a subject in a particular situation to achieve a desired

to the development of CMMN, which is being maintained by the

outcome” .

OMG. Version 1.0 of CMMN was released in May 2014 [5],

currently CMMN version is 1.1 – Beta [4].

An illustrative example of listed definitions of cases, can be found

in medicine, where a case involves the care of a patient, together

2.1 CMMN versus BPMN

with his/her medical history as well as current situation. Other

The focal rationale for CMMN introduction was a need for more

examples of cases are: a law case, social security case, employment

flexibility for knowledge workers when modelling business

case, etc. A project-related case definition states that “a case is a

processes. Flexibility is needed, because some tasks can be done

project, transaction, service or response that has different states

independently of the time and the sequence of tasks is not

(for example: opened, doing, closed) over a period of time to

important. So, workers can decide which work to do and what order

achieve resolution of a problem, claim, request, proposal,

is the best in a particular case. This is the main difference compared

development or other complex activity” [13].

to the well accepted business process standard - BPMN. Within

A case always contains some kind of a subject which may be a

BPMN models, an exact order of activities is defined (i.e. structured

person, a legal action, a business transaction, or some other focal

process), e.g. activity A has to finish before activity B starts.

point around which actions are taken to achieve an objective [5].

However, the exact order is not always the best way to solve

Besides, resolving a specific case usually requires a lot of

specific instances or cases. A good example is a health case, where

information [5], whereas new cases, with no previous experience

knowledge workers (i.e. medical stuff, administration, etc.) do not

of involved individuals, can be resolved intuitively [5].

know precisely in which direction the specific case will evolve.

Another illustrative example is also exception handling, where

As mentioned above, resolving a case includes information,

flexibility is welcome. But it is also reasonable to stress that to some

actions, human resources, knowledge, etc., which can be united in

level, processes have to be defined. For example, a nurse has to

35





know exactly which steps need to be taken, when a patient comes

case (instance). A Planning Table defines the scope of planning:

to a hospital.

Collapsed Planning Table (discretionary elements are not visible)

and, Expanded Planning Table (discretionary elements are visible).

Above we discussed the differences between CMMN and BPMN.

BPMN is well known, used and accepted standard, but CMMN can

fill out the existing weaknesses of BPMN. Currently, CMMN and

BPMN are used separately [12].

2.2 CMMN structure

Beside a modelling notation, CMMN defines a meta-model, a



XML–based model for Interchange (XMI) and XML-Schema for

Figure 2: Elements

exchanging Case models among different environments and tools

CMMN defines the following Plan Model Elements: Stage –

[5].

considered as episodes of a Case (shown in Figure 3), Task – atomic The meta-model can be used by case management definition tools

unit of work during a case (also shown in Figure 3), Event Listener to define functions and features that a business analyst might use

– something that happens during the course of a case (shown in

when defining a case model for a particular type of case. The

Figure 2), Milestone – an achievable target defined to enable notation is intended to express the model graphically [5].

evaluation of progress of the case (also shown in Figure 2).

This specification enables portability of case models, so that users

can take a model defined in one CMMN implementation and use it

in another one. The CMMN XMI and/or XML-Schema are

intended for importing and exporting case models among different

CMMN implementers [5].

A case model is intended to be used by a run-time case management

product to guide and assist a knowledge worker in the handling of

a particular instance of a case, for example a particular invoice



discrepancy. The meta-model and notation are used to express a

Figure 3: Element Stage, Task and Discretionary Tasks

case model in a common notation for a particular type of case, and



the resulting model can subsequently be instantiated for the

handling of a particular instance of a case [5].

In CMMN, an event is something that “happens” during the course

of a case. Event may trigger the enabling, activation and

2.3 CMMN Notation

termination of Stages and Tasks, or the achievement of Milestones.

The outermost element that defines a case, is Case Plan Model

Standard events are: Case File Items lifecycle transitions, and

(Figure 1). The various elements of a Case Plan Model are depicted

Stages, Tasks and Milestones lifecycle transitions. In CMMN there

within the boundary of the Case Plan Model shape. The Case Plan

are also Event Listeners, that are used to influence the proceeding

Model comprises: all elements that represent the initial plan of the

of the Case directly, instead of indirectly via impacting information

case and, all elements that support the further evolution of the plan

in the Case File. There are also two special Event Listeners: Timer

through run-time planning by case workers.

Event Listener, which is used to catch predefined elapses of time,

and User Event Listener enables direct interaction of a user with the

case.





Figure 4: Tasks

Figure 1: Case Plan Model

CMMN also defines a variety of Tasks (Figure 4): Human Task – a All information, or references to information, that is required as

non-blocking task, that is not waiting for the work to complete, but

context for managing a Case, is defined by exactly one Case File.

it completes immediately upon installation, Decision Task – a

A Case File is meant as a logical model. It does not imply any

blocking task, that is waiting until the work associated with the

assumptions about physical storage of information. A Case File

Task is completed, Process Task – can be used in the case to initiate

contains Case File Items (Figure 2) that can be anything from a a business process, and Case Task – can be used to initiate another

folder or document stored, an entire folder hierarchy referring or

case.

containing other Case File Items.

A Sentry “watches out” for important situations to occur, which

Case management planning is typically concerned with

influence the further proceedings in a case. A Sentry is a

determination of which tasks are applicable, or which follow-up

combination of an Event and/or Condition. A Sentry can be used as

tasks are required. Case workers execute the plan, particularly by

an entry criterion or as an exit criterion and may consist of two

performing tasks as planned and adding Discretionary Tasks

parts: an On-Part specifies the event that serves as trigger, and an

(Figure 3) to the plan of a case instance.

If-Part specifies a condition, as Expression that evaluates over the

In CMMN planning is a run-time effort. Users (i.e. case workers)

Case File [1,5,7] (Figure 5).

are said to “plan” (in run-time), when they select Discretionary

Items from a Planning Table, and move them into the plan of the

36





collection of data about the case is often described as a Case File.

Case workers use structured and unstructured data when decision-

making [3].

Cases are directed not just by explicit knowledge about the

particular Case and its context represented in the Case File, but also



by explicit knowledge encoded as rules by business analysts, the

Figure 5: Task with Sentries

tacit knowledge of human participants, and tacit knowledge from

Besides, various Decorators can be added to CMMN shapes. Table

the organization or community in which participants are members

1 presents Decorators (Planning Table, Entry Criterion, Exit

[3].

Criterion, Auto Complete, Manual Activation, Required,

3. CURRENT CMMN ACCEPTANCE

Repetition) applicability to CMMN shapes (Case Plan Model,

The use of standard CMMN is not widespread. It was designed to

Stage, Task, Milestone, Event Listener, Case File Item, Plan

be used when planning activities, that do not require an exact order.

Fragment). Symbol “+” means that a certain shape accepts

Every group of tasks has to be performed, but the time and sequence

associated Decorator [5].

are not important. In the following paragraphs, some aspects of use

Table 1: Decorators Applicability Summary Table [5]

of standard CMMN are discussed.



n

Table 2 is showing Operating Models used in companies. Models





for Coordination, Diversification, Unification and Replication have

le

n



atio

n

lete

v

its own degree of Process Integration and Process Standardization

erio

p

ti



Tab



n

[6] [1]. Table shows that CMMN has low degree of Process

g

terio

mo

Ac

o

Standardization for Coordination and Diversification.

in

al

ti

n

Crit

Cri

C

u

iredu

eti

try

it

to

q

p

Table 2: Operating models

lan

an

P

En

Ex

Au

M

Re

Re

Process

High

Coordination

Unification

Case Plan Model

+



+

+





Integration

Low

Diversification

Replication

Stage

+

+

+

+

+

+

+





Low

High

Task

+*

+

+



+

+

+





CMMN

BPMN

Milestone



+





+

+





Process Standardization

Event Listener





Case File Item





According to the fact that CMMN was introduced in 2014 and also

that version 1.1 is in its Beta phase, it makes sense that there is not

Plan Fragment





a great number of tools that support standard CMMN. At the time

*Human Task only.

of the survey we detected only two adequate tools. The first tool for

A case can be considered in ad-hoc manner, which is some kind of

modelling with standard CMMN is Camunda [2], an open source

equivalent to ad-hoc processes in BPMN, because there is no

platform for Business Process Management. It is suitable for

specific order or sequence of the completion of the tasks. It is also

development and provides business-IT-alignment based on BPMN

permitted to perform tasks in any frequency [3]. Usually all ad-hoc

for structured workflows, CMMN for less structured Cases and

activities are conducted by human resources, who determinate the

DMN for business rules [2]. The other tool for standard CMMN is

sequence, time and frequency of the performance of each activity

CMMN Modeler (Trisotech) [9]. It is a payable tool.

in ad-hoc process [3].

CMMN was primary designed for business analysts, which are the

anticipated users of Case management tools for capturing and

formalizing repeatable patterns of common Tasks, Event Listeners,

and Milestones into a Case model [5].

3.1 Illustrative Example

In this section, a simple example is presented, in which we collected

few common elements of the CMMN, which were also introduced

in previous chapter. The example briefly defines a process of



writing a document, with its basic components.

Figure 6: Phases of a Case

Besides, a case may be in one of the two phases: design-time and

run-time (Figure 6). During the design-time phase, business analysts engage in modelling, which includes defining: (1) tasks

that are always part of pre-defined segments in a case model, and

(2) “discretionary” tasks that are available to the case worker, to be

applied in addition, to his/her discretion. In the run-time phase, case

workers execute the plan, particularly by (1) performing tasks as

planned, and (2) adding discretionary tasks to the case plan instance

in run-time [3].

As we already mentioned, a very important part in case

management is reference to data about the subject of the case. The

37



Figure 7 represent a CMMN model that encompass the whole

CMMN is in its beginnings, but it has a great potential for at least

process of writing a document. First, in the model, we have two

to be used in combination with BPMN. Our intentions for future

tasks: “Find research topic” and “Create template & graphics”.

research are to perform a survey, to recognize the actual acceptance

Figure 7: Model of CMMN [8]

Initially, any of those tasks can be performed. The next, more

and potential use of CMMN.

extensive element is a Stage, with the name “Prepare draft”. It

contains four tasks. The task “Organize references”, is a Tasks with

5. REFERENCES

entry criterion (see the symbol in Figure 5). It is mandatory that

[1] Gagne, D. Case Management Model and Notation (CMMN):

Tasks, related to this Sentry, perform earlier. The next task “Write

An Introduction. 2016. https://prezi.com/yu3lbxamg09v/case-

Text”, is special because it contains exclamation mark at the bottom

management-model-and-notation-cmmn-an-introduction/.

of the shape, which means that the performance of this tasks is

required. The same symbol (exclamation mark) is positioned on the

[2] GmbH, C.S. Camunda Tool. 2016. https://camunda.org.

level of Stage “Prepare draft”. The task “Prepare table of content”

[3] Hinkelmann, K. Case Management Model and Notation -

is a Human Task, marked with a small human symbol in the left

CMMN. 2014. http://knut.hinkelmann.ch/lectures/bpm2013-

upper corner of the shape. The last task in this Stage is “Implement

14/06_CMMN.pdf.

template & graphics”. It also has a Criterion and it is a

[4] OMG.

OMG

CMMN.

2014.

Discretionary Tasks, which is symbolized with dotted line.

http://www.omg.org/spec/CMMN/.

Later on we can see task “Seek comments” and also Stage “Review

draft” with two

[5] OMG (Object Management Group). Case Management Model

constituting tasks. The speciality of this part is an

and Notation 1.0. May (2014), 82.

exit Criterion (see the symbol in Figure 5). Both used Stages

“Prepare draft” and “Review draft” are later on connected to

[6] Ross, J.W., Weill, P., and Robertson, D.C. Enterprise

element Milestones with entry Criterion. Two additional elements

Architecture as Strategy: Creating a Foundation for Business

are also used, namely: Event Listener (Timer) and Case File Item.

Execution. 2006.

The first one defines a deadline for completing a document, and the

[7] Rucker, B. Camunda BPM 7.2: CMMN Case Management

second one contains an actual document. The last important

(English). 2015.

concept, we need to highlight is the Case Model. It is symbolized

with a folder and covers the whole described process (also shown

[8] Torsten

Winterberg.

Oracle

-

CMMN.

in Figure 1). Case model “Write document” includes three exit https://blogs.oracle.com/soacommunity/entry/case_managem

Criterions.

ent_model_and_notation.

[9] Trisotech.

Trisotech

-

CMMN

Modeler.

4. DISCUSSION

http://www.trisotech.com/cmmn-modeler.

In our article, we presented a novel standard for Case Management,

[10] Wikipedia.

Wikipedia

-

CMMN.

- CMMN, which also includes a notation for modelling business

https://en.wikipedia.org/wiki/CMMN.

processes and graphically expressing a Case. CMMN has some

similarities with well-known and accepted standard BPMN. There

[11] Cambridge

Dictionary.

are some similar elements, like Tasks, Events, Sub process, etc., but

https://dictionary.cambridge.org/dictionary/english/case.

there is also very important difference between CMMN and

[12] BPMN

and

CMMN

Compared.

2014.

BPMN. BPMN requires accurate knowledge of a business process

http://brsilver.com/bpmn-cmmn-compared/.

that is intended to be used when modelling. There is actually no

space for flexible execution of business processes. On the opposite,

[13] AIIM

-

What

is

Case

Management?

2016.

CMMN offers flexibility, which is very welcome (or also required)

http://www.aiim.org/What-is-Case-Management.

in many business process cases. As we already mentioned, the



38





Indeks avtorjev / Author index



Akbulut Akhan ............................................................................................................................................................................. 19

Brezočnik Lucija .......................................................................................................................................................................... 31

Budimac Zoran ............................................................................................................................................................................. 27

Çatal Çağatay ............................................................................................................................................................................... 19

Drevenšek Aleks .......................................................................................................................................................................... 23

Heričko Matija ............................................................................................................................................................................. 11

Karakatič Sašo .............................................................................................................................................................................. 19

Kocbek Mateja ............................................................................................................................................................................. 35

Kolek Jozef ................................................................................................................................................................................... 27

Krishnamurthy Prashant ............................................................................................................................................................... 11

Majer Črtomir ............................................................................................................................................................................... 31

Marko Hölbl ........................................................................................................................................................................... 11, 23

Palanisamy Balaji ......................................................................................................................................................................... 11

Pavlinek Miha .............................................................................................................................................................................. 19

Podgorelec Vili ............................................................................................................................................................................. 19

Polančič Gregor ............................................................................................................................................................................ 35

Rakić Gordana .............................................................................................................................................................................. 27

Sagadin Klemen ........................................................................................................................................................................... 15

Šumak Boštjan ............................................................................................................................................................................. 15

Tatjana Welzer ............................................................................................................................................................................. 11

Verber Domen ................................................................................................................................................................................ 7

Zadorozhny Vladimir I. ................................................................................................................................................................ 11





39





Konferenca / Conference

Uredili / Edited by

Sodelovanje, programska oprema in storitve v

informacijski družbi /

Collaboration, Software and Services in

Information Society

Marjan Heričko





Document Outline


Blank Page

Blank Page

Blank Page

I - Preogramski odbor.pdf Blank Page





Blank Page