Zbornik 20. mednarodne multikonference

INFORMACIJSKA DRUŽBA - IS 2017

Zvezek G

Proceedings of the 20th International Multiconference

INFORMATION SOCIETY - IS 2017

Volume G

Sodelovanje, programska oprema in storitve

v informacijski družbi

Collaboration, Software and Services

in Information Society

Uredil / Edited by

Marjan Heričko

http://is.ijs.si

9.–13. oktober 2017 / 9–13 October 2017

Ljubljana, Slovenia





Zbornik 20. mednarodne multikonference

INFORMACIJSKA DRUŽBA – IS 2017

Zvezek G





Proceedings of the 20th International Multiconference

INFORMATION SOCIETY – IS 2017

Volume G





Sodelovanje, programska oprema in storitve v

informacijski družbi

Collaboration, Software and Services in Information

Society





Uredil / Edited by



Marjan Heričko





http://is.ijs.si





9. - 13. oktober 2017 / 9th – 13th October 2017

Ljubljana, Slovenia





Urednik:





Marjan Heričko

University of Maribor

Faculty of Electrical Engineering and Computer Science





Založnik: Institut »Jožef Stefan«, Ljubljana

Priprava zbornika: Mitja Lasič, Vesna Lasič, Lana Zemljak

Oblikovanje naslovnice: Vesna Lasič





Dostop do e-publikacije:

http://library.ijs.si/Stacks/Proceedings/InformationSociety





Ljubljana, oktober 2017





Kataložni zapis o publikaciji (CIP) pripravili v Narodni in univerzitetni

knjižnici v Ljubljani

COBISS.SI-ID=292477440

ISBN 978-961-264-118-4 (pdf)





PREDGOVOR MULTIKONFERENCI

INFORMACIJSKA DRUŽBA 2017



Multikonferenca Informacijska družba (http://is.ijs.si) je z dvajseto zaporedno prireditvijo osrednji srednjeevropski dogodek na področju informacijske družbe, računalništva in informatike. Letošnja prireditev je ponovno na več lokacijah, osrednji dogodki pa so na Institutu »Jožef Stefan«.



Informacijska družba, znanje in umetna inteligenca so spet na razpotju tako same zase kot glede vpliva na človeški razvoj. Se bo eksponentna rast elektronike po Moorovem zakonu nadaljevala ali stagnirala? Bo umetna inteligenca nadaljevala svoj neverjetni razvoj in premagovala ljudi na čedalje več področjih in s tem omogočila razcvet civilizacije, ali pa bo eksponentna rast prebivalstva zlasti v Afriki povzročila zadušitev rasti? Čedalje več pokazateljev kaže v oba ekstrema – da prehajamo v naslednje civilizacijsko obdobje, hkrati pa so planetarni konflikti sodobne družbe čedalje težje obvladljivi.



Letos smo v multikonferenco povezali dvanajst odličnih neodvisnih konferenc. Predstavljenih bo okoli 200

predstavitev, povzetkov in referatov v okviru samostojnih konferenc in delavnic. Prireditev bodo spremljale okrogle mize in razprave ter posebni dogodki, kot je svečana podelitev nagrad. Izbrani prispevki bodo izšli tudi v posebni številki revije Informatica, ki se ponaša s 40-letno tradicijo odlične znanstvene revije. Odlične obletnice!



Multikonferenco Informacijska družba 2017 sestavljajo naslednje samostojne konference:

 Slovenska konferenca o umetni inteligenci

 Soočanje z demografskimi izzivi

 Kognitivna znanost

 Sodelovanje, programska oprema in storitve v informacijski družbi

 Izkopavanje znanja in podatkovna skladišča

 Vzgoja in izobraževanje v informacijski družbi

 Četrta študentska računalniška konferenca

 Delavnica »EM-zdravje«

 Peta mednarodna konferenca kognitonike

 Mednarodna konferenca za prenos tehnologij - ITTC

 Delavnica »AS-IT-IC«

 Robotika

Soorganizatorji in podporniki konference so različne raziskovalne institucije in združenja, med njimi tudi ACM

Slovenija, SLAIS, DKZ in druga slovenska nacionalna akademija, Inženirska akademija Slovenije (IAS). V imenu organizatorjev konference se zahvaljujemo združenjem in inštitucijam, še posebej pa udeležencem za njihove dragocene prispevke in priložnost, da z nami delijo svoje izkušnje o informacijski družbi. Zahvaljujemo se tudi recenzentom za njihovo pomoč pri recenziranju.



V 2017 bomo petič podelili nagrado za življenjske dosežke v čast Donalda Michija in Alana Turinga. Nagrado Michie-Turing za izjemen življenjski prispevek k razvoju in promociji informacijske družbe bo prejel prof. dr.

Marjan Krisper. Priznanje za dosežek leta bo pripadlo prof. dr. Andreju Brodniku. Že šestič podeljujemo nagradi

»informacijska limona« in »informacijska jagoda« za najbolj (ne)uspešne poteze v zvezi z informacijsko družbo.

Limono je dobilo padanje slovenskih sredstev za akademsko znanost, tako da smo sedaj tretji najslabši po tem kriteriju v Evropi, jagodo pa »e-recept«. Čestitke nagrajencem!



Bojan Orel, predsednik programskega odbora

Matjaž Gams, predsednik organizacijskega odbora



i

FOREWORD - INFORMATION SOCIETY 2017



In its 20th year, the Information Society Multiconference (http://is.ijs.si) remains one of the leading conferences in Central Europe devoted to information society, computer science and informatics. In 2017 it is organized at various locations, with the main events at the Jožef Stefan Institute.



The pace of progress of information society, knowledge and artificial intelligence is speeding up, and it seems we are again at a turning point. Will the progress of electronics continue according to the Moore’s law or will it start stagnating? Will AI continue to outperform humans at more and more activities and in this way enable the predicted unseen human progress, or will the growth of human population in particular in Africa cause global decline? Both extremes seem more and more likely – fantastic human progress and planetary decline caused by humans destroying our environment and each other.



The Multiconference is running in parallel sessions with 200 presentations of scientific papers at twelve conferences, round tables, workshops and award ceremonies. Selected papers will be published in the Informatica journal, which has 40 years of tradition of excellent research publication. These are remarkable achievements.



The Information Society 2017 Multiconference consists of the following conferences:

 Slovenian Conference on Artificial Intelligence

 Facing Demographic Challenges

 Cognitive Science

 Collaboration, Software and Services in Information Society

 Data Mining and Data Warehouses

 Education in Information Society

 4th Student Computer Science Research Conference

 Workshop Electronic and Mobile Health

 5th International Conference on Cognitonics

 International Conference of Transfer of Technologies - ITTC

 Workshop »AC-IT-IC«

 Robotics



The Multiconference is co-organized and supported by several major research institutions and societies, among them ACM Slovenia, i.e. the Slovenian chapter of the ACM, SLAIS, DKZ and the second national engineering academy, the Slovenian Engineering Academy. In the name of the conference organizers we thank all the societies and institutions, and particularly all the participants for their valuable contribution and their interest in this event, and the reviewers for their thorough reviews.



For the fifth year, the award for life-long outstanding contributions will be delivered in memory of Donald Michie and Alan Turing. The Michie-Turing award will be given to Prof. Marjan Krisper for his life-long outstanding contribution to the development and promotion of information society in our country. In addition, an award for current achievements will be given to Prof. Andrej Brodnik. The information lemon goes to national funding of the academic science, which degrades Slovenia to the third worst position in Europe. The information strawberry is awarded for the medical e-recipe project. Congratulations!



Bojan Orel, Programme Committee Chair

Matjaž Gams, Organizing Committee Chair





ii

KONFERENČNI ODBORI

CONFERENCE COMMITTEES



International Programme Committee

Organizing Committee

Vladimir Bajic, South Africa

Matjaž Gams, chair

Heiner Benking, Germany

Mitja Luštrek

Se Woo Cheon, South Korea

Lana Zemljak

Howie Firth, UK

Vesna Koricki

Olga Fomichova, Russia

Mitja Lasič

Vladimir Fomichov, Russia

Robert Blatnik

Vesna Hljuz Dobric, Croatia

Aleš Tavčar

Alfred Inselberg, Israel

Blaž Mahnič

Jay Liebowitz, USA

Jure Šorn

Huan Liu, Singapore

Mario Konecki

Henz Martin, Germany



Marcin Paprzycki, USA



Karl Pribram, USA

Claude Sammut, Australia

Jiri Wiedermann, Czech Republic

Xindong Wu, USA

Yiming Ye, USA

Ning Zhong, USA

Wray Buntine, Australia

Bezalel Gavish, USA

Gal A. Kaminka, Israel

Mike Bain, Australia

Michela Milano, Italy

Derong Liu, Chicago, USA

Toby Walsh, Australia





Programme Committee

Bojan Orel, chair

Mitja Luštrek

Niko Schlamberger

Franc Solina, co-chair

Marko Grobelnik

Stanko Strmčnik

Viljan Mahnič, co-chair

Nikola Guid

Jurij Šilc

Cene Bavec, co-chair

Marjan Heričko

Jurij Tasič

Tomaž Kalin, co-chair

Borka Jerman Blažič Džonova

Denis Trček

Jozsef Györkös, co-chair

Gorazd Kandus

Andrej Ule

Tadej Bajd

Urban Kordeš

Tanja Urbančič

Jaroslav Berce

Marjan Krisper

Boštjan Vilfan

Mojca Bernik

Andrej Kuščer

Baldomir Zajc

Marko Bohanec

Jadran Lenarčič

Blaž Zupan

Ivan Bratko

Borut Likar

Boris Žemva

Andrej Brodnik

Janez Malačič

Leon Žlajpah

Dušan Caf

Olga Markič

Saša Divjak

Dunja Mladenič

Tomaž Erjavec

Franc Novak

Bogdan Filipič

Vladislav Rajkovič

Andrej Gams

Grega Repovš

Matjaž Gams

Ivan Rozman





iii

Invited lecture



AN UPDATE FROM THE AI & MUSIC FRONT





Gerhard Widmer

Institute for Computational Perception

Johannes Kepler University Linz (JKU), and

Austrian Research Institute for Artificial Intelligence (OFAI), Vienna





Abstract

Much of current research in Artificial Intelligence and Music, and particularly in the field of Music Information Retrieval (MIR), focuses on algorithms that interpret musical signals and recognize musically relevant objects and patterns at various levels -- from notes to beats and rhythm, to melodic and harmonic patterns and higher-level segment structure --, with the goal of supporting novel applications in the digital music world. This presentation will give the audience a glimpse of what musically "intelligent" systems can currently do with music, and what this is good for. However, we will also find that while some of these capabilities are quite impressive, they are still far from (and do not require) a deeper

"understanding" of music. An ongoing project will be presented that aims to take AI & music research a bit closer to the "essence" of music, going beyond surface features and focusing on the expressive aspects of music, and how these are communicated in music. This raises a number of new research challenges for the field of AI and Music (discussed in much more detail in [Widmer, 2016]). As a first step, we will look at recent work on computational models of expressive music performance, and will show some examples

of the state of the art (including the result of a recent musical 'Turing test').



References

Widmer, G. (2016).

Getting Closer to the Essence of Music: The Con Espressione Manifesto.

ACM Transactions on Intelligent Systems and Technology 8(2), Article 19.





iv





KAZALO / TABLE OF CONTENTS



Sodelovanje, programska oprema in storitve v ....................................................................................................... 1

informacijski družbi / Collaboration, Software and Services in Information ........................................................ 1

Society ......................................................................................................................................................................... 1

PREDGOVOR / FOREWORD ................................................................................................................................. 3

PROGRAMSKI ODBORI / PROGRAMME COMMITTEES ..................................................................................... 5

Crop Yield Prediction in the Cloud: Machine Learning Approach / Catal Çağatay, Muratli Can ............................ 7

Using Cognitive Software to Evaluate Natural Language / Torres Camilo, Tabares S. Marta, Montoya

Edwin, Kamišalić Aida ...................................................................................................................................... 11

An Analysis of BPMN-based Approaches ............................................................................................................. 15

for Process Landscape Design / Polančič Gregor, Huber Jernej, Tabares S. Marta ........................................... 15

Approach to an alternative value chain modeling / Pavlinek Miha, Heričko Marjan, Pušnik Maja ...................... 19

Using Property Graph Model for Semantic Web Services Discovery / Šestak Martina ....................................... 23

Statecharts representation of program execution flow / Sukur Nataša, Rakić Gordana, Budimac Zordan ......... 27

Code smell detection: A tool comparison / Beranič Tina, Rednjak Zlatko, Heričko Marjan ................................. 31

A Qualitative and Quantitative Comparison of PHP and Node.js for Web Development / Heričko Tjaša ........... 35

Skills, Competences and Platforms for a Data Scientist / Podgorelec Vili, Karakatič Sašo ................................. 39

Towards a Classification of Educational Tools / Košič Kristjan, Rajšp Alen, Huber Jernej .................................. 43

Indeks avtorjev / Author index ................................................................................................................................ 47





v





vi





Zbornik 20. mednarodne multikonference

INFORMACIJSKA DRUŽBA – IS 2017

Zvezek G





Proceedings of the 20th International Multiconference

INFORMATION SOCIETY – IS 2017

Volume G





Sodelovanje, programska oprema in storitve v

informacijski družbi

Collaboration, Software and Services in Information

Society





Uredil / Edited by



Marjan Heričko





http://is.ijs.si





9. oktober 2017 / 9th October 2017

Ljubljana, Slovenia

1





2





PREDGOVOR



Konferenco “Sodelovanje, programska oprema in storitve v informacijski družbi” organiziramo v sklopu

multikonference Informacijska družba že sedemnajstič. Kot običajno, tudi letošnji prispevki naslavljajo aktualne teme in izzive, povezane z razvojem sodobnih programskih in informacijskih rešitev ter storitev kot tudi sodelovanja v splošnem.



Informatika in informacijske tehnologije so že več desetletij gonilo inoviranja na vseh področjih poslovanja podjetij ter delovanja posameznikov. Odprti standardi in interoperabilnost ter vedno višja odzivnost informatikov vodijo k razvoju inteligentnih digitalnih storitvenih platform in inovativnih poslovnih modelov ter novih ekosistemov, kjer se povezujejo in sodelujejo ne le partnerji, temveč tudi konkurenti. Vse večja in pomembnejša je tudi vključenost končnih uporabnikov naših storitev in rešitev.

Napredne informacijske tehnologije in sodobni pristopi k razvoju, vpeljavi in upravljanju omogočajo višjo stopnjo avtomatizacije in integracije doslej ločenih svetov, saj vzpostavljajo zaključeno zanko in zagotavljajo nenehne izboljšave, ki temeljijo na aktivnem sodelovanju in povratnih informacijah vseh vključenih akterjev. Ob vsem tem zagotavljanje kakovosti ostaja eden pomembnejših vidikov razvoja in

vpeljave na informacijskih tehnologijah temelječih storitev.



Prispevki, zbrani v tem zborniku, omogočajo vpogled v in rešitve za izzive na področjih kot so npr.:

 modeliranje vrednostih verig storitvenih ekosistemov;

 načrtovanje pokrajin procesov;

 zaznavanje neustreznih načrtovalskih odločitev;

 identifikacija pomanjkljivih programskih komponent;

 odkrivanje semantičnih spletnih storitev;

 vrednotenje naprednih spletnih tehnologij;

 klasifikacija orodij učnega stolpiča;

 identifikacija znanj in kompetenc podatkoslovca;

 učenje in ovrednotenje klasifikatorjev naravnega jezika;

 uporaba algoritmov strojnega učenja v praksi.



Upamo, da boste v zborniku prispevkov, ki povezujejo teoretična in praktična znanja, tudi letos našli koristne informacije za svoje nadaljnje delo tako pri temeljnem kot aplikativnem raziskovanju.





Marjan Heričko





3

FOREWORD



This year, the Conference “Collaboration, Software and Services in Information Society” is being organised for the seventeenth time as a part of the “Information Society” multi-conference. As in previous years, the papers from this year's proceedings address actual challenges and best practices related to the development of advanced software and information solutions as well as collaboration in general.



Information technologies and the field of Informatics have been the driving force of innovation in business, as well as in the everyday activities of individuals for several decades. Open standards, interoperability and the increasing responsiveness of IS/IT experts are leading the way to the development of intelligent digital service platforms, innovative business models and new ecosystems where not only partners, but also competitors are connecting and working together. The involvement and engagement of end users is a necessity. On the other hand, quality assurance remains a vital part of software and ICT-based service development and deployment. The papers in these proceedings provide a better insight and/or propose solutions to challenges related to:

 Modelling large ecosystems value chain;

 Designing process landscape;

 Detecting bad design decision and code smells;

 Discovering semantic web services;

 Evaluation of advanced Web technologies;

 Classification of learning stack tools;

 Identifying skills and competencies of data scientists;

 Training and evaluation of natural language classifiers.

 Applying machine learning algorithms in practice.



We hope that these proceedings will be beneficial for your reference and that the information in this volume will be useful for further advancements in both research and industry.





Marjan Heričko





4





PROGRAMSKI ODBOR / PROGRAMME COMMITTEE

Marjan Heričko

Lorna Uden

Gabriele Gianini

Hannu Jaakkola

Mirjana Ivanović

Zoltán Porkoláb

Vili Podgorelec

Maja Pušnik

Muhamed Turkanović

Boštjan Šumak

Gregor Polančič

Luka Pavlič



5





6





Crop Yield Prediction in the Cloud:

Machine Learning Approach



Cagatay Catal

Can Muratli

Department of Computer Engineering

Department of Computer Engineering

Istanbul Kültür University

Istanbul Kültür University

Istanbul, Turkey

Istanbul, Turkey

c.catal@iku.edu.tr

o.muratli@iku.edu.tr





ABSTRACT

and political stakeholders, and evaluation of climate change

impact [6]. Therefore, researchers are still actively involved in the

Crop yield prediction provides critical information for decision

development of crop yield prediction models at national, sub-

makers and directly affects the agricultural policies and trade.

national, and international levels [6]. Since traditional survey

Current emerging technologies such as Internet of Things (IoT),

methods are time consuming and error-prone, accurate prediction

big data analytics, cloud computing, and machine learning

approaches are currently being developed by different research

enabled researchers to design and implement high-performance

groups. In addition to the survey method, there are different

yield prediction models. In this work, we aimed at investigating

approaches such as statistical methods, crop simulation models,

several machine learning-based regression techniques such as

and remote sensing-based techniques.

Boosted Decision Tree Regression and Neural Network

In this study, our main objective is to design and implement a

Regression for this challenging problem and implementing a

wheat yield prediction system based on the data obtained from

wheat yield prediction web service to host on the Azure cloud

sensors positioned in stations in south-east region of Turkey. To

computing platform. Case studies were performed on the data

retrieve and process this data, we collaborated with TARBIL

obtained for south-east region of Turkey and four states in the

Agro-Informatics Research Centre in Istanbul Technical

United States. Experimental results demonstrated that while

University which has a terrestrial network called Agricultural

neural network regression technique provides the best

Monitoring and Information System (AgriMONIS) with 441

performance for large-scale crop yield prediction datasets, linear

active RoboStations. In addition to the analysis performed on this

regression technique is more appropriate for small-scale datasets.

data, we also developed machine learning-based models for wheat

Categories and Subject Descriptors

data obtained from four states in the United States. Machine

learning-based models were designed and evaluated in Azure

I.2.6. [Computing Methodologies]: Learning

Machine Learning Studio platform. The best algorithm in terms of

coefficient of determination parameter was transformed into a web

General Terms

service and deployed in Azure cloud computing platform. A

Algorithms, Measurement, Performance, Experimentation.

client-side web application was implemented using ASP.NET

technology to handle the requests of the end users and farmers are

Keywords

being informed via this web application about the yield prediction

results.

Crop Yield Prediction; Internet of Things; Sensors; Machine

Learning; Regression Techniques; Cloud Computing





2. RELATED WORK

1. INTRODUCTION

There are several studies on crop yield prediction, but we did

It is reported that 795 million people in the world are

encounter an end-to-end crop yield prediction system which uses

undernourished which means that one in nine people today live

Azure Machine Learning Studio, Azure Cloud platform, and web

without sufficient food [1]. While the current world population is

services technology. Most of the studies in the literature only

around 7.5 billion people, it’s estimated that it will be 9.7 billion

report experimental results, but do not provide any practical

people which is 30% higher than the current population [2]. To

information to build a crop yield prediction system for the real-

supply adequate food to this huge population, global food

world scenarios. Also, there is very limited number of studies

production must improve dramatically. It’s estimated that while

which applied data in south-east region of Turkey. Çakır et al. [5]

one farmer now feeds 155 people in the world, by 2050 one

built an Artificial Neural Network to estimate the wheat yield

farmer will need to feed 250 people which is 61% higher than the

prediction in south-east region of Turkey and utilized from

current situation [3]. United Nations aims at ending hunger by

meteorological data such as temperature and rainfall records. They

2030 and ensuring access to safe and adequate food by all people

used data regarding to the years 2011 and 2012 for training the

in the world [4].

model, and applied the data regarding the year 2013 to test the

prediction model. They reported that results are better than the

Crop yield prediction before harvest can help to manage the

regression method when Multi-Layer Perceptron (MLP) is

agricultural trade policies [5], provide critical data for economic

applied. The optimal value for the Number of neurons was

7





reported as 15. Chen and Jing [7] compared two adaptive

In case study one, we had both phenological data and crop yield

multivariate Analysis methods based on Landsat-8 images to

information from nine different stations equipped with sensors for

forecast wheat yield and reported that Artificial Neural Networks

the years between 2013 and 2016 which enabled us to use the data

(ANN) provides better results than Partial Least Squares

from 2013 to 2015 for training and prediction 2016 yield results

Regression (PLSR) technique in terms of coefficient of

with given features (Figure 1).

determination and root mean squared error (RMSE) parameters.



Gouache et al. [6] developed wheat yield prediction models in

France for 23 departments using yield statistics from 1986 to

2010. They started with 250 variables and reached to 5-7

variables using forward stepwise regression methods to design

their prediction models. For 20 departments, acceptable models

were implemented. Stas et al. [8] compared Boosted Regression

Trees (BST) and Support Vector Machines (SVM) algorithms for

the prediction of wheat yields and reported that BST provides

better performance than SVM. Our paper is different than these

studies as we decided to build a cloud-based prediction system

and use state-of-the-art regression algorithms in Azure machine

learning platform.



3. METHODOLOGY



While there are many tools available, we preferred Azure Machine

Figure 1. All regression models tested for train and score

Learning Studio due to its cloud computing capabilities and its

model South-East Region of Turkey

easy to use nature. The collaboration with TARBIL, which is a



focused science center on agriculture that has over 400 stations

equipped with various sensors that monitor every phenological

After combining the two datasets to one both test and train

state of a field, enabled us to get the precise datasets for our

datasets also run with ten-fold cross-validation settings as seen in

experiments. In addition to those datasets, we also came across

Figure 2.

with a set of datasets focused on wheat yield in USA [9] which

created an opportunity for another case study to evaluate our

models.

During our experiments, every regression model available in

Azure Machine Learning Studio is tested, however due to some

constraints created by the datasets available to us we narrowed our

regression options to four which are explained briefly below:



1.

Linear Regression: Despite being the most simplistic

method amongst other regression models, linear

regression is frequently used in many case studies since

what it simply does is to attempt to create a linear

relationship between one or more features to be used for

a prediction of a numeric outcome.



2.

Bayesian Linear Regression: It is like linear regression

Figure 2. All regression models tested for cross validation

approach however, it uses Bayesian inference that

model South-East Region of Turkey

update probability distribution.



3.

Boosted Decision Tree Regression: Using an efficient

Cross-validation evaluation helped us to compare our findings

implementation of MART gradient boosting algorithm,

with second case study. In the second case study, we had more

Boosted Decision Tree Regression aims to build each

than 300.000 records in the dataset. However, since we had only

regression tree in a step by step fashion, eliminating

two-years of data, we did not perform a test which uses an

weaker prediction models.

external test dataset. Therefore, we made only cross-validation

4.

Neural Network Regression: While neural networks are

experiment for this large dataset.

widely used for deep learning and modelling

sophisticated problems, they can also be adapted to

regression models where more traditional regression

models falls short.

We had two different datasets one was from South-East region of

Turkey the other one was from four states of United States of

America. Having two sets of data, led us to approach this problem

in two case studies.

8





Regression algorithm provided the best performance in south-east

region in terms of coefficient of determination parameter when

external test set was used. Neural Network Regression algorithm

was the best option for the US dataset when cross-validation

analysis was applied. As part of the future work, web application

can be replaced with a mobile application and new experiments

can be performed when more regions are added to the datasets.

Deep learning algorithms might be considered when the dataset

becomes very large.

South East Region of Turkey Wheat Yield ML Results



Relative

Relative

Coefficient of

Regression Type

Absolute Error

Squared Error

Determination

Linear Regression

0.484972

0.300939

0.699061

Bayesian Linear



0.643469

0.396072

0.603928

Regression

Figure 3 - All regression models tested for cross validation model

Boosted Decision

0.669278

0.646339

0.353661

For the datasets from Turkey, yield information was in kilograms

Tree Regression

while in the USA dataset it was percentage based information.

Neural Network

0.969594

1,4013830

-0,4013830



Regression

Relative

Relative

Coefficient of

4. EXPERIMENTAL RESULTS

Regression Type

Absolute Error

Squared Error

Determination

As mentioned earlier, we applied four different regression models

Linear Regression

0.572134

0.391993

0.608007

to our datasets. We applied 10-fold cross-validation approach for

all the case studies and calculated the Coefficient of

Bayesian Linear

0.325579

0.157755

0.842245

Regression

Determination parameter with the help of Azure Machine

Learning Studio. Coefficient of Determination is a value between

Boosted Decision

0.512205

0.329844

0.670156

0-1 which determines how close the prediction is to the reality.

Tree Regression

While experimenting, we have seen that both the features and the

Neural Network

1,7276340

3,7723160

-2,7723160

amount of data affect the results. As seen in Table 1 in our

Regression

train/score model, the most simplistic approach which is the

Table 1 South East Region of Turkey Wheat Yield ML Results

Linear Regression scored the best results. Neural Network

regression failed because of the insufficient number of records.

United States of America Wheat Yield ML Results

During the 10-fold cross validation experiments after adding the



Relative

Relative

Coefficient of

2016 data to the training dataset which consisted of the data from

Regression Type

Absolute Error

Squared Error

Determination

2013-2015 we observed significant changes on the results,

especially for Bayesian Linear Regression. As the number of

Linear Regression

0.376717

0.173927

0.826073

records raised in the dataset, Boosted Decision Tree regression

Bayesian Linear

0.389548

0.180413

0.819587

had more information to train itself with better results.

Regression

In the second case study, we only had two years data with great

Boosted Decision

0.105499

0.01352

0.98648

amount of records for machine learning algorithms to learn from.

Tree Regression

As in Table 2 all the ten-fold cross validation results increased

Neural Network

0.000736

0.000001

0.999999

while Neural Network Regression model, unlike in the first case

Regression

study, giving a satisfying result. With the lack of the past two

Table 2 United States of America Wheat Yield ML Results

year’s data, we chose to base our web service on the first case

study’s dataset and developed further on from there. The web



application we developed uses a basic input-output style interface

to interact with the user and predict the crop yield when given the

6. ACKNOWLEDGMENTS

input. Input consists of the following features: Region and



provenance of the field, current temperature, yearly maximum and

Data for this project is provided by TARBIL Agro-Informatics

minimum temperature, total precipitation, growing day degree,

Research Centre in İstanbul Technical University. Authors would

temperature difference parameter, Photo Thermal Unit, Helio

like to thank to technical and management staff in this research

Thermal Unit and evapotranspiration parameter.

centre who helped us to prepare the dataset.

5. CONCLUSION AND FUTURE WORK



The objective of the crop yield prediction studies is to forecast the

REFERENCES

crop yield as early as possible during the crop growing season.

Weather and climate affect this agricultural production



dramatically. In this study, we developed an end-to-end wheat

[1] J. You, X. Li, M. Low, D. Lobell, and S. Ermon, “Deep

yield prediction system using machine learning algorithms. Case

gaussian process for crop yield prediction based on remote

studies were performed on the datasets retrieved from south-east

sensing data”, In AAAI, pp. 4559-4566, 2017.

region of Turkey and four states in United States. Linear

9

[2] K. B. Newbold, Population Growth. The International

[6] D. Gouache, A. S. Bouchon, E. Jouanneau, and X. Le Bris,

Encyclopedia of Geography. 2017.

“Agrometeorological analysis and prediction of wheat yield

at the departmental level in France”, Agricultural and Forest

[3] Cloud Technology Partners,

https://www.cloudtp.com/doppler/feeding-10-billion-people/

Meteorology, Vol. 209, pp. 1-10, 2015.

(2017) (accessed June 17, 2017).

[7] P. Chen, and Q. Jing, “A comparison of two adaptive

[4] United Nations, zero hunger: why it matters? Sustainable

multivariate analysis methods (PLSR and ANN) for winter

development

goal

wheat yield forecasting using Landsat-8 OLI images”,

http://www.un.org/sustainabledevelopment/wp-

Advances in Space Research, Vol. 59, Issue 4, pp. 987-995,

content/uploads/2016/08/2_Why-it-

2017.

Matters_ZeroHunger_2p.pdf (2015) (accessed June 17,

[8] M. Stas, J. Van Orshoven, Q. Dong, S. Heremans, and B.

2017).

Zhang, “A comparison of machine learning algorithms for

[5] Y. Çakır, M. Kırcı, and E. O. Güneş, “Yield prediction of

regional wheat yield prediction using NDVI time series of

wheat in south-east region of Turkey by using artificial

SPOT-VGT”,

In

Agro-Geoinformatics

(Agro-

neural

networks”,

In

Agro-geoinformatics

(Agro-

Geoinformatics), pp. 1-5. 2016.

geoinformatics 2014), pp. 1-4, 2014.

[9] USA Wheat Data,

https://github.com/prateek47/Wheat_Prediction



10





Using cognitive software to evaluate Natural Language

Classifiers - A Use Case

Camilo Torres

Marta S. Tabares

Edwin Montoya

Department of Informatics and

Department of Informatics and

Department of Informatics and

Systems

Systems

Systems

Universidad EAFIT

Universidad EAFIT

Universidad EAFIT

Medellín, Colombia

Medellín, Colombia

Medellín, Colombia

ctorres9@eafit.edu.co

mtabares@eafit.edu.co

emontoya@eafit.edu.co

Aida Kamišalić

Faculty of Electrical

Engineering and Computer

Science

University of Maribor

Maribor, Slovenia

aida.kamisalic@um.si

ABSTRACT

access, while in others, they can be accessed by purchas-

The current techniques for natural language processing can

ing on demand packages. This type of information allows

be used to identify valuable information such as sentiments

companies to carry out market analysis or search for com-

or patterns recognized and adjusted for different topics. To

munities of potential clients [2].

apply these techniques, it is required to know how to use

and tune prediction models. This requires time, experience

The available literature presents several research projects

and the implementation of different tests to ensure the cor-

about the algorithms and techniques used for natural lan-

rect behavior of the models. The aim of this paper is to

guage processing [3]. Their results indicate that the time

detect the features to train and evaluate classifiers instances

required to implement such techniques and algorithms de-

using optimized software, specifically, IBM Bluemix, and its

pends on users’ previous mathematical knowledge and on

module named Natural Language Classifier. The created

the tuning of the mathematical functions used in the pro-

classifier was trained with real tweets to classify the texts

cess. Therefore, despite the existing different solutions for

into three categories: Positive, neutral and negative texts.

text analysis, the implementation of such algorithms may

Afterwards, the classifier was validated with a set of already

be slow because of different influencing factors such as the

classified texts. The obtained results indicate how the num-

tuning required for each process and the tests with different

ber of training examples impact the behavior of the classifier

parameters.

and, that the highest accuracy was achieved for positive and

negative categories.

To address this problem we evaluate the proficiency of a tool

to analyze and classify texts generated from social networks.

Categories and Subject Descriptors

The texts classifications are labeled with the basic polarity

used for sentiment analysis, i.e. positive, negative and neu-

H.4 [Information Systems Applications]: Miscellaneous;

tral labels [4]. The databases for testing and training sets

I.2.7 [Artificial Intelligence]: Natural Language Process-

were obtained from the public Twitter API and the Spanish

ing—Text analysis

Society for Natural Language Processing (SEPLN).

General Terms

Among the existing technologies for natural language pro-

Algorithms, Measurement, Experimentation

cessing, there are platforms such as IBM Watson, Microsoft

LUIS, API.ai, WIT.ai, etc. We decided to use IBM Bluemix,

which includes a large variety of Watson services, where the

Keywords

wide catalog of options can be used for intelligent chats,

Natural Language, Classifiers, Machine Learning, Bluemix,

texts’ classifications and understanding, as well as the demon-

Watson

strated results and tests of Watson, such as the Jeopardy

game. Here, Watson shows the proficiency to give right an-

1.

INTRODUCTION

swers using natural language processing [5, 6]. We have used

The daily use of social networks currently results in the

the Natural Language Classifier component of the Watson

large-scale growth in the data and information generated in

suite that uses convolutional neural networks to do the cog-

the world. There is an expected expansion of 40% per year

nitive process from language [7].

and an estimated size of 50 times by 2020 [1]. Generated

data are mostly texts created by users in social networks

This paper is organized as follows. We present the context

such as comments or tweets that, in some cases, have free

and the research questions in Section 2. Section 3 summa-

11

rizes the related work for tweets’ polarity classification. The

gorithms for mining opinions on twitter and emphasized the

methodology to address the problem is explained in Section

actions of cleaning and pre-processing the information before

4. Furthermore, we introduce a detailed explanation of the

it is submitted to a classifier. Furthermore, they proposed a

developed proposal in Section 5. The results of the provided

process to make similar classifications according to the pro-

work are shown in Section 6. Finally, Section 7 brings the

cess carried out in this study, not only based on polarity and

conclusions of the paper.

sentiment analysis, but also on the objective classification of

the opinions expressed in the texts.

2.

CONTEXT AND RESEARCH QUESTIONS

During the classifier training, problems such as over-fitting

Finally, several of the papers found in the literature identi-

or poor estimation of the models for the training sets, might

fied different problems related to various types of classifiers

occur [8, 9].

These factors should be taken into account

and, accordingly, there are models that attempt to solve the

in order to avoid a bad prediction. Furthermore, adjusting

issues combining different types of classifiers. Lima et al. [4]

and finding the correct parameters for the adequate behavior

and Brahimi et al. [15] proposed hybrid solutions to improve

of the algorithm is a task that demands time and effort.

classification results using bayesian classifiers, support vec-

Technologies, such as IBM Bluemix, already have a set of

tor machines, decision trees and k-nearest neighbors. These

services used for the natural language processing.

These

types of solutions make it possible to increase the accuracy

can be solutions that save time consumed for the tuning of

of classifications and to evaluate which learning methods

classifiers.

perform best.

The identified problem and the context of using classifiers

The different algorithms and techniques found in this re-

through the natural language processing for sentiment anal-

lated work are processes that require time in each of the

ysis (polarity) in text, particularly tweets, results in the def-

different phases: Pre-processing, extraction, development or

inition of the following research questions:

algorithms’ testing. In this work, we use algorithms already

tested in order to speed up the sentiment analysis and polar-

- Which are the characteristics under which a classifier should

ity detection in tweets. We used IBM Bluemix specifically

be trained using technologies for the natural language pro-

Watson and its Natural Language Classifier module.

cessing and sentiment analysis?

4.

METHODOLOGY

- How effective are classifiers trained using frameworks and

We propose an approach for the tool evaluation through the

automated tools for natural language processing and senti-

method developed by Wieringa et al. [16]. We try to solve a

ment analysis?

problem through an engineering cycle, which is carried out

by the treatment or planning of solutions, and is validated

3.

RELATED WORK

with questions and answers that we made before and after

One of the most treated problems found in the studied liter-

the treatment. The expected effects are mentioned, and fi-

ature is data pre-processing before being used to train classi-

nally, the process is concluded using the results obtained

fiers. Elements such as sarcasm, expressions, abbreviations,

in the treatment. The treatment of this work is described

among others, can generate erroneous predictions. Khan et

based on the process carried out for the supervised learn-

al. [10, 11] focused on developing classifiers with good pre-

ing introduced by Kotsiantis [17], where the emphasis is on

processing before training. At the same time, they propose

the pre-processing of the data. In order to validate the re-

hybrid models based on the classification of emoticons, bags

sults, a database with texts of already classified tweets was

of words, etc. For pre-processing, procedures were proposed

obtained through the Spanish Society for Natural Language

such as searching dictionaries to check the existence of terms,

Processing (SEPLN).

replacing abbreviations, completing incomplete words and

performing spelling checks.

5.

USING NATURAL LANGUAGE PROCESS-

ING

Mertiya et al. [12] proposed the usage of bayesian classifiers

We based our proposal on the supervised learning process

to obtain the polarity of a database of tweets that results

presented by Kotsiantis [17], where we start with the iden-

in classifications with several false positives, which are sub-

tification of the required data. Figure 1 shows this process’s

mitted to an analysis of adjectives in order to be polarized

steps, which are modified for the use case described in this

correctly. The problem with this type of classifiers is that the

study. First, we obtained the training sets from our own

short texts of the tweets have a characteristic named spar-

tool, and then we made the texts’ pre-processing for their

sity, meaning that the data is not very significant, therefore,

correct interpretation in Watson. Furthermore, we present

the classifiers may have errors or bad predictions.

the contribution for the use of Bluemix.

To avoid the problems that happen when this type of short

texts are classified, He et al. [13] proposed a different ap-

5.1

Data identification

proach using a clustering algorithm called k-means in order

The used data are the different texts from the tweets database.

to discover related topics, based on the premise that the

We used Cloudant, a managed NoSQL JSON database ser-

texts will be more informative if they are grouped into sim-

vice, to perform querying easily through an HTTP API. We

ilar topics. Then, the obtained clusters are used to train a

imported the data in order to create the training sets.

bayesian classifier.

Almeida et al. [14] performed an evaluation of supervised al-

12



Table 1: File structure and example

Text

Class

“Let’s leave the skin to create a job and our econ-

positive

omy grows again. #MensajeGri˜

nan”

“2012 will be a year of titles. Play in a team.

Win as a team. Who’s with me?#makeitcount

positive

http://t.co/Ue7Kh2De”

“#FF @BRmodainfantil moms with children, do

not miss it, the best online shop for children’s

positive

fashion!”

“Impressed by the violence of the media in Mo-

negative

rocco. Pushing to photograph Rajoy in Rabat”

“The one who does not want to follow me does

not follow me, but the masochist must stop com-

negative

plaining and enjoy”

“These are hard times for everyone! The worst

thing will be the staff adjustments, which will

negative

not be delayed...”

“You can also follow it in the channel 24 hours

neutral

of RTVE”

“A few hours remain to close the last draw of the

neutral

year. There is still time to sign up”

“In the Vatican City”

neutral

Finally, the classifier can also be consulted through HTTP

requests. The obtained results are the probabilities that an

evaluated text could be in each class.

Figure 1: Supervised learning process exposed by

5.5

Evaluation with the test set

Kotsiantis [17] and complemented in this paper.

To perform the evaluation of the created classifier, we used a

test dataset with texts already polarized by the SEPLN. We

used 15,000 texts to retrain the classifier in order to make

5.2

Definition of the training sets

comparisons with the first training set and, at the same time,

To facilitate the selection of the tweets, we developed a web

be evaluated with a test subset taken from the total texts.

application in Node.js which makes queries to the database

and selects a tweet randomly. The selected tweet must be

After we retrained the classifier, we used a subset from the

assigned to one of the defined classes (positive, negative or

test database to perform the validation and test the accuracy

neutral). If it is not possible to label it with one of these po-

for the classifier with the respective training sets. We took

larities, labeling can be omitted or N/A (not applicable) se-

300 texts for each class, i.e. in total there were 900 tweets,

lected. When each class contains approximately 600 tweets,

to test how much the classifiers instances approached their

they are exported in CSV format using a script in Node.js

predictions regarding their polarity from the SEPLN. We

which queries the instances through the Cloudant HTTP

use these 900 texts in both classifiers instances to compare

API.

the results.

5.3

Data pre-processing

6.

RESULTS

Following the proposed engineering cycle, and based on the

The texts used as training should go through a cleanup pro-

results from the performed tests, some answers can be de-

cess where special characters like quotes and break lines are

rived for the raised research questions. Regarding the ef-

replaced as indicated in the Bluemix documentation. For

fectiveness of the classifiers, the results obtained were ac-

example, each text must be enclosed in quotation marks, if

ceptable in the positive and negative classes for the second

this character is repeated, it must be added twice, i.e. re-

training set. The neutral texts class presents results varying

placing ” so as ”” to distinguish it from the one that encloses

in both tests, which leaves evidence of the subjectivity that

the training text. We perform this process when the CSV file

this type of sentences present. It is important to note that

is created. Table 1 shows a file example with two columns,

the texts have not been filtered by any process to remove

the first one has the training texts and the second the class

stop words, URLs, hashtags, and other types of words that

to which each sentence corresponds.

could affect the classifiers’ prediction. Tables 2 and 3 show

the results for the first and the second training sets, respec-

5.4

Training the classifier

tively and table 4 shows the accuracy for each training set.

We used the HTTP API of Bluemix, which, through a web

service, creates the classifier from the training file separated

We observe that the classifier with the second training set

by commas. At the beginning the classifier is in a training

presents better results than the classifier with the first train-

state. The time the classifier needs to be prepared for the

ing set. The second training set was created with 15,000

consultation varies, depending on the size of the training set.

records, which is the maximum number of records supported

13

Services by Big Data Analytics. Mobile Networks and

Table 2: Results of the test set for the first training

Applications, pages 1–8, dec 2016.

set

Right

pre-

Right predictions’

[3] Fabrizio Sebastiani. Machine learning in automated

Total

dictions

rate

text categorization. ACM Computing Surveys,

Positive

300

105

35%

34(1):1–47, mar 2002.

Negative

300

158

52.6%

[4] Ana Carolina E S Lima, Leandro Nunes De Castro,

Neutral

300

191

63.6%

and Juan M. Corchado. A polarity analysis framework

for Twitter messages. Applied Mathematics and

Computation, 270:756–767, nov 2015.

Table 3: Results of the test set for the second train-

[5] Grady Booch. The soul of a new watson, jul 2011.

ing set

[6] D. A. Ferrucci. Introduction to ”This is Watson”. IBM

Right

pre-

Right predictions’

Total

Journal of Research and Development,

dictions

rate

56(3.4):1:1–1:15, may 2012.

Positive

300

235

78.3%

Negative

300

254

84.7%

[7] Carmine DiMascio. Create a natural language

Neutral

300

147

49%

classifier that identifies spam.

https://www.ibm.com/developerworks/library/cc-spam-

classification-service-watson-nlc-bluemix-

Table 4: Accuracy for each training set

trs/index.html,

Training set 1

Training set 2

2015.

Accuracy

50.4%

70.7%

[8] Alex A. Freitas and Alex A. Understanding the crucial

differences between classification and discovery of

association rules. ACM SIGKDD Explorations

by the Bluemix Natural Language Classifier module. It is

Newsletter, 2(1):65–69, jun 2000.

probable that the large set of texts and the type of texts used

[9] Douglas M. Hawkins. The Problem of Overfitting,

by the SEPLN, made the classifier with the second training

2004.

set get a better prediction and closer to the original polarity

[10] Farhan Hassan Khan, Saba Bashir, and Usman

of the test database. It is also probable that the class for

Qamar. TOM: Twitter opinion mining framework

neutral texts will be more subjective and, therefore, could

using hybrid classification scheme. Decision Support

be the reason for obtaining different results in the two tests.

Systems, 57(1):245–257, jan 2014.

We conclude that the classifier obtained better results with

[11] Farhan Hassan Khan, Usman Qamar, and M. Younus

the second training set because of the large number of ex-

Javed. SentiView: A visual sentiment analysis

amples.

framework. In International Conference on

Information Society, i-Society 2014, pages 291–296.

7.

CONCLUSION

IEEE, nov 2015.

We proposed the usage of natural language classifiers, us-

[12] Mohit Mertiya and Ashima Singh. Combining Naive

ing IBM Bluemix and its services for text analysis, in order

Bayes and Adjective Analysis for Sentiment Detection

to speed up the process of parameterization and algorithms’

on Twitter. 2016 International Conference on

tuning. We conclude that the classifiers created in this man-

Inventive Computation Technologies (ICICT), pages

ner have a good effectiveness according to the texts’ cleaning

1–6, aug 2016.

process. The neutral classification is the most subjective and

[13] Yunchao He, Chin Sheng Yang, Liang Chih Yu,

prone to bad predictions. It is important to emphasize that

K. Robert Lai, and Weiyi Liu. Sentiment classification

the cleaning process has a great influence on the classifica-

of short texts based on semantic clustering. In

tion results, in addition to the subjectivity in the creation

Proceedings of 2015 International Conference on

of the training sets.

Orange Technologies, ICOT 2015, pages 54–57. IEEE,

dec 2016.

[14] Yudivian Almeida and Velarde Suilaan. Evaluacion de

8.

ACKNOWLEDGEMENT

Algoritmos de Clasificacion Supervisada Para El

We acknowledge the support of the Colombian Center of Ex-

Minado De Opinion en twitter. Investigación

cellence and Appropriation on Big Data and Data Analytics

Operacional, 36(3):194–205, 2015.

- Alianza CAOBA (http://alianzacaoba.co/), under which

[15] Belgacem Brahimi, Mohamed Touahria, and

the project is developed. We sincerely thank the researchers

Abdelkamel Tari. Data and text mining techniques for

and students who participated in tweets’ classification.

classifying Arabic tweet polarity. Journal of Digital

Information Management, 14(1):15–25, 2016.

9.

REFERENCES

[16] Roel J. Wieringa and Ayse Morali. Technical Action

[1] Ibrar Yaqoob, Ibrahim Abaker Targio Hashem,

Research as a Validation Method in Information

Abdullah Gani, Salimah Mokhtar, Ejaz Ahmed,

Systems Design Science. Design Science Research in

Nor Badrul Anuar, and Athanasios V. Vasilakos. Big

Information Systems. Advances in Theory and

data: From beginning to future. International Journal

Practice, 7286:220–238, 2012.

of Information Management, 36(6):1231–1247, dec

[17] S B Kotsiantis. Supervised machine learning: A

2016.

review of classification techniques. Informatica, An

[2] Francesco Piccialli and Jai E. Jung. Understanding

International Journal of Computing and Informatics,

Customer Experience Diffusion on Social Networking

31(31):249–268, 2007.

14





An Analysis of BPMN-based Approaches

for Process Landscape Design

Gregor Polančič

Jernej Huber

Marta S. Tabares

Faculty of Electrical Engineering and Faculty of Electrical Engineering and

Universidad EAFIT,

Computer Science,

Computer Science,

Department of Informatics

University of Maribor

University of Maribor

and Systems,

Maribor, Slovenia

Maribor, Slovenia

Antioquia, Colombia

gregor.polancic@um.si

jernej.huber@um.si

mtabares@eafit.edu.co





ABSTRACT

represent process landscapes with a subset of BPMN elements.

Process landscapes represent the top part of an organizational

However, despite BPMN is an ISO and de-facto standard for

process architecture. As such, they define the scope and

process modeling, landscapes are out of its scope.

relationships between its processes. Process landscapes diagrams

While the non-normative BPMN-based process landscape

simplify process-related communication by leveraging the benefits

diagrams appear in practice, this article reviews and analyses

of visual notations. However, in contrast to business process

related approaches to identify their strengths and weaknesses.

diagrams, where nowadays BPMN is the prevalent notation,

Based on analyzed approaches, we evaluate the applicability of

process landscape diagrams lack of standardization. In the article,

BPMN for such modeling purpose.

we review and analyze notations used for modeling process

landscapes, as well as non-normative BPMN-based approaches

applicable for their representation. Based on analyzed approaches,

2. PROCESS ARCHITECTURES AND

we evaluate the applicability of BPMN for process landscape

LANDSCAPES

design.

Process landscapes represent the top part of a process architecture

- a conceptual model that organizes processes of a company and

Categories and Subject Descriptors

makes their relationships explicit (Figure 1).

C.0 [Computer Systems Organization]: General – Modeling of



computer

architecture;

D.2.9

[Software

engineering]:

Management – Software process models (e.g., CMM, ISO, PSP)

Landscape

level

General Terms

Management, Documentation, Standardization, Verification

Strategic

level

Keywords

Operational

Process landscape, process map, BPMN, analysis

level



1. INTRODUCTION

Figure 1: A conceptual representation of a process

A common starting point for process design and all activities

architecture

related to BPM is to identify and structure organization’s processes

A process architecture usually defines two types of relationships:

(i.e. process identification phase) [1]. Regularly, users tend to

horizontal

and

vertical. Horizontal relationships define

represent identified processes in a visual manner in the form of a

‘output/input’ relationships between processes, i.e. the outcome of

process landscape (i.e. process map) diagram.

a process represents an input for the next process, (e.g. ‘consumer-

The main purpose of process landscapes is to specify organizational

producer’ or ‘order-to-cash’ relationship). Vertical relationships

processes on a bird’s-eye view. With process landscapes, an

between processes define different levels of details of a process, i.e.

organization can more easily gain an overview of its main processes

a process diagram on a lower level represents a more detailed view

and their major interdependencies. Therefore, the usage of process

of the same process on the level above.

landscapes simplifies process-related communication and represent

The top-level of a process architecture is commonly reserved for

a starting point for detailed process discovery (i.e. AS-IS process

process landscape diagrams. A single process landscape diagram

modeling). Besides, process landscapes are a common way to

shows the main processes of an organization as well as the

represent processes-based reference models for the operation (e.g.

dependencies between them, which is shown in the Figure 2 and

ITIL, CMMI) and the management (e.g. COBIT) of organizational

Figure 3. Those two figures represent the two examples of process

IT infrastructure and services.

landscape diagrams with processes as ‘black-boxes’ and arrows

There are no standardized languages for creating process

representing the flow of deliverables between different processes.

landscapes. Consequently, modelers most commonly define their

Rectangles represent the stakeholders, external to an organization.

own ‘overviews of processes’ by imitating existing diagrams (e.g.

value chains) or proposing their own more or less intuitive

representations. A common approach for BPMN experts is to

15





process. E.g., supportive processes are usually positioned

below the core processes, with arrows pointing up, whereas

management processes are positioned above them, with

arrows pointing down (Figure 5).

c.

Parent / child process relationships. Processes may be

hierarchically organized which is represented either by (1)

visualizing sub-processes by using (visual) sub-sets or (2) by

using non-directed solid lines between processes as common

in ‘organizational charts’ (Figure 6).

Business

Business Process

Process



Business

Business

Business

Figure 2: An example landscape diagram (ISO 9001)

Process

Process

Process

Business

Business

Business



Process

Process

Process



Figure 6: Hierarchical relationships between processes –

subsets (left), organigram-based (right)

d.

Process sequence. The sequence which defines the order on

how processed are performed is mainly represented implicitly

with a horizontal sequence of process symbols (Figure 7, the

left process is performed prior to the right one).





Figure 3: An example landscape diagram [2]

Business

Business

Business

A process landscape diagram serves as a framework for defining

Process

Process

Process

the priorities and the scope of process modeling and redesign



projects. Each element of a process landscape model may point to

Figure 7: Implicit representation of a sequential relationship

a more concrete business processes on the lower levels.

between processes

2.1 Process landscape notation

However, since this implicit representation of a sequential

relationship enables only a simple linear relationship between

A visual notation (i.e. visual language, graphical notation, or

processes, explicit representations of processes orderings are

diagramming notation) consists of a set of graphical symbols

visualized with solid directed lines (Figure 8). Another

(visual vocabulary), a set of compositional rules (visual grammar)

drawback of implicitly ordering the processes is that a diagram

and definitions of the meaning of each symbol (visual semantics).

reader could misinterpret a set of non-sequentially performed

A common denominator of process landscape diagrams (Figure 2

processes, put in a line, as being performed sequentially.

and Figure 3) are the following elements:



a.

Business process. Although not explicitly defined, the

Business

Business

landscape diagrams clearly highlight the concept of a business

Process

Process

process. Visually, a business process is frequently represented

with an arrow, where there are also alternative representations,

Business

Business

e.g. a rectangle and a rectangle with rounded corners (Figure

Process

Process

4).



Figure 8: Explicit representation of a sequential relationship

Business

Business

Business

between processes

Process

Process

Process



Arrows-based representation of process ordering enables more

Figure 4: Business process symbols

complex ordering relationships (e.g. when a process ends, two

processes are initialized). Sequential relationships might be

b.

Process groups / types. On a process landscape diagram, the

labelled, representing artefacts or data being transferred

business processes are commonly distinguished by their

between processes (i.e. process outputs – process inputs as

purpose (e.g. core processes, management processes and

presented in the Figure 3).

supportive processes), which is visualized either by (1)



encircling and labelling a set of processes (Figure 5, left) or

e.

Participant. A participant, usually visualized with a rectangle

(2) specializing the process symbol for individual types of

(Figure 9), presents someone who is involved (i.e. internal

processes (Figure 5, right).

participant) or interacts (i.e. external participant) with a

Management

business process. Most commonly, process landscapes

process

Group of business processes

visualize external participants (e.g. suppliers and customers),

Core

Business

Business

Business

which are related to processes, either by providing inputs or

Process

Process

Process

Process

receiving outputs. This corresponds to the concept of a ‘value

system’ which consists of following value chains: supplier, the

Support

process



focal enterprise and consumer [2]. The relationships to

participants are represented either implicitly (e.g. with

Figure 5: Representation of a group (left) and/or type of

leveraging visual planar variables) or explicitly (with solid

processes (right)

arrows).

Besides manipulating the shapes of symbols, the planar visual



variables and symbol orientation might imply the type of a

16

diagrams but merely an abstract view of BPMN collaboration

r

r

ei

e

l

diagrams.

Business

p

iv

p

label

e

Process

c

u

Conversation

S

Re

Node



Pool

Pool

Figure 9: Representation of (external) process participants

and their (explicit) relations

Pool

3. BPMN-based approaches



Business Process Model and Notation (BPMN) is a well-

established standard for process modeling and automation [3].

Figure 11: BPMN Conversation diagram

From the modeling aspect, it defines a vocabulary, grammar and

Conversation diagrams are an effective way for representing

semantics for creating different types of process diagrams, namely:

interactions between processes; however, similar to previous

process diagrams, collaboration diagrams, choreography diagrams

approach, they are based on a small set of elements, which are

and conversation diagrams. In light of process diagrams, BPMN

inappropriate for modeling of conventional process landscapes (i.e.

states that [4] “processes can be defined at any level from

conversation nodes, representing correlated messages and pools,

enterprise-wide Processes to Processes performed by a single

representing participants or processes).

person.”. Although this could be understood as BPMN supports

modeling of process landscapes, they are not mentioned in any

3.1.3

Enterprise-wide process diagrams

version of the specification, nor recommended by researchers [5].

As stated in the specification [4], BPMN can be used for business

Nevertheless, since BPMN is widely adapted by industry, modelers

process modeling on any level of granularity. In accordance to this,

frequently use BPMN for visualizing systems of black-box

the system of an organization’s processes may be modeled as a

processes (i.e. some kind of process landscapes) by applying the

single process consisting of individual processes being modeled as

approaches, presented in the next sub-chapters.

activities, i.e. sub-processes (Figure 12).



3.1.1

Abstract collaboration diagrams

A common and syntactically valid BPMN representation of process

landscapes is to use black-box Pools and Message flows, i.e.

Process

Process

Process

collaboration diagrams with hidden details (Figure 10). A BPMN



Pool is a visual representation of a Participant, which may reference

Figure 12: BPMN Sub-processes representing processes

at most one business process. A Message flow represents exchange

of messages between two ‘message aware’ process elements (e.g.



activities, message events and black-box Pools).

By using this approach, one is able to present the majority of



process landscape constructs, namely processes (i.e. with BPMN

sub-processes), sequential interactions (i.e. BPMN sequence

flows), groups or types of processes (i.e. BPMN group element)

Pool

Message

Pool

and participants (i.e. BPMN lanes). However, there are several

major drawbacks of this approach. First, such diagrams are visually

inconsistent with process landscape diagrams (e.g. processes being

Pool

Message

Message

represented with rounded rectangles and participants with



horizontal lines). Second, these diagrams are inconsistent with

Figure 10: BPMN Pools and Message flows

BPMN syntax and semantics, making them invalid (e.g., BPMN



Process and BPMN Sub-process are two distinct BPMN meta-

model elements). Third, this approach is also impractical, since the

The strength of such representation of a process landscape is

majority of processes are discovered on a lower level of granularity

compliance with BPMN specification and simplicity. On the other

(e.g. based on the services or products a business process delivers)

hand, there are several drawbacks. First, the visual appearance of

and afterwards interrelated into a process landscape diagram.

this approach is unconventional for process landscapes (i.e.

processes being represented with rectangles). Second, the

relationships between processes represent information exchange,

4. DISCUSSION

where process landscape diagrams most commonly visualize

Table 1 summarizes a comparison of BPMN-based approaches for

sequential relationships between processes and processes

landscape design in respect to common process landscape concepts.

clustering. Third, there is a lack of concepts, which may be

In respect to abstract syntax comparison, we can conclude that none

regularly used for landscape modeling, namely, sequential

of aforementioned BPMN approaches supports all of the concepts

relationships, process hierarchy and process types, whereas there is

common in process landscapes modeling. Besides, the following

a symbol deficit in case of representing a participant and a process

inconsistencies exist. The first and second approach uses the same

(rectangle symbol is used in both cases).

element for representing a participant and a process – BPMN Pool

(i.e. symbol overload), whereas the third approach uses element

3.1.2

Conversation diagrams

BPMN Activity in contrast to its definition (i.e. semantics).

Conversation diagrams. Another valid way for representing



process landscapes in BPMN is by using Conversation diagrams

(Figure 11), which were introduced in the second major revision of



BPMN. Formally, they are not a standalone type of BPMN

17

Table 1: Comparison of BPMN-based approaches for landscape design

Process landscape

Common

BPMN approach for landscapes modeling

concept

visualization

1 - Abstract collaboration

2 - Conversation diagrams

3 - Enterprise-wide

diagrams

process diagrams

Business process

See Figure 4

BPMN Pool

BPMN Pool

BPMN

Activity





Process group / cluster

See

BPMN Group

BPMN Group

BPMN Group

Figure 5, left





Process type

See





Figure 5, right

No standardized BPMN

No standardized BPMN

No standardized BPMN

element

element

element

Hierarchical

See Figure 6

No standardized BPMN

No standardized BPMN

Parent activity – child

relationship between

element

element

activity relationship

processes

Sequential

See Figure 7 and

No standardized BPMN

No standardized BPMN

relationship between

Figure 8

element

element

BPMN Sequence flow

processes

Information flows

See Figure 8

BPMN Message flow

BPMN Message flow

Directed association





Conversation Node



Internal and external

See Figure 9

participant

BPMN Pool

BPMN Pool

BPMN Pool





In respect to the concrete syntax comparison (i.e. notation), Table

1 demonstrates that none of BPMN approaches result in diagrams

6. REFERENCES

with a graphical similarity to common landscape diagrams.

[1] L. Fischer, R. Shapiro, B. Silver, and Workflow Management

Coalition, BPMN 2.0 handbook second edition: methods,

According to above, we can conclude that BPMN is inappropriate

concepts, case studies and standards in business process

for modeling the process landscapes. This finding is also supported

management notation. Lighthouse Point, FLa.: Future

by Freund and Rücker [6], who state that ‘even when we’ve already

Strategies, 2012.

modeled one or more process landscapes using BPMN at a

customer’s request, primarily with the collapsed pools and

[2] M. Weske, Business process management concepts,

message flows described we cannot recommend doing this’.

languages, architectures. Berlin; New York: Springer, 2012.



[3] M. Kocbek, G. Jost, M. Hericko, and G. Polancic, “Business

Analytically, this was confirmed by Malinova [7], who performed

process model and notation: The current state of affairs,”

a semantical mapping between BPMN and ‘Process maps’. Her

Comput. Sci. Inf. Syst. , vol. 12, no. 2, pp. 509–539, 2015.

results show that BPMN is not appropriate for process landscape

[4] OMG, “Business Process Model and Notation version 2.0,”

design.

03-Jan-2011.

[Online].

Available:

According to the benefits and weaknesses of existing approaches

http://www.omg.org/spec/BPMN/2.0/. [Accessed: 15-Mar-

for (BPMN-based) process landscape design, following research

2011].

directions are feasible. First, a standardized language for process

[5] J. Freund and B. Rücker, Real-Life BPMN: Using BPMN 2.0

landscapes may be designed by considering the best practices of

to Analyze, Improve, and Automate Processes in Your

non-formal process landscape notations. The focal risk of this

Company, 2 edition. CreateSpace Independent Publishing

research direction is to develop a solution, which has to gain

Platform, 2014.

standardization and industry adoption. Second, BPMN structure

[6] J. Freund and B. Rücker, Real-Life BPMN: With introductions

and the notation may be extended for effective support of process

to CMMN and DMN, 3 edition. CreateSpace Independent

landscapes. In this case, the major risk is the intervention into the

Publishing Platform, 2016.

structure and notation of a well-adopted and standardized language.

[7] M. Malinova and J. Mendling, “Why is BPMN not

appropriate for Process Maps?,” ICIS 2015 Proc. , Dec. 2015.

5. ACKNOWLEDGMENTS

The authors acknowledge the financial support from the Slovenian

Research Agency (research core funding No. P2-0057).



18





Approach to an alternative value chain modeling

Miha Pavlinek

Marjan Heričko

Maja Pušnik

Faculty of Electrical Engineering

Faculty of Electrical Engineering

Faculty of Electrical Engineering

and Computer Science,

and Computer Science,

and Computer Science,

University of Maribor

University of Maribor

University of Maribor

Maribor, Slovenia

Maribor, Slovenia

Maribor, Slovenia

miha.pavlinek@um.si

marjan.hericko@um.si

maja.pusnik@um.si





ABSTRACT

contribution to the commercial success. In a way it organizes the

This paper is focused on describing an alternative approach to

activities of an enterprise. For example, value chains are a well-

modeling value chains, which are an important part of presenting

known approach in business administration to organize the work

business activities, and the value each activity delivers. They

that a company conducts to achieve its business goals. Value

document translation of data and services into business value,

chains are often used in business modeling for different areas, e.g.

essential in times of ever growing productivity and competition.

medicine, lists of online services, etc. In this paper, we propose an

There are several possible notations for value chain modeling,

adapted approach to model value chains in the smart city domain,

each holding specific characteristics. Since each domain has its

where the value chain describes the transformation cycle of data

own demands, the goal of this paper is focused on finding the

into value for the benefits of citizens and the community [3].

most suitable approach to modeling based on one or more

After the Introduction section, the history of value chain notations

notations, addressing a representative domain within smart cities

is presented in Section 2. The proposed approach is described in

(a case study of the health domain is included). An approach to

Section 3, supported by an example of modeling value chains in

value chain modeling is supported by existing notations and

Section 4. Lastly, conclusions and future work are presented.

documentation techniques.

2. VALUE CHAINS

Categories and Subject Descriptors

The idea of value chains is to represent an organization as a

H.5.3 [Information interfaces and presentation]: Group and

system divided into subsystems with inputs, transformations and

Organization Interfaces - collaborative computing, organizational

outputs. The process of turning inputs into value-added outputs

design.

usually consists of various activities, where some of them are

primary, and others are supporting activities [5]. The most

General Terms

common notation for a value chain is by Porter; often used within



EU projects like “Project BLUENE”, “European Big Data Value

Documentation. Design.

Partnership Strategic Research and Innovation Agenda”,

“European Data Portal”. Porter’s value chains are, basically, used

Keywords

to identify activities conducted by specific companies, with the

Value chains, use cases, smart city.

purpose of providing a product or a service. They can be applied

in different fields, such as the definition of B2B and B2C

1. INTRODUCTION

segments in any field. An example of a classic Porter’s value

chain is presented in Figure 1.

The paradigm shift in business practices is going from the

“product-driven orientation” of the past to today’s “customer-

driven orientation”, which is characterized by increased demand

of variability, product variety, amounts of customer-specific

products, and shortening product life cycles [1]. Therefore, it is

beneficial to the business to identify the key activities and

capabilities that flow through the business and define a value

chain [2]. A value chain is a high-level model intended to

describe the process by which businesses receive raw materials,

add value to the raw materials through various processes to create

a finished product, and then sell that end product to customers.

The concept was suggested by Michael Porter in 1985 [4]. The

raw material and product concept can also be transferred to

business services and different, intangible business goals. For

achieving business goals, companies have to cooperate with or



within each other, and their value chains are connected in so-

Figure 1. Classic value chain by Porter [4]

called value systems. Each value system consists of a number of



value chains, each of which is associated with one enterprise.

With Porter, notation support is provided to describe not only

A value chain simplifies complex value systems, since it breaks

classic supply chain processes, but also services and

down the activities a company performs, and analyzes their

collaborations among companies that use them. Despite their

19



popularity, classic value chains are useful only in a limited range

and others can use information and services to improve their

of domains. The main disadvantage is their manufacturing

effectiveness [9]. Sharing outcomes is important for promotional

oriented format. Therefore, other forms of value chains appeared,

purposes, to inform not just end users about it, but also especially

especially in the field of ICT, where there are several alternatives

developers, who can use it as inputs to develop new innovative

to value chain modeling. The first who applied value chains to

solutions. The final step is actually repeating. With new data,

Information Systems were Rayport and Sviokla within their work

technology updates and a new audience, a digital value chain is

on virtual value chains [6]. Some other relevant value chains in

changing constantly.

the ICT domain are the following:



Service value chain – the relative value of the activities

3. PROPOSED APPROACH

required to establish the product or service,

Despite several notations which were listed and described briefly,

none of them fulfilled completely the needs of a complex smart



Service value network – a dynamic way of providing

city. In this paper, a proposal of an alternative approach is given,

services based on a coordinated value chain of companies,

aggregating existing practices, adjusted to the needs of a smart



Value stream mapping – a method for analyzing the current

city domain. Usually the aim of the value chain is to increase

state and planning a future sequence of events that lead the

profits by creating a value, but value chains can also be used to

product or service from beginning to client,

identify opportunities where end users benefit from the final



Data value chain – information flow is described as a series

outcome. In the ICT domain, identification of value chains needs

of steps needed to generate value and useful insights from

detailed consideration of existing problems, obstacles, potential

data [7],

improvements based on ICT, and the inclusion of various

stakeholders. The approach described in this paper is based on



Big data value chain –modeling the high-level activities that

digital value chains and on documenting existing data, services

comprise an information system, and

and processes. It is designed to address and solve issues in several

smart city fields using ICT tools and techniques. The most



Digital value chain –a set of processes designed to transform

important activity is the documentation of key challenges, target

raw data into actionable information that can drive better

groups and actors, existing data sources, web services and

decisions and insights [8].

potential scenarios.

Every value chain begins with inputs. In manufacturing, these are

In the process of identification and documentation of challenges,

raw materials like steel or wood, while in the field of ICT raw

each challenge was described clearly with all specifics and details,

materials can be considered as raw data. In this step,

so that everyone could understand it. References to a service,

heterogeneous data can be gathered from mobile internet devices,

which presents a potential solution were also provided. An

sensor-devices, or extracted from existing sources in structured or

example of current challenges in the context of health would be

unstructured format. Applying new technologies to existing

long waiting hours, unnecessary visits to the doctor and improved

products, practices and processes can be best described with

control of a patient’s progress.

digital value chains, the activities of which are depicted in Figure

2 and described in the continuation.

A list of target groups describes key actors who are involved in

scenarios. Some of the actors are providers, and others are end

users such as citizens.

By documenting available data sources and services, various

information needs to be provided. Besides the title and

description, each end user must understand the data or service

purpose and benefits, know who are the owners and potential

consumers, and has to be informed about accessibility, privacy

restrictions and price. In the case of web services, input and

output parameters are important as well.

Based on the inventory of existing data and services, potential

Figure 2. Digital Value Chain [8]

scenarios can be defined. Each scenario is described through flow

of events with alternatives, and the entire concept is presented

Raw data does not have any value until it is processed. Therefore,

with use cases. Some additional information regarding initiating

in the second step, collected data is processed and, if necessary,

events, participants, included services, inputs/outputs and

mashed up and/or visualized. In the processing, activity data is

execution is also provided.

transformed and mapped from raw format to determined format

trough actions such as parsing, joining, standardizing,

4. AN EXAMPLE OF MODELING A

augmenting, cleansing, consolidating and filtering. Processed data

VALUE CHAIN OF A HEALTH CARE

can be combined and exposed through the web APIs, which are

analogous to components in manufacturing. More details on this

SCENARIO

activity are presented within data value chain [9]. As an output, a

In this chapter, a real example is presented of an applied approach

refined data is provided, new information, or even enhanced

to value chain documentation in the smart city domain.

functionality, which can be input into the next value chain or used

Customized value chains have already been used to define the role

by application developers, leaders or other end users. Application

and impact of ICT in developing smart cities within other related

developers can take aggregated data streams and combine them in

works [10].

any number of ways to create information components. Leaders

Value chains were designed in accordance with our approach

can help with new visualizations to improve decision- making,

within the EkoSMART program, the purpose of which is to

20



develop a smart city ecosystem with all the supporting

processes based on meetings with representatives from the field,

mechanisms necessary for efficient, optimized and gradual

the following problems were detected:

integration of individual areas into a unified and coherent system

of value chains [11]. One of the most important objectives of the



Multiple treatments

program is to integrate solutions from different sectors into a



Distribution of services

common ecosystem. The resulting value chains, based on



Inflexible working time and poor ordering system

technologies like electronic and mobile devices, related software



The burden on healthcare personnel and the long waiting

solutions and intelligent data processing, are enhancing the

period

quality of current services. Moreover, sectoral value chains will



Patient monitoring disabled

be inputs for the cross-sectoral value chains.



Inaccessibility of data



Deficient legislation, missing Standards and protocols

4.1 Smart cities and their characteristics



Cities are marked with locations that have a high level of

accumulation and concentration of economic activities; they are

spatially complex and connected with transport systems. The

larger the city, the greater the complexity and the challenges and

the risks of disturbances. The fundamental paradigm of the

present world is the continuous technological advancement,

which, on the one hand, represents a certain proportion of new

problems, but, on the other hand, technology is precisely the one

thing where key solutions to this problem can be found. Since the

world cannot be "reversed", it is necessary to look for suitable

solutions that would facilitate modern pressures to focus on the

core of new life, which is represented largely by Information and

Communication Technologies (ICT). The quality services

provided by ICT can relieve people greatly, help them with time

optimization, organization, and, last but not least, motivate them.

The purpose of the field is to develop approaches and prototypes,

which provide the basic conditions for effective transformation of



the healthcare, traffic, energy, waste and other systems, focusing

Figure 3: Use case diagram for health vertical.

on the following main fields:

In order to address these problems effectively, target groups were

•

Smart economy,

identified in addition to data and services which are needed to

•

Smart people,

enhance existing processes. All the parts were described and

•

Smart governance,

connected in a comprehensive diagram of use cases, which

•

Smart mobility,

includes a list of activities of identified participants (Figure 3).

•

Smart environment,

The common use case diagram includes several possible

•

Smart living.

application scenarios.

Individual usage scenarios were also presented in more detail with

In the context of a smart city, a value chain is defined as

a detailed description, characteristics, flow of events and separate

connected activities within a particular sector with various

use case diagram. The Establishing treatment scenario is

stakeholders, which collaborate with the aim to provide quality

explained as an alternative value chain presentation, where

services to enhance the life quality and/or strengthen economic

activities are as follows: The patient has a problem, (1) Enters the

growth in an environmentally friendly manner. Designed value

system, (2) The system is assigned a medical treatment, and (3)

chains are intended for data owners, service providers, application

The patient follows this treatment, trying to achieve the set

developers, city leaders, citizens and others.

criteria. Characteristics of the scenario are categorized in the

4.2 Designing a value chain for the Health

following groups: Basic information, People and IT, Inputs /

outputs and Implementation. Table I represents actual

sector

characteristics for the Establishing treatment scenario.

In the Health sector, value chains were identified that should be

considered within the context of the introduction of smart

A high-level representation of a final value chain with

healthcare services, like telemedicine and telecare. The main goal

participants, inputs, outputs and intermediate assets can be seen in

is the preparation of quality and comprehensive healthcare

Figure 4. The value chain has four pillars: Participants, Input

services using ICT tools and techniques, where value chains are

Data, Information/Services and Output results. The participants

designed to identify and upgrade the occasionally problematic

are the providers as well as users of the service, followed by all

quality of today's treatment and care of these groups, primarily

data necessary and, further, more services, designed by and for the

through the use of electronic and mobile devices and related

participants, based on accumulated data. Finally, based on all

software solutions, in particular artificial intelligence in the cloud,

previous pillars, final services with added value for the city (or

or locally, for example, on a mobile device or with customized

company) in the form of different outputs and results are

sensors and carrying devices. A connection of existing solutions is

presented.

planned with new smart city solutions. By documenting health



21





Participants

Input data

Informations/services

Output/results

strengths and weaknesses. Several techniques can be used to

present the value flow. However, a combination of notations was

used for the purpose of presenting the smart city complex system

Patient

of users, data, services and scenarios. A use case diagram was

 Measurements

 Messages

used to present the behavior and set of actions of several

Social network

 Measurements

participants. Within a use case diagram, several scenarios can be

overview

 Notifications

overview

derived; each scenario defined in the form of a Table

 Contact details

 Reminders

Home care provider

 Pacient/doctor

communication

(characteristics of a scenario). Lastly, a high-level representation

 Video conference

 Simpler patient

of a value chain is presented (including the four value pillars). In

monitoring

 Details about

 Informing

the future work, refinement of the approach will be performed.

patient

 Anomaly

patients and their

General practicioner

 Examination

detection

social network

results

 Decision support

 Joint treatment

 Treatments

 Informing

 Saving time and

 Therapies

money

 Interventions

 Relieve health



6. ACKNOWLEDGMENTS

Specialist

Messages

 Entering, viewing

professionals

Instructions

and editing

 Effective care

Notifications

patient

 Greater data

information

availability

This joint work is enabled by the program “Eko Sistem

 Doctor/doctor

 Cost reduction

communication

Nurse

Pametnega Mesta”, supported by the European Union, European

 Messages

Instructions

Health professionals

Motivations

 Educational

Regional Development Fund and Ministry of Education, Science

Explanations

content overview

 Interactive

coherence during

and Sport.

therapy

 Educational

content

Registered nurse

 Video guides

 Instructions

Reference clinic

7. REFERENCES

[1] Martínez-Olvera, C., Davizon-Castillo, Y. A. 2015.

Modeling the Supply Chain Management Creation of Value

Content editor



— A Literature Review of Relevant Concepts. Business,

Figure 4. High-level representation of a value chain in the

Management and Economics » "Applications of

Healthcare domain.

Contemporary Management Approaches in Supply Chains"



(Apr- 2015).

Table I Characteristics for the scenario “Establishing

[2] Business Modelling. https://www.enterprise-

treatment”

architecture.org/business-architecture-tutorials/79-business-

value-chain. Accessed: 2017-09-15.

Scenario name

Establishing medical treatment

Description of

The purpose of the scenario is to

[3] Smart City Value Chain. White Paper e-madina. November

the scenario

describe the initialization of the

2016. http://www.e-madina.org/wp-



treatment of a patient with a chronic

content/uploads/2016/11/White-Paper-e-Madina-3.0-Value-

disease. The scenario involves ordering

Chain-of-Smart-cities.pdf.

a patient for a review where the doctor

[4] Porter, M. E. 1985. Competitive Advantage: Creating and

gives them a treatment, and the nurse

Sustaining Superior Performance. New York.: Simon and

introduces treatment information and

Schuster. Retrieved 9 September 2013.

informs the patient of the use of the

assigned equipment and the

[5] "Decision Support Tools: Porter's Value Chain". Cambridge

implementation of the activity.

University: Institute for Manufacturing (IfM). Retrieved 9

September 2013

Variants

If the patient has several treatments, the

doctor will obtain further findings and,

[6] Rayport, J. F., & Sviokla, J. J. 1995. Exploiting the virtual

on the basis of communication with

value chain. Harvard Business Review, 73, 75–85.

other doctors, will form a joint therapy

[7] Curry, E. 2016. The Big Data Value Chain: Definitions,

The trigger of

The process is passed by a patient who

Concepts, and Theoretical Approaches. New Horizons for a

the scenario.

comes to the check due to the problem.

Data-Driven Economy. Springer International Publishing,

Participants

Patient, Health personnel

2016. 29-37.

Included services

A service for entering data processing

[8] Data Tip #1 – Your Digital Value Chain: 2013.

and Editing educational content

http://captricity.com/blog/data-tip-1-your-digital-value-

Scenario input.

Patient information, Data processing

chain/. Accessed: 2017-09-15.

Scenario output

Program, Schedule, Therapies,

Educational content

[9] Understanding the Data Value Chain. 2014.

Activities

Obtaining / entering patient

[http://www.ibmbigdatahub.com/blog/understanding-data-

information, Data entry information,

value-chain]. Accessed: 2017-09-15.

Entering therapy, Establishing patient /

[10] Webb, M., Finighan, R., Buscher, V., Doody, L. and

doctor and doctor / doctor

Cosgrave, E. 2011. Information marketplaces- The new

communication, Editing educational

economics of cities. The Climate Group, Arup, Accenture.

content

(2011).



[11] EkoSmart – Ekosistem pametnega mesta. 2017.

http://ekosmart.net/. Accessed: 2017-09-1

5. CONCLUSION AND FUTURE WORK

Graphical presentation value within any company is an important

part of understanding the focus of business processes, their

22





Using Property Graph Model for Semantic Web Services

Discovery

Martina Šestak



Faculty of Organization and



Informatics

Pavlinska 2, 42000 Varaždin, Croatia





+385 42 390 847

msestak2@foi.hr

ABSTRACT

Through both technologies, the client can access and retrieve the

Web services have significantly contributed to the integration of

required data from a specific Web service by invoking the correct

different businesses. Service-oriented computing (SOC) paradigm

interface method of that service.

still represents an implementation challenge for developers. Several

According to [16], each Web service should be capable of being

approaches have been developed over the years for different

defined, described, and discovered. Web service description

processes related to Web services. Nowadays, traditional Web

process can be divided into three layers[14]:

services are often supplemented with semantics to achieve higher

levels of automation and interoperability. In this paper, a new

1.

service invocation

approach for semantic Web services discovery based on property

2.

service publication and discovery description

graphs is proposed. The proposed model proves that the semantic

3.

composite web services description

Web service model specified in OWL-S language can be

Recently, the concept of SOA has been supplemented with

represented as a property graph, which can be queried to discover

semantic Web concepts, which resulted in semantic Web services

Web services based on query parameters.

(SWS) technology. SWS enables the Web services to be automated

and carried out by intelligent software agents [4]. In SWS,

Categories and Subject Descriptors

additional meaning is added to the basic Web service information.

H.3.3 [Information Storage and Retrieval]: Information Search

Thus, the main motivation behind this technology is to increase the

and Retrieval – Query formulation, Retrieval models, Search

level of automation of information processing, and to improve the

process, Selection process.

interoperability of Web services.

H.3.5 [Information Storage and Retrieval]: Online Information

There are several languages developed for semantic Web services

Services - Web-based services.

as well. OWL-S is the most popular Web service ontology used for

SWS description. In OWL-S, a semantic Web service description

General Terms

consists of three elements [8]:

Design, Languages, Standardization, Theory.

1.

service profile – contains general information about the

Keywords

service

(name,

description,

inputs,

outputs,



preconditions, results)

Labeled property graph model, web services discovery, PGQL.

2.

service model – contains information about how the



service works (by using structures like loops, sequences,

etc.)

1. INTRODUCTION

3.

service grounding – contains information about how to

Application integration is an important challenge in the modern

use the service

business environment. Over the years, many concepts and solutions

In this paper, the focus will be on the service model element. The

have been developed to address this challenge (e.g., middleware or

process of semantic Web services discovery will be discussed by

Enterprise Application Integration solutions). The most recent

analyzing several models proposed in the literature. Based on this

solution for integrating multiple applications are Web services.

work, a new approach will be proposed and explained.

Their compliance with the existing Web technologies and standards

and platform independency represent a significant advantage.

The rest of the paper is organized as follows: in Section 2, different

approaches for semantic Web services discovery process will be

Web services can be defined as a “software system designed to

discussed. In Section 3 and 4, labeled property graph model and the

support interoperable machine-to-machine interaction over a

property graph query language (PGQL) properties will be

network. It has an interface described in a machine-processable

explained. In Section 5, the new approach will be introduced and

format (specifically WSDL). Other systems interact with the Web

described. Finally, a conclusion will be made to summarize the

service in a manner prescribed by its description using SOAP

characteristics of the proposed approach and challenges, which will

messages, typically conveyed using HTTP with an XML

be further analyzed in future work.

serialization in conjunction with other Web-related standards.”[17].



Nowadays, due to the information overload of SOAP messages,

2. SEMANTIC WEB SERVICES

RESTful Web services are used more often. Since their focus is on

resources[11], the messages exchanged between applications have

DISCOVERY APPROACHES

a simpler format, which makes REST services a simpler alternative

Web services discovery in general is “the act of locating a machine-

to SOAP-based services, which is more applicable in many

processable description of a Web service that may have been

situations.

23



previously unknown and that meets certain functional criteria”

attributes. The proposed QoS approach is agent-based, i.e.,

[15]. The goal of the process is to find an appropriate service within

introduces the additional mediator Agent, which selects the most

the Web service directory which meets some predefined criteria. It

appropriate Web service available based on different QoS

is worth mentioning that in recent years the importance of other

parameters set by the clients (users). The authors built an ontology

nonfunctional criteria (e.g., reliability, response time, availability,

of selected QoS parameters, and used that ontology on the model

etc.)[12] has also been recognized, which led to the development of

built in OWL-S Editor available in Protégé.

different Quality of Service (QoS) modeling approaches in the

Klusch et al. [5] introduced a hybrid SWS matching approach with

(semantic) Web services description, discovery and composition

a mechanism called OWLS-MX, which they applied to services

processes.

specified in OWL-S. The authors managed to prove that logical

As already mentioned, OWL-S is one of the ontology languages,

reasoning is not sufficient for semantic Web services discovery, so

which can be used in the SWS discovery process. OWL-S service

they combined logical reasoning with information retrieval (IR)

model contains Web services viewed as a collection of processes,

similarity metrics.

which represent the specification of how the client interacts with

Srinivisan et al. [13] presented an OWL-S/UDDI matchmaker

the service [9]. If the process receives information and returns some

architecture. The authors used OWL-S Integrated Development

new information based on its input, the information production is

Environment (IDE) to build and discover OWL-S based Web

described only by specifying inputs and outputs of that process.

services. OWL-S IDE supports various processes of the SWS

Otherwise, if the process makes more complex transformations and

lifecycle (service description, publication, discovery and

changes, then the production is described by the process

execution). The Web service description can be generated based on

preconditions and results [9]. A process may require some

code or model within the OWL-S Editor. Services descriptions are

information to be executed, i.e., it can have any number of inputs,

stored inside OWL-S registry. Web service discovery is performed

and it can also produce any number of outputs for Web service

through executing a query to the registry by using a specific

requestors. Thus, process inputs and outputs specify the data

Application Programming Interface (API). The registry performs a

transformation, which takes place during the process execution.

matching process, and returns OWL-S descriptions of the matched

A sample OWL-S service model is shown in Fig. 1. A Web service

services.

called “BorrowedBooks” is shown as a process, which returns the

Transaction ID, client name, date when a requested book was

4. PROPERTY GRAPH MODEL

borrowed, and whether the book was returned for a given title of

The property graph data model is nowadays the base data model for

the book and its author. Input information is shown as an incoming

many graph databases (e.g., Neo4j, Titan, etc.). This model is an

edge, and output information as the outgoing edge from the process.

easy-to-understand representation of the way data is stored in graph

databases. Since it represents an extension to graphs in

Many different approaches have been proposed in the literature for

mathematics, it can be formalized in the following way [1]:

the discovery (selection) process.

A property graph G is a tuple (V,E, ρ, λ, σ), where:

a.

V is a finite set of vertices (or nodes).

b.

E is a finite set of edges (or relationships) such that V

and E have no elements in common.

c.

ρ : E → (V × V ) is a total function. Intuitively, ρ(e) =

(v1, v2) indicates that e is a directed edge from node v1

to node v2 in G.

d.

λ : (V ∪ E) → Lab is a total function with Lab a set of

labels. Intuitively, if v ∈ V (resp., e ∈ E) and ρ(v) = l

(resp., ρ(e) = l ) , then l is the label of node v (resp.,

edge e) in G.

e.

σ : (V ∪ E) × Prop → Val is a partial function with



Prop a finite set of properties and Val a set of values.

Intuitively, if v ∈ V (resp., e ∈ E), p ∈ Prop and σ(v,

Figure 1. Sample OWL-S service model

p) = s (resp., σ(e, p) = s), then s is the value of property



p for node v (resp., edge e) in the property

3. RELATED WORK



(Labeled) property graph data model consists of the following

In [2], authors have divided different SWS discovery approaches

elements [6]:

into three categories: algebraic, deductive and hybrid approaches.



Nodes - different entities with attributes and unique

The algebraic approach includes approaches based on graph theory

identifier

(e.g., iMatcher, AASDU1, etc.), the deductive approach uses logic-



Labels - semantical description of the role of each entity,

based reasoning (e.g., Inputs and Outputs, Preconditions and

where a single node or relationship can have multiple

Effects matching, etc.), while the hybrid approach combines both

labels at the same time

algebraic and deductive approach.



Relationships - connections between nodes, where each

Sachan et al. [12] proposed a new modeling approach for QoS-

connection has a start and an end node

based semantic WS model and formalization of several QoS



1 AASDU (Agent Approach for Service Discovery and Utilization)

24







Properties - key-value pairs, which represent node and

The pattern (n)-[e]->(m) defined in the WHERE clause represents

relationship attributes

a topology constraint, which is a description of a connectivity

relationship between vertices and edges in the pattern [10].

A simple property graph model shown in Fig. 2 contains 3 nodes.

Node labeled “Group” has a property “Name”, and the other two

For n-step hops between nodes it is possible to specify path

nodes with no labels have properties “Name” and “Age”. These two

expressions, which are then used in the WHERE clause of the

nodes are connected with a relationship labeled “knows”, which has

query.

a property “Since” indicating for since when the two persons know

each other.

6. PROPERTY GRAPH-BASED

APPROACH FOR SEMANTIC WEB

SERVICES DISCOVERY

Since the OWL-S representation of the service model is a graph,

the characteristics of property graph models can be applied to the

OWL-S service model. This graph-based service model can then be

queried using the PGQL clauses during the semantic Web services

discovery process.

The proposed approach will be explained on the sample Web

service model shown in Fig. 3. The Web service called

“MovieService” contains the following three processes:

1.

GetMovieGenres – for a given movie name and year

returns the genre name of that movie

2.

GetMoviePersonnel – for a given movie name and year



return the list of all actors’ and directors’ names of that

Figure 2. Sample property graph model [3]

movie

Property graphs represent an expressive and simple mechanism for

3.

GetGenreDirectors – for a given genre name returns the

describing the richness of data [7], where a connection between two

list of directors’ names which produced movies in that

nodes is easily represented, and both nodes and relationships can

genre

have various attributes of different complexity. The property graph

model is an easy-to-understand representation of the property

graph, which is why it can be used for modeling semantic

information about Web services. As shown in the previous section,

the OWL-S service model is also a directed graph. Thus, in the

proposed approach the described concepts of the property graph

data model will be applied to the OWL-S service model.

However, in order to efficiently query the property graph model,

several query languages have been developed. In the following

section, the characteristics of the Property Graph Query Language

(PGQL) will be discussed.



5. PROPERTY GRAPH QUERY



LANGUAGE (PGQL)

Figure 3. Sample service model of the proposed approach

PGQL is a new SQL-like query language for property graphs

The sample service model contains three processes with different

developed by Oracle [10]. The language offers a wide collection of

number of input and output parameters. Since this is a property

statements to be executed in order to query the property graph and

graph, both nodes and relationships can be supplemented with

find the required data.

additional labels and properties. The processes shown in this model

are simple, so they are described only with their input and output

PGQL is based on graph pattern matching algorithm, i.e., when

executing a PGQL query, the query engine finds all subgraphs

parameters.

within the original graph, that match the specified query pattern.

In the example, the nodes representing the processes have a label

(type) “Process”, which distinguishes them from parameter nodes

To query a property graph, the SELECT clause is used, which

labeled “Parameter”. A parameter node can represent both an input

specifies the data entities to be returned in the query result. In the

and an output parameter of the process (e.g., parameter “Genre

example property graph shown in Fig. 3, to return the name of the

Name” is the output parameter of the “GetMovieGenres” process,

persons who know each other, the following PGQL would be

but the input parameter of the “GetGenreDirectors” process).

executed:

The defined property graph service model can be queries by using

SELECT n.name, m.name

PGQL.

WHERE

In order to discover (find) Web services, which use movie name as

an input parameter, the following PGQL query would be executed:

(n WITH type=’Person’)-[e:knows]->(m WITH type=’Person’)

SELECT s.name

WHERE (p1 WITH name = ‘MovieName’)-[e1]->(s)

25

It is also possible to discover which Web services can be called to

[3]

Getting started with SylvaDB: http://sylvadb.com/get-

find director names for a given movie name with the following

started/. Accessed: 2017-09-14.

PGQL query:

[4]

Klusch, M. 2008. Semantic web service description.

PATH get_directors := ()-[]->(s WITH type = ‘Process’)-[]->()

CASCOM: intelligent service coordination in the semantic

web. Springer. 31–57.

SELECT s.name

[5]

Klusch, M., Fries, B. and Sycara, K. 2006. Automated

WHERE

Semantic Web Service Discovery with OWLS-MX.

(p1 WITH name = ‘MovieName’)-/:get_directors*/->

Proceedings of the Fifth International Joint Conference

(p2 WITH name = ‘DirectorNames’)

on Autonomous Agents and Multiagent Systems (New

York, NY, USA, 2006), 915–922.



[6]

Lal, M. 2015. Neo4j Graph Data Modeling. Packt

The proposed property graph model representing the service model

Publishing Ltd.

can be easily implemented in a graph database (e.g., Neo4j). By

using different libraries developed for connecting with graph

[7]

Malak, M. and East, R. 2016. Spark GraphX in action.

databases, a developer can create Web services input and output

Manning Publications Co.

parameters as node instances (classes), store them in a graph

[8]

Martin, D., Paolucci, M., McIlraith, S., Burnstein, M.,

database instance, and include them in a specific Web service class

McDermott, D., McGuinness, D., Parsia, B., Payne, T.R.,

definition. The PGQL queries can then be executed on the database

Sabou, M., Solanki, M. and others 2004. Bringing

instance to find the necessary web service and other information.

semantics to web services: The OWL-S approach. (2004).

Therefore, the property graph model represents a new approach,

[9]

OWL-S:

Semantic

Markup

for

Web

Services:

which combined with the PGQL query language could be used for

https://www.w3.org/Submission/OWL-S/.

semantic Web services discovery. At this moment, the model can

[10]

PGQL

1.0

Specification:

2017.

http://pgql-

be used to represent a simplified OWL-S service model without

lang.org/spec/1.0/. Accessed: 2017-09-15.

including QoS parameters mentioned in the previous sections.

[11]

Rodriguez, A. 2008. Restful web services: The basics.



Online article in IBM DeveloperWorks Technical Library.

7. CONCLUSION AND FUTURE WORK

November (2008), 1–11.

In this paper, the characteristics of semantic Web services have

[12]

Sachan, D., Dixit, S.K. and Kumar, S. 2014. QoS aware

been discussed with a special focus on SWS discovery approaches.

formalized model for semantic Web service selection.

Based on the OWL-S ontology language and its graph

International Journal of Web & Semantic Technology. 5,

representation of semantic Web service model, a new approach has

4 (2014), 83.

been proposed. The approach includes using property graphs to

model semantic Web services, and discovering the required

[13]

Srinivasan, N., Paolucci, M. and Sycara, K. 2006.

services by executing PGQL queries on that property graph. The

Semantic web service discovery in the OWL-S IDE.

System Sciences, 2006. HICSS’06. Proceedings of the 39th

built property graph can be implemented in graph databases to build

a graph database of existing Web services and used for SWS

Annual Hawaii International Conference on (2006), 109b-

composition process. Future work includes extending the model by

-109b.

adding Web services methods (operations), and by including and

[14]

Varga, L.Z. and Sztaki, Á.H. 2005. Semantic Web

verifying different QoS parameters against the proposed model.

Services Description Based on Web Services Description.



W3C Workshop on Frameworks for Semantics in Web

Services (2005).

8. REFERENCES

[15]

Web

Services

Architecture:

2004.

[1]

Angles, R. A Foundations of Modern Graph Query

https://www.w3.org/TR/ws-arch/.

Languages.

[16]

Web

Services

Architecture

Requirements:

2002.

[2]

Bitar, I. El, Belouadha, F.-Z. and Roudies, O. 2014.

https://www.w3.org/TR/2002/WD-wsa-reqs-20021011.

Semantic web service discovery approaches: overview

[17]

Word Wide Web Consortium 2004. Web Services

and limitations. arXiv preprint arXiv:1409.3021. (2014).

Architecture.



26





Statecharts representation of program execution flow

Nataša Sukur

Gordana Rakić

Zoran Budimac

Faculty of Sciences

Faculty of Sciences

Faculty of Sciences

University of Novi Sad

University of Novi Sad

University of Novi Sad

Trg Dositeja Obradovića 3

Trg Dositeja Obradovića 3

Trg Dositeja Obradovića 3

21000 Novi Sad, Serbia

21000 Novi Sad, Serbia

21000 Novi Sad, Serbia

nts@dmi.uns.ac.rs

goca@dmi.uns.ac.rs

zjb@dmi.uns.ac.rs

ABSTRACT

1.

INTRODUCTION

Source code and software in general is prone to errors. This

Formal methods are getting more and more attention in the

is due to bad design decisions, lack of experience of devel-

world of software.

In the beginning, using formal repre-

opers and constant need to change the existing software.

sentation was more usual for hardware than for software.

Because the code changes rapidly and in strict time limita-

Software is more complex in terms of state components and

tions, it is not always possible to preserve the quality and

the process to produce abstract models is more difficult.[1,

reliability of the source code. It is very important to detect

3] Formal verification of software systems is seldom auto-

errors in time and to fix or remove them in an automated or

mated. Usually, an abstract model is manually created in

semi-automated manner. Automation of error detection and

order to perform formal verification of a large system. This

removal can be accomplished by various tools or platforms.

requires investing a considerable amount of time and ex-

The platform which is used for software quality analysis in

pertise. However, because the model was manually made,

this case is a static analysis oriented, language independent,

analysis of this model cannot be considered reliable. Sys-

SSQSA (Set of Software Quality Static Analyzers), which

tems are also usually large and change quickly. This means

is based on an universal intermediate source code represen-

that it is very complex to continuously create new and up-

tation, eCST (enriched Concrete Syntax Tree). The tree

date existing abstract models, as well as expensive, prone to

structure is useful for code representation and comprehen-

errors and difficult to optimize. For all those reasons, an au-

sion, however its structure is not immediately suitable for

tomated solution to abstract model generation is needed.[2,

representing program control flow. That is why the control

5] Also, there has been a lot of foundational work on defin-

flow graphs were introduced to SSQSA and were represented

ing safe abstractions, but research for model reduction has

later visually in the form of higher level automata, stat-

not been explored enough.[3]

echarts. Apart from introducing formal representation of

control flow graphs, statecharts introduced additional func-

There are many approaches to software analysis and soft-

tionalities to SSQSA. Some of them are hierarchical control

ware quality measurement. Algorithms that are implemented

flow graph representation, possibility of simulation in one

for this purpose usually operate on some internal represen-

or more parallel work flows, also planned to be expanded to

tations of this code, such as trees, graphs or some meta

interprocedural level, and performing various kinds of esti-

models.[5] SSQSA platform [9] uses its own structure, called

mation.

enriched Concrete Syntax Tree and derived graph-based rep-

resentations. Some of the algorithms for software quality

Categories and Subject Descriptors

measurement in SSQSA use this intermediate structure to

perform their calculations and analysis, such as software

D.2.4 [Software Engineering]: Software/Program Verifi-

metrics analysis, timing analysis and code clone detection.

cation—Formal methods

Enriched Concrete Syntax Tree is the key of language in-

General Terms

dependence of SSQSA. It is a syntax tree which contains

Measurement, Languages

all concrete tokens from the original source code, includ-

ing comments, and enriched by a predefined set of universal

Keywords

nodes. These nodes are created in order to generalize dif-

software quality, static analysis, formal methods, control

ferent structures of various programming languages whose

flow graphs, statecharts

purpose is the same. The set of universal nodes is mini-

mal, and the nodes were carefully selected, so that they are

applicable to the structure of all languages supported by

SSQSA. They are inserted as imaginary nodes in this tree

together with the original source code tokens. For example,

if we represented a semicolon and a comma with a univer-

sal node SEPARATOR, we would also have the data that

precisely represents the SEPARATOR universal node. The

whole source code is represented like this and it is possible to

completely reconstruct the original source code from eCST.

27

Sometimes it is not necessary to have a structure which con-

activity diagram. It has support for 43 languages, such as C,

tains the amount of data such as eCST and it is not optimal

COBOL, Fortran, Java, JavaScript, Pascal, PHP and Ruby.

for all kinds of analysis. That is why it was necessary to

It has also support for simulation, but it has a problem when

introduce Control Flow Graphs [6] to this platform. They

it is presented with large pieces of code - it becomes hard to

were derived from eCST by extracting only the nodes which

analyze and has no hierarchical representation.

were of importance for control flow analysis.

Graphviz (Graph Vizualization Software)2 is open source

Although this was an upgrade to SSQSA features and some

and it is used for graph visualization. Graphviz has wide use

additional algorithms were implemented on this structure,

in networking, bioinformatics, software engineering, database

there was still a problem in the case of larger pieces of code,

design, machine learning... This tool performs graph draw-

where the resulting graphs were quite complex [1]. Repre-

ing, based on specifications in DOT3 language. The down-

senting control flow graphs in a hierarchical manner seemed

side is that the graphs have to be first specified in DOT

like a good solution for reduction of complexity. That is one

language in order to be represented by Graphviz. It is also

of the reasons why statecharts were introduced to SSQSA.

not able to simulate control flow behavior.

Apart from solving the complexity issue, they give us the

possibility of simulation, parallel execution and estimation.

Apart from these, there are also other tools that have some

similar features, such as MOOSE4 and RefactorErl5. How-

The rest of the paper is organized as follows: Section 2 re-

ever, some of them are not oriented towards formal repre-

flects on some of the related papers and tools. Section 3 de-

sentation and some cannot simulate the represented code.

scribes the introduction of Control Flow Graphs to SSQSA.

Section 4 explains the meaning and purpose of statecharts

3.

CONTROL FLOW GRAPHS IN SSQSA

in SSQSA. In Section 5, an example of a statechart is shown

The control flow graph can be represented as a set

and described in more detail. In Section 6, future work is

presented and finally, we conclude our paper in Section 7.

G = (N, A, S, E)

(1)

2.

RELATED WORK

where N are the nodes of the graph, A is a binary relation N x

N, which represents the graph transitions, S and E are start

There are many tools and papers that deal with visual rep-

and end nodes of the graph [6]. The purpose of control flow

resentations and simulation of the source code. However,

graphs is to track all possible paths of program execution

most of them lack some features. Usually, they focus on

for important reasons, such as to detect dead code or infinite

some specific programming language. Some of them are able

loops. In order to generate them, it was necessary to extract

to represent the control flow or the structure of programs in

them from eCST. The subset of universal nodes was selected

a visual way, but most of them do not have the possibility of

and it was enough to represent the program flow accurately.

simulation and testing. If there are testable representations,

then another problem emerges: the state explosion problem

Some nodes of importance for the control flow representa-

[1]. Also, many of them are not formal representations and

tion are statement nodes, such as assignment statements

not all of them are platform independent.

and function calls. Apart from them, nodes which were also

included in the graph are branch statements, branches and

There have already been some attempts at creating formal

loop statements, as well as their corresponding condition

representations of source code. In papers, such as [2] it is ex-

nodes. Some pieces of information were included because of

plained how the authors extracted finite state models from

their importance for generating statecharts. Statecharts are

Java source code. Their approach also addresses the state

highly structured and it is important for us to preserve that

explosion problem and tries to solve it. However, this solu-

structure by saving information about nodes such as com-

tion is only applicable to Java code.

pilation unit, block scope and function declaration in the

control flow graph.

Work by [7] deals with interprocedural graphs. The inten-

tion is to check if fixing errors in code actually eliminate

The control flow graph that was first created in SSQSA fo-

them and whether fixing errors means introducing some new

cused only on one function or procedure. The starting node

ones. The whole approach is based on static analysis and

was the entry point of this function/procedure. The rest of

it generates the graph based on all existing versions of pro-

the control flow graph was created by extracting previously

gram and tries to discover and fix faults and propagate that

defined nodes of interest from eCST and connecting them in

to newer versions.

a way that they represent control flow of the original source

code. This graph is directed and it has cycles, which exist

A PhD thesis [10] performs static analysis of programs, based

due to the nature of source code. If the language indepen-

on their dependence graphs. The idea is that a single state-

dent condition evaluation is successfully implemented (the

ment can affect some other statements and parts of code.

number of repetitions of some cycles is calculated), it will

This approach is formal and language independent. It fo-

be possible to perform calculations, such as worst case ex-

cuses on sequential, imperative programs.

ecution time estimation. Currently, we are trying to create

interprocedural graphs [8], described in detail in Section 6.

2.1

Related tools

2

Visustin v7 Flow chart generator1 is a visualization tool that

http://www.gnu.org/softwarhttp://www.graphviz.org/

3

can represent code in the form of program flow diagram and

http://www.graphviz.org/doc/info/lang.html

4The MOOSE book, http://www.themoosebook.org/book

1http://www.aivosto.com/visustin.html

5http://plc.inf.elte.hu/erlang/

28



4.

STATECHARTS

as complex states. For example, these are compilation units

Statecharts are defined as a visual formalism for complex

which can consist of elements such as functions, or func-

systems [4]. They were later included in UML diagrams.

tion declarations which can contain statements. Some state-

Another name for this diagram is Harel’s automaton6. The

ments are also represented as complex states because they

main benefit of using statecharts reflects in the ability of

contain other statements, such as branch statements (and

representing parallel states, tracking history within complex

their branches) and loop statements. These complex states

states and tracking values of variables throughout the flow.

can contain other simple or complex states. Universal nodes

They are highly expressive - they are able to show a very de-

that were represented as simple states can only trigger some

tailed preview of the system which is to be created, but they

events or manipulate the variable values. Statecharts also

can also be very compact due to their hierarchical nature and

have some additional states related to entering and exiting

show only the system on higher levels of abstraction.

the whole statechart or its complex parts.

Statecharts take care of modelling hierarchy, concurrency

For the purpose of testing statecharts, Yakindu SCT7 was

and communication and that makes them important for

used. Parts of this tool are open source. Some of the limita-

tracking complex real-time systems. Reactive systems are

tions are directly related to this tool, such as lack of complex

event driven and they constantly have to react on various

data types. For now, only integer, real, boolean, string and

kinds of internal and external events. It was very difficult

void are supported. These data types are enough for the

to represent them in a way that was realistic, but also for-

proof of concept phase, but it is necessary to additionally

mal and precise. Statecharts are a solution to that problem,

implement these features or replace this tool with one that

since they make the process of specifying and designing these

can also represent nontrivial pieces of code.

complex systems a lot easier and more natural. All possi-

ble behavior of reactive systems was easily represented by a

5.

EXAMPLE

set of allowed in and out events, conditions and actions and

In Figure 1, we present a statechart generated based on a

some time limitations.

piece of code written in programming language Modula-2.

It represents a part of an algorithm which counts factoriel

Dynamic behavior of a complex system is easily represented

of a given number. The purpose of this figure is to show how

by using states and events. The system is always in at least

the part of code which is a loop is represented in a simplified

one state and if some event occurs, it transitions to another

way. In Figure 2, we present how this loop statement looks

state under some conditions. A transition can occur from

like when it is expanded and what is really happening inside

one state to another, which can all happen inside a com-

this complex state.

plex state. Transitions can also be recursive if they have

the same state as origin and target. Some transitions can

The same algorithm was implemented in Java programming

cause to exit or enter a complex state. They can also trig-

language. The resulting statecharts are identical to the ones

ger events, which affect the simulation. If standard finite

in Figure 1 and 2. A more detailed preview and comparison

automata were used for this purpose, they would be very

of different algorithms can be found in [11].

difficult to understand due to very large complexity because

of the generated number of states. Statecharts use hierarchy,

they provide us modularity and good structure and make it

6.

FUTURE WORK

easy to represent independent parallel execution.[4]

Although statecharts are currently focused on representing

the control flow of one function or procedure, the idea is to

4.1

The importance of statecharts for SSQSA

make an interprocedural representation, first on the level of

a compilation unit and then to expand it to represent the

Statecharts introduced graphical representation of Control

Flow Graphs to SSQSA and created a more compact version

7https://www.itemis.com/en/yakindu/state-machine/

of them. Also, because statecharts are a formal representa-

tion, the system is represented in a non-ambiguous way and

it becomes a very trivial problem to detect program paths

which are possible and the ones which are not. This com-

ponent is useful for testing parts of code in early stages of

development. It has some features of a debugger - it can

show us if parts of code show odd behavior, why the con-

trol flow unexpectedly changes, and it includes much more

visualization in representing what is currently happening.

4.2

Implementing statecharts in SSQSA

Statecharts consist of two kinds of states, complex (which

can also be orthogonal, with parallel regions) and simple

ones. Based on the nature of the universal nodes that were

used (if they represented complex or simple program struc-

tures), they were transformed into suitable kind of states.

Universal nodes that stand for complex program constructs

Figure 1:

Statechart based on sample code in

(i.e. that contain other program elements) were represented

Modula-2, which shows how factoriel of a number

6After David Harel, the creator of statecharts

is calculated. The loop statement is collapsed.

29



ment and running the code. Statecharts could also be used

to introduce new people to various projects. One will be

able to view the system on different levels of abstraction

and take a step by step approach in getting familiar with it.

A statechart is more dynamic than a simple diagram which

represents project structure. Statecharts in SSQSA would

mean the ability of viewing systems, simulating them, and

not having to worry if parts of this system are written in

different languages.

It is also important to note that statecharts are not intro-

duced to SSQSA only to visualize and simulate the system.

They are important for predicting different outcomes if some

parts of code are executed and to evaluate qualities, such as

correctness and reliability.[4]

8.

REFERENCES

[1] E. M. Clarke, W. Klieber, M. Nováček, and P. Zuliani.

Model checking and the state explosion problem. In

Tools for Practical Software Verification, pages 1–30.

Springer, 2012.

[2] J. C. Corbett, M. B. Dwyer, J. Hatcliff, S. Laubach,

C. S. Pasareanu, H. Zheng, et al. Bandera: Extracting

Figure 2:

Statechart based on sample code in

finite-state models from java source code. In Software

Modula-2, which shows the expanded loop state-

Engineering, 2000. Proceedings of the 2000

ment in factoriel calculation.

International Conference on, pages 439–448. IEEE,

2000.

complete software. This will be done using the graph depen-

[3] M. B. Dwyer, J. Hatcliff, R. Joehanes, S. Laubach,

dency networks. Once the control flow graphs for all pro-

C. S. P˘

as˘

areanu, H. Zheng, and W. Visser.

cedures and functions are constructed, function call nodes

Tool-supported program abstraction for finite-state

will be detected and these graphs will be connected into one,

verification. In Proceedings of the 23rd international

which represents the whole system. By implementing this, it

conference on software engineering, pages 177–187.

will be possible to improve statecharts, also in an interpro-

IEEE Computer Society, 2001.

cedural way. So far, statecharts were tested mostly on one

[4] D. Harel. Statecharts: A visual formalism for complex

object-oriented language (Java) and one procedural Pascal-

systems. Science of computer programming,

like language (Modula-2). Therefore, there is also room for

8(3):231–274, 1987.

improvement in terms of testing how statecharts are gener-

[5] G. J. Holzmann and M. H Smith. Software model

ated based on source code written in other languages.

checking: Extracting verification models from source

code. Software Testing, Verification and Reliability,

Another idea is that, if we succeeded in refining statecharts

11(2):65–79, 2001.

to the lowest level and if we introduced the environment

[6] J. Laski and W. Stanley. Software verification and

variable, we would greatly improve simulation of the source

analysis: An integrated, hands-on approach. Springer

code in the evaluator. That would mean having the most

Science & Business Media, 2009.

realistic representation of how whole or a part of source code

[7] W. Le and S. D. Pattison. Patch verification via

would execute in reality.

multiversion interprocedural control flow graphs. In

Proceedings of the 36th International Conference on

7.

CONCLUSIONS

Software Engineering, pages 1047–1058. ACM, 2014.

[8] F. Nielson and H. R. Nielson. Interprocedural control

The eCST provides us with complete information about the

flow analysis. In ESOP, volume 99, pages 20–39.

source code. Therefore, possible limitations will not be re-

Springer, 1999.

lated to lack of information about the source code.

The

[9] G. Rakić. Extendable and Adaptable Framework for

true challenge will be to represent everything that is impor-

Input Language Independent Static Analysis, Novi Sad

tant in a manner suitable to the nature of statecharts. Our

Faculty of Sciences, University of Novi Sad. PhD

approach has proven feasible so far, but that will be put un-

thesis, doctoral dissertation, 2015.

der further inspection once more complex pieces of code are

introduced.

[10] J. A. Stafford and A. L. Wolf. A formal,

language-independent, and compositional approach to

Once we have a component that is possible to visualize and

interprocedural control dependence analysis. PhD

simulate the complete code under analysis and test existing

thesis, University of Colorado, 2000.

systems or the ones that are still under development, detect-

[11] N. Sukur. Reprezentacija toka izvrsavanja programa

ing obvious flaws in source code design and execution will

dijagramom stanja, nezavisna od ulaznog jezika.

be trivial. The tool will be able to simulate the execution

Master’s thesis, Faculty of Sciences, University of Novi

of the code without the need for setting up the environ-

Sad, Serbia, 9 2016. in Serbian.

30





Code smell detection: A tool comparison

Tina Beranič

Zlatko Rednjak

Marjan Heričko

Faculty of Electrical Engineering and Faculty of Electrical Engineering and Faculty of Electrical Engineering and Computer Science,

Computer Science,

Computer Science,

University of Maribor

University of Maribor

University of Maribor

Maribor, Slovenia

Maribor, Slovenia

Maribor, Slovenia

tina.beranic@um.si

zlatko.rednjak@gmail.com

marjan.hericko@um.si





ABSTRACT

between detected code smells and the value of calculated

Technical debt can be identified with different techniques,

technical debt.

including code smell detection. Furthermore, different approaches

The article is organized as follows. First, the theoretical

are available to detect code smells and some of those approaches

background about technical debt and code smell is presented.

are implemented among different tools. In this article, different

Section 4 presents the case study we carried out while section 5

tools for code smell detection were selected with the goal of

and 6 contain a discussion of obtained results. The article is

comparing their outputs. The fact is that compared tools detect

concluded with section 7.

different code smells with varying degrees of success and that the

intersection of detected code smells within tools is very small.

2. TECHNICAL DEBT

Because of this, the connection between detected code smells and

the value of technical debt is hard to define. The results are

The technical debt was first named in 1992 by Ward Cunningham

supported with empirical analysis from 32 software projects,

as “not-quite-right code” [1]. Through the years, technical debt

different code smells and 3 code smell detection tools.

has won more recognition and its basic meaning has been

upgraded. Since it represents a metaphor that can be subjectively

understood [2], [3], a single generally accepted definition of

Categories and Subject Descriptors

technical debt still cannot be found.

D.2.8 [Software Engineering]: Metrics

Nevertheless, authors describe technical debt as a set of decisions

taken within a project. Those decisions usually bring short-term

D.2.9 [Software Engineering]: Management-Software quality

success, but in the future they can cause problems, which are

assurance (SQA)

resolved with more effort than would be needed in the beginning

General Terms

[4]–[9].



Measurement

Through the time, different types of technical debt were

introduced. For example design debt, architecture debt,

Keywords

documentation debt, test debt, code debt and environment debt



[5]–[7], [10], [11]. Soon, based on research and needs, some

Code smell, technical debt, intersection of detected code smells

other types of technical debt appeared in the literature. For

example, build debt, defect debt, requirement debt, test

1. INTRODUCTION

automation debt, service debt, versioning debt, people debt,

Since there is a growing need for rapid changes, some decisions

process debt and usability debt [5]–[7].

have to be accepted quickly. Those decisions, especially the

inadequate ones, affect the whole software development life cycle

2.1 Technical debt identification

and can have a significant impact on code quality. We have to be

One of the technical debt identification methods is source code

especially careful and pay attention to those program entities that

analysis, where techniques can be divided between a static and

contain irregularities or deficiencies.

dynamic analysis [8], [12]. Each one focuses on a specific field in

a source code and does not provide for the detection of a variety

The technical debt helps evaluate resulting problems and one of

in code smell. In the literature, the most represented ones are the

the most recognized techniques for identifying technical debt is

following techniques:

the detection of code smells. There are different types of code

smells, divided into different groups aimed at achieving a better



Modularity violations [12], [13].

understanding. A lot of tools can be found in the literature that



Design patterns and grime buildup [12], [13].

help detect code smells, but each of them detect code smells with



Code smells [8], [12]–[14].

the help of different techniques.



Automatic static analysis [8], [12], [13].

The goal of our research was to look into available code smell

detection tools and, using selected ones, compare the detected



Source code comments [15].

code smells while also analyzing the intersection of the results.

Since code smell is a technique of identifying technical debt, the

3. CODE SMELL

second part of the research was aimed at finding a connection

One of the techniques for identifying technical debt is code smell

detection. It refers to indicators within source code that point to

31



deeper problems in a software product [16]. Code smell means

4.1 Selected tools and code smells

writing code in a way that violates the principles of best

Based on criteria, two tools for code smell detection were selected

programing practices. The removal of code smells is usually done

and corresponding code smell types. The information is presented

with source code refactoring, where the need for refactoring

in Table 1. Both selected tools are Eclipse extensions. To be able

increases the likelihood of the existence of code smells in

to answer the second question, we have to prepare the SonarQube

software [17].

tool, which detects code smells based on predefined rules. Among

Tufano et al. [18] presents an analysis of occurrence of code

255 rules that indicate the existence of code smells, 12 that follow

smells in software products. Usually code smell emerges when

Fowler definition [28], were selected. Because SonarQube does

adding new functionalities and changing existing ones. This

not enable code smell classification in different groups, this step

interesting finding was also that by refactoring existing source

was done by hand. The rules that follow before mentioned

code new types of code smells can enter.

definitions was combined in a profile and classified into groups.

3.1 Code smell groups and types

Table 1. Selected tools and code smells

Many different types of code smells are defined in the literature.

Tool

Code smell

Tool version

For a better understanding of different types of code smells,

God class, Feature envy, Brain

several groups were defined, each containing different code smell

JSpIRIT

4.3.2

method, Brain class

types [19]: (1) Bloaters represent something in the code that is so

God class, Feature envy, Long

big it cannot be effectively handled; (2) Object-Orientation

JDeodorant

5.0.64

Method

Abusers contain examples where the possibilities of OO design is

God class, Feature envy, Brain

not fully exploited; (3) Change Preventers contain code smells

SonarQube

method, Long method, Brain

5.6.4 LTS

that refers to code structures that considerably hinder software

class

modification; (4) Dispensables present parts that are unnecessary

and should be removed from a source code; (5) Encapsulators

4.2 Analysis of empirical data

joins code smells connected to data communication mechanisms

In the first step, selected projects were analyzed using the selected

or encapsulations; (6) Couplers express the code, which is tightly

tools. The aim was to detect and count different code smell

coupled; (7) Other code smells.

occurrence within different projects. Also, the distribution of

activated rules in SonarQube was done. With this, we gain an

3.2 Code smell detection

insight into the appropriateness of the mapping rules for forming

Code smell detection can be done with the help of software

different groups in SonarQube and different types of code smells

metrics. Different authors connected selected software metrics

in JSpIRIT and JDeodorant. The secondary appropriateness of

with detection of specific types of code smells [20], [21]. Based

mapping rules was done with the intersection of detected code

on the correlation between software metrics and code smell types,

smells among the tools.

tools that identify some of these types have been developed [16],

All gathered data present a starting point for finding correlations

[22]–[24]. When identifying code smells with a software metric it

between code smell and technical debt and comparison of

is important to use reliable software metric threshold values. If

different tools for code smell detection.

thresholds are not set properly, a variety of false positive values

can be detected [25]. Based on this, some other code smell

detection techniques have been developed, for example, a

5. COMPARISON OF DETECTED CODE

technique based on the combination of machine learning

SMELLS

algorithms, which achieved more than 96% accuracy when

Used tools and considered code smells within the case study are

detecting different code smells [26].

presented in Table 1.

4. CASE STUDY

As part of the case study, two research questions were formed:



Are different open source tools for detecting code

smells providing different results?



What is the connection between detected code smells in

selected tools and the value of technical debt calculated

in the SonarQube tool?

To answer these questions, different software projects were

analyzed to gather empirical data. We analyzed the projects

gathered in “Qualitas Corpus.” But since we needed the projects

compiled to byte code to make an analysis, we chose a compiled

version of the Qualitas Corpus [27]. So we can provide analysis



without major changes in source code. In the end, based on

Graph 1 – Identification of God class and Brain class code

criteria, 32 software projects were analyzed. Also, the criteria was

smells within tools

set to select appropriate code smell types and tools for code smell

In this analysis, we combine results for God class and Brain class

detection to be used within a case study. The criteria were

(Graph 1), Brain method and Long Method (Graph 2), while with

inspired by data gathered in a preliminary literature review.

Feature envy (Graph 3) code smell was analyzed independently.

As can be seen in Graph 1 and Graph 2, the SonarQube detected

more God class/Brain class and Long method/Brain method code

32





smells than the tools JSpIRIT and JDeodorand did together.

Since these 12 rules are used for code smell detection, it can be

However, SonarQube has a problem with detecting Feature envy

seen how much they contribute to the overall technical debt of a

code smell, since it was not able to detect it within software

project. But the problem lies in the low intersection between tools

project analysis (Graph 3). For this purpose we look again at the

when detecting code smells. An even more detailed analysis about

rules in SonarQube, but it cannot be stated that one of those rules

activated rules within selected code smells does not bring any

can reliablly detect Feature envy code smell. The rule that we

clearer results.

select is, in our opinion, the one that has the highest probability of

detecting the mentioned code smell. On the other hand, when

7. CONCLUSION

detecting Feature envy (Graph 3), JSpIRIT prevails, which

The case study was made to compare detected code smells among

detected 500 more occurrence than JDeodorand did. The latter has

three different tools: JSpIRIT, JDeodorand and SonarQube. Since

the tendency of detecting God class and Brain method. Overall,

the first two define code smells which are not part of SonarQube,

the most code smells are detected in SonarQube.

the rules within the SonarQube were mapped to a variety of

groups, representing selected code smells.





Graph 2 - Identification of Long method and Brain method

Figure 1 – Intersection between tools for code smell detection

code smells within tools

The detected code smells by tools were compared within three

identified categories, and the intersection of detected code smells

was presented. The intersection between the used tool is very

small (Figure 1). This can be attributed to the use of different

detection techniques in different tools. JDeodorand proved to be

the best at detecting Long method code smell, JSpIRIT at

detecting Feature envy code smell and SonarQube at detecting

God class/Brain class and Long method/Brain method code

smells.

The second part of the case study was aimed at finding a

connection between detected code smell and technical debt in

SonarQube. The intersection of detected code smells between

different tools and SonarQube is very small. We can also add the



fact that the classified rules are not activated proportionally. Since

Graph 3 - Identification of Feature envy code smells within

rules do not contribute equality to a technical debt calculation, the

tools

impact of code smell detection to technical debt cannot be

Based on the results, we identified the intersection of detected

defined.

code smells among the tools, presented in Figure 1. This was done

There are many research opportunities in this area. The rules for

by analyzing a project that was not selected for analysis, but is

code smell detection among tools could be compared in details for

still a part of Qualitas Corpus. When identifying God class/Brain

the purpose of unification. In addition, the future work can be

class code smell, the intersection between all the tools is 2.7% and

oriented in an attempt to provide the generally accepted definition

when detecting Long method/Brain method it is 6.08%. To

of technical debt. At last, selected tools, JSpIRIT and JDeodorand

identify the causes for low intersection, we again look into the

could be upgraded for technical debt calculation.

SonarQube tools. We looked into rules that are activated when

detecting code smell common to other two tools.

8. ACKNOWLEDGMENTS

6. THE CONNECTION BETWEEN CODE

The authors acknowledge the financial support from the

Slovenian Research Agency under The Young Researchers

SMELLS AND TECHNICAL DEBT

Programme (SICRIS/SRA code 35512, RO 0796, Programme P2-

For establishing a correlation between code smell and technical

0057).

debt, the data about time contributed by each of the 12 used rules

in SonarQube was acquired. The technical debt, when all 255

9. REFERENCES

rules were used, was 5,460 days. When we activate just 12

[1]

W. Cunningham, “The WyCash Portfolio Management

selected rules, the technical debt was 1,982 days, which is 27% of

System,” SIGPLAN OOPS Mess. , vol. 4, no. 2, pp. 29–

all technical debt.

30, Dec. 1992.

33

[2]

P. Kruchten, R. L. Nord, and I. Ozkaya, “Technical Debt:

Managing Technical Debt (MTD). pp. 9–15, 2015.

From Metaphor to Theory and Practice,” IEEE Software,

[16]

E. Fernandes, J. Oliveira, G. Vale, T. Paiva, and E.

vol. 29, no. 6. pp. 18–21, 2012.

Figueiredo, “A Review-based Comparative Study of Bad

[3]

N. A. Ernst, S. Bellomo, I. Ozkaya, R. L. Nord, and I.

Smell Detection Tools,” in Proceedings of the 20th

Gorton, “Measure It? Manage It? Ignore It? Software

International Conference on Evaluation and Assessment

Practitioners and Technical Debt,” in Proceedings of the

in Software Engineering, 2016, pp. 18:1–18:12.

2015 10th Joint Meeting on Foundations of Software

[17]

F. A. Fontana, M. Mangiacavalli, D. Pochiero, and M.

Engineering, 2015, pp. 50–60.

Zanoni, “On Experimenting Refactoring Tools to

[4]

J. Yli-Huumo, A. Maglyas, and K. Smolander, “How do

Remove Code Smells,” in Scientific Workshop

software development teams manage technical debt? –

Proceedings of the XP2015, 2015, pp. 7:1–7:8.

An empirical study,” J. Syst. Softw. , vol. 120, no.

[18]

M. Tufano, F. Palomba, G. Bavota, R. Oliveto, M. Di

Supplement C, pp. 195–218, 2016.

Penta, A. De Lucia, and D. Poshyvanyk, “When and

[5]

Z. Li, P. Avgeriou, and P. Liang, “A systematic mapping

Why Your Code Starts to Smell Bad,” in Proceedings of

study on technical debt and its management,” J. Syst.

the 37th International Conference on Software

Softw. , vol. 101, no. Supplement C, pp. 193–220, 2015.

Engineering - Volume 1, 2015, pp. 403–414.

[6]

N. S. R. Alves, T. S. Mendes, M. G. de Mendonça, R. O.

[19]

M. Mantyla, J. Vanhanen, and C. Lassenius, “A

Spínola, F. Shull, and C. Seaman, “Identification and

taxonomy and an initial empirical study of bad smells in

management of technical debt: A systematic mapping

code,”

International

Conference

on

Software

study,” Inf. Softw. Technol. , vol. 70, pp. 100–121, Feb.

Maintenance, 2003. ICSM 2003. Proceedings. pp. 381–

2016.

384, 2003.

[7]

N. S. R. Alves, L. F. Ribeiro, V. Caires, T. S. Mendes,

[20]

F. A. Fontana, V. Ferme, A. Marino, B. Walter, and P.

and R. O. Spínola, “Towards an Ontology of Terms on

Martenka, “Investigating the Impact of Code Smells on

Technical Debt,” 2014 Sixth International Workshop on

System’s Quality: An Empirical Study on Systems of

Managing Technical Debt. pp. 1–7, 2014.

Different

Application

Domains,”

2013

IEEE

[8]

N. Zazworka, R. O. Sp’\inola, A. Vetro’, F. Shull, and C.

International Conference on Software Maintenance. pp.

Seaman, “A Case Study on Effectively Identifying

260–269, 2013.

Technical

Debt,” in

Proceedings of the 17th

[21]

F. A. Fontana and S. Spinelli, “Impact of Refactoring on

International Conference on Evaluation and Assessment

Quality Code Evaluation,” in Proceedings of the 4th

in Software Engineering, 2013, pp. 42–47.

Workshop on Refactoring Tools, 2011, pp. 37–40.

[9]

M. Fowler, “TechnicalDebt,” 2003. [Online]. Available:

[22]

A. Hamid, M. Ilyas, M. Hummayun, and A. Nawaz, “A

https://martinfowler.com/bliki/TechnicalDebt.html.

Comparative Study on Code Smell Detection Tools,” Int.

[Accessed: 14-Sep-2017].

J. Adv. Sci. Technol. , vol. 60, pp. 25–32, 2013.

[10]

E. Tom, A. Aurum, and R. Vidgen, “An exploration of

[23]

A. Chatzigeorgiou and A. Manakos, “Investigating the

technical debt,” J. Syst. Softw. , vol. 86, no. 6, pp. 1498–

Evolution of Bad Smells in Object-Oriented Code,” 2010

1516, 2013.

Seventh International Conference on the Quality of

[11]

C. Fernández-Sánchez, J. Garbajosa, C. Vidal, and A.

Information and Communications Technology. pp. 106–

Yagüe, “An Analysis of Techniques and Methods for

115, 2010.

Technical Debt Management: A Reflection from the

[24]

F. A. Fontana, P. Braione, and M. Zanoni, “Automatic

Architecture

Perspective,”

2015

IEEE/ACM

2nd

detection of bad smells in code: An experimental

International Workshop on Software Architecture and

assessment.,” J. Object Technol. , vol. 11, no. 2, p. 5: 1–

Metrics. pp. 22–28, 2015.

38, 2012.

[12]

N. Zazworka, A. Vetro’, C. Izurieta, S. Wong, Y. Cai, C.

[25]

F. A. Fontana, V. Ferme, M. Zanoni, and A. Yamashita,

Seaman, and F. Shull, “Comparing Four Approaches for

“Automatic Metric Thresholds Derivation for Code

Technical Debt Identification,” Softw. Qual. J. , vol. 22,

Smell Detection,” 2015 IEEE/ACM 6th International

no. 3, pp. 403–426, Sep. 2014.

Workshop on Emerging Trends in Software Metrics. pp.

[13]

C. Izurieta, A. Vetrò, N. Zazworka, Y. Cai, C. Seaman,

44–53, 2015.

and F. Shull, “Organizing the Technical Debt

[26]

F. Arcelli Fontana, M. V Mäntylä, M. Zanoni, and A.

Landscape,” in Proceedings of the Third International

Marino, “Comparing and experimenting machine

Workshop on Managing Technical Debt, 2012, pp. 23–

learning techniques for code smell detection,” Empir.

26.

Softw. Eng. , vol. 21, no. 3, pp. 1143–1191, 2016.

[14]

N. Zazworka, M. A. Shaw, F. Shull, and C. Seaman,

[27]

R. Terra, L. F. Miranda, M. T. Valente, and R. S.

“Investigating the Impact of Design Debt on Software

Bigonha, “Qualitas.class Corpus: A compiled version of

Quality,” in Proceedings of the 2Nd Workshop on

the Qualitas Corpus,” Softw. Eng. Notes, vol. 38, pp. 1–

Managing Technical Debt, 2011, pp. 17–23.

4, 2013.

[15]

E. d. S. Maldonado and E. Shihab, “Detecting and

[28]

M. Fowler and K. Beck, Refactoring: Improving the

quantifying different types of self-admitted technical

Design of Existing Code. Addison-Wesley, 1999.

Debt,” 2015 IEEE 7th International Workshop on

34





35

36

37

38





Skills, Competences and Platforms

for a Data Scientist

Vili Podgorelec

Sašo Karakatič

Faculty of Electrical Engineering and Computer Science, Faculty of Electrical Engineering and Computer Science, University of Maribor

University of Maribor

Maribor, Slovenia

Maribor, Slovenia

vili.podgorelec@um.si

saso.karakatic@um.si



ABSTRACT

(alongside with general computer science, ICT and similar)

In this paper, we identify the core competences and skills of a

competences, skills and subject domain classifications, have

Data Scientist, where we build on already existing research about

emerged. These frameworks can be, with some alignment, built

the already practicing Data Scientists and on existing frameworks.

upon and re-used for better acceptance from research and

We complement this research with the practitioners’ survey about

industrial communities. One of the most elaborate is EDISON

popular Data Science platforms and our own research on the

Data Science Framework (ESDF), developed within the scope of

search term trends and job posting trends.

European project “Edison – building the data science profession”

[2]. The ESDF provides a collection of documents that define the

Data Science profession, which have been developed to guide

Categories and Subject Descriptors

educators and trainers, employers and managers, and Data

H.4 [Information Systems Applications]: Miscellaneous

Scientists themselves, collectively breaking down the complexity

of the skills and competences need to define Data Science as a

I.2.m [Artificial Intelligence]: Miscellaneous

professional practice.

General Terms

The ESDF itself, however, builds on existing standard and

Data Science, Framework, Platforms.

commonly accepted frameworks, as is the Big Data

Interoperability Framework, published by the NIST Big Data

Working in September 2015 [3]. It provides various definitions,

Keywords

among them for Data Science, Data Scientist and Data Life Cycle,

Data science, data scientist, skills, competences, platforms.

which can be used as a starting point for further analysis.

“Data Science is the extraction of actionable knowledge

1. INTRODUCTION

directly from data through a process of discovery, or hypothesis

In the last few years, Data Science has become one of the most

formulation and hypothesis testing. Data Science can be

rapidly growing interdisciplinary fields, where it combines

understood as the activities happening in the processing layer of

different aspects of computer engineering, mathematics, and other

the system architecture, against data stored in the data layer, in

managerial skills. The employer and employee review website

order to extract knowledge from the raw data.

Glassdoor even rates Data Scientist as the number one best job in

America in 2017 (regarding the job satisfaction, number of a job

Data Science across the entire data life cycle incorporates

openings and median base salary) [1].

principles, techniques, and methods from many disciplines and

domains including data cleansing, data management, analytics,

The precise skill set of a Data Scientist is not so well defined yet,

visualization, engineering, and in the context of Big Data, now

as it gets mixed up with other job roles, such as Data Analyst,

also includes Big Data Engineering. Data Science applications

Machine Learning Engineer, Statistician, Data Engineer, Business

implement data transformation processes from the data life cycle

Analyst, Data Architect and others. The differences between these

in the context of Big Data Engineering.“ [3]

titles are not always clear and are used interchangeably, especially

among people outside the domain of Data Science. The purpose of

“A Data Scientist is a practitioner who has sufficient

this paper is not to make a clear differential line between these

knowledge in the overlapping regimes of business needs, domain

different job roles, but to define what core skills of a Data

knowledge, analytical skills, and software and systems

Scientist are. This could help any employers to identify if the Data

engineering to manage the end-to-end data processes in the data

Scientist is the one they need in their organization. Also, the clear

life cycle.

list of definitions, skill sets and most common platforms used by

Data Scientists and Data Science teams solve complex data

Data Scientists could be used by people striving to become a Data

problems by employing deep expertise in one or more of these

Scientist and work on each skill of the broad spectrum of

disciplines, in the context of business strategy, and under the

competences and skills needed and expected by a Data Scientist.

guidance of domain knowledge. Personal skills in communication,

presentation, and inquisitiveness are also very important given the

2. DATA SCIENCE COMPETENCES AND

complexity of interactions within Big Data systems.“ [3]

SKILLS FRAMEWORKS

The main focus of a data scientist is thus to discover meaningful

With the growing demand for staff with knowledge and skills of

patterns in data and synthesize useful knowledge by performing

Data Science, several more or less commonly accepted

all the necessary steps throughout the whole data life cycle – the

frameworks that have been used for defining Data Science

39





collection of raw data, (pre-)processing of data and transforming it

several reports and scientific papers which provide research

into useful information, performing data analysis via various data

results of what skills a data scientist should have.

analytics algorithms and tools, interpreting and evaluating the

discovered patterns in order to produce useful knowledge, and

validating the induced knowledge models to produce value.

Analytics is used to refer to the methods, their implementations in

tools, and the results of the use of the tools as interpreted by the

practitioner [4]. The analytics process is the synthesis of

knowledge from information.



(a) Data Science competence groups for general or research

oriented profiles



Figure 1. Data Science definition by NIST BD-WG [3].

In order to cover the competence, required from a Data Scientist,

a good knowledge of data analytics is needed (the two most

important fields of analytics are statistics and machine learning), a

good understanding of engineering (programming, software

engineering and data management in order to provide analytical

applications), as well as a fair amount of domain expertise. Figure

1 provides graphical presentation of the multi-factor/multi-domain

Data Science definition.



(b) Data Science competence groups for business oriented profiles

2.1 General/research vs. business profile

As the Data Science covers a lot of topics, many different

Figure 2. Relations between the Data Science competence

competences and skills are required from a data scientist. In this

groups for (a) general or research oriented and (b) business

manner, data scientists tend to focus on some specialization

oriented professions/profiles [4].

within the whole data science scope. In general, two major

In [5] the authors present the findings compiled from 50 different

profiles can be identified – a general, research oriented profile,

reports of research in articles, journals, and books, and conducted

and a business oriented profile (see Figure 2). For both profiles a

via experts’ views using the Delphi technique, regarding data

fair amount of analytics and engineering knowledge as well as the

scientist skills required by the industry. They provided a list of 41

domain expertise is required. Besides that, the research oriented

data scientists’ skills and categorize them into five major

profile concentrates primarily on the use of scientific methods –

categories adapted from [6] – computer science, analytics, data

formulation of test hypothesis, experiment design, data collection

management, decision management, and entrepreneurship:

and analysis, pattern discovery and explanation of discovered

knowledge. On the other hand, the business oriented profile

•

Computer Science includes programming, where R and

focuses on business process management – monitoring the

Python are predominant programming languages, as well as

important data and designing, modelling, optimizing, and

privacy, security and systems architecture.

executing the data-driven business processes.

•

Analytics focuses primarily on statistics and machine

learning, and includes natural language processing, probability,

3. DATA SCIENTISTS’ SKILL SETS

simulation.

The existing standard and commonly accepted frameworks for

•

Data management covers all data handling skills and

defining Data Science competences are very good aligned with

puts emphasis on databases, data modelling and visualization,

data mining, business intelligence and general data processing.

40





•

Decision management focuses on decision making,

Next, the poll results also show the transitions from one to

while encompassing communication and ethics.

another platform for Analytics, Data Science, and Machine

•

Finally, Entrepreneurship includes business and

learning (see Figure 4). The chart in Figure 4 clearly shows the

economics.

following. Python users are more loyal than R users, as 91% of

readers stuck with Python from 2016 to 2017, and only 74% of

On the other hand, the EDSF also categorizes all the skills

readers stuck with R from 2016 to 2017. Also, only 60% of

required for a data scientist into five major categories, namely

readers that use other platform and languages stuck to those from

analytics, engineering, data management, research methods and

2016 to 2017.

project management, and business analytics [4]:

•

Analytics focuses on the use of machine learning, data

mining and text mining techniques, the application of predictive

and prescriptive analytics, the use of statistics, operations

research, optimization and simulations, and the assessment,

evaluation and validation of results.

•

Engineering includes the use of ICT systems and

software engineering, cloud computing and big data technologies,

databases, data security, privacy and intellectual property rights

protection, as well as algorithms design.

•

Data management put emphasis on specifying,

developing and implementing enterprise data management and

data governance strategy and infrastructure and includes data

storage systems, data modeling and design, data lifecycle support,



data quality, integration, and digital libraries and open data.

Figure 3. Share of R, Python, both R and Python, or other

•

Research

methods

and

project

management

platforms usage for Analytics, Data Science or Machine

encompasses the use of research methods principles in developing

Learning for 2016 and 2017 [8].

data driven applications and implementing the whole cycle of data

handling, development and implementation of data collection

processes, and consistent application of project management

workflow.

•

Business analytics focuses on the use business

intelligence, business process management, econometrics for data

analysis and applications, user experience design, data

warehouses for data integration, and data driven marketing.

4. PRACTITIONER PLATFORM SURVEY

AND TRENDS

After defining what the required skills for a Data Scientist are, in

this section we look at what the current state of the skills is among

active practitioners of Data Science. So far, no thorough analysis



of all skills of Data Scientists was done, but there is a good survey

about the frameworks they use in their line of work.

Figure 4. The transition between different programming

languages for Data Science, Analytics and Machine Learning

In August of 2017 KDnuggets, one of the most popular websites

from 2016 to 2017 [8].

about data science based on independent ranking [7], had a poll

for their readers [8]. The poll asked the following question: “Did

As the chart shows, only 5% of Python users switched to R

you use R, Python (along with their packages), both, or other

exclusively, while 10% of R users switched to using Python

tools for Analytics, Data Science, Machine Learning work in 2016

exclusively. There is a clear flow of R users (15%) that switched

and 2017?”. The poll was completed by 954 people and it showed

to using both, R and Python, but users of both platforms in 2016

the following results.

had a major switch to using Python exclusively (38%). There was

only 4% of switch by Python users to using both platforms, and

The results of the poll clearly indicate that there is a shift from R

only 11% of readers that used both platforms in 2016 that

programming language to Python programming language in

switched to using only R. There is also a clear flow of users that

respect of Data Science, Analytics and Machine Learning (see

are using either R or Python for Analytics, Data Science and

Figure 3). The usage of R programming language fell by 6

Machine Learning from other platforms - 17% to using only R,

percentage points, while usage of Python rose from 34% to 41% (

19% to using only Python and 4% to using both, R and Python.

the increase of 7 percentage points) of readers that finished the

poll. The poll indicates that use of both, R and Python for Data

KDnuggets performed a similar poll in 2015 [9], which served as

Science, Analytics and Machine learning also rose from 8.5% to

a basis for trend recognition of platform usage. Figure 5 shows

12% ( the increase of 3.5 percentage points), which can be

these trends of using different programming languages/platforms

contributed to practitioners slowly switching from R to Python

for Analytics, Data Science, and Machine Learning. The Figure 5

but still using R for some specific part of work.

clearly shows that the use of other platforms (not R or Python) is

dropping and it will probably continue to drop in the following

years. The usage of R peaked in 2015 but had a somehow sharp

41





drop in 2017, while the usage of Python programming language is

posting site Indeed.com. Figure 7 shows two trends for the same

steadily rising and should continue to grow if this trend continues.

search terms as before (“Python Data Science” and “R Data

Science”) for last five years. Even in the job posting aspect, the

Python platform has a clear advantage in comparison to R.

5. CONCLUSION

In this paper, we present the definition of a Data Scientist and

some frameworks of its required skill set and competences. We

presented existing research in the field of identifying the core

skills and competences and survey the current state of needed and

popular skills among practicing Data Scientists. We may conclude

that a Data Scientist requires a diverse set of skills and has to

adapt to new platforms as their popularities change throughout the

time. It is yet to be seen how these skills and popular frameworks

used in the work of a Data Scientist will change in the future, but



for now we can conclude that skills of analytics, engineering, data

management, research methods, project management and business

Figure 5. Platform usage for Analytics, Data Science and

analytics using Python and R platforms present a core of skills

Machine Learning from 2014 to 2017 [8].

every Data Scientist needs.

We made a quick glance at the popularity of R and Python

platforms for Data Science, ourselves. Figure 6 shows the Google

6. ACKNOWLEDGMENTS

Trend chart, where it shows search term popularity on the

The authors acknowledge the financial support from the

timeline. We compared two search terms: “Python Data Science”

Slovenian Research Agency (research core funding No. P2-0057).

(blue trend line), and “R Data Science” (red trend line).

7. REFERENCES

[1] ––, 50 Best Jobs in America, Glassdoor [online]

https://www.glassdoor.com/List/Best-Jobs-in-America-

LST_KQ0,20.htm Accessed on 2017-09-07

[2] EDISON Data Science Framework (EDSF), http://edison-

project.eu/edison/edison-data-science-framework-edsf

Accessed on 2017-09-07



[3] NIST SP 1500-1 NIST Big Data interoperability Framework

(NBDIF): Volume 1: Definitions, September 2015.

Figure 6. Google Trends search term popularity for last five

http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.S

years for terms “Python Data Science” for blue trend line and

P.1500-1.pdf Accessed on 2017-09-07

“R Data Science” for red trend line (September 9th, 2017).

[4] EDISON Data Science Framework: EDSF Part 1: Data

Science Competences Framework (CF-DS) Release 2, July

2017. http://edison-project.eu/data-science-competence-

framework-cf-ds Accessed on 2017-09-07

[5] Abidin, W.Z., Ismail, N.A., Maarop, N., Alias, R.A.: Skills

Sets Towards Becoming Effective Data Scientists, In:

Proceedings of the 12th International Conference, KMO

2017, Beijing, China, August 2017, Communications in

Computer and Information Science, vol. 731, Springer, 2017.

[6] Stadelmann, T., Stockinger, K., Braschler, M., Cieliebak, M.,

Baudinot, G., Ruckstuhl, G. Applied data science in Europe

– challenges for academia in keeping up with a highly

demanded topic. European Computer Science Summit (2013)



[7] ––, Top 75 Data Science Blogs And Websites For Data

Figure 7. Job posting trends on Indeed.com for last five years

Scientists. http://blog.feedspot.com/data_science_blogs/

for terms “Python Data Science” for blue trend line and “R

Accessed on 2017-09-08

Data Science” for orange trend line (September 9th, 2017).

[8] Piatetsky, G. Python overtakes R, becomes the leader in Data

As chart shows, there was the almost even popularity of both

Science, Machine Learning platforms. KDnuggets, 2017.

search terms until the end of 2016, there was just a slight lead of

http://www.kdnuggets.com/2017/08/python-overtakes-r-

R in the year 2015. In the beginning of 2016, the Python search

leader-analytics-data-science.html Accessed on 2017-09-08

term took over the lead and its search term popularity gained more

[9] Piatetsky, G. R, Python users show surprising stability, but

and more lead as the time progressed. Although popularity for

strong regional differences. KDnuggets, 2015.

both terms increased, the search term for Python platform has a

http://www.kdnuggets.com/2015/07/poll-primary-analytics-

clear advantage. After that, we also did a trend analysis on the job

language-r-python.html Accessed on 2017-09-08

42





Towards a Classification of Educational Tools

Kristjan Košič

Alen Rajšp

Jernej Huber

Faculty of Electrical Engineering

Faculty of Electrical Engineering

Faculty of Electrical Engineering

and Computer Science,

and Computer Science,

and Computer Science,

University of Maribor

University of Maribor

University of Maribor

Maribor, Slovenia

Maribor, Slovenia

Maribor, Slovenia

kristjan.kosic@um.si

alen.rajsp@um.si

jernej.huber@um.si





ABSTRACT

continue with statistical analysis of the survey of the usage of ICT

As part of Didakt.UM project, which aims to exchange of

tools within the UM in section 3.2.

experience and create a platform that would enable the efficient

Lastly, we provide a sneak-peak into a project deliverable in the

search and selection of suitable ICT solutions, used for educational

form of a two-level classification table, which offers a more

purposes within the University of Maribor, an analysis and

detailed overview of the resulting classification.

classification of such ICT solutions were made. Out of 82 entries,

63 tools were classified into the broad classification, which intends

2. RELATED WORK

to cover the widest range of ICT solutions, used by the students and

At the highest level, technological infrastructure can be divided into

staff of University of Maribor.

hardware and software resources [3]. Hardware refers to the

mechanical, visible and tangible part of the information technology,

Categories and Subject Descriptors

while software presents a set of instructions prepared to obtain an

K.3 [Computers and education]: General; H.4 [Information

adequate final result [4]. With the rapid development of

systems applications]: General; K.6 [Management of computing

smartphones and embedded systems, the hardware-dependent

and information systems]: General

software has been recently gaining unprecedented use within a wide

range of domains such as medicine, telecommunications,

General Terms

automotive industry and others [5].



Management, Documentation, Performance, Economics, Human

A global classification of educational tool was not found. Various

Factors, Standardization, Legal Aspects.

sub-classifications, related to specific use-cases or niche domains

were included in the analysis. In general, software is usually

Keywords

divided into application and system solutions [4]. The latter offer



an infrastructure environment for running application software and

Software classification, Software taxonomy, Educational software,

include operating systems, drivers, system utilities, and servers.

Software usage, Learning stack

The category of application solutions can include software that

enables

information

management,

education,

business

1. INTRODUCTION

infrastructure, simulation, media processing, software development

The preparation of an exhaustive list of ICT solutions presents a

and solutions that contain the concepts of gamification [6], [7].

complex challenge due to the extremely high number and variety

Multiple taxonomies for classifying software already exist, one of

of solutions and their respective domains of use. The need to place

most known being the ACM Computing Classification System,

these solutions within the different levels of the classification

which was lastly revised in 2012 [8]. Main categories from the

makes the challenge even more complex [1].

ACM taxonomy are: General and reference; Hardware, Computer

systems organization; Networks; Software and its engineering;

The purpose and objective of our research was two folded. Firstly,

Theory of computation; Mathematics of computing; Information

we provided an analysis of the existing situation regarding the

systems; Security and privacy; Human-centered computing;

usage of ICT solutions, used for educational purposes by the

Computing methodologies; Applied computing; Social and

students and staff of University of Maribor (from hereinafter

professional topics; Proper nouns: People, technologies and

referred to as UM). Secondly, the classification of aforementioned

companies. The purpose of ACM taxonomy is to provide a

ICT solutions was prepared to give a foundation for establishing

categorization of technology-related topics. From application

the learning stack [2], which represents a collection of applications,

domain standpoint it provides relatively poor coverage for some

cloud services, content repositories and data sources that can be

application types, such as information display and consumer-

accessed through a content platform. Such platform would enable

oriented software [9]. On the other hand, open-source community,

the pedagogical staff to search, comment and rate the suitable ICT

with sites as SourceForge and Google Code, provide a good

tools within the repository and to exchange the didactic experience

overview of most types of software, developed by such

and good practices within the UM environment.

communities. Google’s approach for defining application domains

The document consists of the following sections. We provide a

avoids the hierarchical structure and relies on tagging [9].

short overview of related work that covers different approaches in

Additionally, many authors have developed their own more or less

classifying the ICT resources in section 2.

up-to-date taxonomies, which divide software into categories based

on the purpose of use (e.g. data-dominant, systems, control

Secondly, we present our classification proposal with a mind map

dominant and computation-dominant software, the categories that

of the classification with the first level categories in section 3.1. We

are furtherly divided into domains of use) [9] or directly by the

domain of use [10].

43





3. CLASSIFICATION PROPOSAL

Adobe Photoshop, Photofiltre Studio X, Audacity, Windows Movie

Maker, HandBrake, MKVTToolNix, Subtitle Edit, Hot Potatoes,

3.1 Classification attributes

Google Docs, Sheets, Slides, Forms, Poll Maker, Skype, The

We aimed to avoid classifying software only by purpose and

Jupyter Notebook, matplotlib, WinMIPS64, XAMPP, Usb Web

domain of use and strived to provide more comprehensive and

Server, UwAmp, WampServer, SonarQube, Java Web Start,

holistic approach for classifying software in education sector. Thus,

ERPSim, Vox Armes, BIM server, Xerte, Oracle database server,

we included a multitude of other factors (attributes). Examples of

Adonis CE, Pantheon X, Bizagi Business Process Modeler,

such factors include the type of the usage [11] (e.g. web, mobile

Microsoft Visio, Microsoft Dynamics Nav, Microsoft Project, Aris

and desktop), usage domain [12] (e.g. general-purpose or specific,

Architect & Designer, Aris Express, JDeveloper, Eclipse, SQL

such as Mathematics or Medicine), group work support (on a scale

Developer, SQL Developer Data Modeler, Greenfoot, Tableau,

of a team, community, organization) [13], time aspect of

Orange, SAP Lumira, SAP-ERP, Oracle VM VirtualBox, VMware

collaboration [14] (synchronous and asynchronous) and the

Workstation Player, Linux Ubuntu, Kali Linux, SPSS, AnyLogic,

purpose of the use, which was further divided to functionalities.

Turning Point, Kahoot, Padlet, Anatomy 4D, Virtual Patient

This was done to cover the widest possible range of application

MedU, ThinkDesign Suite.

software, which will, of course, continue to expand in the future

with the development of the ICT field. Regarding the purpose of

The column chart in Figure 2 shows the number of solutions

use, we placed an emphasis on solutions in the field of education,

according to the domains of the usage. It is important to stress that

where we divided the purpose of using such tools into three

one tool can belong to more than one domain. Majority of tools

sections: learning content management [15], knowledge testing and

represented the computer science and informatics domain (28),

evaluation [15] and learning analytics [16].

while 19 tools were general-purpose tools (such as Skype, Google

Docs) that can be used within any domain.

Among the other purposes of use, that are furtherly divided into

more specific functionalities, are polling capabilities, group work

support (collaboration, communication and coordination), media

processing, statistical data processing, data storage, software

development, software deployment, enterprise resource planning,

modeling, project management, virtualization and simulation.

The initial part of the classification is a general description of tools,

with data regarding the manufacturer, type of license, price of tools,

solution provider within the UM and support/service level,

examples of usage both in general and within the UM and a

corresponding contact person. General description is followed by

positioning the ICT solutions according to the Klasius P [17]

classification.

The following diagram shows the classification of ICT solutions

used for educational purposes within the UM. For reasons of greater

transparency, we only show the first level nodes of classification.



Figure 2. Number of solutions by usage domain.

The pie chart in Figure 3 shows the ratio between the open source

and proprietary solutions. Most of the documented equipment (39

out of 63) were proprietary.



Figure 1. Top level attributes of the proposed classification.

3.2 Statistical analysis of ICT tools usage

Based on data from the survey on the usage of ICT solutions within

the UM, we identified 82 entries, of which 19 entries were



defective, with missing data regarding the type of solution,

Figure 3. Ratio between open-source and proprietary tools

manufacturer etc.

Altogether, we classified the following 63 ICT solutions, namely:

Moodle, Geogebra, Sony Virtuoso in Soloist, Expression Studio,

CyberLink PowerDirector, Articulate, iSpring, Hype, Sibelius,

44





important to stress that each of the tool can have more than type of

a client.

The column chart in Figure 7 shows the number of solutions

according to the types of ICT solutions as proposed in our

classification. Most tools meet the following types of ICT

solutions: information management (30), software development

(17), and education (16).



Figure 4. Number of solutions by the purpose of usage.

The column diagram in Figure 4 shows the number of tools that

support at least one functionality from the categories, which

describe purpose of use. Most often, the tool was intended for

modeling (18); cooperation, communication and coordination (16);



multimedia management (16), software development (12) and

learning content management (10).

Figure 7. Number of solutions by the ICT type.

3.3 Proposed two-level classification

The Table 1 presents the more detailed introspection into our

classification proposal. Within this article, we limited the number

of attribute hierarchy level into two levels. The actual classification

was divided into the three-level hierarchy of classification

attributes, hence being even more comprehensive.

Table 1. Classification of used solutions (to the second level)

1st level of

2nd level of classification

classification

General

Name of the solution; Description;



information

Manufacturer; Manufacturer's URL;

License type; Price; Provider; Provider's

Figure 5. Ratio of collaboration-supported tools.

URL; Support/service level; Minimum

The pie chart in Figure 5 shows the proportion of solutions in terms

system requirements; General use case;

of collaboration support. The 19 tools from the survey allow groups

UM use case; UM contact person

to work together, while the remaining 44 solutions do not have such

Faculty usage

1 - Teacher training and education science;

support.

(Klasius P)

2 - Humanities and arts; 3 - Social sciences,

business and law; 4 - Science, Mathematics

and Computer Science; 5 - Engineering,

manufacturing and construction; 6 -

Agriculture, Forestry, Fisheries,

Veterinary; 7 - Health and welfare; 8 -

Services

Use case

General-purpose; Specific

domain

Type of ICT

System software; Application software



solutions

Figure 6. Number of solutions by the type of the usage.

Type of the

Web; Mobile; Desktop

The column chart in Figure 6 shows the number of solutions based

usage

on the type of client, with 32 tools permitting online use within the

Channel of

Video; Sound; Text

browser. The 14 tools can be accessed with mobile smartphones

communication

and 51 tools are developed as desktop applications. Again, it is

45

Type / format of

Video material; Graphical material; Sound;

[5]

W. Ecker, W. Müller, and R. Dömer, “Hardware-

the content

Text; Spreadsheet; Presentation; Any file

dependent Software,” in Hardware-dependent Software,

Dordrecht: Springer Netherlands, 2009, pp. 1–13.

Group work

Among the members of the organization;

support

Among the members of the community;

[6]

A. Saito, K. Umemoto, and M. Ikeda, “Journal of

Among the team members

Knowledge Management A strategy-based ontology of

knowledge management technologies,” Journal of

Knowledge Management Journal of Knowledge

The time aspect

Asynchronous; Synchronous

Management Journal of Knowledge Management, vol.

of collaboration

11, no. 6, pp. 97–114, 2007.

Cooperation

Student; Teacher; Domain expert;

[7]

TechTarget, “What is software?,” 2017. [Online].

between the

Administrator

Available:

roles within UM

http://searchmicroservices.techtarget.com/definition/soft

The purpose of

Learning content management; Knowledge

ware. [Accessed: 31-Aug-2017].

usage

testing and evaluation; Polling; Learning

[8]

ACM, “The 2012 ACM Computing Classification

analytics; Cooperation, communication and

System,” 2012. [Online]. Available:

coordination; Multimedia management;

https://www.acm.org/publications/class-2012. [Accessed:

Statistical data analysis; Data storage;

05-Sep-2017].

Software Development; Software

Deployment; Enterprise resource planning;

[9]

A. Forward and T. C. Lethbridge, “A taxonomy of

Modeling; Project management;

software types to facilitate search and evidence-based

Virtualization; Simulation

software engineering,” in Proceedings of the 2008

conference of the center for advanced studies on

The result of our in-depth analysis was a report in which we

collaborative research meeting of minds - CASCON ’08,

provided the three-level classification of ICT solutions, a brief

2008, p. 179.

description of each attribute of the classification and the actual

placement of 63 identified ICT solutions within our proposed

[10]

R. L. Glass and I. Vessey, “Contemporary application-

classification.

domain taxonomies,” IEEE Software, vol. 12, no. 4, pp.

63–76, Jul. 1995.

4. CONCLUSION

DOI=https://doi.org/10.1109/52.391837.

Classification of educational tools is a broad topic that still has a lot

[11]

SD Times, “Web, desktop, mobile: What’s the

of room for improvement and research. In the future, we suggest

difference?,” 2017. [Online]. Available:

additional classification and categorization of tools combined with

http://sdtimes.com/web-desktop-mobile-whats-the-

pedagogical learning approaches related to the specific needs of the

difference/. [Accessed: 31-Aug-2017].

instructor. Moreover, the framework could be expanded with

[12]

eduCBA, “What is application software & its types,”

pedagogical

classifications

and

requirements

related

to

2017. [Online]. Available:

regional/local pedagogical classifications (i.e. related to the

https://www.educba.com/what-is-application-software-

specific country).

its-types/. [Accessed: 31-Aug-2017].

5. ACKNOWLEDGEMENTS

[13]

H. Fuks et al. , “The 3C Collaboration Model,” in

This research was carried out within the project Didakt.UM, which

Encyclopedia of E-Collaboration, IGI Global, pp. 637–

is financed by the Slovenian Ministry of Education, Science and

644.

Sport and European Union from European Social Fund.

[14]

TechTarget, “Synchronous vs. asynchronous

communication: The differences,” 2017. [Online].

6. REFERENCES

Available:

[1]

A. Saito, K. Umemoto, and M. Ikeda, “A strategy‐based

http://searchmicroservices.techtarget.com/tip/Synchronou

ontology of knowledge management technologies,”

s-vs-asynchronous-communication-The-differences.

Journal of Knowledge Management, vol. 11, no. 1, pp.

[Accessed: 31-Aug-2017].

97–114, Feb. 2007.

[15]

S. R. Malikowski, M. E. Thompson, and J. G. Theis, “A

DOI=https://doi.org/10.1108/13673270710728268.

Model for Research into Course Management Systems:

[2]

Jan-Martin Lowendahl, “Hype Cycle for Education,”

Bridging Technology and Learning Theory,” Journal of

Gartner, 2016. [Online]. Available:

Educational Computing Research, vol. 36, no. 2, pp.

https://www.gartner.com/doc/3364119/hype-cycle-

149–173, Mar. 2007. DOI=https://doi.org/10.2190/1002-

education-. [Accessed: 07-Sep-2017].

1T50-27G2-H3V7.

[3]

M. Afshari, K. A. Bakar, W. S. Luan, B. A. Samah, and

[16]

R. Scapin, “Learning Analytics in Education: Using

F. S. Fooi, “Factors Affecting Teachers’ Use of

Student’s Big Data to Improve Teaching,” 2015.

Information and Communication Technology,”

[17]

Statistični urad Republike Slovenije, “Klasius-P,” 2017.

International Journal of Instruction, pp. 77–104, 2009.

[Online]. Available:

[4]

I. Masic et al. , “Information Technologies (ITs) in

http://www.stat.si/Klasius/Default.aspx?id=5. [Accessed:

Medical Education,” Acta Informatica Medica, vol. 19,

05-Sep-2017].

no. 3, p. 161, 2011.



DOI=https://doi.org/10.5455/aim.2011.19.161-167.



46





Indeks avtorjev / Author index



Beranič Tina ................................................................................................................................................................................. 31

Budimac Zordan ........................................................................................................................................................................... 27

Catal Çağatay ................................................................................................................................................................................. 7

Heričko Marjan ...................................................................................................................................................................... 19, 31

Heričko Tjaša ............................................................................................................................................................................... 35

Huber Jernej ........................................................................................................................................................................... 15, 43

Kamišalić Aida ............................................................................................................................................................................. 11

Karakatič Sašo .............................................................................................................................................................................. 39

Košič Kristjan ............................................................................................................................................................................... 43

Montoya Edwin ............................................................................................................................................................................ 11

Muratli Can .................................................................................................................................................................................... 7

Pavlinek Miha .............................................................................................................................................................................. 19

Podgorelec Vili ............................................................................................................................................................................. 39

Polančič Gregor ............................................................................................................................................................................ 15

Pušnik Maja .................................................................................................................................................................................. 19

Rajšp Alen .................................................................................................................................................................................... 43

Rakić Gordana .............................................................................................................................................................................. 27

Rednjak Zlatko ............................................................................................................................................................................. 31

Šestak Martina .............................................................................................................................................................................. 23

Sukur Nataša ................................................................................................................................................................................ 27

Tabares S. Marta .................................................................................................................................................................... 11, 15

Torres Camilo ............................................................................................................................................................................... 11





47





48





Konferenca / Conference

Uredil / Edited by

Sodelovanje, programska oprema in storitve

v informacijski družbi /

Collaboration, Software and Services

in Information Society

Marjan Heričko





Document Outline


A - Naslovnica-SPREDNJA - G

B - Naslovnica - notranja - G

C- Kolofon - G

D-E - IS2017 - skupni zacetni del Blank Page





F - Kazalo - G

G - Naslovnica podkonference - G

H - Predgovor - G

I - Programski odbor - G

J - PDF - G CSS 2017-01 Catal

CSS 2017-02 Tores

CSS 2017-03 Polancic

CSS 2017-04 Pavlinek

CSS 2017-05 Sestak

CSS 2017-06 Sukur

CSS 2017-07 Beranic

CSS 2017-08 Hericko

CSS 2017-09 Podgorelec

CSS 2017-10 Kosic





K - Index - G

L - Naslovnica-ZADNJA - G

Blank Page

Blank Page

Blank Page

Blank Page

Blank Page

Blank Page