Zbornik 20. mednarodne multikonference INFORMACIJSKA DRUŽBA - IS 2017 Zvezek G Proceedings of the 20th International Multiconference INFORMATION SOCIETY - IS 2017 Volume G Sodelovanje, programska oprema in storitve v informacijski družbi Collaboration, Software and Services in Information Society Uredil / Edited by Marjan Heričko http://is.ijs.si 9.–13. oktober 2017 / 9–13 October 2017 Ljubljana, Slovenia Zbornik 20. mednarodne multikonference INFORMACIJSKA DRUŽBA – IS 2017 Zvezek G Proceedings of the 20th International Multiconference INFORMATION SOCIETY – IS 2017 Volume G Sodelovanje, programska oprema in storitve v informacijski družbi Collaboration, Software and Services in Information Society Uredil / Edited by Marjan Heričko http://is.ijs.si 9. - 13. oktober 2017 / 9th – 13th October 2017 Ljubljana, Slovenia Urednik: Marjan Heričko University of Maribor Faculty of Electrical Engineering and Computer Science Založnik: Institut »Jožef Stefan«, Ljubljana Priprava zbornika: Mitja Lasič, Vesna Lasič, Lana Zemljak Oblikovanje naslovnice: Vesna Lasič Dostop do e-publikacije: http://library.ijs.si/Stacks/Proceedings/InformationSociety Ljubljana, oktober 2017 Kataložni zapis o publikaciji (CIP) pripravili v Narodni in univerzitetni knjižnici v Ljubljani COBISS.SI-ID=292477440 ISBN 978-961-264-118-4 (pdf) PREDGOVOR MULTIKONFERENCI INFORMACIJSKA DRUŽBA 2017 Multikonferenca Informacijska družba (http://is.ijs.si) je z dvajseto zaporedno prireditvijo osrednji srednjeevropski dogodek na področju informacijske družbe, računalništva in informatike. Letošnja prireditev je ponovno na več lokacijah, osrednji dogodki pa so na Institutu »Jožef Stefan«. Informacijska družba, znanje in umetna inteligenca so spet na razpotju tako same zase kot glede vpliva na človeški razvoj. Se bo eksponentna rast elektronike po Moorovem zakonu nadaljevala ali stagnirala? Bo umetna inteligenca nadaljevala svoj neverjetni razvoj in premagovala ljudi na čedalje več področjih in s tem omogočila razcvet civilizacije, ali pa bo eksponentna rast prebivalstva zlasti v Afriki povzročila zadušitev rasti? Čedalje več pokazateljev kaže v oba ekstrema – da prehajamo v naslednje civilizacijsko obdobje, hkrati pa so planetarni konflikti sodobne družbe čedalje težje obvladljivi. Letos smo v multikonferenco povezali dvanajst odličnih neodvisnih konferenc. Predstavljenih bo okoli 200 predstavitev, povzetkov in referatov v okviru samostojnih konferenc in delavnic. Prireditev bodo spremljale okrogle mize in razprave ter posebni dogodki, kot je svečana podelitev nagrad. Izbrani prispevki bodo izšli tudi v posebni številki revije Informatica, ki se ponaša s 40-letno tradicijo odlične znanstvene revije. Odlične obletnice! Multikonferenco Informacijska družba 2017 sestavljajo naslednje samostojne konference:  Slovenska konferenca o umetni inteligenci  Soočanje z demografskimi izzivi  Kognitivna znanost  Sodelovanje, programska oprema in storitve v informacijski družbi  Izkopavanje znanja in podatkovna skladišča  Vzgoja in izobraževanje v informacijski družbi  Četrta študentska računalniška konferenca  Delavnica »EM-zdravje«  Peta mednarodna konferenca kognitonike  Mednarodna konferenca za prenos tehnologij - ITTC  Delavnica »AS-IT-IC«  Robotika Soorganizatorji in podporniki konference so različne raziskovalne institucije in združenja, med njimi tudi ACM Slovenija, SLAIS, DKZ in druga slovenska nacionalna akademija, Inženirska akademija Slovenije (IAS). V imenu organizatorjev konference se zahvaljujemo združenjem in inštitucijam, še posebej pa udeležencem za njihove dragocene prispevke in priložnost, da z nami delijo svoje izkušnje o informacijski družbi. Zahvaljujemo se tudi recenzentom za njihovo pomoč pri recenziranju. V 2017 bomo petič podelili nagrado za življenjske dosežke v čast Donalda Michija in Alana Turinga. Nagrado Michie-Turing za izjemen življenjski prispevek k razvoju in promociji informacijske družbe bo prejel prof. dr. Marjan Krisper. Priznanje za dosežek leta bo pripadlo prof. dr. Andreju Brodniku. Že šestič podeljujemo nagradi »informacijska limona« in »informacijska jagoda« za najbolj (ne)uspešne poteze v zvezi z informacijsko družbo. Limono je dobilo padanje slovenskih sredstev za akademsko znanost, tako da smo sedaj tretji najslabši po tem kriteriju v Evropi, jagodo pa »e-recept«. Čestitke nagrajencem! Bojan Orel, predsednik programskega odbora Matjaž Gams, predsednik organizacijskega odbora i FOREWORD - INFORMATION SOCIETY 2017 In its 20th year, the Information Society Multiconference (http://is.ijs.si) remains one of the leading conferences in Central Europe devoted to information society, computer science and informatics. In 2017 it is organized at various locations, with the main events at the Jožef Stefan Institute. The pace of progress of information society, knowledge and artificial intelligence is speeding up, and it seems we are again at a turning point. Will the progress of electronics continue according to the Moore’s law or will it start stagnating? Will AI continue to outperform humans at more and more activities and in this way enable the predicted unseen human progress, or will the growth of human population in particular in Africa cause global decline? Both extremes seem more and more likely – fantastic human progress and planetary decline caused by humans destroying our environment and each other. The Multiconference is running in parallel sessions with 200 presentations of scientific papers at twelve conferences, round tables, workshops and award ceremonies. Selected papers will be published in the Informatica journal, which has 40 years of tradition of excellent research publication. These are remarkable achievements. The Information Society 2017 Multiconference consists of the following conferences:  Slovenian Conference on Artificial Intelligence  Facing Demographic Challenges  Cognitive Science  Collaboration, Software and Services in Information Society  Data Mining and Data Warehouses  Education in Information Society  4th Student Computer Science Research Conference  Workshop Electronic and Mobile Health  5th International Conference on Cognitonics  International Conference of Transfer of Technologies - ITTC  Workshop »AC-IT-IC«  Robotics The Multiconference is co-organized and supported by several major research institutions and societies, among them ACM Slovenia, i.e. the Slovenian chapter of the ACM, SLAIS, DKZ and the second national engineering academy, the Slovenian Engineering Academy. In the name of the conference organizers we thank all the societies and institutions, and particularly all the participants for their valuable contribution and their interest in this event, and the reviewers for their thorough reviews. For the fifth year, the award for life-long outstanding contributions will be delivered in memory of Donald Michie and Alan Turing. The Michie-Turing award will be given to Prof. Marjan Krisper for his life-long outstanding contribution to the development and promotion of information society in our country. In addition, an award for current achievements will be given to Prof. Andrej Brodnik. The information lemon goes to national funding of the academic science, which degrades Slovenia to the third worst position in Europe. The information strawberry is awarded for the medical e-recipe project. Congratulations! Bojan Orel, Programme Committee Chair Matjaž Gams, Organizing Committee Chair ii KONFERENČNI ODBORI CONFERENCE COMMITTEES International Programme Committee Organizing Committee Vladimir Bajic, South Africa Matjaž Gams, chair Heiner Benking, Germany Mitja Luštrek Se Woo Cheon, South Korea Lana Zemljak Howie Firth, UK Vesna Koricki Olga Fomichova, Russia Mitja Lasič Vladimir Fomichov, Russia Robert Blatnik Vesna Hljuz Dobric, Croatia Aleš Tavčar Alfred Inselberg, Israel Blaž Mahnič Jay Liebowitz, USA Jure Šorn Huan Liu, Singapore Mario Konecki Henz Martin, Germany Marcin Paprzycki, USA Karl Pribram, USA Claude Sammut, Australia Jiri Wiedermann, Czech Republic Xindong Wu, USA Yiming Ye, USA Ning Zhong, USA Wray Buntine, Australia Bezalel Gavish, USA Gal A. Kaminka, Israel Mike Bain, Australia Michela Milano, Italy Derong Liu, Chicago, USA Toby Walsh, Australia Programme Committee Bojan Orel, chair Mitja Luštrek Niko Schlamberger Franc Solina, co-chair Marko Grobelnik Stanko Strmčnik Viljan Mahnič, co-chair Nikola Guid Jurij Šilc Cene Bavec, co-chair Marjan Heričko Jurij Tasič Tomaž Kalin, co-chair Borka Jerman Blažič Džonova Denis Trček Jozsef Györkös, co-chair Gorazd Kandus Andrej Ule Tadej Bajd Urban Kordeš Tanja Urbančič Jaroslav Berce Marjan Krisper Boštjan Vilfan Mojca Bernik Andrej Kuščer Baldomir Zajc Marko Bohanec Jadran Lenarčič Blaž Zupan Ivan Bratko Borut Likar Boris Žemva Andrej Brodnik Janez Malačič Leon Žlajpah Dušan Caf Olga Markič Saša Divjak Dunja Mladenič Tomaž Erjavec Franc Novak Bogdan Filipič Vladislav Rajkovič Andrej Gams Grega Repovš Matjaž Gams Ivan Rozman iii Invited lecture AN UPDATE FROM THE AI & MUSIC FRONT Gerhard Widmer Institute for Computational Perception Johannes Kepler University Linz (JKU), and Austrian Research Institute for Artificial Intelligence (OFAI), Vienna Abstract Much of current research in Artificial Intelligence and Music, and particularly in the field of Music Information Retrieval (MIR), focuses on algorithms that interpret musical signals and recognize musically relevant objects and patterns at various levels -- from notes to beats and rhythm, to melodic and harmonic patterns and higher-level segment structure --, with the goal of supporting novel applications in the digital music world. This presentation will give the audience a glimpse of what musically "intelligent" systems can currently do with music, and what this is good for. However, we will also find that while some of these capabilities are quite impressive, they are still far from (and do not require) a deeper "understanding" of music. An ongoing project will be presented that aims to take AI & music research a bit closer to the "essence" of music, going beyond surface features and focusing on the expressive aspects of music, and how these are communicated in music. This raises a number of new research challenges for the field of AI and Music (discussed in much more detail in [Widmer, 2016]). As a first step, we will look at recent work on computational models of expressive music performance, and will show some examples of the state of the art (including the result of a recent musical 'Turing test'). References Widmer, G. (2016). Getting Closer to the Essence of Music: The Con Espressione Manifesto. ACM Transactions on Intelligent Systems and Technology 8(2), Article 19. iv KAZALO / TABLE OF CONTENTS Sodelovanje, programska oprema in storitve v ....................................................................................................... 1 informacijski družbi / Collaboration, Software and Services in Information ........................................................ 1 Society ......................................................................................................................................................................... 1 PREDGOVOR / FOREWORD ................................................................................................................................. 3 PROGRAMSKI ODBORI / PROGRAMME COMMITTEES ..................................................................................... 5 Crop Yield Prediction in the Cloud: Machine Learning Approach / Catal Çağatay, Muratli Can ............................ 7 Using Cognitive Software to Evaluate Natural Language / Torres Camilo, Tabares S. Marta, Montoya Edwin, Kamišalić Aida ...................................................................................................................................... 11 An Analysis of BPMN-based Approaches ............................................................................................................. 15 for Process Landscape Design / Polančič Gregor, Huber Jernej, Tabares S. Marta ........................................... 15 Approach to an alternative value chain modeling / Pavlinek Miha, Heričko Marjan, Pušnik Maja ...................... 19 Using Property Graph Model for Semantic Web Services Discovery / Šestak Martina ....................................... 23 Statecharts representation of program execution flow / Sukur Nataša, Rakić Gordana, Budimac Zordan ......... 27 Code smell detection: A tool comparison / Beranič Tina, Rednjak Zlatko, Heričko Marjan ................................. 31 A Qualitative and Quantitative Comparison of PHP and Node.js for Web Development / Heričko Tjaša ........... 35 Skills, Competences and Platforms for a Data Scientist / Podgorelec Vili, Karakatič Sašo ................................. 39 Towards a Classification of Educational Tools / Košič Kristjan, Rajšp Alen, Huber Jernej .................................. 43 Indeks avtorjev / Author index ................................................................................................................................ 47 v vi Zbornik 20. mednarodne multikonference INFORMACIJSKA DRUŽBA – IS 2017 Zvezek G Proceedings of the 20th International Multiconference INFORMATION SOCIETY – IS 2017 Volume G Sodelovanje, programska oprema in storitve v informacijski družbi Collaboration, Software and Services in Information Society Uredil / Edited by Marjan Heričko http://is.ijs.si 9. oktober 2017 / 9th October 2017 Ljubljana, Slovenia 1 2 PREDGOVOR Konferenco “Sodelovanje, programska oprema in storitve v informacijski družbi” organiziramo v sklopu multikonference Informacijska družba že sedemnajstič. Kot običajno, tudi letošnji prispevki naslavljajo aktualne teme in izzive, povezane z razvojem sodobnih programskih in informacijskih rešitev ter storitev kot tudi sodelovanja v splošnem. Informatika in informacijske tehnologije so že več desetletij gonilo inoviranja na vseh področjih poslovanja podjetij ter delovanja posameznikov. Odprti standardi in interoperabilnost ter vedno višja odzivnost informatikov vodijo k razvoju inteligentnih digitalnih storitvenih platform in inovativnih poslovnih modelov ter novih ekosistemov, kjer se povezujejo in sodelujejo ne le partnerji, temveč tudi konkurenti. Vse večja in pomembnejša je tudi vključenost končnih uporabnikov naših storitev in rešitev. Napredne informacijske tehnologije in sodobni pristopi k razvoju, vpeljavi in upravljanju omogočajo višjo stopnjo avtomatizacije in integracije doslej ločenih svetov, saj vzpostavljajo zaključeno zanko in zagotavljajo nenehne izboljšave, ki temeljijo na aktivnem sodelovanju in povratnih informacijah vseh vključenih akterjev. Ob vsem tem zagotavljanje kakovosti ostaja eden pomembnejših vidikov razvoja in vpeljave na informacijskih tehnologijah temelječih storitev. Prispevki, zbrani v tem zborniku, omogočajo vpogled v in rešitve za izzive na področjih kot so npr.:  modeliranje vrednostih verig storitvenih ekosistemov;  načrtovanje pokrajin procesov;  zaznavanje neustreznih načrtovalskih odločitev;  identifikacija pomanjkljivih programskih komponent;  odkrivanje semantičnih spletnih storitev;  vrednotenje naprednih spletnih tehnologij;  klasifikacija orodij učnega stolpiča;  identifikacija znanj in kompetenc podatkoslovca;  učenje in ovrednotenje klasifikatorjev naravnega jezika;  uporaba algoritmov strojnega učenja v praksi. Upamo, da boste v zborniku prispevkov, ki povezujejo teoretična in praktična znanja, tudi letos našli koristne informacije za svoje nadaljnje delo tako pri temeljnem kot aplikativnem raziskovanju. Marjan Heričko 3 FOREWORD This year, the Conference “Collaboration, Software and Services in Information Society” is being organised for the seventeenth time as a part of the “Information Society” multi-conference. As in previous years, the papers from this year's proceedings address actual challenges and best practices related to the development of advanced software and information solutions as well as collaboration in general. Information technologies and the field of Informatics have been the driving force of innovation in business, as well as in the everyday activities of individuals for several decades. Open standards, interoperability and the increasing responsiveness of IS/IT experts are leading the way to the development of intelligent digital service platforms, innovative business models and new ecosystems where not only partners, but also competitors are connecting and working together. The involvement and engagement of end users is a necessity. On the other hand, quality assurance remains a vital part of software and ICT-based service development and deployment. The papers in these proceedings provide a better insight and/or propose solutions to challenges related to:  Modelling large ecosystems value chain;  Designing process landscape;  Detecting bad design decision and code smells;  Discovering semantic web services;  Evaluation of advanced Web technologies;  Classification of learning stack tools;  Identifying skills and competencies of data scientists;  Training and evaluation of natural language classifiers.  Applying machine learning algorithms in practice. We hope that these proceedings will be beneficial for your reference and that the information in this volume will be useful for further advancements in both research and industry. Marjan Heričko 4 PROGRAMSKI ODBOR / PROGRAMME COMMITTEE Marjan Heričko Lorna Uden Gabriele Gianini Hannu Jaakkola Mirjana Ivanović Zoltán Porkoláb Vili Podgorelec Maja Pušnik Muhamed Turkanović Boštjan Šumak Gregor Polančič Luka Pavlič 5 6 Crop Yield Prediction in the Cloud: Machine Learning Approach Cagatay Catal Can Muratli Department of Computer Engineering Department of Computer Engineering Istanbul Kültür University Istanbul Kültür University Istanbul, Turkey Istanbul, Turkey c.catal@iku.edu.tr o.muratli@iku.edu.tr ABSTRACT and political stakeholders, and evaluation of climate change impact [6]. Therefore, researchers are still actively involved in the Crop yield prediction provides critical information for decision development of crop yield prediction models at national, sub- makers and directly affects the agricultural policies and trade. national, and international levels [6]. Since traditional survey Current emerging technologies such as Internet of Things (IoT), methods are time consuming and error-prone, accurate prediction big data analytics, cloud computing, and machine learning approaches are currently being developed by different research enabled researchers to design and implement high-performance groups. In addition to the survey method, there are different yield prediction models. In this work, we aimed at investigating approaches such as statistical methods, crop simulation models, several machine learning-based regression techniques such as and remote sensing-based techniques. Boosted Decision Tree Regression and Neural Network In this study, our main objective is to design and implement a Regression for this challenging problem and implementing a wheat yield prediction system based on the data obtained from wheat yield prediction web service to host on the Azure cloud sensors positioned in stations in south-east region of Turkey. To computing platform. Case studies were performed on the data retrieve and process this data, we collaborated with TARBIL obtained for south-east region of Turkey and four states in the Agro-Informatics Research Centre in Istanbul Technical United States. Experimental results demonstrated that while University which has a terrestrial network called Agricultural neural network regression technique provides the best Monitoring and Information System (AgriMONIS) with 441 performance for large-scale crop yield prediction datasets, linear active RoboStations. In addition to the analysis performed on this regression technique is more appropriate for small-scale datasets. data, we also developed machine learning-based models for wheat Categories and Subject Descriptors data obtained from four states in the United States. Machine learning-based models were designed and evaluated in Azure I.2.6. [Computing Methodologies]: Learning Machine Learning Studio platform. The best algorithm in terms of coefficient of determination parameter was transformed into a web General Terms service and deployed in Azure cloud computing platform. A Algorithms, Measurement, Performance, Experimentation. client-side web application was implemented using ASP.NET technology to handle the requests of the end users and farmers are Keywords being informed via this web application about the yield prediction results. Crop Yield Prediction; Internet of Things; Sensors; Machine Learning; Regression Techniques; Cloud Computing 2. RELATED WORK 1. INTRODUCTION There are several studies on crop yield prediction, but we did It is reported that 795 million people in the world are encounter an end-to-end crop yield prediction system which uses undernourished which means that one in nine people today live Azure Machine Learning Studio, Azure Cloud platform, and web without sufficient food [1]. While the current world population is services technology. Most of the studies in the literature only around 7.5 billion people, it’s estimated that it will be 9.7 billion report experimental results, but do not provide any practical people which is 30% higher than the current population [2]. To information to build a crop yield prediction system for the real- supply adequate food to this huge population, global food world scenarios. Also, there is very limited number of studies production must improve dramatically. It’s estimated that while which applied data in south-east region of Turkey. Çakır et al. [5] one farmer now feeds 155 people in the world, by 2050 one built an Artificial Neural Network to estimate the wheat yield farmer will need to feed 250 people which is 61% higher than the prediction in south-east region of Turkey and utilized from current situation [3]. United Nations aims at ending hunger by meteorological data such as temperature and rainfall records. They 2030 and ensuring access to safe and adequate food by all people used data regarding to the years 2011 and 2012 for training the in the world [4]. model, and applied the data regarding the year 2013 to test the prediction model. They reported that results are better than the Crop yield prediction before harvest can help to manage the regression method when Multi-Layer Perceptron (MLP) is agricultural trade policies [5], provide critical data for economic applied. The optimal value for the Number of neurons was 7 reported as 15. Chen and Jing [7] compared two adaptive In case study one, we had both phenological data and crop yield multivariate Analysis methods based on Landsat-8 images to information from nine different stations equipped with sensors for forecast wheat yield and reported that Artificial Neural Networks the years between 2013 and 2016 which enabled us to use the data (ANN) provides better results than Partial Least Squares from 2013 to 2015 for training and prediction 2016 yield results Regression (PLSR) technique in terms of coefficient of with given features (Figure 1). determination and root mean squared error (RMSE) parameters. Gouache et al. [6] developed wheat yield prediction models in France for 23 departments using yield statistics from 1986 to 2010. They started with 250 variables and reached to 5-7 variables using forward stepwise regression methods to design their prediction models. For 20 departments, acceptable models were implemented. Stas et al. [8] compared Boosted Regression Trees (BST) and Support Vector Machines (SVM) algorithms for the prediction of wheat yields and reported that BST provides better performance than SVM. Our paper is different than these studies as we decided to build a cloud-based prediction system and use state-of-the-art regression algorithms in Azure machine learning platform. 3. METHODOLOGY While there are many tools available, we preferred Azure Machine Figure 1. All regression models tested for train and score Learning Studio due to its cloud computing capabilities and its model South-East Region of Turkey easy to use nature. The collaboration with TARBIL, which is a focused science center on agriculture that has over 400 stations equipped with various sensors that monitor every phenological After combining the two datasets to one both test and train state of a field, enabled us to get the precise datasets for our datasets also run with ten-fold cross-validation settings as seen in experiments. In addition to those datasets, we also came across Figure 2. with a set of datasets focused on wheat yield in USA [9] which created an opportunity for another case study to evaluate our models. During our experiments, every regression model available in Azure Machine Learning Studio is tested, however due to some constraints created by the datasets available to us we narrowed our regression options to four which are explained briefly below: 1. Linear Regression: Despite being the most simplistic method amongst other regression models, linear regression is frequently used in many case studies since what it simply does is to attempt to create a linear relationship between one or more features to be used for a prediction of a numeric outcome. 2. Bayesian Linear Regression: It is like linear regression Figure 2. All regression models tested for cross validation approach however, it uses Bayesian inference that model South-East Region of Turkey update probability distribution. 3. Boosted Decision Tree Regression: Using an efficient Cross-validation evaluation helped us to compare our findings implementation of MART gradient boosting algorithm, with second case study. In the second case study, we had more Boosted Decision Tree Regression aims to build each than 300.000 records in the dataset. However, since we had only regression tree in a step by step fashion, eliminating two-years of data, we did not perform a test which uses an weaker prediction models. external test dataset. Therefore, we made only cross-validation 4. Neural Network Regression: While neural networks are experiment for this large dataset. widely used for deep learning and modelling sophisticated problems, they can also be adapted to regression models where more traditional regression models falls short. We had two different datasets one was from South-East region of Turkey the other one was from four states of United States of America. Having two sets of data, led us to approach this problem in two case studies. 8 Regression algorithm provided the best performance in south-east region in terms of coefficient of determination parameter when external test set was used. Neural Network Regression algorithm was the best option for the US dataset when cross-validation analysis was applied. As part of the future work, web application can be replaced with a mobile application and new experiments can be performed when more regions are added to the datasets. Deep learning algorithms might be considered when the dataset becomes very large. South East Region of Turkey Wheat Yield ML Results Relative Relative Coefficient of Regression Type Absolute Error Squared Error Determination Linear Regression 0.484972 0.300939 0.699061 Bayesian Linear 0.643469 0.396072 0.603928 Regression Figure 3 - All regression models tested for cross validation model Boosted Decision 0.669278 0.646339 0.353661 For the datasets from Turkey, yield information was in kilograms Tree Regression while in the USA dataset it was percentage based information. Neural Network 0.969594 1,4013830 -0,4013830 Regression Relative Relative Coefficient of 4. EXPERIMENTAL RESULTS Regression Type Absolute Error Squared Error Determination As mentioned earlier, we applied four different regression models Linear Regression 0.572134 0.391993 0.608007 to our datasets. We applied 10-fold cross-validation approach for all the case studies and calculated the Coefficient of Bayesian Linear 0.325579 0.157755 0.842245 Regression Determination parameter with the help of Azure Machine Learning Studio. Coefficient of Determination is a value between Boosted Decision 0.512205 0.329844 0.670156 0-1 which determines how close the prediction is to the reality. Tree Regression While experimenting, we have seen that both the features and the Neural Network 1,7276340 3,7723160 -2,7723160 amount of data affect the results. As seen in Table 1 in our Regression train/score model, the most simplistic approach which is the Table 1 South East Region of Turkey Wheat Yield ML Results Linear Regression scored the best results. Neural Network regression failed because of the insufficient number of records. United States of America Wheat Yield ML Results During the 10-fold cross validation experiments after adding the Relative Relative Coefficient of 2016 data to the training dataset which consisted of the data from Regression Type Absolute Error Squared Error Determination 2013-2015 we observed significant changes on the results, especially for Bayesian Linear Regression. As the number of Linear Regression 0.376717 0.173927 0.826073 records raised in the dataset, Boosted Decision Tree regression Bayesian Linear 0.389548 0.180413 0.819587 had more information to train itself with better results. Regression In the second case study, we only had two years data with great Boosted Decision 0.105499 0.01352 0.98648 amount of records for machine learning algorithms to learn from. Tree Regression As in Table 2 all the ten-fold cross validation results increased Neural Network 0.000736 0.000001 0.999999 while Neural Network Regression model, unlike in the first case Regression study, giving a satisfying result. With the lack of the past two Table 2 United States of America Wheat Yield ML Results year’s data, we chose to base our web service on the first case study’s dataset and developed further on from there. The web application we developed uses a basic input-output style interface to interact with the user and predict the crop yield when given the 6. ACKNOWLEDGMENTS input. Input consists of the following features: Region and provenance of the field, current temperature, yearly maximum and Data for this project is provided by TARBIL Agro-Informatics minimum temperature, total precipitation, growing day degree, Research Centre in İstanbul Technical University. Authors would temperature difference parameter, Photo Thermal Unit, Helio like to thank to technical and management staff in this research Thermal Unit and evapotranspiration parameter. centre who helped us to prepare the dataset. 5. CONCLUSION AND FUTURE WORK The objective of the crop yield prediction studies is to forecast the REFERENCES crop yield as early as possible during the crop growing season. Weather and climate affect this agricultural production dramatically. In this study, we developed an end-to-end wheat [1] J. You, X. Li, M. Low, D. Lobell, and S. Ermon, “Deep yield prediction system using machine learning algorithms. Case gaussian process for crop yield prediction based on remote studies were performed on the datasets retrieved from south-east sensing data”, In AAAI, pp. 4559-4566, 2017. region of Turkey and four states in United States. Linear 9 [2] K. B. Newbold, Population Growth. The International [6] D. Gouache, A. S. Bouchon, E. Jouanneau, and X. Le Bris, Encyclopedia of Geography. 2017. “Agrometeorological analysis and prediction of wheat yield at the departmental level in France”, Agricultural and Forest [3] Cloud Technology Partners, https://www.cloudtp.com/doppler/feeding-10-billion-people/ Meteorology, Vol. 209, pp. 1-10, 2015. (2017) (accessed June 17, 2017). [7] P. Chen, and Q. Jing, “A comparison of two adaptive [4] United Nations, zero hunger: why it matters? Sustainable multivariate analysis methods (PLSR and ANN) for winter development goal wheat yield forecasting using Landsat-8 OLI images”, http://www.un.org/sustainabledevelopment/wp- Advances in Space Research, Vol. 59, Issue 4, pp. 987-995, content/uploads/2016/08/2_Why-it- 2017. Matters_ZeroHunger_2p.pdf (2015) (accessed June 17, [8] M. Stas, J. Van Orshoven, Q. Dong, S. Heremans, and B. 2017). Zhang, “A comparison of machine learning algorithms for [5] Y. Çakır, M. Kırcı, and E. O. Güneş, “Yield prediction of regional wheat yield prediction using NDVI time series of wheat in south-east region of Turkey by using artificial SPOT-VGT”, In Agro-Geoinformatics (Agro- neural networks”, In Agro-geoinformatics (Agro- Geoinformatics), pp. 1-5. 2016. geoinformatics 2014), pp. 1-4, 2014. [9] USA Wheat Data, https://github.com/prateek47/Wheat_Prediction 10 Using cognitive software to evaluate Natural Language Classifiers - A Use Case Camilo Torres Marta S. Tabares Edwin Montoya Department of Informatics and Department of Informatics and Department of Informatics and Systems Systems Systems Universidad EAFIT Universidad EAFIT Universidad EAFIT Medellín, Colombia Medellín, Colombia Medellín, Colombia ctorres9@eafit.edu.co mtabares@eafit.edu.co emontoya@eafit.edu.co Aida Kamišalić Faculty of Electrical Engineering and Computer Science University of Maribor Maribor, Slovenia aida.kamisalic@um.si ABSTRACT access, while in others, they can be accessed by purchas- The current techniques for natural language processing can ing on demand packages. This type of information allows be used to identify valuable information such as sentiments companies to carry out market analysis or search for com- or patterns recognized and adjusted for different topics. To munities of potential clients [2]. apply these techniques, it is required to know how to use and tune prediction models. This requires time, experience The available literature presents several research projects and the implementation of different tests to ensure the cor- about the algorithms and techniques used for natural lan- rect behavior of the models. The aim of this paper is to guage processing [3]. Their results indicate that the time detect the features to train and evaluate classifiers instances required to implement such techniques and algorithms de- using optimized software, specifically, IBM Bluemix, and its pends on users’ previous mathematical knowledge and on module named Natural Language Classifier. The created the tuning of the mathematical functions used in the pro- classifier was trained with real tweets to classify the texts cess. Therefore, despite the existing different solutions for into three categories: Positive, neutral and negative texts. text analysis, the implementation of such algorithms may Afterwards, the classifier was validated with a set of already be slow because of different influencing factors such as the classified texts. The obtained results indicate how the num- tuning required for each process and the tests with different ber of training examples impact the behavior of the classifier parameters. and, that the highest accuracy was achieved for positive and negative categories. To address this problem we evaluate the proficiency of a tool to analyze and classify texts generated from social networks. Categories and Subject Descriptors The texts classifications are labeled with the basic polarity used for sentiment analysis, i.e. positive, negative and neu- H.4 [Information Systems Applications]: Miscellaneous; tral labels [4]. The databases for testing and training sets I.2.7 [Artificial Intelligence]: Natural Language Process- were obtained from the public Twitter API and the Spanish ing—Text analysis Society for Natural Language Processing (SEPLN). General Terms Among the existing technologies for natural language pro- Algorithms, Measurement, Experimentation cessing, there are platforms such as IBM Watson, Microsoft LUIS, API.ai, WIT.ai, etc. We decided to use IBM Bluemix, which includes a large variety of Watson services, where the Keywords wide catalog of options can be used for intelligent chats, Natural Language, Classifiers, Machine Learning, Bluemix, texts’ classifications and understanding, as well as the demon- Watson strated results and tests of Watson, such as the Jeopardy game. Here, Watson shows the proficiency to give right an- 1. INTRODUCTION swers using natural language processing [5, 6]. We have used The daily use of social networks currently results in the the Natural Language Classifier component of the Watson large-scale growth in the data and information generated in suite that uses convolutional neural networks to do the cog- the world. There is an expected expansion of 40% per year nitive process from language [7]. and an estimated size of 50 times by 2020 [1]. Generated data are mostly texts created by users in social networks This paper is organized as follows. We present the context such as comments or tweets that, in some cases, have free and the research questions in Section 2. Section 3 summa- 11 rizes the related work for tweets’ polarity classification. The gorithms for mining opinions on twitter and emphasized the methodology to address the problem is explained in Section actions of cleaning and pre-processing the information before 4. Furthermore, we introduce a detailed explanation of the it is submitted to a classifier. Furthermore, they proposed a developed proposal in Section 5. The results of the provided process to make similar classifications according to the pro- work are shown in Section 6. Finally, Section 7 brings the cess carried out in this study, not only based on polarity and conclusions of the paper. sentiment analysis, but also on the objective classification of the opinions expressed in the texts. 2. CONTEXT AND RESEARCH QUESTIONS During the classifier training, problems such as over-fitting Finally, several of the papers found in the literature identi- or poor estimation of the models for the training sets, might fied different problems related to various types of classifiers occur [8, 9]. These factors should be taken into account and, accordingly, there are models that attempt to solve the in order to avoid a bad prediction. Furthermore, adjusting issues combining different types of classifiers. Lima et al. [4] and finding the correct parameters for the adequate behavior and Brahimi et al. [15] proposed hybrid solutions to improve of the algorithm is a task that demands time and effort. classification results using bayesian classifiers, support vec- Technologies, such as IBM Bluemix, already have a set of tor machines, decision trees and k-nearest neighbors. These services used for the natural language processing. These types of solutions make it possible to increase the accuracy can be solutions that save time consumed for the tuning of of classifications and to evaluate which learning methods classifiers. perform best. The identified problem and the context of using classifiers The different algorithms and techniques found in this re- through the natural language processing for sentiment anal- lated work are processes that require time in each of the ysis (polarity) in text, particularly tweets, results in the def- different phases: Pre-processing, extraction, development or inition of the following research questions: algorithms’ testing. In this work, we use algorithms already tested in order to speed up the sentiment analysis and polar- - Which are the characteristics under which a classifier should ity detection in tweets. We used IBM Bluemix specifically be trained using technologies for the natural language pro- Watson and its Natural Language Classifier module. cessing and sentiment analysis? 4. METHODOLOGY - How effective are classifiers trained using frameworks and We propose an approach for the tool evaluation through the automated tools for natural language processing and senti- method developed by Wieringa et al. [16]. We try to solve a ment analysis? problem through an engineering cycle, which is carried out by the treatment or planning of solutions, and is validated 3. RELATED WORK with questions and answers that we made before and after One of the most treated problems found in the studied liter- the treatment. The expected effects are mentioned, and fi- ature is data pre-processing before being used to train classi- nally, the process is concluded using the results obtained fiers. Elements such as sarcasm, expressions, abbreviations, in the treatment. The treatment of this work is described among others, can generate erroneous predictions. Khan et based on the process carried out for the supervised learn- al. [10, 11] focused on developing classifiers with good pre- ing introduced by Kotsiantis [17], where the emphasis is on processing before training. At the same time, they propose the pre-processing of the data. In order to validate the re- hybrid models based on the classification of emoticons, bags sults, a database with texts of already classified tweets was of words, etc. For pre-processing, procedures were proposed obtained through the Spanish Society for Natural Language such as searching dictionaries to check the existence of terms, Processing (SEPLN). replacing abbreviations, completing incomplete words and performing spelling checks. 5. USING NATURAL LANGUAGE PROCESS- ING Mertiya et al. [12] proposed the usage of bayesian classifiers We based our proposal on the supervised learning process to obtain the polarity of a database of tweets that results presented by Kotsiantis [17], where we start with the iden- in classifications with several false positives, which are sub- tification of the required data. Figure 1 shows this process’s mitted to an analysis of adjectives in order to be polarized steps, which are modified for the use case described in this correctly. The problem with this type of classifiers is that the study. First, we obtained the training sets from our own short texts of the tweets have a characteristic named spar- tool, and then we made the texts’ pre-processing for their sity, meaning that the data is not very significant, therefore, correct interpretation in Watson. Furthermore, we present the classifiers may have errors or bad predictions. the contribution for the use of Bluemix. To avoid the problems that happen when this type of short texts are classified, He et al. [13] proposed a different ap- 5.1 Data identification proach using a clustering algorithm called k-means in order The used data are the different texts from the tweets database. to discover related topics, based on the premise that the We used Cloudant, a managed NoSQL JSON database ser- texts will be more informative if they are grouped into sim- vice, to perform querying easily through an HTTP API. We ilar topics. Then, the obtained clusters are used to train a imported the data in order to create the training sets. bayesian classifier. Almeida et al. [14] performed an evaluation of supervised al- 12 Table 1: File structure and example Text Class “Let’s leave the skin to create a job and our econ- positive omy grows again. #MensajeGri˜ nan” “2012 will be a year of titles. Play in a team. Win as a team. Who’s with me?#makeitcount positive http://t.co/Ue7Kh2De” “#FF @BRmodainfantil moms with children, do not miss it, the best online shop for children’s positive fashion!” “Impressed by the violence of the media in Mo- negative rocco. Pushing to photograph Rajoy in Rabat” “The one who does not want to follow me does not follow me, but the masochist must stop com- negative plaining and enjoy” “These are hard times for everyone! The worst thing will be the staff adjustments, which will negative not be delayed...” “You can also follow it in the channel 24 hours neutral of RTVE” “A few hours remain to close the last draw of the neutral year. There is still time to sign up” “In the Vatican City” neutral Finally, the classifier can also be consulted through HTTP requests. The obtained results are the probabilities that an evaluated text could be in each class. Figure 1: Supervised learning process exposed by 5.5 Evaluation with the test set Kotsiantis [17] and complemented in this paper. To perform the evaluation of the created classifier, we used a test dataset with texts already polarized by the SEPLN. We used 15,000 texts to retrain the classifier in order to make 5.2 Definition of the training sets comparisons with the first training set and, at the same time, To facilitate the selection of the tweets, we developed a web be evaluated with a test subset taken from the total texts. application in Node.js which makes queries to the database and selects a tweet randomly. The selected tweet must be After we retrained the classifier, we used a subset from the assigned to one of the defined classes (positive, negative or test database to perform the validation and test the accuracy neutral). If it is not possible to label it with one of these po- for the classifier with the respective training sets. We took larities, labeling can be omitted or N/A (not applicable) se- 300 texts for each class, i.e. in total there were 900 tweets, lected. When each class contains approximately 600 tweets, to test how much the classifiers instances approached their they are exported in CSV format using a script in Node.js predictions regarding their polarity from the SEPLN. We which queries the instances through the Cloudant HTTP use these 900 texts in both classifiers instances to compare API. the results. 5.3 Data pre-processing 6. RESULTS Following the proposed engineering cycle, and based on the The texts used as training should go through a cleanup pro- results from the performed tests, some answers can be de- cess where special characters like quotes and break lines are rived for the raised research questions. Regarding the ef- replaced as indicated in the Bluemix documentation. For fectiveness of the classifiers, the results obtained were ac- example, each text must be enclosed in quotation marks, if ceptable in the positive and negative classes for the second this character is repeated, it must be added twice, i.e. re- training set. The neutral texts class presents results varying placing ” so as ”” to distinguish it from the one that encloses in both tests, which leaves evidence of the subjectivity that the training text. We perform this process when the CSV file this type of sentences present. It is important to note that is created. Table 1 shows a file example with two columns, the texts have not been filtered by any process to remove the first one has the training texts and the second the class stop words, URLs, hashtags, and other types of words that to which each sentence corresponds. could affect the classifiers’ prediction. Tables 2 and 3 show the results for the first and the second training sets, respec- 5.4 Training the classifier tively and table 4 shows the accuracy for each training set. We used the HTTP API of Bluemix, which, through a web service, creates the classifier from the training file separated We observe that the classifier with the second training set by commas. At the beginning the classifier is in a training presents better results than the classifier with the first train- state. The time the classifier needs to be prepared for the ing set. The second training set was created with 15,000 consultation varies, depending on the size of the training set. records, which is the maximum number of records supported 13 Services by Big Data Analytics. Mobile Networks and Table 2: Results of the test set for the first training Applications, pages 1–8, dec 2016. set Right pre- Right predictions’ [3] Fabrizio Sebastiani. Machine learning in automated Total dictions rate text categorization. ACM Computing Surveys, Positive 300 105 35% 34(1):1–47, mar 2002. Negative 300 158 52.6% [4] Ana Carolina E S Lima, Leandro Nunes De Castro, Neutral 300 191 63.6% and Juan M. Corchado. A polarity analysis framework for Twitter messages. Applied Mathematics and Computation, 270:756–767, nov 2015. Table 3: Results of the test set for the second train- [5] Grady Booch. The soul of a new watson, jul 2011. ing set [6] D. A. Ferrucci. Introduction to ”This is Watson”. IBM Right pre- Right predictions’ Total Journal of Research and Development, dictions rate 56(3.4):1:1–1:15, may 2012. Positive 300 235 78.3% Negative 300 254 84.7% [7] Carmine DiMascio. Create a natural language Neutral 300 147 49% classifier that identifies spam. https://www.ibm.com/developerworks/library/cc-spam- classification-service-watson-nlc-bluemix- Table 4: Accuracy for each training set trs/index.html, Training set 1 Training set 2 2015. Accuracy 50.4% 70.7% [8] Alex A. Freitas and Alex A. Understanding the crucial differences between classification and discovery of association rules. ACM SIGKDD Explorations by the Bluemix Natural Language Classifier module. It is Newsletter, 2(1):65–69, jun 2000. probable that the large set of texts and the type of texts used [9] Douglas M. Hawkins. The Problem of Overfitting, by the SEPLN, made the classifier with the second training 2004. set get a better prediction and closer to the original polarity [10] Farhan Hassan Khan, Saba Bashir, and Usman of the test database. It is also probable that the class for Qamar. TOM: Twitter opinion mining framework neutral texts will be more subjective and, therefore, could using hybrid classification scheme. Decision Support be the reason for obtaining different results in the two tests. Systems, 57(1):245–257, jan 2014. We conclude that the classifier obtained better results with [11] Farhan Hassan Khan, Usman Qamar, and M. Younus the second training set because of the large number of ex- Javed. SentiView: A visual sentiment analysis amples. framework. In International Conference on Information Society, i-Society 2014, pages 291–296. 7. CONCLUSION IEEE, nov 2015. We proposed the usage of natural language classifiers, us- [12] Mohit Mertiya and Ashima Singh. Combining Naive ing IBM Bluemix and its services for text analysis, in order Bayes and Adjective Analysis for Sentiment Detection to speed up the process of parameterization and algorithms’ on Twitter. 2016 International Conference on tuning. We conclude that the classifiers created in this man- Inventive Computation Technologies (ICICT), pages ner have a good effectiveness according to the texts’ cleaning 1–6, aug 2016. process. The neutral classification is the most subjective and [13] Yunchao He, Chin Sheng Yang, Liang Chih Yu, prone to bad predictions. It is important to emphasize that K. Robert Lai, and Weiyi Liu. Sentiment classification the cleaning process has a great influence on the classifica- of short texts based on semantic clustering. In tion results, in addition to the subjectivity in the creation Proceedings of 2015 International Conference on of the training sets. Orange Technologies, ICOT 2015, pages 54–57. IEEE, dec 2016. [14] Yudivian Almeida and Velarde Suilaan. Evaluacion de 8. ACKNOWLEDGEMENT Algoritmos de Clasificacion Supervisada Para El We acknowledge the support of the Colombian Center of Ex- Minado De Opinion en twitter. Investigación cellence and Appropriation on Big Data and Data Analytics Operacional, 36(3):194–205, 2015. - Alianza CAOBA (http://alianzacaoba.co/), under which [15] Belgacem Brahimi, Mohamed Touahria, and the project is developed. We sincerely thank the researchers Abdelkamel Tari. Data and text mining techniques for and students who participated in tweets’ classification. classifying Arabic tweet polarity. Journal of Digital Information Management, 14(1):15–25, 2016. 9. REFERENCES [16] Roel J. Wieringa and Ayse Morali. Technical Action [1] Ibrar Yaqoob, Ibrahim Abaker Targio Hashem, Research as a Validation Method in Information Abdullah Gani, Salimah Mokhtar, Ejaz Ahmed, Systems Design Science. Design Science Research in Nor Badrul Anuar, and Athanasios V. Vasilakos. Big Information Systems. Advances in Theory and data: From beginning to future. International Journal Practice, 7286:220–238, 2012. of Information Management, 36(6):1231–1247, dec [17] S B Kotsiantis. Supervised machine learning: A 2016. review of classification techniques. Informatica, An [2] Francesco Piccialli and Jai E. Jung. Understanding International Journal of Computing and Informatics, Customer Experience Diffusion on Social Networking 31(31):249–268, 2007. 14 An Analysis of BPMN-based Approaches for Process Landscape Design Gregor Polančič Jernej Huber Marta S. Tabares Faculty of Electrical Engineering and Faculty of Electrical Engineering and Universidad EAFIT, Computer Science, Computer Science, Department of Informatics University of Maribor University of Maribor and Systems, Maribor, Slovenia Maribor, Slovenia Antioquia, Colombia gregor.polancic@um.si jernej.huber@um.si mtabares@eafit.edu.co ABSTRACT represent process landscapes with a subset of BPMN elements. Process landscapes represent the top part of an organizational However, despite BPMN is an ISO and de-facto standard for process architecture. As such, they define the scope and process modeling, landscapes are out of its scope. relationships between its processes. Process landscapes diagrams While the non-normative BPMN-based process landscape simplify process-related communication by leveraging the benefits diagrams appear in practice, this article reviews and analyses of visual notations. However, in contrast to business process related approaches to identify their strengths and weaknesses. diagrams, where nowadays BPMN is the prevalent notation, Based on analyzed approaches, we evaluate the applicability of process landscape diagrams lack of standardization. In the article, BPMN for such modeling purpose. we review and analyze notations used for modeling process landscapes, as well as non-normative BPMN-based approaches applicable for their representation. Based on analyzed approaches, 2. PROCESS ARCHITECTURES AND we evaluate the applicability of BPMN for process landscape LANDSCAPES design. Process landscapes represent the top part of a process architecture - a conceptual model that organizes processes of a company and Categories and Subject Descriptors makes their relationships explicit (Figure 1). C.0 [Computer Systems Organization]: General – Modeling of computer architecture; D.2.9 [Software engineering]: Management – Software process models (e.g., CMM, ISO, PSP) Landscape level General Terms Management, Documentation, Standardization, Verification Strategic level Keywords Operational Process landscape, process map, BPMN, analysis level 1. INTRODUCTION Figure 1: A conceptual representation of a process A common starting point for process design and all activities architecture related to BPM is to identify and structure organization’s processes A process architecture usually defines two types of relationships: (i.e. process identification phase) [1]. Regularly, users tend to horizontal and vertical. Horizontal relationships define represent identified processes in a visual manner in the form of a ‘output/input’ relationships between processes, i.e. the outcome of process landscape (i.e. process map) diagram. a process represents an input for the next process, (e.g. ‘consumer- The main purpose of process landscapes is to specify organizational producer’ or ‘order-to-cash’ relationship). Vertical relationships processes on a bird’s-eye view. With process landscapes, an between processes define different levels of details of a process, i.e. organization can more easily gain an overview of its main processes a process diagram on a lower level represents a more detailed view and their major interdependencies. Therefore, the usage of process of the same process on the level above. landscapes simplifies process-related communication and represent The top-level of a process architecture is commonly reserved for a starting point for detailed process discovery (i.e. AS-IS process process landscape diagrams. A single process landscape diagram modeling). Besides, process landscapes are a common way to shows the main processes of an organization as well as the represent processes-based reference models for the operation (e.g. dependencies between them, which is shown in the Figure 2 and ITIL, CMMI) and the management (e.g. COBIT) of organizational Figure 3. Those two figures represent the two examples of process IT infrastructure and services. landscape diagrams with processes as ‘black-boxes’ and arrows There are no standardized languages for creating process representing the flow of deliverables between different processes. landscapes. Consequently, modelers most commonly define their Rectangles represent the stakeholders, external to an organization. own ‘overviews of processes’ by imitating existing diagrams (e.g. value chains) or proposing their own more or less intuitive representations. A common approach for BPMN experts is to 15 process. E.g., supportive processes are usually positioned below the core processes, with arrows pointing up, whereas management processes are positioned above them, with arrows pointing down (Figure 5). c. Parent / child process relationships. Processes may be hierarchically organized which is represented either by (1) visualizing sub-processes by using (visual) sub-sets or (2) by using non-directed solid lines between processes as common in ‘organizational charts’ (Figure 6). Business Business Process Process Business Business Business Figure 2: An example landscape diagram (ISO 9001) Process Process Process Business Business Business Process Process Process Figure 6: Hierarchical relationships between processes – subsets (left), organigram-based (right) d. Process sequence. The sequence which defines the order on how processed are performed is mainly represented implicitly with a horizontal sequence of process symbols (Figure 7, the left process is performed prior to the right one). Figure 3: An example landscape diagram [2] Business Business Business A process landscape diagram serves as a framework for defining Process Process Process the priorities and the scope of process modeling and redesign projects. Each element of a process landscape model may point to Figure 7: Implicit representation of a sequential relationship a more concrete business processes on the lower levels. between processes 2.1 Process landscape notation However, since this implicit representation of a sequential relationship enables only a simple linear relationship between A visual notation (i.e. visual language, graphical notation, or processes, explicit representations of processes orderings are diagramming notation) consists of a set of graphical symbols visualized with solid directed lines (Figure 8). Another (visual vocabulary), a set of compositional rules (visual grammar) drawback of implicitly ordering the processes is that a diagram and definitions of the meaning of each symbol (visual semantics). reader could misinterpret a set of non-sequentially performed A common denominator of process landscape diagrams (Figure 2 processes, put in a line, as being performed sequentially. and Figure 3) are the following elements: a. Business process. Although not explicitly defined, the Business Business landscape diagrams clearly highlight the concept of a business Process Process process. Visually, a business process is frequently represented with an arrow, where there are also alternative representations, Business Business e.g. a rectangle and a rectangle with rounded corners (Figure Process Process 4). Figure 8: Explicit representation of a sequential relationship Business Business Business between processes Process Process Process Arrows-based representation of process ordering enables more Figure 4: Business process symbols complex ordering relationships (e.g. when a process ends, two processes are initialized). Sequential relationships might be b. Process groups / types. On a process landscape diagram, the labelled, representing artefacts or data being transferred business processes are commonly distinguished by their between processes (i.e. process outputs – process inputs as purpose (e.g. core processes, management processes and presented in the Figure 3). supportive processes), which is visualized either by (1) encircling and labelling a set of processes (Figure 5, left) or e. Participant. A participant, usually visualized with a rectangle (2) specializing the process symbol for individual types of (Figure 9), presents someone who is involved (i.e. internal processes (Figure 5, right). participant) or interacts (i.e. external participant) with a Management business process. Most commonly, process landscapes process Group of business processes visualize external participants (e.g. suppliers and customers), Core Business Business Business which are related to processes, either by providing inputs or Process Process Process Process receiving outputs. This corresponds to the concept of a ‘value system’ which consists of following value chains: supplier, the Support process focal enterprise and consumer [2]. The relationships to participants are represented either implicitly (e.g. with Figure 5: Representation of a group (left) and/or type of leveraging visual planar variables) or explicitly (with solid processes (right) arrows). Besides manipulating the shapes of symbols, the planar visual variables and symbol orientation might imply the type of a 16 diagrams but merely an abstract view of BPMN collaboration r r ei e l diagrams. Business p iv p label e Process c u Conversation S Re Node Pool Pool Figure 9: Representation of (external) process participants and their (explicit) relations Pool 3. BPMN-based approaches Business Process Model and Notation (BPMN) is a well- established standard for process modeling and automation [3]. Figure 11: BPMN Conversation diagram From the modeling aspect, it defines a vocabulary, grammar and Conversation diagrams are an effective way for representing semantics for creating different types of process diagrams, namely: interactions between processes; however, similar to previous process diagrams, collaboration diagrams, choreography diagrams approach, they are based on a small set of elements, which are and conversation diagrams. In light of process diagrams, BPMN inappropriate for modeling of conventional process landscapes (i.e. states that [4] “processes can be defined at any level from conversation nodes, representing correlated messages and pools, enterprise-wide Processes to Processes performed by a single representing participants or processes). person.”. Although this could be understood as BPMN supports modeling of process landscapes, they are not mentioned in any 3.1.3 Enterprise-wide process diagrams version of the specification, nor recommended by researchers [5]. As stated in the specification [4], BPMN can be used for business Nevertheless, since BPMN is widely adapted by industry, modelers process modeling on any level of granularity. In accordance to this, frequently use BPMN for visualizing systems of black-box the system of an organization’s processes may be modeled as a processes (i.e. some kind of process landscapes) by applying the single process consisting of individual processes being modeled as approaches, presented in the next sub-chapters. activities, i.e. sub-processes (Figure 12). 3.1.1 Abstract collaboration diagrams A common and syntactically valid BPMN representation of process landscapes is to use black-box Pools and Message flows, i.e. Process Process Process collaboration diagrams with hidden details (Figure 10). A BPMN Pool is a visual representation of a Participant, which may reference Figure 12: BPMN Sub-processes representing processes at most one business process. A Message flow represents exchange of messages between two ‘message aware’ process elements (e.g. activities, message events and black-box Pools). By using this approach, one is able to present the majority of process landscape constructs, namely processes (i.e. with BPMN sub-processes), sequential interactions (i.e. BPMN sequence flows), groups or types of processes (i.e. BPMN group element) Pool Message Pool and participants (i.e. BPMN lanes). However, there are several major drawbacks of this approach. First, such diagrams are visually inconsistent with process landscape diagrams (e.g. processes being Pool Message Message represented with rounded rectangles and participants with horizontal lines). Second, these diagrams are inconsistent with Figure 10: BPMN Pools and Message flows BPMN syntax and semantics, making them invalid (e.g., BPMN Process and BPMN Sub-process are two distinct BPMN meta- model elements). Third, this approach is also impractical, since the The strength of such representation of a process landscape is majority of processes are discovered on a lower level of granularity compliance with BPMN specification and simplicity. On the other (e.g. based on the services or products a business process delivers) hand, there are several drawbacks. First, the visual appearance of and afterwards interrelated into a process landscape diagram. this approach is unconventional for process landscapes (i.e. processes being represented with rectangles). Second, the relationships between processes represent information exchange, 4. DISCUSSION where process landscape diagrams most commonly visualize Table 1 summarizes a comparison of BPMN-based approaches for sequential relationships between processes and processes landscape design in respect to common process landscape concepts. clustering. Third, there is a lack of concepts, which may be In respect to abstract syntax comparison, we can conclude that none regularly used for landscape modeling, namely, sequential of aforementioned BPMN approaches supports all of the concepts relationships, process hierarchy and process types, whereas there is common in process landscapes modeling. Besides, the following a symbol deficit in case of representing a participant and a process inconsistencies exist. The first and second approach uses the same (rectangle symbol is used in both cases). element for representing a participant and a process – BPMN Pool (i.e. symbol overload), whereas the third approach uses element 3.1.2 Conversation diagrams BPMN Activity in contrast to its definition (i.e. semantics). Conversation diagrams. Another valid way for representing process landscapes in BPMN is by using Conversation diagrams (Figure 11), which were introduced in the second major revision of BPMN. Formally, they are not a standalone type of BPMN 17 Table 1: Comparison of BPMN-based approaches for landscape design Process landscape Common BPMN approach for landscapes modeling concept visualization 1 - Abstract collaboration 2 - Conversation diagrams 3 - Enterprise-wide diagrams process diagrams Business process See Figure 4 BPMN Pool BPMN Pool BPMN Activity Process group / cluster See BPMN Group BPMN Group BPMN Group Figure 5, left Process type See Figure 5, right No standardized BPMN No standardized BPMN No standardized BPMN element element element Hierarchical See Figure 6 No standardized BPMN No standardized BPMN Parent activity – child relationship between element element activity relationship processes Sequential See Figure 7 and No standardized BPMN No standardized BPMN relationship between Figure 8 element element BPMN Sequence flow processes Information flows See Figure 8 BPMN Message flow BPMN Message flow Directed association Conversation Node Internal and external See Figure 9 participant BPMN Pool BPMN Pool BPMN Pool In respect to the concrete syntax comparison (i.e. notation), Table 1 demonstrates that none of BPMN approaches result in diagrams 6. REFERENCES with a graphical similarity to common landscape diagrams. [1] L. Fischer, R. Shapiro, B. Silver, and Workflow Management Coalition, BPMN 2.0 handbook second edition: methods, According to above, we can conclude that BPMN is inappropriate concepts, case studies and standards in business process for modeling the process landscapes. This finding is also supported management notation. Lighthouse Point, FLa.: Future by Freund and Rücker [6], who state that ‘even when we’ve already Strategies, 2012. modeled one or more process landscapes using BPMN at a customer’s request, primarily with the collapsed pools and [2] M. Weske, Business process management concepts, message flows described we cannot recommend doing this’. languages, architectures. Berlin; New York: Springer, 2012. [3] M. Kocbek, G. Jost, M. Hericko, and G. Polancic, “Business Analytically, this was confirmed by Malinova [7], who performed process model and notation: The current state of affairs,” a semantical mapping between BPMN and ‘Process maps’. Her Comput. Sci. Inf. Syst. , vol. 12, no. 2, pp. 509–539, 2015. results show that BPMN is not appropriate for process landscape [4] OMG, “Business Process Model and Notation version 2.0,” design. 03-Jan-2011. [Online]. Available: According to the benefits and weaknesses of existing approaches http://www.omg.org/spec/BPMN/2.0/. [Accessed: 15-Mar- for (BPMN-based) process landscape design, following research 2011]. directions are feasible. First, a standardized language for process [5] J. Freund and B. Rücker, Real-Life BPMN: Using BPMN 2.0 landscapes may be designed by considering the best practices of to Analyze, Improve, and Automate Processes in Your non-formal process landscape notations. The focal risk of this Company, 2 edition. CreateSpace Independent Publishing research direction is to develop a solution, which has to gain Platform, 2014. standardization and industry adoption. Second, BPMN structure [6] J. Freund and B. Rücker, Real-Life BPMN: With introductions and the notation may be extended for effective support of process to CMMN and DMN, 3 edition. CreateSpace Independent landscapes. In this case, the major risk is the intervention into the Publishing Platform, 2016. structure and notation of a well-adopted and standardized language. [7] M. Malinova and J. Mendling, “Why is BPMN not appropriate for Process Maps?,” ICIS 2015 Proc. , Dec. 2015. 5. ACKNOWLEDGMENTS The authors acknowledge the financial support from the Slovenian Research Agency (research core funding No. P2-0057). 18 Approach to an alternative value chain modeling Miha Pavlinek Marjan Heričko Maja Pušnik Faculty of Electrical Engineering Faculty of Electrical Engineering Faculty of Electrical Engineering and Computer Science, and Computer Science, and Computer Science, University of Maribor University of Maribor University of Maribor Maribor, Slovenia Maribor, Slovenia Maribor, Slovenia miha.pavlinek@um.si marjan.hericko@um.si maja.pusnik@um.si ABSTRACT contribution to the commercial success. In a way it organizes the This paper is focused on describing an alternative approach to activities of an enterprise. For example, value chains are a well- modeling value chains, which are an important part of presenting known approach in business administration to organize the work business activities, and the value each activity delivers. They that a company conducts to achieve its business goals. Value document translation of data and services into business value, chains are often used in business modeling for different areas, e.g. essential in times of ever growing productivity and competition. medicine, lists of online services, etc. In this paper, we propose an There are several possible notations for value chain modeling, adapted approach to model value chains in the smart city domain, each holding specific characteristics. Since each domain has its where the value chain describes the transformation cycle of data own demands, the goal of this paper is focused on finding the into value for the benefits of citizens and the community [3]. most suitable approach to modeling based on one or more After the Introduction section, the history of value chain notations notations, addressing a representative domain within smart cities is presented in Section 2. The proposed approach is described in (a case study of the health domain is included). An approach to Section 3, supported by an example of modeling value chains in value chain modeling is supported by existing notations and Section 4. Lastly, conclusions and future work are presented. documentation techniques. 2. VALUE CHAINS Categories and Subject Descriptors The idea of value chains is to represent an organization as a H.5.3 [Information interfaces and presentation]: Group and system divided into subsystems with inputs, transformations and Organization Interfaces - collaborative computing, organizational outputs. The process of turning inputs into value-added outputs design. usually consists of various activities, where some of them are primary, and others are supporting activities [5]. The most General Terms common notation for a value chain is by Porter; often used within EU projects like “Project BLUENE”, “European Big Data Value Documentation. Design. Partnership Strategic Research and Innovation Agenda”, “European Data Portal”. Porter’s value chains are, basically, used Keywords to identify activities conducted by specific companies, with the Value chains, use cases, smart city. purpose of providing a product or a service. They can be applied in different fields, such as the definition of B2B and B2C 1. INTRODUCTION segments in any field. An example of a classic Porter’s value chain is presented in Figure 1. The paradigm shift in business practices is going from the “product-driven orientation” of the past to today’s “customer- driven orientation”, which is characterized by increased demand of variability, product variety, amounts of customer-specific products, and shortening product life cycles [1]. Therefore, it is beneficial to the business to identify the key activities and capabilities that flow through the business and define a value chain [2]. A value chain is a high-level model intended to describe the process by which businesses receive raw materials, add value to the raw materials through various processes to create a finished product, and then sell that end product to customers. The concept was suggested by Michael Porter in 1985 [4]. The raw material and product concept can also be transferred to business services and different, intangible business goals. For achieving business goals, companies have to cooperate with or within each other, and their value chains are connected in so- Figure 1. Classic value chain by Porter [4] called value systems. Each value system consists of a number of value chains, each of which is associated with one enterprise. With Porter, notation support is provided to describe not only A value chain simplifies complex value systems, since it breaks classic supply chain processes, but also services and down the activities a company performs, and analyzes their collaborations among companies that use them. Despite their 19 popularity, classic value chains are useful only in a limited range and others can use information and services to improve their of domains. The main disadvantage is their manufacturing effectiveness [9]. Sharing outcomes is important for promotional oriented format. Therefore, other forms of value chains appeared, purposes, to inform not just end users about it, but also especially especially in the field of ICT, where there are several alternatives developers, who can use it as inputs to develop new innovative to value chain modeling. The first who applied value chains to solutions. The final step is actually repeating. With new data, Information Systems were Rayport and Sviokla within their work technology updates and a new audience, a digital value chain is on virtual value chains [6]. Some other relevant value chains in changing constantly. the ICT domain are the following:  Service value chain – the relative value of the activities 3. PROPOSED APPROACH required to establish the product or service, Despite several notations which were listed and described briefly, none of them fulfilled completely the needs of a complex smart  Service value network – a dynamic way of providing city. In this paper, a proposal of an alternative approach is given, services based on a coordinated value chain of companies, aggregating existing practices, adjusted to the needs of a smart  Value stream mapping – a method for analyzing the current city domain. Usually the aim of the value chain is to increase state and planning a future sequence of events that lead the profits by creating a value, but value chains can also be used to product or service from beginning to client, identify opportunities where end users benefit from the final  Data value chain – information flow is described as a series outcome. In the ICT domain, identification of value chains needs of steps needed to generate value and useful insights from detailed consideration of existing problems, obstacles, potential data [7], improvements based on ICT, and the inclusion of various stakeholders. The approach described in this paper is based on  Big data value chain –modeling the high-level activities that digital value chains and on documenting existing data, services comprise an information system, and and processes. It is designed to address and solve issues in several smart city fields using ICT tools and techniques. The most  Digital value chain –a set of processes designed to transform important activity is the documentation of key challenges, target raw data into actionable information that can drive better groups and actors, existing data sources, web services and decisions and insights [8]. potential scenarios. Every value chain begins with inputs. In manufacturing, these are In the process of identification and documentation of challenges, raw materials like steel or wood, while in the field of ICT raw each challenge was described clearly with all specifics and details, materials can be considered as raw data. In this step, so that everyone could understand it. References to a service, heterogeneous data can be gathered from mobile internet devices, which presents a potential solution were also provided. An sensor-devices, or extracted from existing sources in structured or example of current challenges in the context of health would be unstructured format. Applying new technologies to existing long waiting hours, unnecessary visits to the doctor and improved products, practices and processes can be best described with control of a patient’s progress. digital value chains, the activities of which are depicted in Figure 2 and described in the continuation. A list of target groups describes key actors who are involved in scenarios. Some of the actors are providers, and others are end users such as citizens. By documenting available data sources and services, various information needs to be provided. Besides the title and description, each end user must understand the data or service purpose and benefits, know who are the owners and potential consumers, and has to be informed about accessibility, privacy restrictions and price. In the case of web services, input and output parameters are important as well. Based on the inventory of existing data and services, potential Figure 2. Digital Value Chain [8] scenarios can be defined. Each scenario is described through flow of events with alternatives, and the entire concept is presented Raw data does not have any value until it is processed. Therefore, with use cases. Some additional information regarding initiating in the second step, collected data is processed and, if necessary, events, participants, included services, inputs/outputs and mashed up and/or visualized. In the processing, activity data is execution is also provided. transformed and mapped from raw format to determined format trough actions such as parsing, joining, standardizing, 4. AN EXAMPLE OF MODELING A augmenting, cleansing, consolidating and filtering. Processed data VALUE CHAIN OF A HEALTH CARE can be combined and exposed through the web APIs, which are analogous to components in manufacturing. More details on this SCENARIO activity are presented within data value chain [9]. As an output, a In this chapter, a real example is presented of an applied approach refined data is provided, new information, or even enhanced to value chain documentation in the smart city domain. functionality, which can be input into the next value chain or used Customized value chains have already been used to define the role by application developers, leaders or other end users. Application and impact of ICT in developing smart cities within other related developers can take aggregated data streams and combine them in works [10]. any number of ways to create information components. Leaders Value chains were designed in accordance with our approach can help with new visualizations to improve decision- making, within the EkoSMART program, the purpose of which is to 20 develop a smart city ecosystem with all the supporting processes based on meetings with representatives from the field, mechanisms necessary for efficient, optimized and gradual the following problems were detected: integration of individual areas into a unified and coherent system of value chains [11]. One of the most important objectives of the  Multiple treatments program is to integrate solutions from different sectors into a  Distribution of services common ecosystem. The resulting value chains, based on  Inflexible working time and poor ordering system technologies like electronic and mobile devices, related software  The burden on healthcare personnel and the long waiting solutions and intelligent data processing, are enhancing the period quality of current services. Moreover, sectoral value chains will  Patient monitoring disabled be inputs for the cross-sectoral value chains.  Inaccessibility of data  Deficient legislation, missing Standards and protocols 4.1 Smart cities and their characteristics Cities are marked with locations that have a high level of accumulation and concentration of economic activities; they are spatially complex and connected with transport systems. The larger the city, the greater the complexity and the challenges and the risks of disturbances. The fundamental paradigm of the present world is the continuous technological advancement, which, on the one hand, represents a certain proportion of new problems, but, on the other hand, technology is precisely the one thing where key solutions to this problem can be found. Since the world cannot be "reversed", it is necessary to look for suitable solutions that would facilitate modern pressures to focus on the core of new life, which is represented largely by Information and Communication Technologies (ICT). The quality services provided by ICT can relieve people greatly, help them with time optimization, organization, and, last but not least, motivate them. The purpose of the field is to develop approaches and prototypes, which provide the basic conditions for effective transformation of the healthcare, traffic, energy, waste and other systems, focusing Figure 3: Use case diagram for health vertical. on the following main fields: In order to address these problems effectively, target groups were • Smart economy, identified in addition to data and services which are needed to • Smart people, enhance existing processes. All the parts were described and • Smart governance, connected in a comprehensive diagram of use cases, which • Smart mobility, includes a list of activities of identified participants (Figure 3). • Smart environment, The common use case diagram includes several possible • Smart living. application scenarios. Individual usage scenarios were also presented in more detail with In the context of a smart city, a value chain is defined as a detailed description, characteristics, flow of events and separate connected activities within a particular sector with various use case diagram. The Establishing treatment scenario is stakeholders, which collaborate with the aim to provide quality explained as an alternative value chain presentation, where services to enhance the life quality and/or strengthen economic activities are as follows: The patient has a problem, (1) Enters the growth in an environmentally friendly manner. Designed value system, (2) The system is assigned a medical treatment, and (3) chains are intended for data owners, service providers, application The patient follows this treatment, trying to achieve the set developers, city leaders, citizens and others. criteria. Characteristics of the scenario are categorized in the 4.2 Designing a value chain for the Health following groups: Basic information, People and IT, Inputs / outputs and Implementation. Table I represents actual sector characteristics for the Establishing treatment scenario. In the Health sector, value chains were identified that should be considered within the context of the introduction of smart A high-level representation of a final value chain with healthcare services, like telemedicine and telecare. The main goal participants, inputs, outputs and intermediate assets can be seen in is the preparation of quality and comprehensive healthcare Figure 4. The value chain has four pillars: Participants, Input services using ICT tools and techniques, where value chains are Data, Information/Services and Output results. The participants designed to identify and upgrade the occasionally problematic are the providers as well as users of the service, followed by all quality of today's treatment and care of these groups, primarily data necessary and, further, more services, designed by and for the through the use of electronic and mobile devices and related participants, based on accumulated data. Finally, based on all software solutions, in particular artificial intelligence in the cloud, previous pillars, final services with added value for the city (or or locally, for example, on a mobile device or with customized company) in the form of different outputs and results are sensors and carrying devices. A connection of existing solutions is presented. planned with new smart city solutions. By documenting health 21 Participants Input data Informations/services Output/results strengths and weaknesses. Several techniques can be used to present the value flow. However, a combination of notations was used for the purpose of presenting the smart city complex system Patient of users, data, services and scenarios. A use case diagram was  Measurements  Messages used to present the behavior and set of actions of several Social network  Measurements participants. Within a use case diagram, several scenarios can be overview  Notifications overview derived; each scenario defined in the form of a Table  Contact details  Reminders Home care provider  Pacient/doctor communication (characteristics of a scenario). Lastly, a high-level representation  Video conference  Simpler patient of a value chain is presented (including the four value pillars). In monitoring  Details about  Informing the future work, refinement of the approach will be performed. patient  Anomaly patients and their General practicioner  Examination detection social network results  Decision support  Joint treatment  Treatments  Informing  Saving time and  Therapies money  Interventions  Relieve health  6. ACKNOWLEDGMENTS Specialist Messages  Entering, viewing professionals Instructions and editing  Effective care Notifications patient  Greater data information availability This joint work is enabled by the program “Eko Sistem  Doctor/doctor  Cost reduction communication Nurse Pametnega Mesta”, supported by the European Union, European  Messages Instructions Health professionals Motivations  Educational Regional Development Fund and Ministry of Education, Science Explanations content overview  Interactive coherence during and Sport. therapy  Educational content Registered nurse  Video guides  Instructions Reference clinic 7. REFERENCES [1] Martínez-Olvera, C., Davizon-Castillo, Y. A. 2015. Modeling the Supply Chain Management Creation of Value Content editor — A Literature Review of Relevant Concepts. Business, Figure 4. High-level representation of a value chain in the Management and Economics » "Applications of Healthcare domain. Contemporary Management Approaches in Supply Chains" (Apr- 2015). Table I Characteristics for the scenario “Establishing [2] Business Modelling. https://www.enterprise- treatment” architecture.org/business-architecture-tutorials/79-business- value-chain. Accessed: 2017-09-15. Scenario name Establishing medical treatment Description of The purpose of the scenario is to [3] Smart City Value Chain. White Paper e-madina. November the scenario describe the initialization of the 2016. http://www.e-madina.org/wp- treatment of a patient with a chronic content/uploads/2016/11/White-Paper-e-Madina-3.0-Value- disease. The scenario involves ordering Chain-of-Smart-cities.pdf. a patient for a review where the doctor [4] Porter, M. E. 1985. Competitive Advantage: Creating and gives them a treatment, and the nurse Sustaining Superior Performance. New York.: Simon and introduces treatment information and Schuster. Retrieved 9 September 2013. informs the patient of the use of the assigned equipment and the [5] "Decision Support Tools: Porter's Value Chain". Cambridge implementation of the activity. University: Institute for Manufacturing (IfM). Retrieved 9 September 2013 Variants If the patient has several treatments, the doctor will obtain further findings and, [6] Rayport, J. F., & Sviokla, J. J. 1995. Exploiting the virtual on the basis of communication with value chain. Harvard Business Review, 73, 75–85. other doctors, will form a joint therapy [7] Curry, E. 2016. The Big Data Value Chain: Definitions, The trigger of The process is passed by a patient who Concepts, and Theoretical Approaches. New Horizons for a the scenario. comes to the check due to the problem. Data-Driven Economy. Springer International Publishing, Participants Patient, Health personnel 2016. 29-37. Included services A service for entering data processing [8] Data Tip #1 – Your Digital Value Chain: 2013. and Editing educational content http://captricity.com/blog/data-tip-1-your-digital-value- Scenario input. Patient information, Data processing chain/. Accessed: 2017-09-15. Scenario output Program, Schedule, Therapies, Educational content [9] Understanding the Data Value Chain. 2014. Activities Obtaining / entering patient [http://www.ibmbigdatahub.com/blog/understanding-data- information, Data entry information, value-chain]. Accessed: 2017-09-15. Entering therapy, Establishing patient / [10] Webb, M., Finighan, R., Buscher, V., Doody, L. and doctor and doctor / doctor Cosgrave, E. 2011. Information marketplaces- The new communication, Editing educational economics of cities. The Climate Group, Arup, Accenture. content (2011). [11] EkoSmart – Ekosistem pametnega mesta. 2017. http://ekosmart.net/. Accessed: 2017-09-1 5. CONCLUSION AND FUTURE WORK Graphical presentation value within any company is an important part of understanding the focus of business processes, their 22 Using Property Graph Model for Semantic Web Services Discovery Martina Šestak Faculty of Organization and Informatics Pavlinska 2, 42000 Varaždin, Croatia +385 42 390 847 msestak2@foi.hr ABSTRACT Through both technologies, the client can access and retrieve the Web services have significantly contributed to the integration of required data from a specific Web service by invoking the correct different businesses. Service-oriented computing (SOC) paradigm interface method of that service. still represents an implementation challenge for developers. Several According to [16], each Web service should be capable of being approaches have been developed over the years for different defined, described, and discovered. Web service description processes related to Web services. Nowadays, traditional Web process can be divided into three layers[14]: services are often supplemented with semantics to achieve higher levels of automation and interoperability. In this paper, a new 1. service invocation approach for semantic Web services discovery based on property 2. service publication and discovery description graphs is proposed. The proposed model proves that the semantic 3. composite web services description Web service model specified in OWL-S language can be Recently, the concept of SOA has been supplemented with represented as a property graph, which can be queried to discover semantic Web concepts, which resulted in semantic Web services Web services based on query parameters. (SWS) technology. SWS enables the Web services to be automated and carried out by intelligent software agents [4]. In SWS, Categories and Subject Descriptors additional meaning is added to the basic Web service information. H.3.3 [Information Storage and Retrieval]: Information Search Thus, the main motivation behind this technology is to increase the and Retrieval – Query formulation, Retrieval models, Search level of automation of information processing, and to improve the process, Selection process. interoperability of Web services. H.3.5 [Information Storage and Retrieval]: Online Information There are several languages developed for semantic Web services Services - Web-based services. as well. OWL-S is the most popular Web service ontology used for SWS description. In OWL-S, a semantic Web service description General Terms consists of three elements [8]: Design, Languages, Standardization, Theory. 1. service profile – contains general information about the Keywords service (name, description, inputs, outputs, preconditions, results) Labeled property graph model, web services discovery, PGQL. 2. service model – contains information about how the service works (by using structures like loops, sequences, etc.) 1. INTRODUCTION 3. service grounding – contains information about how to Application integration is an important challenge in the modern use the service business environment. Over the years, many concepts and solutions In this paper, the focus will be on the service model element. The have been developed to address this challenge (e.g., middleware or process of semantic Web services discovery will be discussed by Enterprise Application Integration solutions). The most recent analyzing several models proposed in the literature. Based on this solution for integrating multiple applications are Web services. work, a new approach will be proposed and explained. Their compliance with the existing Web technologies and standards and platform independency represent a significant advantage. The rest of the paper is organized as follows: in Section 2, different approaches for semantic Web services discovery process will be Web services can be defined as a “software system designed to discussed. In Section 3 and 4, labeled property graph model and the support interoperable machine-to-machine interaction over a property graph query language (PGQL) properties will be network. It has an interface described in a machine-processable explained. In Section 5, the new approach will be introduced and format (specifically WSDL). Other systems interact with the Web described. Finally, a conclusion will be made to summarize the service in a manner prescribed by its description using SOAP characteristics of the proposed approach and challenges, which will messages, typically conveyed using HTTP with an XML be further analyzed in future work. serialization in conjunction with other Web-related standards.”[17]. Nowadays, due to the information overload of SOAP messages, 2. SEMANTIC WEB SERVICES RESTful Web services are used more often. Since their focus is on resources[11], the messages exchanged between applications have DISCOVERY APPROACHES a simpler format, which makes REST services a simpler alternative Web services discovery in general is “the act of locating a machine- to SOAP-based services, which is more applicable in many processable description of a Web service that may have been situations. 23 previously unknown and that meets certain functional criteria” attributes. The proposed QoS approach is agent-based, i.e., [15]. The goal of the process is to find an appropriate service within introduces the additional mediator Agent, which selects the most the Web service directory which meets some predefined criteria. It appropriate Web service available based on different QoS is worth mentioning that in recent years the importance of other parameters set by the clients (users). The authors built an ontology nonfunctional criteria (e.g., reliability, response time, availability, of selected QoS parameters, and used that ontology on the model etc.)[12] has also been recognized, which led to the development of built in OWL-S Editor available in Protégé. different Quality of Service (QoS) modeling approaches in the Klusch et al. [5] introduced a hybrid SWS matching approach with (semantic) Web services description, discovery and composition a mechanism called OWLS-MX, which they applied to services processes. specified in OWL-S. The authors managed to prove that logical As already mentioned, OWL-S is one of the ontology languages, reasoning is not sufficient for semantic Web services discovery, so which can be used in the SWS discovery process. OWL-S service they combined logical reasoning with information retrieval (IR) model contains Web services viewed as a collection of processes, similarity metrics. which represent the specification of how the client interacts with Srinivisan et al. [13] presented an OWL-S/UDDI matchmaker the service [9]. If the process receives information and returns some architecture. The authors used OWL-S Integrated Development new information based on its input, the information production is Environment (IDE) to build and discover OWL-S based Web described only by specifying inputs and outputs of that process. services. OWL-S IDE supports various processes of the SWS Otherwise, if the process makes more complex transformations and lifecycle (service description, publication, discovery and changes, then the production is described by the process execution). The Web service description can be generated based on preconditions and results [9]. A process may require some code or model within the OWL-S Editor. Services descriptions are information to be executed, i.e., it can have any number of inputs, stored inside OWL-S registry. Web service discovery is performed and it can also produce any number of outputs for Web service through executing a query to the registry by using a specific requestors. Thus, process inputs and outputs specify the data Application Programming Interface (API). The registry performs a transformation, which takes place during the process execution. matching process, and returns OWL-S descriptions of the matched A sample OWL-S service model is shown in Fig. 1. A Web service services. called “BorrowedBooks” is shown as a process, which returns the Transaction ID, client name, date when a requested book was 4. PROPERTY GRAPH MODEL borrowed, and whether the book was returned for a given title of The property graph data model is nowadays the base data model for the book and its author. Input information is shown as an incoming many graph databases (e.g., Neo4j, Titan, etc.). This model is an edge, and output information as the outgoing edge from the process. easy-to-understand representation of the way data is stored in graph databases. Since it represents an extension to graphs in Many different approaches have been proposed in the literature for mathematics, it can be formalized in the following way [1]: the discovery (selection) process. A property graph G is a tuple (V,E, ρ, λ, σ), where: a. V is a finite set of vertices (or nodes). b. E is a finite set of edges (or relationships) such that V and E have no elements in common. c. ρ : E → (V × V ) is a total function. Intuitively, ρ(e) = (v1, v2) indicates that e is a directed edge from node v1 to node v2 in G. d. λ : (V ∪ E) → Lab is a total function with Lab a set of labels. Intuitively, if v ∈ V (resp., e ∈ E) and ρ(v) = l (resp., ρ(e) = l ) , then l is the label of node v (resp., edge e) in G. e. σ : (V ∪ E) × Prop → Val is a partial function with Prop a finite set of properties and Val a set of values. Intuitively, if v ∈ V (resp., e ∈ E), p ∈ Prop and σ(v, Figure 1. Sample OWL-S service model p) = s (resp., σ(e, p) = s), then s is the value of property p for node v (resp., edge e) in the property 3. RELATED WORK (Labeled) property graph data model consists of the following In [2], authors have divided different SWS discovery approaches elements [6]: into three categories: algebraic, deductive and hybrid approaches.  Nodes - different entities with attributes and unique The algebraic approach includes approaches based on graph theory identifier (e.g., iMatcher, AASDU1, etc.), the deductive approach uses logic-  Labels - semantical description of the role of each entity, based reasoning (e.g., Inputs and Outputs, Preconditions and where a single node or relationship can have multiple Effects matching, etc.), while the hybrid approach combines both labels at the same time algebraic and deductive approach.  Relationships - connections between nodes, where each Sachan et al. [12] proposed a new modeling approach for QoS- connection has a start and an end node based semantic WS model and formalization of several QoS 1 AASDU (Agent Approach for Service Discovery and Utilization) 24  Properties - key-value pairs, which represent node and The pattern (n)-[e]->(m) defined in the WHERE clause represents relationship attributes a topology constraint, which is a description of a connectivity relationship between vertices and edges in the pattern [10]. A simple property graph model shown in Fig. 2 contains 3 nodes. Node labeled “Group” has a property “Name”, and the other two For n-step hops between nodes it is possible to specify path nodes with no labels have properties “Name” and “Age”. These two expressions, which are then used in the WHERE clause of the nodes are connected with a relationship labeled “knows”, which has query. a property “Since” indicating for since when the two persons know each other. 6. PROPERTY GRAPH-BASED APPROACH FOR SEMANTIC WEB SERVICES DISCOVERY Since the OWL-S representation of the service model is a graph, the characteristics of property graph models can be applied to the OWL-S service model. This graph-based service model can then be queried using the PGQL clauses during the semantic Web services discovery process. The proposed approach will be explained on the sample Web service model shown in Fig. 3. The Web service called “MovieService” contains the following three processes: 1. GetMovieGenres – for a given movie name and year returns the genre name of that movie 2. GetMoviePersonnel – for a given movie name and year return the list of all actors’ and directors’ names of that Figure 2. Sample property graph model [3] movie Property graphs represent an expressive and simple mechanism for 3. GetGenreDirectors – for a given genre name returns the describing the richness of data [7], where a connection between two list of directors’ names which produced movies in that nodes is easily represented, and both nodes and relationships can genre have various attributes of different complexity. The property graph model is an easy-to-understand representation of the property graph, which is why it can be used for modeling semantic information about Web services. As shown in the previous section, the OWL-S service model is also a directed graph. Thus, in the proposed approach the described concepts of the property graph data model will be applied to the OWL-S service model. However, in order to efficiently query the property graph model, several query languages have been developed. In the following section, the characteristics of the Property Graph Query Language (PGQL) will be discussed. 5. PROPERTY GRAPH QUERY LANGUAGE (PGQL) Figure 3. Sample service model of the proposed approach PGQL is a new SQL-like query language for property graphs The sample service model contains three processes with different developed by Oracle [10]. The language offers a wide collection of number of input and output parameters. Since this is a property statements to be executed in order to query the property graph and graph, both nodes and relationships can be supplemented with find the required data. additional labels and properties. The processes shown in this model are simple, so they are described only with their input and output PGQL is based on graph pattern matching algorithm, i.e., when executing a PGQL query, the query engine finds all subgraphs parameters. within the original graph, that match the specified query pattern. In the example, the nodes representing the processes have a label (type) “Process”, which distinguishes them from parameter nodes To query a property graph, the SELECT clause is used, which labeled “Parameter”. A parameter node can represent both an input specifies the data entities to be returned in the query result. In the and an output parameter of the process (e.g., parameter “Genre example property graph shown in Fig. 3, to return the name of the Name” is the output parameter of the “GetMovieGenres” process, persons who know each other, the following PGQL would be but the input parameter of the “GetGenreDirectors” process). executed: The defined property graph service model can be queries by using SELECT n.name, m.name PGQL. WHERE In order to discover (find) Web services, which use movie name as an input parameter, the following PGQL query would be executed: (n WITH type=’Person’)-[e:knows]->(m WITH type=’Person’) SELECT s.name WHERE (p1 WITH name = ‘MovieName’)-[e1]->(s) 25 It is also possible to discover which Web services can be called to [3] Getting started with SylvaDB: http://sylvadb.com/get- find director names for a given movie name with the following started/. Accessed: 2017-09-14. PGQL query: [4] Klusch, M. 2008. Semantic web service description. PATH get_directors := ()-[]->(s WITH type = ‘Process’)-[]->() CASCOM: intelligent service coordination in the semantic web. Springer. 31–57. SELECT s.name [5] Klusch, M., Fries, B. and Sycara, K. 2006. Automated WHERE Semantic Web Service Discovery with OWLS-MX. (p1 WITH name = ‘MovieName’)-/:get_directors*/-> Proceedings of the Fifth International Joint Conference (p2 WITH name = ‘DirectorNames’) on Autonomous Agents and Multiagent Systems (New York, NY, USA, 2006), 915–922. [6] Lal, M. 2015. Neo4j Graph Data Modeling. Packt The proposed property graph model representing the service model Publishing Ltd. can be easily implemented in a graph database (e.g., Neo4j). By using different libraries developed for connecting with graph [7] Malak, M. and East, R. 2016. Spark GraphX in action. databases, a developer can create Web services input and output Manning Publications Co. parameters as node instances (classes), store them in a graph [8] Martin, D., Paolucci, M., McIlraith, S., Burnstein, M., database instance, and include them in a specific Web service class McDermott, D., McGuinness, D., Parsia, B., Payne, T.R., definition. The PGQL queries can then be executed on the database Sabou, M., Solanki, M. and others 2004. Bringing instance to find the necessary web service and other information. semantics to web services: The OWL-S approach. (2004). Therefore, the property graph model represents a new approach, [9] OWL-S: Semantic Markup for Web Services: which combined with the PGQL query language could be used for https://www.w3.org/Submission/OWL-S/. semantic Web services discovery. At this moment, the model can [10] PGQL 1.0 Specification: 2017. http://pgql- be used to represent a simplified OWL-S service model without lang.org/spec/1.0/. Accessed: 2017-09-15. including QoS parameters mentioned in the previous sections. [11] Rodriguez, A. 2008. Restful web services: The basics. Online article in IBM DeveloperWorks Technical Library. 7. CONCLUSION AND FUTURE WORK November (2008), 1–11. In this paper, the characteristics of semantic Web services have [12] Sachan, D., Dixit, S.K. and Kumar, S. 2014. QoS aware been discussed with a special focus on SWS discovery approaches. formalized model for semantic Web service selection. Based on the OWL-S ontology language and its graph International Journal of Web & Semantic Technology. 5, representation of semantic Web service model, a new approach has 4 (2014), 83. been proposed. The approach includes using property graphs to model semantic Web services, and discovering the required [13] Srinivasan, N., Paolucci, M. and Sycara, K. 2006. services by executing PGQL queries on that property graph. The Semantic web service discovery in the OWL-S IDE. System Sciences, 2006. HICSS’06. Proceedings of the 39th built property graph can be implemented in graph databases to build a graph database of existing Web services and used for SWS Annual Hawaii International Conference on (2006), 109b- composition process. Future work includes extending the model by -109b. adding Web services methods (operations), and by including and [14] Varga, L.Z. and Sztaki, Á.H. 2005. Semantic Web verifying different QoS parameters against the proposed model. Services Description Based on Web Services Description. W3C Workshop on Frameworks for Semantics in Web Services (2005). 8. REFERENCES [15] Web Services Architecture: 2004. [1] Angles, R. A Foundations of Modern Graph Query https://www.w3.org/TR/ws-arch/. Languages. [16] Web Services Architecture Requirements: 2002. [2] Bitar, I. El, Belouadha, F.-Z. and Roudies, O. 2014. https://www.w3.org/TR/2002/WD-wsa-reqs-20021011. Semantic web service discovery approaches: overview [17] Word Wide Web Consortium 2004. Web Services and limitations. arXiv preprint arXiv:1409.3021. (2014). Architecture. 26 Statecharts representation of program execution flow Nataša Sukur Gordana Rakić Zoran Budimac Faculty of Sciences Faculty of Sciences Faculty of Sciences University of Novi Sad University of Novi Sad University of Novi Sad Trg Dositeja Obradovića 3 Trg Dositeja Obradovića 3 Trg Dositeja Obradovića 3 21000 Novi Sad, Serbia 21000 Novi Sad, Serbia 21000 Novi Sad, Serbia nts@dmi.uns.ac.rs goca@dmi.uns.ac.rs zjb@dmi.uns.ac.rs ABSTRACT 1. INTRODUCTION Source code and software in general is prone to errors. This Formal methods are getting more and more attention in the is due to bad design decisions, lack of experience of devel- world of software. In the beginning, using formal repre- opers and constant need to change the existing software. sentation was more usual for hardware than for software. Because the code changes rapidly and in strict time limita- Software is more complex in terms of state components and tions, it is not always possible to preserve the quality and the process to produce abstract models is more difficult.[1, reliability of the source code. It is very important to detect 3] Formal verification of software systems is seldom auto- errors in time and to fix or remove them in an automated or mated. Usually, an abstract model is manually created in semi-automated manner. Automation of error detection and order to perform formal verification of a large system. This removal can be accomplished by various tools or platforms. requires investing a considerable amount of time and ex- The platform which is used for software quality analysis in pertise. However, because the model was manually made, this case is a static analysis oriented, language independent, analysis of this model cannot be considered reliable. Sys- SSQSA (Set of Software Quality Static Analyzers), which tems are also usually large and change quickly. This means is based on an universal intermediate source code represen- that it is very complex to continuously create new and up- tation, eCST (enriched Concrete Syntax Tree). The tree date existing abstract models, as well as expensive, prone to structure is useful for code representation and comprehen- errors and difficult to optimize. For all those reasons, an au- sion, however its structure is not immediately suitable for tomated solution to abstract model generation is needed.[2, representing program control flow. That is why the control 5] Also, there has been a lot of foundational work on defin- flow graphs were introduced to SSQSA and were represented ing safe abstractions, but research for model reduction has later visually in the form of higher level automata, stat- not been explored enough.[3] echarts. Apart from introducing formal representation of control flow graphs, statecharts introduced additional func- There are many approaches to software analysis and soft- tionalities to SSQSA. Some of them are hierarchical control ware quality measurement. Algorithms that are implemented flow graph representation, possibility of simulation in one for this purpose usually operate on some internal represen- or more parallel work flows, also planned to be expanded to tations of this code, such as trees, graphs or some meta interprocedural level, and performing various kinds of esti- models.[5] SSQSA platform [9] uses its own structure, called mation. enriched Concrete Syntax Tree and derived graph-based rep- resentations. Some of the algorithms for software quality Categories and Subject Descriptors measurement in SSQSA use this intermediate structure to perform their calculations and analysis, such as software D.2.4 [Software Engineering]: Software/Program Verifi- metrics analysis, timing analysis and code clone detection. cation—Formal methods Enriched Concrete Syntax Tree is the key of language in- General Terms dependence of SSQSA. It is a syntax tree which contains Measurement, Languages all concrete tokens from the original source code, includ- ing comments, and enriched by a predefined set of universal Keywords nodes. These nodes are created in order to generalize dif- software quality, static analysis, formal methods, control ferent structures of various programming languages whose flow graphs, statecharts purpose is the same. The set of universal nodes is mini- mal, and the nodes were carefully selected, so that they are applicable to the structure of all languages supported by SSQSA. They are inserted as imaginary nodes in this tree together with the original source code tokens. For example, if we represented a semicolon and a comma with a univer- sal node SEPARATOR, we would also have the data that precisely represents the SEPARATOR universal node. The whole source code is represented like this and it is possible to completely reconstruct the original source code from eCST. 27 Sometimes it is not necessary to have a structure which con- activity diagram. It has support for 43 languages, such as C, tains the amount of data such as eCST and it is not optimal COBOL, Fortran, Java, JavaScript, Pascal, PHP and Ruby. for all kinds of analysis. That is why it was necessary to It has also support for simulation, but it has a problem when introduce Control Flow Graphs [6] to this platform. They it is presented with large pieces of code - it becomes hard to were derived from eCST by extracting only the nodes which analyze and has no hierarchical representation. were of importance for control flow analysis. Graphviz (Graph Vizualization Software)2 is open source Although this was an upgrade to SSQSA features and some and it is used for graph visualization. Graphviz has wide use additional algorithms were implemented on this structure, in networking, bioinformatics, software engineering, database there was still a problem in the case of larger pieces of code, design, machine learning... This tool performs graph draw- where the resulting graphs were quite complex [1]. Repre- ing, based on specifications in DOT3 language. The down- senting control flow graphs in a hierarchical manner seemed side is that the graphs have to be first specified in DOT like a good solution for reduction of complexity. That is one language in order to be represented by Graphviz. It is also of the reasons why statecharts were introduced to SSQSA. not able to simulate control flow behavior. Apart from solving the complexity issue, they give us the possibility of simulation, parallel execution and estimation. Apart from these, there are also other tools that have some similar features, such as MOOSE4 and RefactorErl5. How- The rest of the paper is organized as follows: Section 2 re- ever, some of them are not oriented towards formal repre- flects on some of the related papers and tools. Section 3 de- sentation and some cannot simulate the represented code. scribes the introduction of Control Flow Graphs to SSQSA. Section 4 explains the meaning and purpose of statecharts 3. CONTROL FLOW GRAPHS IN SSQSA in SSQSA. In Section 5, an example of a statechart is shown The control flow graph can be represented as a set and described in more detail. In Section 6, future work is presented and finally, we conclude our paper in Section 7. G = (N, A, S, E) (1) 2. RELATED WORK where N are the nodes of the graph, A is a binary relation N x N, which represents the graph transitions, S and E are start There are many tools and papers that deal with visual rep- and end nodes of the graph [6]. The purpose of control flow resentations and simulation of the source code. However, graphs is to track all possible paths of program execution most of them lack some features. Usually, they focus on for important reasons, such as to detect dead code or infinite some specific programming language. Some of them are able loops. In order to generate them, it was necessary to extract to represent the control flow or the structure of programs in them from eCST. The subset of universal nodes was selected a visual way, but most of them do not have the possibility of and it was enough to represent the program flow accurately. simulation and testing. If there are testable representations, then another problem emerges: the state explosion problem Some nodes of importance for the control flow representa- [1]. Also, many of them are not formal representations and tion are statement nodes, such as assignment statements not all of them are platform independent. and function calls. Apart from them, nodes which were also included in the graph are branch statements, branches and There have already been some attempts at creating formal loop statements, as well as their corresponding condition representations of source code. In papers, such as [2] it is ex- nodes. Some pieces of information were included because of plained how the authors extracted finite state models from their importance for generating statecharts. Statecharts are Java source code. Their approach also addresses the state highly structured and it is important for us to preserve that explosion problem and tries to solve it. However, this solu- structure by saving information about nodes such as com- tion is only applicable to Java code. pilation unit, block scope and function declaration in the control flow graph. Work by [7] deals with interprocedural graphs. The inten- tion is to check if fixing errors in code actually eliminate The control flow graph that was first created in SSQSA fo- them and whether fixing errors means introducing some new cused only on one function or procedure. The starting node ones. The whole approach is based on static analysis and was the entry point of this function/procedure. The rest of it generates the graph based on all existing versions of pro- the control flow graph was created by extracting previously gram and tries to discover and fix faults and propagate that defined nodes of interest from eCST and connecting them in to newer versions. a way that they represent control flow of the original source code. This graph is directed and it has cycles, which exist A PhD thesis [10] performs static analysis of programs, based due to the nature of source code. If the language indepen- on their dependence graphs. The idea is that a single state- dent condition evaluation is successfully implemented (the ment can affect some other statements and parts of code. number of repetitions of some cycles is calculated), it will This approach is formal and language independent. It fo- be possible to perform calculations, such as worst case ex- cuses on sequential, imperative programs. ecution time estimation. Currently, we are trying to create interprocedural graphs [8], described in detail in Section 6. 2.1 Related tools 2 Visustin v7 Flow chart generator1 is a visualization tool that http://www.gnu.org/softwarhttp://www.graphviz.org/ 3 can represent code in the form of program flow diagram and http://www.graphviz.org/doc/info/lang.html 4The MOOSE book, http://www.themoosebook.org/book 1http://www.aivosto.com/visustin.html 5http://plc.inf.elte.hu/erlang/ 28 4. STATECHARTS as complex states. For example, these are compilation units Statecharts are defined as a visual formalism for complex which can consist of elements such as functions, or func- systems [4]. They were later included in UML diagrams. tion declarations which can contain statements. Some state- Another name for this diagram is Harel’s automaton6. The ments are also represented as complex states because they main benefit of using statecharts reflects in the ability of contain other statements, such as branch statements (and representing parallel states, tracking history within complex their branches) and loop statements. These complex states states and tracking values of variables throughout the flow. can contain other simple or complex states. Universal nodes They are highly expressive - they are able to show a very de- that were represented as simple states can only trigger some tailed preview of the system which is to be created, but they events or manipulate the variable values. Statecharts also can also be very compact due to their hierarchical nature and have some additional states related to entering and exiting show only the system on higher levels of abstraction. the whole statechart or its complex parts. Statecharts take care of modelling hierarchy, concurrency For the purpose of testing statecharts, Yakindu SCT7 was and communication and that makes them important for used. Parts of this tool are open source. Some of the limita- tracking complex real-time systems. Reactive systems are tions are directly related to this tool, such as lack of complex event driven and they constantly have to react on various data types. For now, only integer, real, boolean, string and kinds of internal and external events. It was very difficult void are supported. These data types are enough for the to represent them in a way that was realistic, but also for- proof of concept phase, but it is necessary to additionally mal and precise. Statecharts are a solution to that problem, implement these features or replace this tool with one that since they make the process of specifying and designing these can also represent nontrivial pieces of code. complex systems a lot easier and more natural. All possi- ble behavior of reactive systems was easily represented by a 5. EXAMPLE set of allowed in and out events, conditions and actions and In Figure 1, we present a statechart generated based on a some time limitations. piece of code written in programming language Modula-2. It represents a part of an algorithm which counts factoriel Dynamic behavior of a complex system is easily represented of a given number. The purpose of this figure is to show how by using states and events. The system is always in at least the part of code which is a loop is represented in a simplified one state and if some event occurs, it transitions to another way. In Figure 2, we present how this loop statement looks state under some conditions. A transition can occur from like when it is expanded and what is really happening inside one state to another, which can all happen inside a com- this complex state. plex state. Transitions can also be recursive if they have the same state as origin and target. Some transitions can The same algorithm was implemented in Java programming cause to exit or enter a complex state. They can also trig- language. The resulting statecharts are identical to the ones ger events, which affect the simulation. If standard finite in Figure 1 and 2. A more detailed preview and comparison automata were used for this purpose, they would be very of different algorithms can be found in [11]. difficult to understand due to very large complexity because of the generated number of states. Statecharts use hierarchy, they provide us modularity and good structure and make it 6. FUTURE WORK easy to represent independent parallel execution.[4] Although statecharts are currently focused on representing the control flow of one function or procedure, the idea is to 4.1 The importance of statecharts for SSQSA make an interprocedural representation, first on the level of a compilation unit and then to expand it to represent the Statecharts introduced graphical representation of Control Flow Graphs to SSQSA and created a more compact version 7https://www.itemis.com/en/yakindu/state-machine/ of them. Also, because statecharts are a formal representa- tion, the system is represented in a non-ambiguous way and it becomes a very trivial problem to detect program paths which are possible and the ones which are not. This com- ponent is useful for testing parts of code in early stages of development. It has some features of a debugger - it can show us if parts of code show odd behavior, why the con- trol flow unexpectedly changes, and it includes much more visualization in representing what is currently happening. 4.2 Implementing statecharts in SSQSA Statecharts consist of two kinds of states, complex (which can also be orthogonal, with parallel regions) and simple ones. Based on the nature of the universal nodes that were used (if they represented complex or simple program struc- tures), they were transformed into suitable kind of states. Universal nodes that stand for complex program constructs Figure 1: Statechart based on sample code in (i.e. that contain other program elements) were represented Modula-2, which shows how factoriel of a number 6After David Harel, the creator of statecharts is calculated. The loop statement is collapsed. 29 ment and running the code. Statecharts could also be used to introduce new people to various projects. One will be able to view the system on different levels of abstraction and take a step by step approach in getting familiar with it. A statechart is more dynamic than a simple diagram which represents project structure. Statecharts in SSQSA would mean the ability of viewing systems, simulating them, and not having to worry if parts of this system are written in different languages. It is also important to note that statecharts are not intro- duced to SSQSA only to visualize and simulate the system. They are important for predicting different outcomes if some parts of code are executed and to evaluate qualities, such as correctness and reliability.[4] 8. REFERENCES [1] E. M. Clarke, W. Klieber, M. Nováček, and P. Zuliani. Model checking and the state explosion problem. In Tools for Practical Software Verification, pages 1–30. Springer, 2012. [2] J. C. Corbett, M. B. Dwyer, J. Hatcliff, S. Laubach, C. S. Pasareanu, H. Zheng, et al. Bandera: Extracting Figure 2: Statechart based on sample code in finite-state models from java source code. In Software Modula-2, which shows the expanded loop state- Engineering, 2000. Proceedings of the 2000 ment in factoriel calculation. International Conference on, pages 439–448. IEEE, 2000. complete software. This will be done using the graph depen- [3] M. B. Dwyer, J. Hatcliff, R. Joehanes, S. Laubach, dency networks. Once the control flow graphs for all pro- C. S. P˘ as˘ areanu, H. Zheng, and W. Visser. cedures and functions are constructed, function call nodes Tool-supported program abstraction for finite-state will be detected and these graphs will be connected into one, verification. In Proceedings of the 23rd international which represents the whole system. By implementing this, it conference on software engineering, pages 177–187. will be possible to improve statecharts, also in an interpro- IEEE Computer Society, 2001. cedural way. So far, statecharts were tested mostly on one [4] D. Harel. Statecharts: A visual formalism for complex object-oriented language (Java) and one procedural Pascal- systems. Science of computer programming, like language (Modula-2). Therefore, there is also room for 8(3):231–274, 1987. improvement in terms of testing how statecharts are gener- [5] G. J. Holzmann and M. H Smith. Software model ated based on source code written in other languages. checking: Extracting verification models from source code. Software Testing, Verification and Reliability, Another idea is that, if we succeeded in refining statecharts 11(2):65–79, 2001. to the lowest level and if we introduced the environment [6] J. Laski and W. Stanley. Software verification and variable, we would greatly improve simulation of the source analysis: An integrated, hands-on approach. Springer code in the evaluator. That would mean having the most Science & Business Media, 2009. realistic representation of how whole or a part of source code [7] W. Le and S. D. Pattison. Patch verification via would execute in reality. multiversion interprocedural control flow graphs. In Proceedings of the 36th International Conference on 7. CONCLUSIONS Software Engineering, pages 1047–1058. ACM, 2014. [8] F. Nielson and H. R. Nielson. Interprocedural control The eCST provides us with complete information about the flow analysis. In ESOP, volume 99, pages 20–39. source code. Therefore, possible limitations will not be re- Springer, 1999. lated to lack of information about the source code. The [9] G. Rakić. Extendable and Adaptable Framework for true challenge will be to represent everything that is impor- Input Language Independent Static Analysis, Novi Sad tant in a manner suitable to the nature of statecharts. Our Faculty of Sciences, University of Novi Sad. PhD approach has proven feasible so far, but that will be put un- thesis, doctoral dissertation, 2015. der further inspection once more complex pieces of code are introduced. [10] J. A. Stafford and A. L. Wolf. A formal, language-independent, and compositional approach to Once we have a component that is possible to visualize and interprocedural control dependence analysis. PhD simulate the complete code under analysis and test existing thesis, University of Colorado, 2000. systems or the ones that are still under development, detect- [11] N. Sukur. Reprezentacija toka izvrsavanja programa ing obvious flaws in source code design and execution will dijagramom stanja, nezavisna od ulaznog jezika. be trivial. The tool will be able to simulate the execution Master’s thesis, Faculty of Sciences, University of Novi of the code without the need for setting up the environ- Sad, Serbia, 9 2016. in Serbian. 30 Code smell detection: A tool comparison Tina Beranič Zlatko Rednjak Marjan Heričko Faculty of Electrical Engineering and Faculty of Electrical Engineering and Faculty of Electrical Engineering and Computer Science, Computer Science, Computer Science, University of Maribor University of Maribor University of Maribor Maribor, Slovenia Maribor, Slovenia Maribor, Slovenia tina.beranic@um.si zlatko.rednjak@gmail.com marjan.hericko@um.si ABSTRACT between detected code smells and the value of calculated Technical debt can be identified with different techniques, technical debt. including code smell detection. Furthermore, different approaches The article is organized as follows. First, the theoretical are available to detect code smells and some of those approaches background about technical debt and code smell is presented. are implemented among different tools. In this article, different Section 4 presents the case study we carried out while section 5 tools for code smell detection were selected with the goal of and 6 contain a discussion of obtained results. The article is comparing their outputs. The fact is that compared tools detect concluded with section 7. different code smells with varying degrees of success and that the intersection of detected code smells within tools is very small. 2. TECHNICAL DEBT Because of this, the connection between detected code smells and the value of technical debt is hard to define. The results are The technical debt was first named in 1992 by Ward Cunningham supported with empirical analysis from 32 software projects, as “not-quite-right code” [1]. Through the years, technical debt different code smells and 3 code smell detection tools. has won more recognition and its basic meaning has been upgraded. Since it represents a metaphor that can be subjectively understood [2], [3], a single generally accepted definition of Categories and Subject Descriptors technical debt still cannot be found. D.2.8 [Software Engineering]: Metrics Nevertheless, authors describe technical debt as a set of decisions taken within a project. Those decisions usually bring short-term D.2.9 [Software Engineering]: Management-Software quality success, but in the future they can cause problems, which are assurance (SQA) resolved with more effort than would be needed in the beginning General Terms [4]–[9]. Measurement Through the time, different types of technical debt were introduced. For example design debt, architecture debt, Keywords documentation debt, test debt, code debt and environment debt [5]–[7], [10], [11]. Soon, based on research and needs, some Code smell, technical debt, intersection of detected code smells other types of technical debt appeared in the literature. For example, build debt, defect debt, requirement debt, test 1. INTRODUCTION automation debt, service debt, versioning debt, people debt, Since there is a growing need for rapid changes, some decisions process debt and usability debt [5]–[7]. have to be accepted quickly. Those decisions, especially the inadequate ones, affect the whole software development life cycle 2.1 Technical debt identification and can have a significant impact on code quality. We have to be One of the technical debt identification methods is source code especially careful and pay attention to those program entities that analysis, where techniques can be divided between a static and contain irregularities or deficiencies. dynamic analysis [8], [12]. Each one focuses on a specific field in a source code and does not provide for the detection of a variety The technical debt helps evaluate resulting problems and one of in code smell. In the literature, the most represented ones are the the most recognized techniques for identifying technical debt is following techniques: the detection of code smells. There are different types of code smells, divided into different groups aimed at achieving a better  Modularity violations [12], [13]. understanding. A lot of tools can be found in the literature that  Design patterns and grime buildup [12], [13]. help detect code smells, but each of them detect code smells with  Code smells [8], [12]–[14]. the help of different techniques.  Automatic static analysis [8], [12], [13]. The goal of our research was to look into available code smell detection tools and, using selected ones, compare the detected  Source code comments [15]. code smells while also analyzing the intersection of the results. Since code smell is a technique of identifying technical debt, the 3. CODE SMELL second part of the research was aimed at finding a connection One of the techniques for identifying technical debt is code smell detection. It refers to indicators within source code that point to 31 deeper problems in a software product [16]. Code smell means 4.1 Selected tools and code smells writing code in a way that violates the principles of best Based on criteria, two tools for code smell detection were selected programing practices. The removal of code smells is usually done and corresponding code smell types. The information is presented with source code refactoring, where the need for refactoring in Table 1. Both selected tools are Eclipse extensions. To be able increases the likelihood of the existence of code smells in to answer the second question, we have to prepare the SonarQube software [17]. tool, which detects code smells based on predefined rules. Among Tufano et al. [18] presents an analysis of occurrence of code 255 rules that indicate the existence of code smells, 12 that follow smells in software products. Usually code smell emerges when Fowler definition [28], were selected. Because SonarQube does adding new functionalities and changing existing ones. This not enable code smell classification in different groups, this step interesting finding was also that by refactoring existing source was done by hand. The rules that follow before mentioned code new types of code smells can enter. definitions was combined in a profile and classified into groups. 3.1 Code smell groups and types Table 1. Selected tools and code smells Many different types of code smells are defined in the literature. Tool Code smell Tool version For a better understanding of different types of code smells, God class, Feature envy, Brain several groups were defined, each containing different code smell JSpIRIT 4.3.2 method, Brain class types [19]: (1) Bloaters represent something in the code that is so God class, Feature envy, Long big it cannot be effectively handled; (2) Object-Orientation JDeodorant 5.0.64 Method Abusers contain examples where the possibilities of OO design is God class, Feature envy, Brain not fully exploited; (3) Change Preventers contain code smells SonarQube method, Long method, Brain 5.6.4 LTS that refers to code structures that considerably hinder software class modification; (4) Dispensables present parts that are unnecessary and should be removed from a source code; (5) Encapsulators 4.2 Analysis of empirical data joins code smells connected to data communication mechanisms In the first step, selected projects were analyzed using the selected or encapsulations; (6) Couplers express the code, which is tightly tools. The aim was to detect and count different code smell coupled; (7) Other code smells. occurrence within different projects. Also, the distribution of activated rules in SonarQube was done. With this, we gain an 3.2 Code smell detection insight into the appropriateness of the mapping rules for forming Code smell detection can be done with the help of software different groups in SonarQube and different types of code smells metrics. Different authors connected selected software metrics in JSpIRIT and JDeodorant. The secondary appropriateness of with detection of specific types of code smells [20], [21]. Based mapping rules was done with the intersection of detected code on the correlation between software metrics and code smell types, smells among the tools. tools that identify some of these types have been developed [16], All gathered data present a starting point for finding correlations [22]–[24]. When identifying code smells with a software metric it between code smell and technical debt and comparison of is important to use reliable software metric threshold values. If different tools for code smell detection. thresholds are not set properly, a variety of false positive values can be detected [25]. Based on this, some other code smell detection techniques have been developed, for example, a 5. COMPARISON OF DETECTED CODE technique based on the combination of machine learning SMELLS algorithms, which achieved more than 96% accuracy when Used tools and considered code smells within the case study are detecting different code smells [26]. presented in Table 1. 4. CASE STUDY As part of the case study, two research questions were formed:  Are different open source tools for detecting code smells providing different results?  What is the connection between detected code smells in selected tools and the value of technical debt calculated in the SonarQube tool? To answer these questions, different software projects were analyzed to gather empirical data. We analyzed the projects gathered in “Qualitas Corpus.” But since we needed the projects compiled to byte code to make an analysis, we chose a compiled version of the Qualitas Corpus [27]. So we can provide analysis without major changes in source code. In the end, based on Graph 1 – Identification of God class and Brain class code criteria, 32 software projects were analyzed. Also, the criteria was smells within tools set to select appropriate code smell types and tools for code smell In this analysis, we combine results for God class and Brain class detection to be used within a case study. The criteria were (Graph 1), Brain method and Long Method (Graph 2), while with inspired by data gathered in a preliminary literature review. Feature envy (Graph 3) code smell was analyzed independently. As can be seen in Graph 1 and Graph 2, the SonarQube detected more God class/Brain class and Long method/Brain method code 32 smells than the tools JSpIRIT and JDeodorand did together. Since these 12 rules are used for code smell detection, it can be However, SonarQube has a problem with detecting Feature envy seen how much they contribute to the overall technical debt of a code smell, since it was not able to detect it within software project. But the problem lies in the low intersection between tools project analysis (Graph 3). For this purpose we look again at the when detecting code smells. An even more detailed analysis about rules in SonarQube, but it cannot be stated that one of those rules activated rules within selected code smells does not bring any can reliablly detect Feature envy code smell. The rule that we clearer results. select is, in our opinion, the one that has the highest probability of detecting the mentioned code smell. On the other hand, when 7. CONCLUSION detecting Feature envy (Graph 3), JSpIRIT prevails, which The case study was made to compare detected code smells among detected 500 more occurrence than JDeodorand did. The latter has three different tools: JSpIRIT, JDeodorand and SonarQube. Since the tendency of detecting God class and Brain method. Overall, the first two define code smells which are not part of SonarQube, the most code smells are detected in SonarQube. the rules within the SonarQube were mapped to a variety of groups, representing selected code smells. Graph 2 - Identification of Long method and Brain method Figure 1 – Intersection between tools for code smell detection code smells within tools The detected code smells by tools were compared within three identified categories, and the intersection of detected code smells was presented. The intersection between the used tool is very small (Figure 1). This can be attributed to the use of different detection techniques in different tools. JDeodorand proved to be the best at detecting Long method code smell, JSpIRIT at detecting Feature envy code smell and SonarQube at detecting God class/Brain class and Long method/Brain method code smells. The second part of the case study was aimed at finding a connection between detected code smell and technical debt in SonarQube. The intersection of detected code smells between different tools and SonarQube is very small. We can also add the fact that the classified rules are not activated proportionally. Since Graph 3 - Identification of Feature envy code smells within rules do not contribute equality to a technical debt calculation, the tools impact of code smell detection to technical debt cannot be Based on the results, we identified the intersection of detected defined. code smells among the tools, presented in Figure 1. This was done There are many research opportunities in this area. The rules for by analyzing a project that was not selected for analysis, but is code smell detection among tools could be compared in details for still a part of Qualitas Corpus. When identifying God class/Brain the purpose of unification. In addition, the future work can be class code smell, the intersection between all the tools is 2.7% and oriented in an attempt to provide the generally accepted definition when detecting Long method/Brain method it is 6.08%. To of technical debt. At last, selected tools, JSpIRIT and JDeodorand identify the causes for low intersection, we again look into the could be upgraded for technical debt calculation. SonarQube tools. We looked into rules that are activated when detecting code smell common to other two tools. 8. ACKNOWLEDGMENTS 6. THE CONNECTION BETWEEN CODE The authors acknowledge the financial support from the Slovenian Research Agency under The Young Researchers SMELLS AND TECHNICAL DEBT Programme (SICRIS/SRA code 35512, RO 0796, Programme P2- For establishing a correlation between code smell and technical 0057). debt, the data about time contributed by each of the 12 used rules in SonarQube was acquired. The technical debt, when all 255 9. REFERENCES rules were used, was 5,460 days. When we activate just 12 [1] W. Cunningham, “The WyCash Portfolio Management selected rules, the technical debt was 1,982 days, which is 27% of System,” SIGPLAN OOPS Mess. , vol. 4, no. 2, pp. 29– all technical debt. 30, Dec. 1992. 33 [2] P. Kruchten, R. L. Nord, and I. Ozkaya, “Technical Debt: Managing Technical Debt (MTD). pp. 9–15, 2015. From Metaphor to Theory and Practice,” IEEE Software, [16] E. Fernandes, J. Oliveira, G. Vale, T. Paiva, and E. vol. 29, no. 6. pp. 18–21, 2012. Figueiredo, “A Review-based Comparative Study of Bad [3] N. A. Ernst, S. Bellomo, I. Ozkaya, R. L. Nord, and I. Smell Detection Tools,” in Proceedings of the 20th Gorton, “Measure It? Manage It? Ignore It? Software International Conference on Evaluation and Assessment Practitioners and Technical Debt,” in Proceedings of the in Software Engineering, 2016, pp. 18:1–18:12. 2015 10th Joint Meeting on Foundations of Software [17] F. A. Fontana, M. Mangiacavalli, D. Pochiero, and M. Engineering, 2015, pp. 50–60. Zanoni, “On Experimenting Refactoring Tools to [4] J. Yli-Huumo, A. Maglyas, and K. Smolander, “How do Remove Code Smells,” in Scientific Workshop software development teams manage technical debt? – Proceedings of the XP2015, 2015, pp. 7:1–7:8. An empirical study,” J. Syst. Softw. , vol. 120, no. [18] M. Tufano, F. Palomba, G. Bavota, R. Oliveto, M. Di Supplement C, pp. 195–218, 2016. Penta, A. De Lucia, and D. Poshyvanyk, “When and [5] Z. Li, P. Avgeriou, and P. Liang, “A systematic mapping Why Your Code Starts to Smell Bad,” in Proceedings of study on technical debt and its management,” J. Syst. the 37th International Conference on Software Softw. , vol. 101, no. Supplement C, pp. 193–220, 2015. Engineering - Volume 1, 2015, pp. 403–414. [6] N. S. R. Alves, T. S. Mendes, M. G. de Mendonça, R. O. [19] M. Mantyla, J. Vanhanen, and C. Lassenius, “A Spínola, F. Shull, and C. Seaman, “Identification and taxonomy and an initial empirical study of bad smells in management of technical debt: A systematic mapping code,” International Conference on Software study,” Inf. Softw. Technol. , vol. 70, pp. 100–121, Feb. Maintenance, 2003. ICSM 2003. Proceedings. pp. 381– 2016. 384, 2003. [7] N. S. R. Alves, L. F. Ribeiro, V. Caires, T. S. Mendes, [20] F. A. Fontana, V. Ferme, A. Marino, B. Walter, and P. and R. O. Spínola, “Towards an Ontology of Terms on Martenka, “Investigating the Impact of Code Smells on Technical Debt,” 2014 Sixth International Workshop on System’s Quality: An Empirical Study on Systems of Managing Technical Debt. pp. 1–7, 2014. Different Application Domains,” 2013 IEEE [8] N. Zazworka, R. O. Sp’\inola, A. Vetro’, F. Shull, and C. International Conference on Software Maintenance. pp. Seaman, “A Case Study on Effectively Identifying 260–269, 2013. Technical Debt,” in Proceedings of the 17th [21] F. A. Fontana and S. Spinelli, “Impact of Refactoring on International Conference on Evaluation and Assessment Quality Code Evaluation,” in Proceedings of the 4th in Software Engineering, 2013, pp. 42–47. Workshop on Refactoring Tools, 2011, pp. 37–40. [9] M. Fowler, “TechnicalDebt,” 2003. [Online]. Available: [22] A. Hamid, M. Ilyas, M. Hummayun, and A. Nawaz, “A https://martinfowler.com/bliki/TechnicalDebt.html. Comparative Study on Code Smell Detection Tools,” Int. [Accessed: 14-Sep-2017]. J. Adv. Sci. Technol. , vol. 60, pp. 25–32, 2013. [10] E. Tom, A. Aurum, and R. Vidgen, “An exploration of [23] A. Chatzigeorgiou and A. Manakos, “Investigating the technical debt,” J. Syst. Softw. , vol. 86, no. 6, pp. 1498– Evolution of Bad Smells in Object-Oriented Code,” 2010 1516, 2013. Seventh International Conference on the Quality of [11] C. Fernández-Sánchez, J. Garbajosa, C. Vidal, and A. Information and Communications Technology. pp. 106– Yagüe, “An Analysis of Techniques and Methods for 115, 2010. Technical Debt Management: A Reflection from the [24] F. A. Fontana, P. Braione, and M. Zanoni, “Automatic Architecture Perspective,” 2015 IEEE/ACM 2nd detection of bad smells in code: An experimental International Workshop on Software Architecture and assessment.,” J. Object Technol. , vol. 11, no. 2, p. 5: 1– Metrics. pp. 22–28, 2015. 38, 2012. [12] N. Zazworka, A. Vetro’, C. Izurieta, S. Wong, Y. Cai, C. [25] F. A. Fontana, V. Ferme, M. Zanoni, and A. Yamashita, Seaman, and F. Shull, “Comparing Four Approaches for “Automatic Metric Thresholds Derivation for Code Technical Debt Identification,” Softw. Qual. J. , vol. 22, Smell Detection,” 2015 IEEE/ACM 6th International no. 3, pp. 403–426, Sep. 2014. Workshop on Emerging Trends in Software Metrics. pp. [13] C. Izurieta, A. Vetrò, N. Zazworka, Y. Cai, C. Seaman, 44–53, 2015. and F. Shull, “Organizing the Technical Debt [26] F. Arcelli Fontana, M. V Mäntylä, M. Zanoni, and A. Landscape,” in Proceedings of the Third International Marino, “Comparing and experimenting machine Workshop on Managing Technical Debt, 2012, pp. 23– learning techniques for code smell detection,” Empir. 26. Softw. Eng. , vol. 21, no. 3, pp. 1143–1191, 2016. [14] N. Zazworka, M. A. Shaw, F. Shull, and C. Seaman, [27] R. Terra, L. F. Miranda, M. T. Valente, and R. S. “Investigating the Impact of Design Debt on Software Bigonha, “Qualitas.class Corpus: A compiled version of Quality,” in Proceedings of the 2Nd Workshop on the Qualitas Corpus,” Softw. Eng. Notes, vol. 38, pp. 1– Managing Technical Debt, 2011, pp. 17–23. 4, 2013. [15] E. d. S. Maldonado and E. Shihab, “Detecting and [28] M. Fowler and K. Beck, Refactoring: Improving the quantifying different types of self-admitted technical Design of Existing Code. Addison-Wesley, 1999. Debt,” 2015 IEEE 7th International Workshop on 34 35 36 37 38 Skills, Competences and Platforms for a Data Scientist Vili Podgorelec Sašo Karakatič Faculty of Electrical Engineering and Computer Science, Faculty of Electrical Engineering and Computer Science, University of Maribor University of Maribor Maribor, Slovenia Maribor, Slovenia vili.podgorelec@um.si saso.karakatic@um.si ABSTRACT (alongside with general computer science, ICT and similar) In this paper, we identify the core competences and skills of a competences, skills and subject domain classifications, have Data Scientist, where we build on already existing research about emerged. These frameworks can be, with some alignment, built the already practicing Data Scientists and on existing frameworks. upon and re-used for better acceptance from research and We complement this research with the practitioners’ survey about industrial communities. One of the most elaborate is EDISON popular Data Science platforms and our own research on the Data Science Framework (ESDF), developed within the scope of search term trends and job posting trends. European project “Edison – building the data science profession” [2]. The ESDF provides a collection of documents that define the Data Science profession, which have been developed to guide Categories and Subject Descriptors educators and trainers, employers and managers, and Data H.4 [Information Systems Applications]: Miscellaneous Scientists themselves, collectively breaking down the complexity of the skills and competences need to define Data Science as a I.2.m [Artificial Intelligence]: Miscellaneous professional practice. General Terms The ESDF itself, however, builds on existing standard and Data Science, Framework, Platforms. commonly accepted frameworks, as is the Big Data Interoperability Framework, published by the NIST Big Data Working in September 2015 [3]. It provides various definitions, Keywords among them for Data Science, Data Scientist and Data Life Cycle, Data science, data scientist, skills, competences, platforms. which can be used as a starting point for further analysis. “Data Science is the extraction of actionable knowledge 1. INTRODUCTION directly from data through a process of discovery, or hypothesis In the last few years, Data Science has become one of the most formulation and hypothesis testing. Data Science can be rapidly growing interdisciplinary fields, where it combines understood as the activities happening in the processing layer of different aspects of computer engineering, mathematics, and other the system architecture, against data stored in the data layer, in managerial skills. The employer and employee review website order to extract knowledge from the raw data. Glassdoor even rates Data Scientist as the number one best job in America in 2017 (regarding the job satisfaction, number of a job Data Science across the entire data life cycle incorporates openings and median base salary) [1]. principles, techniques, and methods from many disciplines and domains including data cleansing, data management, analytics, The precise skill set of a Data Scientist is not so well defined yet, visualization, engineering, and in the context of Big Data, now as it gets mixed up with other job roles, such as Data Analyst, also includes Big Data Engineering. Data Science applications Machine Learning Engineer, Statistician, Data Engineer, Business implement data transformation processes from the data life cycle Analyst, Data Architect and others. The differences between these in the context of Big Data Engineering.“ [3] titles are not always clear and are used interchangeably, especially among people outside the domain of Data Science. The purpose of “A Data Scientist is a practitioner who has sufficient this paper is not to make a clear differential line between these knowledge in the overlapping regimes of business needs, domain different job roles, but to define what core skills of a Data knowledge, analytical skills, and software and systems Scientist are. This could help any employers to identify if the Data engineering to manage the end-to-end data processes in the data Scientist is the one they need in their organization. Also, the clear life cycle. list of definitions, skill sets and most common platforms used by Data Scientists and Data Science teams solve complex data Data Scientists could be used by people striving to become a Data problems by employing deep expertise in one or more of these Scientist and work on each skill of the broad spectrum of disciplines, in the context of business strategy, and under the competences and skills needed and expected by a Data Scientist. guidance of domain knowledge. Personal skills in communication, presentation, and inquisitiveness are also very important given the 2. DATA SCIENCE COMPETENCES AND complexity of interactions within Big Data systems.“ [3] SKILLS FRAMEWORKS The main focus of a data scientist is thus to discover meaningful With the growing demand for staff with knowledge and skills of patterns in data and synthesize useful knowledge by performing Data Science, several more or less commonly accepted all the necessary steps throughout the whole data life cycle – the frameworks that have been used for defining Data Science 39 collection of raw data, (pre-)processing of data and transforming it several reports and scientific papers which provide research into useful information, performing data analysis via various data results of what skills a data scientist should have. analytics algorithms and tools, interpreting and evaluating the discovered patterns in order to produce useful knowledge, and validating the induced knowledge models to produce value. Analytics is used to refer to the methods, their implementations in tools, and the results of the use of the tools as interpreted by the practitioner [4]. The analytics process is the synthesis of knowledge from information. (a) Data Science competence groups for general or research oriented profiles Figure 1. Data Science definition by NIST BD-WG [3]. In order to cover the competence, required from a Data Scientist, a good knowledge of data analytics is needed (the two most important fields of analytics are statistics and machine learning), a good understanding of engineering (programming, software engineering and data management in order to provide analytical applications), as well as a fair amount of domain expertise. Figure 1 provides graphical presentation of the multi-factor/multi-domain Data Science definition. (b) Data Science competence groups for business oriented profiles 2.1 General/research vs. business profile As the Data Science covers a lot of topics, many different Figure 2. Relations between the Data Science competence competences and skills are required from a data scientist. In this groups for (a) general or research oriented and (b) business manner, data scientists tend to focus on some specialization oriented professions/profiles [4]. within the whole data science scope. In general, two major In [5] the authors present the findings compiled from 50 different profiles can be identified – a general, research oriented profile, reports of research in articles, journals, and books, and conducted and a business oriented profile (see Figure 2). For both profiles a via experts’ views using the Delphi technique, regarding data fair amount of analytics and engineering knowledge as well as the scientist skills required by the industry. They provided a list of 41 domain expertise is required. Besides that, the research oriented data scientists’ skills and categorize them into five major profile concentrates primarily on the use of scientific methods – categories adapted from [6] – computer science, analytics, data formulation of test hypothesis, experiment design, data collection management, decision management, and entrepreneurship: and analysis, pattern discovery and explanation of discovered knowledge. On the other hand, the business oriented profile • Computer Science includes programming, where R and focuses on business process management – monitoring the Python are predominant programming languages, as well as important data and designing, modelling, optimizing, and privacy, security and systems architecture. executing the data-driven business processes. • Analytics focuses primarily on statistics and machine learning, and includes natural language processing, probability, 3. DATA SCIENTISTS’ SKILL SETS simulation. The existing standard and commonly accepted frameworks for • Data management covers all data handling skills and defining Data Science competences are very good aligned with puts emphasis on databases, data modelling and visualization, data mining, business intelligence and general data processing. 40 • Decision management focuses on decision making, Next, the poll results also show the transitions from one to while encompassing communication and ethics. another platform for Analytics, Data Science, and Machine • Finally, Entrepreneurship includes business and learning (see Figure 4). The chart in Figure 4 clearly shows the economics. following. Python users are more loyal than R users, as 91% of readers stuck with Python from 2016 to 2017, and only 74% of On the other hand, the EDSF also categorizes all the skills readers stuck with R from 2016 to 2017. Also, only 60% of required for a data scientist into five major categories, namely readers that use other platform and languages stuck to those from analytics, engineering, data management, research methods and 2016 to 2017. project management, and business analytics [4]: • Analytics focuses on the use of machine learning, data mining and text mining techniques, the application of predictive and prescriptive analytics, the use of statistics, operations research, optimization and simulations, and the assessment, evaluation and validation of results. • Engineering includes the use of ICT systems and software engineering, cloud computing and big data technologies, databases, data security, privacy and intellectual property rights protection, as well as algorithms design. • Data management put emphasis on specifying, developing and implementing enterprise data management and data governance strategy and infrastructure and includes data storage systems, data modeling and design, data lifecycle support, data quality, integration, and digital libraries and open data. Figure 3. Share of R, Python, both R and Python, or other • Research methods and project management platforms usage for Analytics, Data Science or Machine encompasses the use of research methods principles in developing Learning for 2016 and 2017 [8]. data driven applications and implementing the whole cycle of data handling, development and implementation of data collection processes, and consistent application of project management workflow. • Business analytics focuses on the use business intelligence, business process management, econometrics for data analysis and applications, user experience design, data warehouses for data integration, and data driven marketing. 4. PRACTITIONER PLATFORM SURVEY AND TRENDS After defining what the required skills for a Data Scientist are, in this section we look at what the current state of the skills is among active practitioners of Data Science. So far, no thorough analysis of all skills of Data Scientists was done, but there is a good survey about the frameworks they use in their line of work. Figure 4. The transition between different programming languages for Data Science, Analytics and Machine Learning In August of 2017 KDnuggets, one of the most popular websites from 2016 to 2017 [8]. about data science based on independent ranking [7], had a poll for their readers [8]. The poll asked the following question: “Did As the chart shows, only 5% of Python users switched to R you use R, Python (along with their packages), both, or other exclusively, while 10% of R users switched to using Python tools for Analytics, Data Science, Machine Learning work in 2016 exclusively. There is a clear flow of R users (15%) that switched and 2017?”. The poll was completed by 954 people and it showed to using both, R and Python, but users of both platforms in 2016 the following results. had a major switch to using Python exclusively (38%). There was only 4% of switch by Python users to using both platforms, and The results of the poll clearly indicate that there is a shift from R only 11% of readers that used both platforms in 2016 that programming language to Python programming language in switched to using only R. There is also a clear flow of users that respect of Data Science, Analytics and Machine Learning (see are using either R or Python for Analytics, Data Science and Figure 3). The usage of R programming language fell by 6 Machine Learning from other platforms - 17% to using only R, percentage points, while usage of Python rose from 34% to 41% ( 19% to using only Python and 4% to using both, R and Python. the increase of 7 percentage points) of readers that finished the poll. The poll indicates that use of both, R and Python for Data KDnuggets performed a similar poll in 2015 [9], which served as Science, Analytics and Machine learning also rose from 8.5% to a basis for trend recognition of platform usage. Figure 5 shows 12% ( the increase of 3.5 percentage points), which can be these trends of using different programming languages/platforms contributed to practitioners slowly switching from R to Python for Analytics, Data Science, and Machine Learning. The Figure 5 but still using R for some specific part of work. clearly shows that the use of other platforms (not R or Python) is dropping and it will probably continue to drop in the following years. The usage of R peaked in 2015 but had a somehow sharp 41 drop in 2017, while the usage of Python programming language is posting site Indeed.com. Figure 7 shows two trends for the same steadily rising and should continue to grow if this trend continues. search terms as before (“Python Data Science” and “R Data Science”) for last five years. Even in the job posting aspect, the Python platform has a clear advantage in comparison to R. 5. CONCLUSION In this paper, we present the definition of a Data Scientist and some frameworks of its required skill set and competences. We presented existing research in the field of identifying the core skills and competences and survey the current state of needed and popular skills among practicing Data Scientists. We may conclude that a Data Scientist requires a diverse set of skills and has to adapt to new platforms as their popularities change throughout the time. It is yet to be seen how these skills and popular frameworks used in the work of a Data Scientist will change in the future, but for now we can conclude that skills of analytics, engineering, data management, research methods, project management and business Figure 5. Platform usage for Analytics, Data Science and analytics using Python and R platforms present a core of skills Machine Learning from 2014 to 2017 [8]. every Data Scientist needs. We made a quick glance at the popularity of R and Python platforms for Data Science, ourselves. Figure 6 shows the Google 6. ACKNOWLEDGMENTS Trend chart, where it shows search term popularity on the The authors acknowledge the financial support from the timeline. We compared two search terms: “Python Data Science” Slovenian Research Agency (research core funding No. P2-0057). (blue trend line), and “R Data Science” (red trend line). 7. REFERENCES [1] ––, 50 Best Jobs in America, Glassdoor [online] https://www.glassdoor.com/List/Best-Jobs-in-America- LST_KQ0,20.htm Accessed on 2017-09-07 [2] EDISON Data Science Framework (EDSF), http://edison- project.eu/edison/edison-data-science-framework-edsf Accessed on 2017-09-07 [3] NIST SP 1500-1 NIST Big Data interoperability Framework (NBDIF): Volume 1: Definitions, September 2015. Figure 6. Google Trends search term popularity for last five http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.S years for terms “Python Data Science” for blue trend line and P.1500-1.pdf Accessed on 2017-09-07 “R Data Science” for red trend line (September 9th, 2017). [4] EDISON Data Science Framework: EDSF Part 1: Data Science Competences Framework (CF-DS) Release 2, July 2017. http://edison-project.eu/data-science-competence- framework-cf-ds Accessed on 2017-09-07 [5] Abidin, W.Z., Ismail, N.A., Maarop, N., Alias, R.A.: Skills Sets Towards Becoming Effective Data Scientists, In: Proceedings of the 12th International Conference, KMO 2017, Beijing, China, August 2017, Communications in Computer and Information Science, vol. 731, Springer, 2017. [6] Stadelmann, T., Stockinger, K., Braschler, M., Cieliebak, M., Baudinot, G., Ruckstuhl, G. Applied data science in Europe – challenges for academia in keeping up with a highly demanded topic. European Computer Science Summit (2013) [7] ––, Top 75 Data Science Blogs And Websites For Data Figure 7. Job posting trends on Indeed.com for last five years Scientists. http://blog.feedspot.com/data_science_blogs/ for terms “Python Data Science” for blue trend line and “R Accessed on 2017-09-08 Data Science” for orange trend line (September 9th, 2017). [8] Piatetsky, G. Python overtakes R, becomes the leader in Data As chart shows, there was the almost even popularity of both Science, Machine Learning platforms. KDnuggets, 2017. search terms until the end of 2016, there was just a slight lead of http://www.kdnuggets.com/2017/08/python-overtakes-r- R in the year 2015. In the beginning of 2016, the Python search leader-analytics-data-science.html Accessed on 2017-09-08 term took over the lead and its search term popularity gained more [9] Piatetsky, G. R, Python users show surprising stability, but and more lead as the time progressed. Although popularity for strong regional differences. KDnuggets, 2015. both terms increased, the search term for Python platform has a http://www.kdnuggets.com/2015/07/poll-primary-analytics- clear advantage. After that, we also did a trend analysis on the job language-r-python.html Accessed on 2017-09-08 42 Towards a Classification of Educational Tools Kristjan Košič Alen Rajšp Jernej Huber Faculty of Electrical Engineering Faculty of Electrical Engineering Faculty of Electrical Engineering and Computer Science, and Computer Science, and Computer Science, University of Maribor University of Maribor University of Maribor Maribor, Slovenia Maribor, Slovenia Maribor, Slovenia kristjan.kosic@um.si alen.rajsp@um.si jernej.huber@um.si ABSTRACT continue with statistical analysis of the survey of the usage of ICT As part of Didakt.UM project, which aims to exchange of tools within the UM in section 3.2. experience and create a platform that would enable the efficient Lastly, we provide a sneak-peak into a project deliverable in the search and selection of suitable ICT solutions, used for educational form of a two-level classification table, which offers a more purposes within the University of Maribor, an analysis and detailed overview of the resulting classification. classification of such ICT solutions were made. Out of 82 entries, 63 tools were classified into the broad classification, which intends 2. RELATED WORK to cover the widest range of ICT solutions, used by the students and At the highest level, technological infrastructure can be divided into staff of University of Maribor. hardware and software resources [3]. Hardware refers to the mechanical, visible and tangible part of the information technology, Categories and Subject Descriptors while software presents a set of instructions prepared to obtain an K.3 [Computers and education]: General; H.4 [Information adequate final result [4]. With the rapid development of systems applications]: General; K.6 [Management of computing smartphones and embedded systems, the hardware-dependent and information systems]: General software has been recently gaining unprecedented use within a wide range of domains such as medicine, telecommunications, General Terms automotive industry and others [5]. Management, Documentation, Performance, Economics, Human A global classification of educational tool was not found. Various Factors, Standardization, Legal Aspects. sub-classifications, related to specific use-cases or niche domains were included in the analysis. In general, software is usually Keywords divided into application and system solutions [4]. The latter offer an infrastructure environment for running application software and Software classification, Software taxonomy, Educational software, include operating systems, drivers, system utilities, and servers. Software usage, Learning stack The category of application solutions can include software that enables information management, education, business 1. INTRODUCTION infrastructure, simulation, media processing, software development The preparation of an exhaustive list of ICT solutions presents a and solutions that contain the concepts of gamification [6], [7]. complex challenge due to the extremely high number and variety Multiple taxonomies for classifying software already exist, one of of solutions and their respective domains of use. The need to place most known being the ACM Computing Classification System, these solutions within the different levels of the classification which was lastly revised in 2012 [8]. Main categories from the makes the challenge even more complex [1]. ACM taxonomy are: General and reference; Hardware, Computer systems organization; Networks; Software and its engineering; The purpose and objective of our research was two folded. Firstly, Theory of computation; Mathematics of computing; Information we provided an analysis of the existing situation regarding the systems; Security and privacy; Human-centered computing; usage of ICT solutions, used for educational purposes by the Computing methodologies; Applied computing; Social and students and staff of University of Maribor (from hereinafter professional topics; Proper nouns: People, technologies and referred to as UM). Secondly, the classification of aforementioned companies. The purpose of ACM taxonomy is to provide a ICT solutions was prepared to give a foundation for establishing categorization of technology-related topics. From application the learning stack [2], which represents a collection of applications, domain standpoint it provides relatively poor coverage for some cloud services, content repositories and data sources that can be application types, such as information display and consumer- accessed through a content platform. Such platform would enable oriented software [9]. On the other hand, open-source community, the pedagogical staff to search, comment and rate the suitable ICT with sites as SourceForge and Google Code, provide a good tools within the repository and to exchange the didactic experience overview of most types of software, developed by such and good practices within the UM environment. communities. Google’s approach for defining application domains The document consists of the following sections. We provide a avoids the hierarchical structure and relies on tagging [9]. short overview of related work that covers different approaches in Additionally, many authors have developed their own more or less classifying the ICT resources in section 2. up-to-date taxonomies, which divide software into categories based on the purpose of use (e.g. data-dominant, systems, control Secondly, we present our classification proposal with a mind map dominant and computation-dominant software, the categories that of the classification with the first level categories in section 3.1. We are furtherly divided into domains of use) [9] or directly by the domain of use [10]. 43 3. CLASSIFICATION PROPOSAL Adobe Photoshop, Photofiltre Studio X, Audacity, Windows Movie Maker, HandBrake, MKVTToolNix, Subtitle Edit, Hot Potatoes, 3.1 Classification attributes Google Docs, Sheets, Slides, Forms, Poll Maker, Skype, The We aimed to avoid classifying software only by purpose and Jupyter Notebook, matplotlib, WinMIPS64, XAMPP, Usb Web domain of use and strived to provide more comprehensive and Server, UwAmp, WampServer, SonarQube, Java Web Start, holistic approach for classifying software in education sector. Thus, ERPSim, Vox Armes, BIM server, Xerte, Oracle database server, we included a multitude of other factors (attributes). Examples of Adonis CE, Pantheon X, Bizagi Business Process Modeler, such factors include the type of the usage [11] (e.g. web, mobile Microsoft Visio, Microsoft Dynamics Nav, Microsoft Project, Aris and desktop), usage domain [12] (e.g. general-purpose or specific, Architect & Designer, Aris Express, JDeveloper, Eclipse, SQL such as Mathematics or Medicine), group work support (on a scale Developer, SQL Developer Data Modeler, Greenfoot, Tableau, of a team, community, organization) [13], time aspect of Orange, SAP Lumira, SAP-ERP, Oracle VM VirtualBox, VMware collaboration [14] (synchronous and asynchronous) and the Workstation Player, Linux Ubuntu, Kali Linux, SPSS, AnyLogic, purpose of the use, which was further divided to functionalities. Turning Point, Kahoot, Padlet, Anatomy 4D, Virtual Patient This was done to cover the widest possible range of application MedU, ThinkDesign Suite. software, which will, of course, continue to expand in the future with the development of the ICT field. Regarding the purpose of The column chart in Figure 2 shows the number of solutions use, we placed an emphasis on solutions in the field of education, according to the domains of the usage. It is important to stress that where we divided the purpose of using such tools into three one tool can belong to more than one domain. Majority of tools sections: learning content management [15], knowledge testing and represented the computer science and informatics domain (28), evaluation [15] and learning analytics [16]. while 19 tools were general-purpose tools (such as Skype, Google Docs) that can be used within any domain. Among the other purposes of use, that are furtherly divided into more specific functionalities, are polling capabilities, group work support (collaboration, communication and coordination), media processing, statistical data processing, data storage, software development, software deployment, enterprise resource planning, modeling, project management, virtualization and simulation. The initial part of the classification is a general description of tools, with data regarding the manufacturer, type of license, price of tools, solution provider within the UM and support/service level, examples of usage both in general and within the UM and a corresponding contact person. General description is followed by positioning the ICT solutions according to the Klasius P [17] classification. The following diagram shows the classification of ICT solutions used for educational purposes within the UM. For reasons of greater transparency, we only show the first level nodes of classification. Figure 2. Number of solutions by usage domain. The pie chart in Figure 3 shows the ratio between the open source and proprietary solutions. Most of the documented equipment (39 out of 63) were proprietary. Figure 1. Top level attributes of the proposed classification. 3.2 Statistical analysis of ICT tools usage Based on data from the survey on the usage of ICT solutions within the UM, we identified 82 entries, of which 19 entries were defective, with missing data regarding the type of solution, Figure 3. Ratio between open-source and proprietary tools manufacturer etc. Altogether, we classified the following 63 ICT solutions, namely: Moodle, Geogebra, Sony Virtuoso in Soloist, Expression Studio, CyberLink PowerDirector, Articulate, iSpring, Hype, Sibelius, 44 important to stress that each of the tool can have more than type of a client. The column chart in Figure 7 shows the number of solutions according to the types of ICT solutions as proposed in our classification. Most tools meet the following types of ICT solutions: information management (30), software development (17), and education (16). Figure 4. Number of solutions by the purpose of usage. The column diagram in Figure 4 shows the number of tools that support at least one functionality from the categories, which describe purpose of use. Most often, the tool was intended for modeling (18); cooperation, communication and coordination (16); multimedia management (16), software development (12) and learning content management (10). Figure 7. Number of solutions by the ICT type. 3.3 Proposed two-level classification The Table 1 presents the more detailed introspection into our classification proposal. Within this article, we limited the number of attribute hierarchy level into two levels. The actual classification was divided into the three-level hierarchy of classification attributes, hence being even more comprehensive. Table 1. Classification of used solutions (to the second level) 1st level of 2nd level of classification classification General Name of the solution; Description; information Manufacturer; Manufacturer's URL; License type; Price; Provider; Provider's Figure 5. Ratio of collaboration-supported tools. URL; Support/service level; Minimum The pie chart in Figure 5 shows the proportion of solutions in terms system requirements; General use case; of collaboration support. The 19 tools from the survey allow groups UM use case; UM contact person to work together, while the remaining 44 solutions do not have such Faculty usage 1 - Teacher training and education science; support. (Klasius P) 2 - Humanities and arts; 3 - Social sciences, business and law; 4 - Science, Mathematics and Computer Science; 5 - Engineering, manufacturing and construction; 6 - Agriculture, Forestry, Fisheries, Veterinary; 7 - Health and welfare; 8 - Services Use case General-purpose; Specific domain Type of ICT System software; Application software solutions Figure 6. Number of solutions by the type of the usage. Type of the Web; Mobile; Desktop The column chart in Figure 6 shows the number of solutions based usage on the type of client, with 32 tools permitting online use within the Channel of Video; Sound; Text browser. The 14 tools can be accessed with mobile smartphones communication and 51 tools are developed as desktop applications. Again, it is 45 Type / format of Video material; Graphical material; Sound; [5] W. Ecker, W. Müller, and R. Dömer, “Hardware- the content Text; Spreadsheet; Presentation; Any file dependent Software,” in Hardware-dependent Software, Dordrecht: Springer Netherlands, 2009, pp. 1–13. Group work Among the members of the organization; support Among the members of the community; [6] A. Saito, K. Umemoto, and M. Ikeda, “Journal of Among the team members Knowledge Management A strategy-based ontology of knowledge management technologies,” Journal of Knowledge Management Journal of Knowledge The time aspect Asynchronous; Synchronous Management Journal of Knowledge Management, vol. of collaboration 11, no. 6, pp. 97–114, 2007. Cooperation Student; Teacher; Domain expert; [7] TechTarget, “What is software?,” 2017. [Online]. between the Administrator Available: roles within UM http://searchmicroservices.techtarget.com/definition/soft The purpose of Learning content management; Knowledge ware. [Accessed: 31-Aug-2017]. usage testing and evaluation; Polling; Learning [8] ACM, “The 2012 ACM Computing Classification analytics; Cooperation, communication and System,” 2012. [Online]. Available: coordination; Multimedia management; https://www.acm.org/publications/class-2012. [Accessed: Statistical data analysis; Data storage; 05-Sep-2017]. Software Development; Software Deployment; Enterprise resource planning; [9] A. Forward and T. C. Lethbridge, “A taxonomy of Modeling; Project management; software types to facilitate search and evidence-based Virtualization; Simulation software engineering,” in Proceedings of the 2008 conference of the center for advanced studies on The result of our in-depth analysis was a report in which we collaborative research meeting of minds - CASCON ’08, provided the three-level classification of ICT solutions, a brief 2008, p. 179. description of each attribute of the classification and the actual placement of 63 identified ICT solutions within our proposed [10] R. L. Glass and I. Vessey, “Contemporary application- classification. domain taxonomies,” IEEE Software, vol. 12, no. 4, pp. 63–76, Jul. 1995. 4. CONCLUSION DOI=https://doi.org/10.1109/52.391837. Classification of educational tools is a broad topic that still has a lot [11] SD Times, “Web, desktop, mobile: What’s the of room for improvement and research. In the future, we suggest difference?,” 2017. [Online]. Available: additional classification and categorization of tools combined with http://sdtimes.com/web-desktop-mobile-whats-the- pedagogical learning approaches related to the specific needs of the difference/. [Accessed: 31-Aug-2017]. instructor. Moreover, the framework could be expanded with [12] eduCBA, “What is application software & its types,” pedagogical classifications and requirements related to 2017. [Online]. Available: regional/local pedagogical classifications (i.e. related to the https://www.educba.com/what-is-application-software- specific country). its-types/. [Accessed: 31-Aug-2017]. 5. ACKNOWLEDGEMENTS [13] H. Fuks et al. , “The 3C Collaboration Model,” in This research was carried out within the project Didakt.UM, which Encyclopedia of E-Collaboration, IGI Global, pp. 637– is financed by the Slovenian Ministry of Education, Science and 644. Sport and European Union from European Social Fund. [14] TechTarget, “Synchronous vs. asynchronous communication: The differences,” 2017. [Online]. 6. REFERENCES Available: [1] A. Saito, K. Umemoto, and M. Ikeda, “A strategy‐based http://searchmicroservices.techtarget.com/tip/Synchronou ontology of knowledge management technologies,” s-vs-asynchronous-communication-The-differences. Journal of Knowledge Management, vol. 11, no. 1, pp. [Accessed: 31-Aug-2017]. 97–114, Feb. 2007. [15] S. R. Malikowski, M. E. Thompson, and J. G. Theis, “A DOI=https://doi.org/10.1108/13673270710728268. Model for Research into Course Management Systems: [2] Jan-Martin Lowendahl, “Hype Cycle for Education,” Bridging Technology and Learning Theory,” Journal of Gartner, 2016. [Online]. Available: Educational Computing Research, vol. 36, no. 2, pp. https://www.gartner.com/doc/3364119/hype-cycle- 149–173, Mar. 2007. DOI=https://doi.org/10.2190/1002- education-. [Accessed: 07-Sep-2017]. 1T50-27G2-H3V7. [3] M. Afshari, K. A. Bakar, W. S. Luan, B. A. Samah, and [16] R. Scapin, “Learning Analytics in Education: Using F. S. Fooi, “Factors Affecting Teachers’ Use of Student’s Big Data to Improve Teaching,” 2015. Information and Communication Technology,” [17] Statistični urad Republike Slovenije, “Klasius-P,” 2017. International Journal of Instruction, pp. 77–104, 2009. [Online]. Available: [4] I. Masic et al. , “Information Technologies (ITs) in http://www.stat.si/Klasius/Default.aspx?id=5. [Accessed: Medical Education,” Acta Informatica Medica, vol. 19, 05-Sep-2017]. no. 3, p. 161, 2011. DOI=https://doi.org/10.5455/aim.2011.19.161-167. 46 Indeks avtorjev / Author index Beranič Tina ................................................................................................................................................................................. 31 Budimac Zordan ........................................................................................................................................................................... 27 Catal Çağatay ................................................................................................................................................................................. 7 Heričko Marjan ...................................................................................................................................................................... 19, 31 Heričko Tjaša ............................................................................................................................................................................... 35 Huber Jernej ........................................................................................................................................................................... 15, 43 Kamišalić Aida ............................................................................................................................................................................. 11 Karakatič Sašo .............................................................................................................................................................................. 39 Košič Kristjan ............................................................................................................................................................................... 43 Montoya Edwin ............................................................................................................................................................................ 11 Muratli Can .................................................................................................................................................................................... 7 Pavlinek Miha .............................................................................................................................................................................. 19 Podgorelec Vili ............................................................................................................................................................................. 39 Polančič Gregor ............................................................................................................................................................................ 15 Pušnik Maja .................................................................................................................................................................................. 19 Rajšp Alen .................................................................................................................................................................................... 43 Rakić Gordana .............................................................................................................................................................................. 27 Rednjak Zlatko ............................................................................................................................................................................. 31 Šestak Martina .............................................................................................................................................................................. 23 Sukur Nataša ................................................................................................................................................................................ 27 Tabares S. Marta .................................................................................................................................................................... 11, 15 Torres Camilo ............................................................................................................................................................................... 11 47 48 Konferenca / Conference Uredil / Edited by Sodelovanje, programska oprema in storitve v informacijski družbi / Collaboration, Software and Services in Information Society Marjan Heričko Document Outline A - Naslovnica-SPREDNJA - G B - Naslovnica - notranja - G C- Kolofon - G D-E - IS2017 - skupni zacetni del Blank Page F - Kazalo - G G - Naslovnica podkonference - G H - Predgovor - G I - Programski odbor - G J - PDF - G CSS 2017-01 Catal CSS 2017-02 Tores CSS 2017-03 Polancic CSS 2017-04 Pavlinek CSS 2017-05 Sestak CSS 2017-06 Sukur CSS 2017-07 Beranic CSS 2017-08 Hericko CSS 2017-09 Podgorelec CSS 2017-10 Kosic K - Index - G L - Naslovnica-ZADNJA - G Blank Page Blank Page Blank Page Blank Page Blank Page Blank Page