Zbornik 25. mednarodne multikonference INFORMACIJSKA DRUZBA Zvezek A Proceedings of the 25th International Multiconference INFORMATION SOCIETY Volume A Slovenska konferenca o umetni inteligenci Slovenian Conference on Artificial Intelligence Uredniki  Editors: Mitja Lustrek, Matjaz Gams, Rok Piltaver Ljubljana, Slovenija 11. oktober 11 October Ljubljana, Slovenia httpis.ijs.si Zbornik 25. mednarodne multikonference INFORMACIJSKA DRUŽBA – IS 2022 Zvezek A Proceedings of the 25th International Multiconference INFORMATION SOCIETY – IS 2022 Volume A Slovenska konferenca o umetni inteligenci Slovenian Conference on Artificial Intelligence Uredniki / Editors Mitja Luštrek, Matjaž Gams, Rok Piltaver http://is.ijs.si 11. oktober 2022 / 11 October 2022 Ljubljana, Slovenija Uredniki: Mitja Luštrek Odsek za inteligentne sisteme Institut »Jožef Stefan«, Ljubljana Matjaž Gams Odsek za inteligentne sisteme Institut »Jožef Stefan«, Ljubljana Rok Piltaver Outfit7 in Odsek za inteligentne sisteme, Institut »Jožef Stefan«, Ljubljana Založnik: Institut »Jožef Stefan«, Ljubljana Priprava zbornika: Mitja Lasič, Vesna Lasič, Lana Zemljak Oblikovanje naslovnice: Vesna Lasič Dostop do e-publikacije: http://library.ijs.si/Stacks/Proceedings/InformationSociety Ljubljana, oktober 2022 Informacijska družba ISSN 2630-371X Kataložni zapis o publikaciji (CIP) pripravili v Narodni in univerzitetni knjižnici v Ljubljani COBISS.SI-ID 127433219 ISBN 978-961-264-241-9 (PDF) PREDGOVOR MULTIKONFERENCI INFORMACIJSKA DRUŽBA 2022 Petindvajseta multikonferenca Informacijska družba je preživela probleme zaradi korone. Zahvala za skoraj normalno delovanje konference gre predvsem tistim predsednikom konferenc, ki so kljub prvi pandemiji modernega sveta pogumno obdržali visok strokovni nivo. Pandemija v letih 2020 do danes skoraj v ničemer ni omejila neverjetne rasti IKTja, informacijske družbe, umetne inteligence in znanosti nasploh, ampak nasprotno – rast znanja, računalništva in umetne inteligence se nadaljuje z že kar običajno nesluteno hitrostjo. Po drugi strani se nadaljuje razpadanje družbenih vrednot ter tragična vojna v Ukrajini, ki lahko pljuskne v Evropo. Se pa zavedanje večine ljudi, da je potrebno podpreti stroko, krepi. Konec koncev je v 2022 v veljavo stopil not raziskovalni zakon, ki bo izboljšal razmere, predvsem leto za letom povečeval sredstva za znanost. Letos smo v multikonferenco povezali enajst odličnih neodvisnih konferenc, med njimi »Legende računalništva«, s katero postavljamo nov mehanizem promocije informacijske družbe. IS 2022 zajema okoli 200 predstavitev, povzetkov in referatov v okviru samostojnih konferenc in delavnic ter 400 obiskovalcev. Prireditev so spremljale okrogle mize in razprave ter posebni dogodki, kot je svečana podelitev nagrad. Izbrani prispevki bodo izšli tudi v posebni številki revije Informatica (http://www.informatica.si/), ki se ponaša s 46-letno tradicijo odlične znanstvene revije. Multikonferenco Informacijska družba 2022 sestavljajo naslednje samostojne konference: • Slovenska konferenca o umetni inteligenci • Izkopavanje znanja in podatkovna skladišča • Demografske in družinske analize • Kognitivna znanost • Kognitonika • Legende računalništva • Vseprisotne zdravstvene storitve in pametni senzorji • Mednarodna konferenca o prenosu tehnologij • Vzgoja in izobraževanje v informacijski družbi • Študentska konferenca o računalniškem raziskovanju • Matcos 2022 Soorganizatorji in podporniki konference so različne raziskovalne institucije in združenja, med njimi ACM Slovenija, SLAIS, DKZ in druga slovenska nacionalna akademija, Inženirska akademija Slovenije (IAS). V imenu organizatorjev konference se zahvaljujemo združenjem in institucijam, še posebej pa udeležencem za njihove dragocene prispevke in priložnost, da z nami delijo svoje izkušnje o informacijski družbi. Zahvaljujemo se tudi recenzentom za njihovo pomoč pri recenziranju. S podelitvijo nagrad, še posebej z nagrado Michie-Turing, se avtonomna stroka s področja opredeli do najbolj izstopajočih dosežkov. Nagrado Michie-Turing za izjemen življenjski prispevek k razvoju in promociji informacijske družbe je prejel prof. dr. Jadran Lenarčič. Priznanje za dosežek leta pripada ekipi NIJZ za portal zVEM. »Informacijsko limono« za najmanj primerno informacijsko potezo je prejela cenzura na socialnih omrežjih, »informacijsko jagodo« kot najboljšo potezo pa nova elektronska osebna izkaznica. Čestitke nagrajencem! Mojca Ciglarič, predsednik programskega odbora Matjaž Gams, predsednik organizacijskega odbora i FOREWORD - INFORMATION SOCIETY 2022 The 25th Information Society Multiconference (http://is.ijs.si) survived the COVID-19 problems. The multiconference survived due to the conference chairs who bravely decided to continue with their conferences despite the first pandemics in the modern era. The COVID-19 pandemic from 2020 till now did not decrease the growth of ICT, information society, artificial intelligence and science overall, quite on the contrary – the progress of computers, knowledge and artificial intelligence continued with the fascinating growth rate. However, the downfall of societal norms and progress seems to slowly but surely continue along with the tragical war in Ukraine. On the other hand, the awareness of the majority, that science and development are the only perspective for prosperous future, substantially grows. In 2020, a new law regulating Slovenian research was accepted promoting increase of funding year by year. The Multiconference is running parallel sessions with 200 presentations of scientific papers at twelve conferences, many round tables, workshops and award ceremonies, and 400 attendees. Among the conferences, “Legends of computing” introduce the “Hall of fame” concept for computer science and informatics. Selected papers will be published in the Informatica journal with its 46-years tradition of excellent research publishing. The Information Society 2022 Multiconference consists of the following conferences: • Slovenian Conference on Artificial Intelligence • Data Mining and Data Warehouses • Cognitive Science • Demographic and family analyses • Cognitonics • Legends of computing • Pervasive health and smart sensing • International technology transfer conference • Education in information society • Student computer science research conference 2022 • Matcos 2022 The multiconference is co-organized and supported by several major research institutions and societies, among them ACM Slovenia, i.e. the Slovenian chapter of the ACM, SLAIS, DKZ and the second national academy, the Slovenian Engineering Academy. In the name of the conference organizers, we thank all the societies and institutions, and particularly all the participants for their valuable contribution and their interest in this event, and the reviewers for their thorough reviews. The award for life-long outstanding contributions is presented in memory of Donald Michie and Alan Turing. The Michie-Turing award was given to Prof. Dr. Jadran Lenarčič for his life-long outstanding contribution to the development and promotion of information society in our country. In addition, the yearly recognition for current achievements was awarded to NIJZ for the zVEM platform. The information lemon goes to the censorship on social networks. The information strawberry as the best information service last year went to the electronic identity card. Congratulations! Mojca Ciglarič, Programme Committee Chair Matjaž Gams, Organizing Committee Chair ii KONFERENČNI ODBORI CONFERENCE COMMITTEES International Programme Committee Organizing Committee Vladimir Bajic, South Africa Matjaž Gams, chair Heiner Benking, Germany Mitja Luštrek Se Woo Cheon, South Korea Lana Zemljak Howie Firth, UK Vesna Koricki Olga Fomichova, Russia Mitja Lasič Vladimir Fomichov, Russia Blaž Mahnič Vesna Hljuz Dobric, Croatia Alfred Inselberg, Israel Jay Liebowitz, USA Huan Liu, Singapore Henz Martin, Germany Marcin Paprzycki, USA Claude Sammut, Australia Jiri Wiedermann, Czech Republic Xindong Wu, USA Yiming Ye, USA Ning Zhong, USA Wray Buntine, Australia Bezalel Gavish, USA Gal A. Kaminka, Israel Mike Bain, Australia Michela Milano, Italy Derong Liu, Chicago, USA Toby Walsh, Australia Sergio Campos-Cordobes, Spain Shabnam Farahmand, Finland Sergio Crovella, Italy Programme Committee Mojca Ciglarič, chair Nikola Guid Andrej Ule Bojan Orel, Marjan Heričko Boštjan Vilfan Franc Solina, Borka Jerman Blažič Džonova Baldomir Zajc Viljan Mahnič, Gorazd Kandus Blaž Zupan Cene Bavec, Urban Kordeš Boris Žemva Tomaž Kalin, Marjan Krisper Leon Žlajpah Jozsef Györkös, Andrej Kuščer Niko Zimic Tadej Bajd Jadran Lenarčič Rok Piltaver Jaroslav Berce Borut Likar Toma Strle Mojca Bernik Janez Malačič Tine Kolenik Marko Bohanec Olga Markič Franci Pivec Ivan Bratko Dunja Mladenič Uroš Rajkovič Andrej Brodnik Franc Novak Borut Batagelj Dušan Caf Vladislav Rajkovič Tomaž Ogrin Saša Divjak Grega Repovš Aleš Ude Tomaž Erjavec Ivan Rozman Bojan Blažica Bogdan Filipič Niko Schlamberger Matjaž Kljun Andrej Gams Stanko Strmčnik Robert Blatnik Matjaž Gams Jurij Šilc Erik Dovgan Mitja Luštrek Jurij Tasič Špela Stres Marko Grobelnik Denis Trček Anton Gradišek iii iv KAZALO / TABLE OF CONTENTS Slovenska konferenca o umetni inteligenci / Slovenian Conference on Artificial Intelligence .......................... 1 PREDGOVOR / FOREWORD ................................................................................................................................. 3 PROGRAMSKI ODBORI / PROGRAMME COMMITTEES ..................................................................................... 5 Initial Results in Predicting High-Level Features of Constrained Multi-Objective Optimization Problems / Andova Andrejaana, Vodopija Aljoša, Krömer Pavel, Uher Vojtěch, Tušar Tea, Filipič Bogdan .................................... 7 Learning the Probabilities in Probabilistic Context-Free Grammars for Arithmetical Expressions from Equation Corpora / Chaushevska Marija, Todorovski Ljupčo, Brence Jure, Džeroski Sašo ......................................... 11 Prediction of the Inflow in Macedonian Hydropower Plants, Using Machine Learning / Kizhevska Emilija, Gjoreski Hristijan, Luštrek Mitja ........................................................................................................................ 15 Naslov / Kolar Žiga, Erzar Blaž, Čelan Nika, Hrastič Aleksander, Leskovec Gašper, Konečnik Martin, Prestor Domen, Susič David, Skobir Matjaž, Gams Matjaž .......................................................................................... 19 Unified Question Answering in Slovene / Logar Katja, Robnik-Šikonja Marko .................................................... 23 Social Media Analysis for Assessing Resilience / Osojnik Aljaž, Ženko Bernard, Žnidaršič Martin .................... 27 Urban Mobility Policy Proposal Using Machine-Learning Techniques / Shulajkovska Miljana, Smerkol Maj, Gams Matjaž .................................................................................................................................................... 31 IMF Quality Assurance of Mammograms Using Deep Convolutional Neural Networks and Transfer Learning / Slapničar Gašper, Us Peter, Alukić Erna, Mekiš Nejc, Mlakar Miha, Žibert Janez .......................................... 35 Vehicle Axle Distance Detection From Time-series Signals Using Machine Learning / Susič David, Erzar Blaž, Čelan Nika, Leskovec Gašper, Kolar Žiga, Konečnik Martin, Prestor Domen, Skobir Matjaž, Gams Matjaž .. 39 Študija učinkovitosti algoritma za razporejanje terenskega dela / Tušar Tea, Sever Nace, Vodopija Aljoša, Filipič Bogdan ................................................................................................................................................... 43 Interaktivno eksperimentiranje z besednimi vložitvami v platformi ClowdFlows / Žnidaršič Martin, Pol ak Senja, Podpečan Vid ................................................................................................................................................... 47 Indeks avtorjev / Author index ................................................................................................................................ 51 v vi Zbornik 25. mednarodne multikonference INFORMACIJSKA DRUŽBA – IS 2022 Zvezek A Proceedings of the 25th International Multiconference INFORMATION SOCIETY – IS 2022 Volume A Slovenska konferenca o umetni inteligenci Slovenian Conference on Artificial Intelligence Uredniki / Editors Mitja Luštrek, Matjaž Gams, Rok Piltaver http://is.ijs.si 11. oktober 2022 / 11 October 2022 Ljubljana, Slovenija 1 2 PREDGOVOR Umetna inteligenca še vedno hitro napreduje, so pa glavni dosežki lanskega leta na področjih, kjer smo jih že vajeni. Avtonomna vozila so vedno bolj avtonomna in se že uporabljajo za prevoz potnikov, čeravno v zelo omejenem obsegu. Jezikovni modeli, kot so izboljšani GPT-3, postajajo zreli za praktično uporabo, zato se njihovi stvaritelji začenjajo ukvarjati s tem, kako jih odvračati od tvorbe politično nekorektnih besedil. Po eni strani razumljivo, po drugi strani pa – ob problemu omejevanja svobode govora na spletu, ki si je letos prislužil nominacijo za informacijsko limono – tudi nekoliko skrb zbujajoče. Modeli za generiranje slik iz opisov, katerih prvi vidnejši predstavnik je bil DALL-E, so se letos namnožili, in videli smo več poizkusov njihove uporabe za izdelavo stripov. Potlej pa so tu še aplikacije v robotiki, medicini, računalniški varnosti in seveda zvitemu streženju spletnih reklam. Ko umetna inteligenca postaja vedno zmožnejša in bolj razširjena, se pojavljalo pomisleki o njeni varnosti ter prizadevanja za uporabo, ki bo družbi v korist in ne v škodo. Ta škoda se začne z nepotrebnimi nakupi zaradi preveč zvitih reklam, ki nas spremljajo že dolgo in smo se z njimi sprijaznili, nadaljuje pa s še resnejšimi problemi, kot so denimo slabe medicinske in zaposlovalne odločitve. Zaradi tovrstnih problemov vse več držav sprejema zakonodajo o umetni inteligenci, ki bo raziskovalcem bržkone povzročila nekaj sivih las, a če bo dobra – in k temu skušajmo prispevati, kolikor lahko – bo tudi pomagala, da naše delo ne bo dobilo zloveščega pridiha. Vse več je tudi razmišljanja o splošni umetni inteligenci z zmožnostmi, ki presegajo človeške. Njen vpliv na človeško družbo utegne biti dramatičen. A če želimo zagotoviti, da bo dramatično dober, se bomo morali v prihodnjih letih resno lotiti raziskovalnega področja zagotavljanja, da kompleksni modeli umetne inteligence zares počno tisto, kar mislimo in želimo, da počno, ki je zaenkrat še precej v povojih. Za konec pa poglejmo, kako je letos z našo konferenco. 11 prispevkov, ki smo jih prejeli, sicer ne opisuje tako visokoletečega dela, kot ga obravnavata prejšnja dva odstavka, so pa vseeno kakovosti in morda začetek česa pomembnega. Število je zmerno in Institut Jožef Stefan še malo bolj prevladujoč, kot običajno, za kar do neke mere krivimo COVID-19 – ne ker bi nas še vedno hudo pestil, ampak ker sta dve konferenčno klavrni leti raziskovalce konferenčenja malo odvadili. A upajmo, da bo tudi to minilo. Prirejamo pa letos v okviru konference Data Science Meetup – dogodek z lepo tradicijo in dobro udeležbo, kjer imajo strokovnjaki iz industrije kratke predstavitve svojega dela. Na to smo ponosni, saj rešuje težavo pomanjkanja prispevkov iz industrije, ki smo se je dotaknili že v preteklih predgovorih. 3 FOREWORD Artificial intelligence is still making good progress, but the major achievements of the past year are in the areas where we have grown to expect them. Autonomous vehicles are increasingly autonomous and already being used to carry passengers, albeit in a very limited way. Language models, such as the improved GPT-3, are becoming ready for practical use. Because of that, their authors are starting to work on preventing them from generating politically incorrect texts. This is on one hand understandable, but on the other hand – considering the problem of censorship on the internet, which was nominated for the Information Lemon this year – somewhat concerning. Models that generate images from text descriptions, whose first prominent representative was DALL-E, are proliferating. We have seen several attempts of using them to generate comics. There are also applications in robotics, medicine, cybersecurity and of course cunning delivery of online ads. With artificial intelligence becoming ever more capable and pervasive, concerns about its safety and use for the benefit of the society rather them harm are increasingly raised. The harm starts with unnecessary consumption due to insidious advertising, but these is old news we have become accustomed to. However, there are potentially more serious problems, such as bad medical or employment decisions. Because of these, a number of countries are drafting legislation about artificial intelligence. This will surely be a headache for researchers, but if the legislation is good – and we should help make it such if we can – it will benefit the reputation of our work. Superhuman general artificial intelligence is also increasingly entering professional and public debate. Its impact on the humanity could be dramatic. To ensure it is dramatically good, we will have to tackle the very much open research problem of ensuring that complex artificial-intelligence models indeed do what we think and want them to do. Let us finally take a look at our conference. The 11 papers we received are not describing work as ambitious as that described in the previous paragraphs, but they are nevertheless good and perhaps the beginning of something important. The number is modest and Jožef Stefan Institute even more overrepresented than usual, which we partially blame on COVID-19. Not that it is still a major problem, but in the two years without truly good conference the researchers seem to have lost the habit of going to conferences to some degree. We hope that this, too, shall pass. On a brighter note, we are organizing Data Science Meetup as a part of our conference. This is an event with a longstanding tradition and good attendance in which experts from the industry give short talks on their work. We are quite proud of this achievement, since it addresses the lack of papers from the industry which we bemoaned in past forewords. 4 PROGRAMSKI ODBOR / PROGRAMME COMMITTEE Mitja Luštrek Matjaž Gams Rok Piltaver Cene Bavec Marko Bohanec Marko Bonač Ivan Bratko Bojan Cestnik Aleš Dobnikar Erik Dovgan Bogdan Filipič Borka Jerman Blažič Marjan Krisper Marjan Mernik Biljana Mileva Boshkoska Vladislav Rajkovič Niko Schlamberger Tomaž Seljak Peter Stanovnik Damjan Strnad Miha Štajdohar Vasja Vehovar Martin Žnidaršič 5 6 Initial Results in Predicting High-Level Features of Constrained Multi-Objective Optimization Problems Andrejaana Andova Pavel Krömer Tea Tušar Aljoša Vodopija Vojtěch Uher Bogdan Filipič Jožef Stefan Institute and Department of Computer Science Jožef Stefan Institute and Jožef Stefan International VSB - Technical University of Jožef Stefan International Postgraduate School Ostrava Postgraduate School Jamova cesta 39 17. listopadu 2172/15 Jamova cesta 39 Ljubljana, Slovenia Ostrava-Poruba, Czech Republic Ljubljana, Slovenia andrejaana.andova@ijs.si pavel.kromer@vsb.cz tea.tusar@ijs.si aljosa.vodopija@ijs.si vojtech.uher@vsb.cz bogdan.filipic@ijs.si ABSTRACT needs to be treated as constrained or unconstrained. Moreover, Trying numerous algorithms on an optimization problem that we Ma et al. [5] showed which constraint handling techniques are encounter for the first time in order to find the best-performing more successful in solving CMOPs, depending on the problem algorithm is time-consuming and impractical. To narrow down type. Similarly, the connectivity of the feasible region (or problem the number of algorithm choices, high-level features describing connectivity for short) defines the multimodality of the problem important problem characteristics can be related with algorithm violation landscape and, therefore, crucially affects the choice of performance. However, characterizing optimization problems for algorithms that can solve the problem efficiently [5]. this purpose is challenging, especially when they include multiple High-level features of a new problem can be predicted using objectives and constraints. In this work, we use machine learning automatically calculated low-level problem features. The most (ML) to automatically predict high-level features of constrained widely known low-level features in evolutionary optimization multi-objective optimization problems (CMOPs) from low-level, are the exploratory landscape analysis (ELA) features. They were exploratory landscape analysis features. The results obtained on initially introduced to characterize single-objective optimization the MW benchmark show a significant difference in classification problems and implemented in the flacco package [2]. More re-accuracy depending on the applied evaluation approach. The poor cently, Liefooghe et al. [4] proposed a set of ELA features for performance of the leave-one-problem-out strategy indicates multi-objective optimization problems, and Vodopija et al. [10] the need for further investigation of the relevance of low-level introduced additional ELA features for CMOPs. features in CMOP characterization. In this work, we use the ELA features from [4] and some from [10] to investigate whether they are useful for predicting problem KEYWORDS type and connectivity. To the best of our knowledge, this is the first attempt to predict the high-level features of CMOPs. A simi- constrained multi-objective optimization, exploratory landscape lar study was performed by Renau et al. [7] on single-objective analysis, sampling methods, problem characterization, machine optimization problems. They used ELA features to classify the op- learning timization problem. When splitting the data into training and test sets, instances from the same problem were used for both training 1 INTRODUCTION and testing. The first of our three experiments follows this setup. Predicting high-level features of constrained multi-objective op- However, because this evaluation methodology is not useful in timization problems (CMOPs) is important as it can help de- practice (the class of a new real-world problem is unknown), a cide which algorithm to use when faced with a new (real-world) second experiment is performed using the leave-one-problem- CMOP. The structure of the objective and constraint functions are out methodology. Finally, the third experiment varies the number usually unknown for such problems. Moreover, the evaluation of of target problem instances used for training to gain further in- problem solutions might be very time-consuming. In such cases, sight in the difficult task of predicting high-level features from it is beneficial to know certain high-level features of the CMOP, low-level ones. which eases the selection of an appropriate multi-objective op- The paper is further organized as follows. In Section 2, we in-timization algorithm or constraint handling technique to solve troduce the theoretical background of constrained multi-objective the problem efficiently. optimization. In Section 3, we explain the features used in this Two frequently considered high-level features of CMOPs are study. In Section 4, we present the considered test problems, and the problem type and connectivity of the feasible region. The in Section 5 the experimental setup. In Section 6, we report on the problem type characterizes whether and how the constraints obtained results. Finally, in Section 7, we provide a conclusion change the Pareto front of the problem. As pointed out by Tanabe and present the ideas for future work. et al. [8], this feature is useful as it indicates whether the problem 2 THEORETICAL BACKGROUND Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or A CMOP can be formulated as: distributed for profit or commercial advantage and that copies bear this notice and minimize 𝑓 (𝑥 ), 𝑚 = 1, . . . , 𝑀 𝑚 the full citation on the first page. Copyrights for third-party components of this (1) work must be honored. For all other uses, contact the owner/author(s). subject to 𝑔 (𝑥 ) ≤ 0, 𝑘 = 1, . . . , 𝐾, 𝑘 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia where 𝑥 = (𝑥 ) is a search vector of dimension 𝐷, 𝑓 : © 2022 Copyright held by the owner/author(s). 1, . . . , 𝑥𝐷 𝑚 𝑆 → R are objective functions, 𝑔 : 𝑆 → 𝑘 R constraint functions, 7 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Andova et al. 𝐷 Table 1: High-level features of the MW test problems. 𝑆 ⊆ R is the search space, and 𝑀 and 𝐾 are the numbers of objectives and constraints, respectively. A solution 𝑥 is feasible, if it satisfies all constraints 𝑔 (𝑥) ≤ 0 𝑘 Problem Type Connectivity for 𝑘 = 1, . . . , 𝐾. For each constraint 𝑔 we can define the con- 𝑘 MW1 II Disconnected straint violation as 𝑣 (𝑥) = max(0, 𝑔 (𝑥)). The overall constraint 𝑘 𝑘 MW2 I Disconnected violation is defined as MW3 III Connected 𝐾 MW4 I Connected ∑︁ 𝑣 (𝑥 ) = 𝑣 (𝑥 ). (2) 𝑘 MW5 II Connected 𝑖 MW6 II Disconnected A solution 𝑥 is feasible iff 𝑣 (𝑥) = 0. MW7 III Connected A feasible solution 𝑥 ∈ 𝑆 is said to dominate another feasible MW8 II Disconnected solution 𝑦 ∈ 𝑆 if 𝑓 (𝑥) ≤ 𝑓 (𝑦) for all 1 ≤ 𝑚 ≤ 𝑀, and 𝑓 (𝑥) < MW9 IV Connected 𝑚 𝑚 𝑚 ∗ 𝑓 (𝑦) for at least one 1 ≤ 𝑚 ≤ 𝑀. A feasible solution 𝑥 ∈ 𝑆 is a MW10 III Disconnected 𝑚 Pareto-optimal solution if there exists no feasible solution 𝑥 ∈ 𝑆 MW11 IV Disconnected that dominates ∗ 𝑥 . All feasible solutions constitute the feasible MW12 IV Disconnected region, 𝐹 = {𝑥 ∈ 𝑆 | 𝑣 (𝑥) = 0}, and all nondominated feasible MW13 III Disconnected solutions form the Pareto set, 𝑆o. The image of the Pareto set in MW14 I Connected the objective space is the Pareto front, 𝑃o = {𝑓 (𝑥) | 𝑥 ∈ 𝑆o}. 3 EXPLORATORY LANDSCAPE ANALYSIS They are the minimum and maximum correlations between the objectives and the overall constraint violation. ELA is a selection of techniques able to analyze the search and objective space of a problem, their correlation and their charac- 4 TEST PROBLEMS teristics with the goal of identifying the features important for the performance of optimization algorithms. To extract the ELA We base this study on 14 CMOPs proposed by Ma et al. [5] and features, one needs to first generate a sample of solutions. The called MW1–14. In addition to proposing the problems, the au- ELA features use statistical methods to characterize the problem thors also describe them with high-level features, such as the landscape. Thus, one can use an arbitrary sample size. However, problem type and connectivity of the feasible region. The values the ELA features are generally more accurate for large sample of these two high-level features for each MW problem are listed sizes. The ELA features proposed by Liefooghe et al. [4] and in Table 1. used also in this work can be divided into four categories: global, Many of the ELA features proposed by Liefooghe et al. [4] multimodality, evolvability, and ruggedness features. can only be calculated for bi-objective optimization problems. The global features capture certain global problem properties, Therefore, we investigate only the bi-objective versions of the for example, the correlation between the objective values, average MW problems although three of them are scalable in the number and maximum distance between solutions in the search space and of objectives. All MW problems are also scalable in the num- the objective space, the proportion of non-dominated solutions, ber of variables. We use 5-dimensional problems to match the the average and maximum rank of solutions, etc. experimental setup from [7]. The multimodality features assess the number of local optima in the objective space. They are computed for the bi-objective 5 EXPERIMENTAL SETUP space and also for each objective separately, in both cases by In preliminary experiments, we used six sampling methods from analyzing the neighbourhood of each solution. If a solution domi- the ghalton [1] and scipy [9] Python libraries: gHalton, Halton, nates its neighbors (or has a better objective value than its neigh-Sobol, Latin hypercube sampling, optimized Latin hypercube bors), it is defined as a local optimum. The multimodality features sampling, and uniform sampling [3]. The results have shown comprise the proportion of solutions that are locally optimal, the that similar prediction accuracies are obtained when using data average and maximum distances between local optima, etc. provided by any of these sampling methods. For this reason, The evolvability features describe how fast a local optimizer we only present the results obtained using the Sobol sampling would converge towards an optimum. They are calculated by method in the rest of the paper. analyzing how many neighboring solutions are dominated by, The Sobol sampling method generates a sample set by parti- dominating, or incomparable with a given solution. tioning the search space and filling each partition with a sample The ruggedness features measure the correlation between the solution. We generate additional Sobol sample sets using the information and quality from neighboring solutions – larger cor- Cranley-Patterson rotation [3]. The solutions from the original relation means a smoother landscape. The features are calculated sample set are rotated using a random shift of each dimension, by using Spearman’s correlation coefficient on the evolvability thus creating new sample sets that preserve the properties of features between each pair of neighboring solutions. the Sobol sampling. The modulo operation keeps the shifted val- In addition, we include four ELA features from [10] that de-ues within the unitary interval. This approach was also used by scribe the violation landscape and its relation with the objective Renau et al. [7]. space. The first feature is the feasibility ratio. It is expressed as Following this approach, we generate 100 sets of samples, the proportion of feasible solutions in the sample and is one each with 512 solutions, which we then evaluate on all 14 MW of the most frequently used features in categorizing violation benchmark problems. For each problem and sample set pair, we landscapes. The second feature is the maximum value of overall compute 46 ELA features, which represent a single instance in constraint violation values in the sample. The last two features the data. As a result, by evaluating the 100 sample sets on each measure the relationship between the objectives and constraints. of the 14 test problems, we get 1400 data instances. We then use 8 Initial Results in Predicting High-Level Features of CMOPs Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia these data instances and the corresponding high-level problem Table 2: Classification accuracy when 50% of all data is used features (problem type and connectivity) to train a classifier for for training and 50% for testing (first experiment). predicting the high-level problem features. We use two widely used machine learning (ML) methods for Learning method Problem type Problem connectivity classification: the Random Forest (RF) classifier and the k-Nearest RF 98% 99% Neighbors (KNN) classifier. The reason for choosing these classi- KNN 100% 100% fiers instead of some others is that, usually, RF performs favorably compared to other ML classifiers. KNN, on the other hand, uses the distance between solutions as a performance metric, which is useful when analyzing the obtained classification results visually. the problem type prediction, and to 41–57% for the problem con- For both RF and KNN, we apply the implementation from the nectivity prediction (see the leftmost points corresponding to 0% scikit-learn library [6]. For KNN, we keep the default settings, on the plots in Figure 2). This is comparable to the classification while for RF we train 100 trees. accuracy of the stratified classifier, which achieves 19% for the We perform three experiments that differ in the classifier eval- problem type prediction and 45% for the problem connectivity uation methodology. In the first experiment, we base the evalua- prediction. We can look at the results of the third experiment to tion methodology on the work by Renau et al. [7], where the data help us understand this decline in classification accuracy. As seen is split by using instances from the same problem for both train-from Figure 2, adding just a few instances of the target problem to ing and testing. There, 50% of all instances are used for training, the training set drastically increases the classification accuracy. and the remaining 50% for testing. Furthermore, we take care When the training data contains no instances from the target of dividing the instances into training and test sets so that the problem, the classifier is forced to find information about the proportion of instances from each problem is equal in both sets. high-level feature from other problems. However, this is a much However, this methodology does not correspond to the real- harder task given that similar problems often have different high- world scenario where we want to learn the high-level features of level features (see the middle and right plots in Figure 1). a problem encountered for the first time. Therefore, we use the In the visualizations in Figure 1 the points indicating the cor-leave-one-problem-out evaluation methodology in the second rectly classified instances have black edges. As we can see, for experiment. Here, the instances from a single problem are used many problems, RF has a 0% classification accuracy (top middle for testing, and the instances from all other problems for training. and top right plot). There are, however, some problems for which The procedure is repeated for all problems and the classification RF finds the correct class for a number of instances. Nonethe- accuracy is calculated as the average over all train-test splits. less, from these 2-D plots it is hard to understand why certain Finally, the third experiment is performed to see how adding instances are misclassified by RF. This is because RF detects de- target problem data to the training set influences the resulting tails in the data that the dimensionality reduction visualization classification accuracy. In this experiment, we vary the percent- method is unable to capture. age of target problem data that is used for training between 0% Similar behavior can be observed for KNN. Given that KNN and 99% with the step of 1%. When it equals 0%, no target problem classifies an instance depending on the classes of its most similar data is used for training, which corresponds to the leave-one-instances, the visualization from Figure 1 can help interpret its problem-out methodology of the second experiment. Note that poor results on the leave-one-problem-out methodology. We can this setup never equals the one from the first experiment because see that the clusters created by PacMAP are not well-aligned here the data of all other (non-target) problems is always used with the high-level features of problem type and connectivity. for training. Again, this procedure is repeated for all problems This makes predicting them a hard task for KNN. The cluster- and we report the average classification accuracy. ing by PacMAP suggests that the applied ELA features are not To better understand the task we are trying to solve, we visu- descriptive enough for predicting problem type and connectivity. alize the classes by first reducing the dimensionality of the fea- ture space from 46-D to 2-D using Pairwise Controlled Manifold 7 CONCLUSION AND FUTURE WORK Approximation Projection (PaCMAP) [11]. We use the Python In this work, we tried to predict high-level features of CMOPs. package pacmap with default parameter values. More specifically, using low-level ELA features, we constructed the classifiers to predict the problem type and connectivity. Two ML classifiers were utilized, RF and KNN. 6 RESULTS We employed three evaluation methodologies. The first one The results of the first experiment, where 50% of all data is used follows the related work and splits the data into two halves, one for training and 50% for testing, show that both RF and KNN serving as the training set and the other as the test set (instances achieve a classification accuracy above 98% (see Table 2). An ex-from the same problem are used in both sets). The second evalu- planation for such good results can be derived from the two left- ation methodology uses all instances from the target problem for most plots in Figure 1. Here, we can see that PacMAP finds many testing, and none for training. The third method gradually adds clusters in the data. However, the clusters are highly correlated to the target problem data to the training set. We achieved excellent the problems themselves. Thus, leaving some instances from the classification accuracy with the first evaluation methodology, but target problem in the training set results in a high classification very poor ones with the second one. The drop in classification ac-accuracy because the classification task is now transformed into curacy was checked by the third methodology, which has shown identifying to which cluster the new sample belongs, which is a that already a small number of instances of the same problem much easier task to perform. increases the classification accuracy. The more realistic scenario of having to predict the high-level Visualizations of the data in the form of 2-D plots show that feature of a yet unseen problem is tested in the second experi- CMOP instances form clusters that are highly correlated to the ment. Here, the classification accuracy drops to only 7–19% for problem instances, but not to the high-level problem features. For 9 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Andova et al. Figure 1: Dimensionality reduction of the ELA feature space using the PacMAP method. Points are colored based on their true values with correct classifications denoted by a black point edge. The top and bottom rows show the results for Random Forest and KNN, respectively, while the different classification targets are arranged in columns: the left column displays the results for the problem, the middle for problem type and the right for problem connectivity. N2-0254) and the Czech Science Foundation (grant no. GF22- 34873K). The Slovenian authors acknowledge additional financial support from the Slovenian Research Agency (young researcher program and research core funding no. P2-0209). REFERENCES [1] François-Michel De Rainville, Christian Gagné, Olivier Teytaud, and Denis Laurendeau. 2012. Evolutionary optimization of low-discrepancy sequences. ACM Transactions on Modeling and Computer Simulation, 22, 2, 1–25. [2] Christian Hanster and Pascal Kerschke. 2017. flaccogui: Exploratory landscape analysis for everyone. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO) Companion. ACM, 1215–1222. Figure 2: Classification accuracy for different proportions [3] Christiane Lemieux. 2009. Monte Carlo and Quasi-Monte Carlo Sampling. of data from the target problem used for training. Springer Series in Statistics. Springer New York, NY. [4] Arnaud Liefooghe, Sébastien Verel, Benjamin Lacroix, Alexandru-Ciprian Zăvoianu, and John McCall. 2021. Landscape features and automated algorithm selection for multi-objective interpolated continuous optimisation problems. In Proceedings of the Genetic and Evolutionary Computation Con-this reason, by including some instances from the target problem ference (GECCO). ACM, 421–429. in the training set, the classification task becomes an easier task of [5] Zhongwei Ma and Yong Wang. 2019. Evolutionary constrained multiobjective optimization: Test suite construction and performance comparisons. recognizing to which cluster an instance belongs. Unfortunately, IEEE Transactions on Evolutionary Computation, 23, 6, 972–986. this is not a realistic scenario, since in the real world we have [6] F. Pedregosa et al. 2011. Scikit-learn: Machine learning in Python. Journal no information on the characteristics of the newly encountered of Machine Learning Research, 12, 2825–2830. [7] Quentin Renau, Carola Doerr, Johann Dreo, and Benjamin Doerr. 2020. problem. We therefore recommend to use the second evaluation Exploratory landscape analysis is strongly sensitive to the sampling strategy. methodology when addressing this task. In International Conference on Parallel Problem Solving from Nature. Springer, However, the initial results obtained using the second evalua-139–153. [8] Ryoji Tanabe and Akira Oyama. 2017. A note on constrained multi-objective tion methodology are not so promising. A possible improvement optimization benchmark problems. In IEEE Congress on Evolutionary Com-could be considering more ELA features in the learning procedure, putation (CEC). IEEE, 1127–1134. [9] Pauli Virtanen et al. 2020. SciPy 1.0: Fundamental Algorithms for Scientific either additional ones from [10] or newly created ones. Moreover, Computing in Python. Nature Methods, 17, 261–272. using a more representative set of test problems from various [10] Aljoša Vodopija, Tea Tušar, and Bogdan Filipič. 2022. Characterization of benchmark suites may also improve classifier performance. constrained continuous multiobjective optimization problems: A feature space perspective. Information Sciences, 607, 244–262. [11] Yingfan Wang, Haiyang Huang, Cynthia Rudin, and Yaron Shaposhnik. ACKNOWLEDGMENTS 2021. Understanding how dimension reduction tools work: An empirical approach to deciphering t-sne, umap, trimap, and pacmap for data visual-The authors acknowledge the project “Constrained multi-objective ization. Journal of Machine Learning Research, 22, 201, 1–73. Optimization Based on Problem Landscape Analysis” was finan- cially supported by the Slovenian Research Agency (project no. 10 Learning the Probabilities in Probabilistic Context-Free Grammars for Arithmetical Expressions from Equation Corpora Marija Chaushevska Ljupčo Todorovski Jure Brence & Sašo Džeroski marija.chaushevska@ijs.si ljupco.todorovski@fmf.uni-lj.si jure.brence@ijs.si|saso.dzeroski@ijs.si Jožef Stefan Int. Postgraduate School Jožef Stefan Institute & Jožef Stefan Institute & & Jožef Stefan Institute Faculty of Mathematics and Physics Jožef Stefan Int. Postgraduate School Jamova cesta 39 Jadranska cesta 21 Jamova cesta 39 Ljubljana, Slovenia Ljubljana, Slovenia Ljubljana, Slovenia ABSTRACT Machine learning methods for supervised regression assume a A core challenge for both physics and artificial intelligence (AI) is fixed class of models, such as linear regression or neural networks symbolic regression: finding a symbolic expression that matches with a particular architecture, and find the one that provides the data from an unknown function. Symbolic regression approaches best fit to the training data. Equation discovery methods typi- are largely data-driven and search an unconstrained space of cally consider broader classes of mathematical equations. These mathematical expressions, often employing genetic algorithms. classes may be vast and many (often infinitely many) equations On the other hand, equation discovery approaches incorporate can be found that provide excellent fit to the training data. The domain knowledge to constrain the structure space and search it challenge of symbolic regression is therefore twofold. On one using local or exhaustive search methods. In this paper, we adopt hand, one can easily overfit the training data with an unnecessar- the use of probabilistic context-free grammars (PCFG) in equation ily complex equation. On the other hand, the space of candidate discovery and propose a method for learning the probabilities equations is huge and grows exponentially as equation complex- of production rules in such PCFGs. We take a universal PCFG ity increases, posing serious computational issues to equation with an initial set of manually assigned probabilities for each discovery methods. production rule. We learn new probabilities by parsing each Equation Discovery systems explore the hypothesis space of expressions in a given corpus of expression, such as the Feynman all equations that can be constructed given a set of arithmetic dataset. operators, functions and variable. They search for equations that fit given input data best. The number of all possible candidate KEYWORDS equations can be infinite. Early equation discovery systems used parametric approaches equation discovery, grammar, probabilistic context-free grammar, to specify the space of polynomial equations considered. LA- parsing, learning probabilities, probability distribution GRAMGE [13] uses context-free grammars (CFG) [9] to specify 1 INTRODUCTION the language of equations considered. The recent system ProGED [2] uses probabilistic context-free grammars (PCFG), where a Equation discovery is an area of machine learning that develops probability is associated with each production rule. In this paper, methods for automated discovery of quantitative laws, expressed we propose a method for learning these probabilities for a given in the form of equations, in collections of measured numeric PCFG by using a given corpus of expressions. data [5] [11]. More precisely, equation discovery methods seek to automate the identification of equation structure as well as parameters. Traditionally, domain experts derive equation structure based on the theory in the domain and use standard numerical optimization methods to estimate their parameters. Equation 2 GRAMMARS FOR EQUATION DISCOVERY discovery methods often use domain knowledge to specify the space of equations they consider. The key questions in the field A grammar is a finite specification of a language. A language can are how to best represent the symbolic language of mathematics, contain an infinite number of strings, or even if it is finite, it can how to incorporate domain knowledge in the process of equa-contain so many strings that it is not practical to list them all. tion discovery, as well as how to perform the search for optimal Originating from computational linguistics, grammars are used equation structures. Symbolic regression methods are largely as formal specifications of languages and use a set of production data-driven and search an unconstrained space of mathematical rules to derive valid strings in the language of interest. A gram- expressions, often employing evolutionary algorithms. On the mar mainly consists of a set of production rules, rewriting rules other hand, equation discovery methods, such as process-based for transforming strings. Each rule specifies a replacement of a modeling [4], incorporate domain knowledge to constrain the particular string (its left-hand side) with another (its right-hand structure space and search using greedy-local [12] or exhaustive side). A rule can be applied to each string (equation) that contains search methods on the constrained space. The task of equation its left-hand side and produces a string in which an occurrence discovery is closely related to the task of supervised regression. of that left-hand side has been replaced with its right-hand side. A grammar further distinguishes between two kinds of symbols: Permission to make digital or hard copies of part or all of this work for personal non-terminal and terminal symbols; each left-hand side must or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and contain at least one non-terminal symbol. It also distinguishes a the full citation on the first page. Copyrights for third-party components of this special non-terminal symbol, called the start symbol. In equation work must be honored. For all other uses, contact the owner/author(s). discovery, we are interested in using grammars as generative Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia models, as opposed to their common use for parsing, i.e., discrim- © 2022 Copyright held by the owner/author(s). inating between legal and illegal strings in a language. 11 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Chaushevska, Todorovski, Brence, and Džeroski 2.1 Context-Free Grammar (CFG) ∑︁ In formal language theory, a context-free grammar [9] is a for- 𝑃 → (𝐴 → 𝛼 ) = 1 (2) mal grammar which is defined as a tuple G = (N, T, R, S). It is (𝐴→𝛼 ) ∈𝑅 used to generate all possible patterns of strings in a given for- The probability of a derivation (parse) is defined as the product mal language. The syntax of the expression on the right-hand of the probabilities of all the production rules used to expand side of the equation is prescribed with a context-free grammar. each node in the parse tree (derivation). These probabilities can The set T contains terminal symbols, i.e.,words for composing be viewed as parameters of the model. The probabilities of all sentences in the language or variables in the arithmetical ex- productions with the same non-terminal symbol on the left hand pressions. The terminals are primitive grammar symbols that side sum up to one, i.e., we impose a probability distribution on can not be further rewritten, i.e., no productions are affiliated the productions with the same symbol on the left hand side. As with them. Non-terminal symbols (syntactic categories) in the each parse tree, derived by a grammar 𝐺, is characterized by a set N represent higher-order terms in the language, such as noun sequence of productions, its probability is simply the product of or verb phrases. Each of the non-terminals represents expres- the probabilities of all productions in the sequence [11]. sions or phrases in a language. The production rules in the set We can extend the example context-free grammar 𝐺 above to 𝑀 R are rewrite rules of the form A → 𝛼, where the left-hand side a PCFG by assigning a probability to each of the six productions, is a non-terminal, 𝐴 ∈ 𝑁 , while the right-hand side is a string given below in brackets after each production: of non-terminals and terminals, 𝛼 ∈ (𝑁 ∪ 𝑇 )* . In natural lan- guage, a rule 𝑁 𝑃 → 𝐴𝑁 specifies that a noun phrase 𝑁 𝑃 has 𝐸 → 𝐸 + 𝑉 [𝑝 ] |𝐸 ∗ 𝑉 [𝑞] |𝑉 [1 − 𝑝 − 𝑞] an adjective 𝐴 and a noun 𝑁 . 𝐴 and 𝑁 represent the subsets of (3) adjectives and nouns, which are both terminals. No matter which 𝑉 → 𝑥 [𝑝 ] |𝑦 [𝑞 ] |𝑧 [1 − 𝑝 − 𝑝 ] 𝑣 𝑣 𝑣 𝑞 symbols surround it, the single non-terminal on the left hand Here we have parameterized the probability distributions over side can always be replaced by the right hand side. This is what productions for 𝐸 and 𝑉 with the parameters 0 < 𝑝 < 1; 0 < 𝑞 < distinguishes it from context-sensitive grammar. When deriving 1; 0 < 𝑝 < 1; and 0 < 𝑞 < 1, respectively. 𝑣 𝑣 a sentence, a grammar starts with a string containing a single Context-free grammars are typically used to parse sentences. non-terminal 𝑆 ∈ 𝑁 and recursively applies production rules to Probabilistic context-free grammars provide an estimate of the replace non-terminals in the current string with the strings on probability of a parse tree, in addition to the tree itself. Prob- the right-hands sides of the rules. The final string contains only abilistic context-free grammars also allow for another type of terminal symbols and belongs to the language defined by 𝐺. application — stochastic generation of sentences or, in our case, In equation discovery, grammars represent sets of expressions mathematical expressions. The probabilities, assigned to the pro- that can appear in the right hand side of equations. These gram- ductions, provide a great amount of control over the probability mars use several symbols with special meanings. For example, distribution of individual parse trees. In our example in Eq. 3, the the terminal 𝑐𝑜𝑛𝑠𝑡 ∈ 𝑇 is used to denote a constant parameter in parameters 𝑝 and 𝑞 control the probability of a larger number of an equation that has to be fitted to the input data. terms in an expression, while the parameters 𝑝 and 𝑞 tune the 𝑣 𝑣 A simple context-free grammar 𝐺 = (𝑁 ,𝑇 ,𝑅 ,𝑆 ) deriv- 𝑀 𝑀 𝑀 𝑀 𝑀 ratio between the number of occurrences of variables 𝑥, 𝑦 and 𝑧. ing linear expressions from variables 𝑥, 𝑦, 𝑧 is as follows: An important concept to consider when working with gram- mars is ambiguity. A grammar is formally ambiguous if there 𝑁 = {𝐸, 𝑉 } exist sentences (expressions) that can be described by more than 𝑀 one parse tree, generated by the grammar. Grammars for arith- 𝑇 = {𝑥, 𝑦, 𝑧, +, ∗} 𝑀 metic expressions can express another type of ambiguity, called 𝑅 = {𝐸 → 𝐸 + 𝑉 |𝐸 ∗ 𝑉 |𝑉 𝑀 (1) semantic ambiguity. All but the simplest arithmetic expressions 𝑉 → 𝑥 |𝑦 |𝑧 } can be written in many mathematically equivalent, but grammat- 𝑆 = 𝐸 ically distinct ways. It is generally useful to adopt a canonical 𝑀 representation that each generated equation is converted into. This allows us to compare expressions to each other and check We write multiple production rules with the same non-terminal whether they are mathematically equivalent in addition to com- on the left hand side using a compact, single-line representation, paring their parse trees. In our work, we use the Python symbolic e.g., 𝐸 → 𝐸 + 𝑉 | 𝐸 ∗ 𝑉 | 𝑉 stands for the set of rules {𝐸 → mathematics library SymPy [8] to simplify expressions, convert 𝐸 + 𝑉 , 𝐸 → 𝐸 ∗ 𝑉 , 𝐸 → 𝑉 }. them into canonical form, and compare them symbolically. 2.2 Probabilistic Context-Free Grammar 3 LEARNING PROBABILITIES IN PCFGS FOR (PCFG) ARITHMETICAL EXPRESSIONS Grammar formalisms are not new to the field of equation discov- Parameter learning approaches for PCFGs assume a fixed set of ery [4] [3] [13], but probabilistic grammars are. A probabilistic production rules and try to learn the probabilities assigned to grammar assigns probabilities to productions and thereby al- them. Some approaches encourage sparsity and remove rules lows one to use the grammar as a stochastic generator [10] [6] with very small probabilities. Parameter learning approaches [15]. Probabilistic Context-Free Grammars (PCFGs), are a sim-are typically more scalable than structure search approaches, be- ple model of phrase-structure trees. They extend context-free cause parameter learning is a continuous optimization problem grammars (CFGs) similarly to how hidden Markov models extend which is in general easier than the discrete optimization problem regular grammars. A grammar can be turned into a probabilistic of structure search. Therefore, most of the state-of-the art algo- grammar by assigning probabilities to each of its productions, rithms for unsupervised learning of natural language grammars such that for each 𝐴 ∈ 𝑁 : are parameter learning approaches. 12 Learning the Probabilities in PCFGs for Arithmetical Expressions from Equation Corpora Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia 3.1 The Approach The second corpus consists of 4080 scientific expressions from In this paper, we propose a parameter learning approach for Wikipedia. Those mathematical expressions are named after peo- PCFGs, based on parsing a corpus of expressions. We adopt the ple and they are parsed from Wikipedia. Compared to the Feyn- universal PCFG probabilistic context-free grammar for arithmetic man dataset, Wikipedia’s corpus contains more functions such expressions used by ProGED [2]. While ProGED uses manually as: 𝐴𝑏𝑠, 𝑓 𝑎𝑐𝑡𝑜𝑟𝑖𝑎𝑙, 𝑡𝑎𝑛, 𝑠𝑖𝑛ℎ, 𝑐𝑜𝑠ℎ and 𝑝𝑜𝑤 (which do not exist assigned probabilities in this grammar, we use an initial set of in the Feynman database) as well as irrational constants (𝑒 and randomly assigned probabilities to each production rule. The 𝜋 ) and numerical constants, which have to be replaced by a con- universal grammar is composed of production rules that include stant ‘C’(const) in the grammar. The equations in Wikipedia’s the four basic operations (+,-,*, /), basic functions (such as sin or dataset contain between one and fifteen variables, which is twice log), constant parameters and variables. as much compared to the Feynman dataset and the maximum Our method for learning probabilities from a given corpus of number of ‘C’ terminal symbols is 16 per equation. expressions is designed on the assumption that the probability 3.3 The Learned Probabilities of a production rule in the grammar is proportional to the inci- dence of the production in the parse trees for the expressions in By using the proposed approach on the two corpora of arithmetic the corpus. It uses a parser from the NLTK (Natural Language expressions described above, we obtain two sets of probabilities, Toolkit) Python library [1] to parse the expressions in the given with each probability assigned to one of the production rules in corpus using the universal PCFG. NLTK contains classes to work the PCFG. More precisely, we now have three universal PCFGs: (1) with PCFGs and there are different types of parsers implemented with the initial probabilities, manually assigned by the authors in the NLTK Python library. In particular, we use the InsideChart- of ProGED, (2) with probabilities fine-tuned (learned) on the Parser(), a bottom-up parser for PCFG grammars that tries edges Feynman dataset, and (3) with probabilities fine-tuned (learned) in descending order of the inside probabilities of their trees. The on the Wikipedia corpus of arithmetical expressions. "inside probability" of a tree is simply the probability of the entire In this section, we first present the three sets of probabilities, tree, ignoring its context. In particular, the inside probability of a for each of the above mentioned PCFs: these are given in Table 1. tree generated by production We then compare the probability distributions across the rules 𝑝 with children 𝑐 [1], 𝑐 [2], ..., 𝑐 [𝑛] is for each non-terminal symbol (𝑆,𝐹 ,𝑇 and 𝑅) in the PCFGs. 𝑃 (𝑝 )𝑃 (𝑐 [1])𝑃 (𝑐 [2])...𝑃 (𝑐 [𝑛]); and the inside probability of a token is 1 if it is present in the text, and 0 if it is absent. For a As compared to the initial grammar, the grammar learned on given string (expression) and a grammar, the parser determines the Feynman database reduces the probabilities of the recursive whether the string can be derived using the grammar and if yes, production rules (𝑆 → 𝑆 + 𝐹 and 𝑆 → 𝑆 − 𝐹 ) and increases returns the appropriate parse tree. After parsing the equations the probability of the non-recursive rule (𝑆 → 𝐹 ): This leads to we count the number of times each production rule appears in simpler expressions with fewer additive terms. In contrast, the the set of parsing trees, for all parsed equations (except for rules grammar learned on the Wikipedia corpus has a probability for directly resulting in terminal symbols (variables)). We then group the rule 𝑆 → 𝑆 + 𝐹 very comparable with the probability in the production rules by left non-terminal symbol and derive the prob- initial grammar. It also decreases the probability of the recursive abilities for each production rule as the number of appearances production rule 𝑆 → 𝑆 − 𝐹 and increases the probability of the of a given production rule divided by the sum of such numbers non-recursive rule 𝑆 → 𝐹 by approximately 0.1 in each case. for all production rules for the same non-terminal. The probabilities of the recursive production rules for the 𝐹 non-terminal symbol (𝐹 → 𝐹 ∗ 𝑇 and 𝐹 → 𝐹 /𝑇 ) are mostly larger than the ones in the initial grammar. An exception is the rule 𝐹 → 𝐹 /𝑇 with the Wikipedia corpus. The probability of the 3.2 The Corpora non-recursive production rule (𝐹 → 𝑇 ) is smaller, slightly for the We apply the proposed approach to two corpora of expressions Wikipedia corpus, more substantially for the Feynman dataset. (that appear on the right hand side of equations). The first one is In the learned probability distributions over the production the Feynman Symbolic Regression Database, which includes a rules for the non-terminal 𝑇 , the probability of the rule 𝑇 → 𝑉 diverse sample of equations from the three-volume set of physics is much higher (goes from 0.4 to 0.7). In both learned grammars, textbooks by Richard P. Feynman [7] and has been previously the probabilities of the 𝑇 → 𝑅 and 𝑇 → ‘𝐶’ production rules used as a benchmark for equation discovery [14]. It was con-are substantially reduced. This is more noticeable for 𝑇 → ‘𝐶’, structed by Udrescu and Tegmark [3] to facilitate the develop-where the probability goes from 0.4 to slightly above 0.1. ment and testing of algorithms for symbolic regression. The We finally discuss the probability distributions over the pro- equations from Feynman database contain between one and nine duction rules for the non-terminal symbol 𝑅 in the initial gram- variables, the four basic operations (+, −, /, ∗), the functions exp, mar and the two learned grammars. A probability with value 0 for √ , sin, cos, tanh, arcsin and ln, as well as a variety of constants a production rule here means that that function for the particular – mostly rational, but also 𝑒 and 𝜋 . There are three components production rule is not present either in the Feynman corpus or the to an arithmetic expression: variables, constants and operators. Wikipedia corpus of mathematical expressions. For example, the Numerical values and constants are typically treated as free pa- functions 𝑙𝑛 and 𝑎𝑟𝑐𝑠𝑖𝑛 are not present in the arithmetical expres-rameters (terminal symbols) to be optimized when evaluating an sions from the Wikipedia dataset, but are present in the Feynman equation for its fit against given data. We replaced all constants, database. On the other hand, the functions 𝑙𝑜𝑔, 𝑝𝑜𝑤 , 𝐴𝑏𝑠, 𝑠𝑖𝑛ℎ, such as 𝑒, 𝜋 and rational constants with the terminal ‘C’(const), 𝑐𝑜𝑠ℎ, 𝑓 𝑎𝑐𝑡𝑜𝑟 𝑖𝑎𝑙 and 𝑡𝑎𝑛 do not exist in the arithmetic expressions because we treat them as free parameters. The minimum number from the Feynman database, that’s why their probability is 0. of constants (‘C’), in the Feynman database is 0, which means that The grammar learned on the Feynman database increases the there are a some equations that have only variables as terminal probabilities of the production rules 𝑅 → (𝑆), 𝑅 → 𝑠𝑖𝑛(𝑆) and symbols. On the other hand the maximum number of constants 𝑅 → 𝑠𝑞𝑟 𝑡 (𝑆 ) as compared to the probabilities of the initial gram-in the Feynman database is five constants in only one equation. mar. In contrast, the probabilities of the remaining production 13 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Chaushevska, Todorovski, Brence, and Džeroski Table 1: Probabilities of the production rules for the non- is expressive enough to encode these properties and that the terminal symbols in the initial grammar, the grammar learning algorithm is able to discover them. The results show a trained on the Feynman database and the grammar trained great deal of promise for the goals of inferring domain knowledge on the Wikipedia corpus of expressions. from equation corpora and improving the efficiency of grammar- based equation discovery through the fine-tuning of production Production rule Initial Feynman Wikipedia probabilities. As further work we would like to perform equation discovery S -> S + F 0.2 0.1034 0.2004 experiments using the three universal grammars: the universal S -> S - F 0.2 0.1552 0.1108 grammar with initial (default) probabilities, with probabilities S -> F 0.6 0.7414 0.6888 learned on the Feynman dataset and probabilities learned on the F -> F * T 0.2 0.3635 0.3349 Wikipedia corpus. For this purpose, we will use the equation F -> F / T 0.2 0.2446 0.1098 discovery system ProGED, which uses a Monte-Carlo approach F -> T 0.6 0.3919 0.5553 of sampling equation structures from a given PCFG and evalu- ating their fit to the given data. We expect that the number of T -> R 0.2 0.1554 0.1746 successfully reconstructed equations from the Feynman dataset, T -> ‘C’ 0.4 0.1338 0.1174 when using the learned (fine-tuned) universal PCFGs, will be T -> V 0.4 0.7108 0.7082 higher as the number of equations successfully reconstructed R →( S ) 0.3 0.5391 0.6841 with the universal grammar with manually set probabilities. R → sin( S ) 0.1 0.113 0.0249 R → arcsin ( S ) 0.1 0.0173 0 REFERENCES R → ln ( S ) 0.1 0.0087 0 [1] Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language ProR → tanh ( S ) 0.1 0.0087 0.0045 cessing with Python. (1st ed.). O’Reilly Media, Inc. [2] Jure Brence, Ljupčo Todorovski, and Sašo Džeroski. 2021. Probabilistic gram-R → cos ( S ) 0.1 0.0956 0.0435 mars for equation discovery. Knowledge-Based Systems, 224, (Apr. 2021), 1– R → sqrt ( S ) 0.1 0.1304 0.0831 12. R → exp ( S ) 0.1 0.0872 0.0780 [3] Will Bridewell and Pat Langley. 2010. Two kinds of knowledge in scientific discovery. Topics in Cognitive Science, 21, 36–52. R → log( S ) 0 0 0.0479 [4] Will Bridewell, Pat Langley, Ljupčo Todorovski, and Sašo Džeroski. 2008. R → Abs( S ) 0 0 0.0211 Inductive process modeling. Machine Learning, 71, 1, 1–32. R → ( S ) ‘**’ ( S ) 0 0 0.0032 [5] Sašo Džeroski, Pat Langley, and Ljupco Todorovski. 2007. Computational Discovery of Scientific Knowledge. Vol. 4660. (Aug. 2007), 1–14. R → sinh( S ) 0 0 0.0032 [6] Brian C. Falkenhainer and Ryszard S. Michalski. 1990. Integrating quanti-R → cosh( S ) 0 0 0.0026 tative and qualitative discovery: the abacus system. In Machine Learning. Yves Kodratoff and Ryszard S. Michalski, editors. Morgan Kaufmann, San R → factorial( S ) 0 0 0.0019 Francisco (CA), 153–190. R → tan( S ) 0 0 0.0019 [7] R.P. Feynman, R.B. Leighton, and M. Sands. 2015. The Feynman Lectures on Physics, Vol. I: The New Millennium Edition: Mainly Mechanics, Radiation, and Heat. Basic Books. [8] Aaron Meurer et al. 2017. Sympy: symbolic computing in python. PeerJ rules learned on the Feynman database have lower probabilities Computer Science, 3, (Jan. 2017), e103. as compared to the initial grammar. The grammar learned on [9] Alan P. Parkes. 2008. A Concise Introduction to Languages and Machines. (1st ed.). Springer Publishing Company, Incorporated. the Wikipedia corpus increases the probability of the 𝑅 → (𝑆) [10] Cullen Schaffer. 1993. Bivariate scientific function finding in a sampled, production rule and decreases the probabilities of the remaining real-data testbed. In Machine Learning, 167–183. rules as compared to the initial grammar. [11] Michael Schmidt and Hod Lipson. 2009. Distilling free-form natural laws from experimental data. Science, 324, 5923, 81–85. [12] Jovan Tanevski, Ljupčo Todorovski, and Sašo Džeroski. 2020. Combinatorial 4 CONCLUSIONS AND FURTHER WORK search for selecting the structure of models of dynamical systems with equation discovery. Engineering Applications of Artificial Intelligence, 89, In this paper, we have proposed an approach to learn the param- (Mar. 2020), 103423. eters, i.e., production rule probabilities, in probabilistic context- [13] Ljupčo Todorovski and Sašo Dzeroski. 1997. Declarative bias in equation discovery. In Proceedings of the Fourteenth International Conference on Machine free grammars for arithmetic expressions. We demonstrated the Learning (ICML ’97). Morgan Kaufmann Publishers Inc., San Francisco, CA, proposed approach by learning the probabilities in a universal USA, 376–384. grammar for arithmetic expressions from two corpora of expres- [14] Silviu-Marian Udrescu and Max Tegmark. 2020. Ai feynman: a physics-inspired method for symbolic regression. Science Advances, 6, 16, eaay2631. sions. The learned probabilities differ substantially from their [15] R. Zembowitz and J Zytkow. 1992. Discovery of equations: experimental initial values. Most notably, the initial settings underestimated evaluation of convergence. In Proceedings of Tenth National Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc., San Mateo, CA, the frequency of variables in favor of numerical constants, over-USA, 101–117. estimated the need for recursion with addition and subtraction, while setting the probability of recursion with multiplication too low. These observations show how difficult it is to set probabilities manually and highlight the utility of the learning algorithm. The comparison of the learned probability values for the two corpora also hints towards differing properties of the two col- lections of equations. The Wikipedia corpus seems to favor mul- tiplication over division to a greater extent than the Feynman dataset. In terms of expression complexity, we observed a pref- erence for high-order terms in the Feynman dataset, in contrast to a preference for higher numbers of low-order terms in the Wikipedia corpus. The observed differences between the proper- ties of the two corpora demonstrate that the universal grammar 14 Prediction of the Inflow in Macedonian Hydropower Plants, Using Machine Learning Emilija Kizhevska Hristijan Gjoreski Mitja Luštrek emilija.kizhevska@ijs.si hristijang@feit.ukim.edu.mk mitja.lustrek@ijs.si Jožef Stefan Institute Ss. Cyril and Methodius University Jožef Stefan Institute Jožef Stefan International Faculty of Electrical Engineering Jožef Stefan International Postgraduate School and Information Technologies Postgraduate School Jamova cesta 39 Rugjer Boskovic 18 Jamova cesta 39 Ljubljana, Slovenia Skopje, R.N.Macedonia Ljubljana, Slovenia ABSTRACT more electrical devices and less and less non-renewable electric- As weather conditions become more complex and unpredictable ity sources. One of the solutions is to optimize electricity losses as a consequence of global warming and air pollution, humans due to erroneous power consumption forecasts. First, from the find it increasingly difficult to predict the amount of precipitation standpoint of world non-renewable electricity savings consump-in the coming period, thus predicting the inflow into hydroelec- tion, as the coal or oil are, and then from the standpoint of public tric basins. Different types of hydropower plants (HPP), soil com-spending, because the later you purchase electricity, the more position, how dry the soil is at the moment, the composition of expensive it becomes. [3][5]. precipitation, etc., also influence the inflow, making it even more Hydropower now provides around 6.5% of the world’s elec-difficult to be predicted. This research looks into the problem of tricity needs. In Republic of North Macedonia the total installed predicting inflow in hydroelectric basins in Republic of North capacity of hydropower is 556.8 MW, which is over 40% of the Macedonia and building machine learning models to do so. The total capacity, ranking first among renewable energy sources. main contribution of this research is the models for the largest Hydroelectricity is used the most to meet daily variations in five hydropower plants in Republic of North Macedonia (RM) electricity consumption and to provide system services for regu- that could optimize the loss and shortage of purchased electricity. lation, allowing the power system to be more flexible and reliable. Historical data from the closest meteorological station to each The peaks of electricity consumption are always regulated (i.e. hydropower plant that we were working on, as well as historical supplemented) by hydroelectric energy while coal-fired power data from the inflows at the hydropower plants, were used to plants produce the majority of electricity. Predicting the quantity build regression models that predict the inflow one day in ad-of electricity available from each hydropower plant in the future vance for each hydropower plant separately. After deriving 19 (the longer the period, the better) leads not only to the most effi-new features, of which the majority are statistical, the predic- cient use of finances, but also to protection from natural disasters tive models’ error was reduced. In the final step, we analyze the such as river and lake overflows. The amount of inflow in the results empirically and qualitatively and comparing the models foreseeable time to be predicted by humans becomes increasingly generated using different machine learning algorithms. For in- difficult, if not impossible, as meteorological conditions become stance, one of the best models is the model for HPP Vrben, with more complex and unpredictable as a result of global warming the mean absolute error of around 8% of the average daily inflow. and air pollution [4][1][9]. We built models using eight different regression algorithms for The inflow prediction in hydropower plants is mainly based each hydropower plant, including linear regression and gradient on human judgement, however, it is not always accurate. Primar- boosting regression models as the models that make the smallest ily, because it is about nature. There are two major issues with errors in predicting. These models could also help to prevent predicting future inflow accurately: river and lake overflows in areas where hydropower facilities are (1) Weather forecasting inaccuracy in the coming days. The located, with timely warnings minimizing the severity of natural issue is that the forecasting models get less accurate as disasters. the forecast gets further out in time [6]. (2) Different types of hydropower plants, especially the con- KEYWORDS struction of pumped-storage hydropower plants, the com- machine learning, regression models, hydropower plants, opti- plexity of geology, etc. In this research, we consider run- mization of hydroelectric energy loss of-river and storage hydropower plants. Run-of-river hy- dropower plant includes a facility that channels flowing water from a river through a canal or pen-stock to spin 1 INTRODUCTION a turbine, and rain influences the inflow almost immedi- The global trend for producing electricity from renewable sources ately. Storage hydropower plants that include a dam and is increasing at an exponential rate. On the other hand, the world a reservoir to accumulate water, which is stored and re-aims to reduce world’s electricity losses, as there are more and leased later when needed, providing flexibility to generate electricity on demand and reducing dependence on the Permission to make digital or hard copies of part or all of this work for personal variability of inflow. But whatever the hydropower plant or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and type, the inflow is not directly related to weather condi-the full citation on the first page. Copyrights for third-party components of this tions. For instance, the snow and hail do not accumulate work must be honored. For all other uses, contact the owner/author(s). straight away; it does require a time of melting, soil wet- Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia ting, and for storage plants, additionally, conducting the © 2022 Copyright held by the owner/author(s). water through the pipes to reach the basin, etc. [7]. 15 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia E.Kizhevska, et al. The inflow into hydroelectric basins can be predicted more the details that affect it, 19 new features were created from the easily when using machine learning than by analyzing geology, original two. The ’date’ feature was divided into three new fea- satellite monitoring, pollution monitoring, and changes in global tures: day, month, and year, and they were added to the original warming. Developing a machine learning model that connects features. Regarding the amount of precipitation and inflows into all of these features, allows the prediction to be made. Otherwise, the hydroelectric basins, the derived features are the average, it is quite difficult to perform it empirically, owing to a lack of variance, p-variance, minimum and maximum values from the resources for repeating the method for each existing hydroelectric previous five days, and values from the previous day of both plant. In this study, we built models using collected data from original features are also added as new features. Also, additional hydropower plants and the nearest meteorological stations with features are derived as sine and cosine functions of the days and the aim of developing an application that would help to monitor months. The purpose of the trigonometric functions is to reduce the daily inflow one day in advance (Figure 1)[2]. the difference between December 31 and January 1, for instance. Random Forest features ranking demonstrate that the features derived from the latest five days of both original features have the highest score. If we consider the correlation matrix, we can realize the same. The derived statistical features have the high- est correlations with the inflow. Also it is interesting that while trigonometrically generated date features have the same correlation as the features from which they were derived, in terms of inflow, they do not correlate to each other with a degree of correlation of 1. For instance, HPP Vrben’s sine function of the month, as well as the month itself (from which the sine function is derived), get a correlation index of -0.01 with the inflow, but a value of -0.04 with each other (Figure 2). Figure 1: Graphical representation of the process: predict- ing inflow in the hydroelectric basins 2 METODOLOGY 2.1 Preprocessing The daily inflow into the hydroelectric basins as labels and the amount of precipitation observed at the nearest meteorological station as descriptive features are merged by date. Then, because missing values represent less than 1% of the total data, they are filled using the average of each feature’s values. For most hydropower facilities, we could find only two de- scriptive features at first: the daily amount of precipitation [l] and the date. There were 11 additional available features in the data for HPP Tikvesh: time of moonrise and moonset (times- tamp), intensity of precipitation [ 2 𝑙 /𝑚 ] , duration of precipitation [min], time of sunrise and sunset (timestamp), the highest and lowest temperature of the day [K], absolute humidity [ 3 𝑔/𝑚 ] , cloud cover [oktas]. For HPP Tikvesh, missing values in the ad- ditional features are filled in with the average of the instances that have the same value in the most meteorologically related feature. For instance, ’humidity’ and ’cloud cover’. The procedure is as follows: for a missing value in the feature ’cloud cover,’ the corresponding value of ’air humidity’ is X; find all the values Figure 2: Correlation matrix after scaling for HPP Vrben from the feature ’cloud cover’ that have the value X in the feature ’air humidity’, calculate their average value and substitute for the missing value in the ’cloud cover’ feature. In conjunction with the ’maximum temperature’ feature, the same procedure is 3 EXPERIMENTS AND RESULTS used to replace missing values in the ’lowest temperature’ feature. Scaling the values in the descriptive features between 0 and 1 3.1 Dataset Description was the final step in the prepossessing process. For each of the five hydropower plants in the Republic of North Macedonia that have been analyzed, the data processing and 2.2 Feature Engineering approach to the problem are the same. HPP Kozjak, HES Mavrovo Using only the date and amount of precipitation as descriptive power plants (consisting of HPP Vrutok and HPP Vrben), HPP features, different regression models can be developed, but all Tikvesh, and HPP Shpilje are part of this study. The datasets are of these predictive models have a low accuracy. Due to the com- time-series, they were collected at daily intervals throughout an puter’s inability to understand the inflow pattern, including all 11-year period (1/1/09 – 12/31/19) for each hydroelectric facility. 16 Prediction of the Inflow in Macedonian Hydropower Plants, Using ML Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia ESM1 and UHMR2 have collected the initial two variables, the SelectKBest, which selects the best features based on univariate amount of precipitation [ 2 𝑙 /𝑚 ] and inflows into the hydropower statistical tests and the best numbers of features for each HPP basin (Figure 3). The inflow is expressed as the maximum amount were chosen based on the predictive models’ errors. The experi- of electricity [MWh] it could be used to produce. ment was divided into four parts: using all features, the best 15 features, the best 10 features, and the best 5 features, out of a total of 20 features. Time-series can be troublesome for splits where with the shuf- fling process we get different train and test sets across different executions, for cross-validation, or when the test subset is before or somewhere in the middle of the train subset, etc. For instance, if a pattern appears in year 3 and persists for years 4-6, the model can detect it, even though it was not present in years 1 and 2. Because the datasets we use in this research are continuous time-series at the daily level, we split the evaluation datasets contentiously, without shuffling the subsets [8]. 3.3 Results For each of the four parts of the experiment for selecting the best n features, we calculated and plotted the mean absolute error [MWh] for each model (Figure 4, Figure 5). As we can see in Figure 4 and Figure 5, the differences in the errors are similar regardless of how many features the model is trained with. The best results are provided by different models developed using various algorithms and various number of features for each hydropower plant, but one thing that all of them have in common Figure 3: Top to bottom: (a) Amount of precipitation (me- is that dummy regressor is the worst, while linear regression or teorological station Debar) and (b) amount of electricity gradient boosting produce the smallest errors. For example, if production (HPP Shpilje) during an 11-year period (2009 - we choose HPP Vrben as one of the best outcomes, we can see 2019) that while the average daily input is 3.7 MWh, the error of the model developed using linear regression and selected 15 features Equation 1 describes how to calculate the electricity that could is 0.34 [MWh]. Because the errors, with the exception of dummy be produced by hydropower plants, knowing that it is a product and lasso, are not particularly big, ranging between [0.345, 0.40], of power and working time [4]. all of the features listed after the five most influential features contribute a negligible percentage to improving or decreasing accuracy (Figure 5). 𝐸 = 𝜌𝑄𝑔𝐻 𝑡 [𝑊 ℎ] (1) where, electricity is equal to water density 3 𝜌 [1000𝑘𝑔/𝑚 ], multiplied by water flow (inflow) Q [ 3 𝑚 /𝑠], acceleration of grav- ity g [ 2 𝑚/𝑠 ] and gross height drop H [m]. Inflow refers to water flowing into accumulation basins of hydropower plants. The in- flow is measured in cubic meters per second [ 3 𝑚 /𝑠], but it can also be expressed as the quantity of electricity that the same amount of inflow could supply. When electricity is a projection of the inflow, the inflow is computed using Equation 1 and ex- pressed in watt-hours [Wh]. 3.2 Experimental Setup To build models for one day in advance inflow prediction, we used eight different regression algorithms: Support Vector Machine, Figure 4: Graphical representation of the mean absolute Random Forest, Linear Regression, Lasso, Gradient Boosting, Ex- error of the eight machine learning regressors used in the treme Gradient Boosting, K-nearest neighbours, Decision Tree study, selecting different numbers of best features for HPP and Dummy that always predicts the mean of the training tar- Vrben as a run-of-river diversion hydropower plant get values. Four evaluation metrics were used to evaluate the regression models for the prediction of inflow in hydropower plants: mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), and R-Squared (R2) as a stan- 4 CONCLUSIONS dardized version of MSE. Using eight different machine learning regressors, we built mod- Different numbers of selected features for each hydropower els for predicting inflow in Macedonian hydroelectric basins. A plant were considered using a class from sklearn library called solution for predicting the daily inflow in hydropower plants has been proposed if the daily amount of precipitation and the 1ESM - Elektrani na Severna Makedonija (litt. "Power plants of North Macedonia") 2UHMR - Uprava za Hidro-Meteoroloski raboti (National Hydrometeorological amount of precipitation for the previous five days for the nearest Service - Republic of North Macedonia) meteorological stations are known. 17 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia E.Kizhevska, et al. temperature, soil moisture, and therefore the amount of inflow into the hydropower plant are significant for storage hydropower plants because it raises the river level, and thus the amount of inflow into the hydroelectric basins, but not immediately. Because of the differences in the location, construction, and operation of hydropower plants, we can only build hydropower plant specific models. For some, the daily quantity of precipitation or the amount of precipitation from the previous day is the most essential factor, while for others, the amount of precipitation over a longer period is the most important factor. Based on this fact, which is also supported by our results for various hydropower facilities, we may conclude that we cannot build general model that can estimate the inflow for all hydropower plants. Because of the geological properties of the soil along the rivers and the temperature fluctuations in the past, we also cannot create a general model for hydropower plants of the same type. Linear regression and gradient boosting models produce the best results. We can solve the inflow problem as a linear problem, because the relations between precipitation and inflow to the basins are simple. The precision of projected weather conditions is the key drawback for obtaining even lower errors. The more accurate the weather conditions are and the longer the time pe- riod of projected weather conditions is, the better the prediction of inflow in hydroelectric basins would be. We built predictive models for the next day’s inflow in this study. The next step is to create predictive models for as far in the future as possible, so that the model can assist in power man- agement decisions. However, accurate projected meteorological conditions are required to develop such a model, and the further the time point, the greater the error. Hourly or minute time-series would predict more precisely in terms of time intervals. REFERENCES [1] Arsenov A. 2003. Proizvodstvo na elektricna energija. (1st. edi- Figure 5: Top to bottom: Graphical representation of the tion). ETF, Skopje. mean absolute error of the eight machine learning regres- [2] A. P. Verma A. Kusiak X. Wei and E. Roz. 2013. Modeling sors used in the study, selecting different numbers of best and prediction of rainfall using radar reflectivity data: a features for: HPP Shpilje, HPP Tikvesh, HPP Kozjak as data-mining approach. IEEE Transactions on Geoscience and storage hydropower plants Remote Sensing, 2337–2342. [3] Jianzhou Niu Tong Du Pei. ang Wendong Wang. 2019. A hybrid forecasting system based on a dual decomposition strategy and multi-objective optimization for electricity According to the results, the daily amount of precipitation price forecasting. Applied Energy, 1205–1225. and other inflow-related elements from the preceding days are [4] Djordjevic B. [n. d.] Koriscenje vodnih snaga: Objekti hidroelek- the most important factors that explain inflows in hydroelectric trana. (1st. edition). Naucna knjiga, Beograd. basins. Of course, there are other variables to consider, such [5] Yazidi A. Goodwin M. 2014. A pattern recognition approach as temperature, cloud cover or humidity. Considering that the for peak prediction of electrical consumption. IFIP Advances errors in prediction for HPP Tikvesh are not much smaller than in Information and Communication Technology, Springer. the ones for the other HPPs where we have only the precipitation [6] S. Makridakis. 2013. Accuracy measures: theoretical and and inflow as original data and of course, the most important practical concerns. International Journal of Forecasting, 9 part - the derived features about last days, we may conclude that (4), 1, 527–529. precipitation takes a certain amount of time to reach the basins [7] Joshua S. and Coblenz. 2015. Using machine learning tech- as an inflow, depending on daily temperatures, the nature of the niques to improve precipitation forecasting. PLoS One. soil where the hydropower plant is located, and other factors. [8] D. Fernando S. De Silva S. Perera S. Dissanayake and W. Also, we can confirm hydrological and geological assump- Rankothge. 2019. Supply and demand planning of electric- tions that continuity in the data is far more important in storage ity power: a comprehensive solution. IEEE Conference on hydropower plants than in run-of-river diversion hydropower Information and Communication Technology. plants. For the first type of data, weather characteristics from [9] Stoilkov V. 2015. Predavanja po predmetot Osnovi na ob- previous days have no significant impact, but all original and novlivi izvori na energija. (2nd. edition). FEEIT, Skopje. derived attributes related to inflows and precipitation for the same day are crucial, because the rains are accumulated immedi- ately. Otherwise, factors such as the period for melting the snow, 18 Peak Detection for Classification of Number of Axles Žiga Kolar Blaž Erzar Nika Čelan Jožef Stefan Institute Jožef Stefan Institute Jožef Stefan Institute Jamova cesta 39 Jamova cesta 39 Jamova cesta 39 Ljubljana, Slovenia Ljubljana, Slovenia Ljubljana, Slovenia ziga.kolar@ijs.si blaz.erzar@gmail.com nika.celan8@gmail.com Aleksander Hrastič Gašper Leskovec Martin Konečnik Jožef Stefan Institute Jožef Stefan Institute Cestel Cestni Inženiring d.o.o Jamova cesta 39 Jamova cesta 39 Špruha 32 Ljubljana, Slovenia Ljubljana, Slovenia Trzin, Slovenia ah3001@student.uni-lj.si leskovecg@gmail.com martin.konecnik@cestel.si Domen Prestor David Susič Matjaž Skobir Cestel Cestni Inženiring d.o.o Jožef Stefan Institute Cestel Cestni Inženiring d.o.o Špruha 32 Jamova cesta 39 Špruha 32 Trzin, Slovenia Ljubljana, Slovenia Trzin, Slovenia domen.prestor@cestel.si david.susic@ijs.si matjaz.skobir@cestel.si Matjaž Gams Jožef Stefan Institute Jamova cesta 39 Ljubljana, Slovenia matjaz.gams@ijs.si ABSTRACT spectrometry [4], signal processing [7, 8], image processing [10], A common requirement in scientific data processing is to detect astrophysics [13] – require peak detection. peaks in a signal and to measure their positions, heights, widths, Peak detection algorithms are also used for classification of and/or areas. In this paper, the problem of peak detection from a the number of peaks or axles. For example, when a vehicle places raw signal is defined and presented. Providing the example, we one of its tyres on a weight sensor, a peak is detected in the signal. showed how the problem of peak detection can be translated into Each peak represents one vehicle axle. Therefore, the algorithm detecting the number of axles in vehicles. Various algorithms detecting how many peaks occur in a given signal in this way for predicting the number of peaks (axles) were presented. So- detects the number of axes. For the purpose of this study, 16 lution with derivatives, the solution with encoder and decoder different signals for two driving lanes were provided by company and the solution with convolution neural network produced the Cestel. Sensors were placed under a bridge near Obrežje. Sensors 1 best result, 99% accuracy with a certain percentage of skipped and 2 were placed at the beginning and end of measuring area for instances. lane 1. Sensors 15 and 16 were placed on lane 2 in a similar fashion. The rest were placed perpendicular on the road between the pairs. KEYWORDS The main goal of this paper is to predict the number of axles as accurately as possible with the use of mathematical models and peak detection, neural networks, machine learning, signal, sen- machine learning algorithms given signals. We introduce the sors, number of axles solution using deep neural networks (artificial neural network 1 INTRODUCTION and convolution neural network), regular derivatives, predefined library find_peaks and a package 𝑡𝑠 𝑓 𝑟𝑒𝑠ℎ for peak detection. In Identifying and analyzing peaks in a given time-series is impor-theory, peak detection is formally a trivial task, however, in reality tant in many applications, because peaks are useful topologi-the task can be performed only with some degree of accuracy. cal features of a time-series. In power distribution data, peaks The rest of the paper is organized as follows. Section 2 presents indicate sudden high demands. In server CPU utilization data, related work. Main methodology and algorithms are described in peaks indicate sharp increase in workload. In network data, peaks section 3. Finally, section 4 concludes the paper with summary correspond to bursts in traffic. In financial data, peaks indicate and ideas for future work. abrupt rise in price or volume. Troughs can be considered as inverted peaks and are equally important in many applications. Many other application areas – e.g., bioinformatics [2], mass 2 RELATED WORK Permission to make digital or hard copies of part or all of this work for personal Peak detection is a common task in time-series analysis and sig-or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and nal processing. Standard approaches to peak detection include (i) the full citation on the first page. Copyrights for third-party components of this using smoothing and then fitting a known function (e.g., a poly-work must be honored. For all other uses, contact the owner/author(s). nomial) to the time-series; and (ii) matching a known peak shape Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia to the time-series. Another common approach to peak-trough © 2022 Copyright held by the owner/author(s). detection is to detect zero-crossings (i.e., local maxima) in the 19 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Žiga Kolar, et al. differences (slope sign change) between a point and its neigh- the first signal had less noise than other signals. Each signal has bours. However, this detects all peaks-troughs, whether strong different length. Therefore, the signals that had length less that or not. To reduce the effects of noise, it is required that the local maximum time had to be extended to maximum signal time in signal-to-noise ratio (SNR) should be over a certain threshold [8, order to create features with the same length. To achieve this, 11]. The key question now is how to set the correct threshold so additional zeros were filled to the positions up to the maximum as to minimize false positives. Ma, van Genderen and Beukelman signal time. Maximum time is 6113, which is equal to the number et al. [10] compute the threshold automatically by adapting it to of features. Figure 1 shows a signal from Sensor 1 which has 𝑚𝑎𝑥 +𝑎𝑏𝑠 the noise levels in the time-series as 𝑎 𝑣𝑔 maximum sensor time. ℎ = + , 2 𝐾 ∗ 𝑎𝑏𝑠𝑑𝑒 𝑣 where Each method uses classification accuracy for evaluation of the 𝑚𝑎𝑥 is the maximum value in the time-series, 𝑎𝑏𝑠 is 𝑎 𝑣𝑔 the average of the absolute values in the time-series, model. Classification accuracy is a metric that summarizes the 𝑎𝑏𝑠 is 𝑑 𝑒 𝑣 the mean absolute deviation and K is a user-specified constant. performance of a classification model as the number of correct Azzini et al. [2] analyze peaks in gene expression microarray predictions divided by the total number of predictions. In this time-series data (for malaria parasite Plasmodium falciparum) us- study, correct predictions are correctly predicted peaks(number ing multiple methods; each method assigns a score to every point of axles). in the time-series. In one method, the score is the rate of change (i.e., the derivative) computed at each point. In another method, the score is computed as the fraction of the area under the candi- date peak. Top 10 candidate peaks are selected for each method; peaks detected by multiple methods are chosen as true peaks. The detected peaks are used to identify genes; SVM are then used to assign a functional group to each identified gene. Key problems in peak detection are noise in the data and the fact that peaks occur with different amplitudes (strong and weak peaks) and at different scales, which result in a large number of false positives among detected peaks. Based on the observation that peaks in mass spectroscopy data have characteristic shapes, Du, Kibbe and Lin et al. [5] propose a continuous wavelet transform (CWT) based pattern-matching algorithm for peak detection. 2D array of CWT coefficients is computed (using a Mexican Hat mother wavelet which has the basic shape like a peak) for the time-series at multiple scales and ridges in this wavelet space representation are systematically examined to identify peaks. Coombes et al. [4] Figure 1: Signal with maximum sensor time. and Lange et al. [9] present other approaches for peak detection using wavelets and their applications to analyze spectroscopy data. Zhu and Shasha et al. [13] propose a wavelet-based burst 3.1 Peak Detection with Derivatives (not peak) detection algorithm. The wavelet coefficients (as well Since peaks are local maxima, we can use mathematical methods as window statistics such as averages) for Haar wavelets are for finding them. But because we are not working with functions organized in a special data structure called the shifted wavelet but rather noisy discrete signals, we need to modify them slightly. tree (SWT). Each level in the tree corresponds to a resolution Let us define th 𝑠 as 𝑖 signal value. First we remove noise at the or time scale and each node corresponds to a window. By auto- 𝑖 beginning and the end of the signal where there are no axes. To matically scanning windows of different sizes and different time do so we find the horizontal line with maximum support, where resolutions, the bursts can be elastically detected (appropriate support for line at height 𝑦 is defined as the number of signal window size is automatically decided). Zhu and Shasha et al. [13] values 𝑠 for which |𝑦 − 𝑠 | < 𝑚𝑎𝑟𝑔𝑖𝑛. For line height 𝑦 we take apply their technique to detecting Gamma Ray bursts in real- 𝑖 𝑖 𝑛 equally spaced values from interval [𝑚𝑖𝑛, 𝑚𝑎𝑥 ] and half the time in the Milagro astronomical telescope, which vary widely distance between consecutive 𝑦 values is used as 𝑚𝑎𝑟𝑔𝑖𝑛. Here in their strength and duration (from minutes to days). Harmer 𝑚𝑖𝑛 and 𝑚𝑎𝑥 represent minimum and maximum value of the et al. [7] propose a momentum-based algorithm to detect peaks. signal. To remove noise and normalize signal, we now define: The idea is compute velocity (i.e., rate of change) and momentum (i.e., product of value and velocity) at various points. A “ball” (0, if 𝑠 < 𝑦 + 2 ∗ 𝑚𝑎𝑟𝑔𝑖𝑛. 𝑖 dropped from a previously detected peak will gain momentum ˆ𝑠 = 𝑖 𝑠 −𝑚𝑖𝑛 𝑖 , otherwise. as it climbs down and lose momentum as it climbs the next peak; 𝑚𝑎𝑥 −𝑚𝑖𝑛 the point where it comes to rest (loses all its momentum) is the On signal ˆ𝑠 we calculate first and second derivative - ¤𝑠 and ¥𝑠 - next peak. Simple analogs of the laws in Newtonian mechanics using finite difference, which is implemented using convolution are proposed (e.g., friction) to compute changes in momentum with kernels [−0.5, 0, 0.5] and [1, −2, 1]. Finally peaks can be as the ball traverses the time-series. acquired by finding indices 𝑖 for which ˆ𝑠 > 0.25, | ¤𝑠 | < 0.01 𝑖 𝑖 and |¥𝑠 | < 0 while only taking peaks which are local minima, i.e. 𝑖 𝑠 > max{𝑠 𝑖 𝑖 −1, 𝑠𝑖 +1 }. 3 METHODOLOGY This procedure achieves accuracy ≈ 90% when using sensor In this section, algorithms for peak detection are described. Each 1. When using other sensors, accuracy is lower, but it can be machine learning method uses 62076 samples (vehicles) and up improved by choosing the correct sensor for every instance. We to 16 signals for classification of number of axles. Typically, only do this by training nine models: one regression model 𝑀 for 𝑎 the first signal was chosen for training the model. This is because predicting number of axes and eight models 𝑀 , 𝑘 = 1..8, for 𝑘 20 Peak Detection for Classification of Number of Axles Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia predicting whether prediction on sensor 𝑘 is correct. We only filters and sizes reversed and max pooling layers replaced with use sensors from lane 1, since sensors on the other lane give poor up sampling. This model is then trained using Adam optimizer accuracy. and binary cross entropy loss function. First we define (𝑖 ) 𝑝 as correct number of peaks for instance After model is trained, peaks on new instance can be detected (𝑖 ) by feeding sensor 1 signal 𝑠 to it to obtain prediction vector 𝑝. 𝑖 and 𝑝 as number of peaks detected by procedure described 𝑘 in this section for instance Peaks are now located at indices for which prediction value is 𝑖 using sensor 𝑘 . Now we create matrix a strong enough local maximum and signal amplitude is high 𝑋 and vector 𝑦 on which we train gradient boosting regression. This model predicts number of axes from number enough: of peaks detected on all sensors. Matrix 𝑋 contains one row 𝑝𝑒𝑎𝑘𝑠 = {𝑖 | 𝑝 ≥ max{𝑝 > 𝑇 > 0.15}, 𝑖 𝑖 −5:𝑖 +5 } ∧ ˆ 𝑝𝑖 1 ∧ ˆ 𝑠𝑖 ( ( ( ( 𝑖 ) 𝑖 ) 𝑖 ) 𝑖 ) 𝑥 = [𝑝 , 𝑝 , . . . , 𝑝 ] for each instance 𝑖 with one column 1 2 8 while skipping instances for which max{𝑝 | 𝑖 ∈ 𝑝𝑒𝑎𝑘𝑠 } < 𝑇 𝑖 2. for every sensor, while vector 𝑦 contains ground truth values for Here ˆ 𝑝 , ˆ 𝑠 are normalized to contain values in [0, 1] and 𝑇1, 𝑇2 are number of axes: thresholds that need to be selected. For 𝑇1 = 0.01 and 𝑇2 = 0.5 h (1) (2) ( h 𝑚 ) i𝑇 (1) (2) (𝑚 ) i𝑇 accuracy 99.6% is achieved with 20% skipped instances. 𝑋 = 𝑥 , 𝑥 , . . . , 𝑥 , 𝑦 = 𝑝 , 𝑝 , . . . , 𝑝 . Here 𝑚 is number of all instances. Other eight models use the 3.3 Peak Detection with Artificial Neural same matrix 𝑋 , but different vector 𝑦. Model 𝑀 which predicts 𝑘 Network whether detection using sensor 𝑘 produces correct number of Neural networks, also known as artificial neural networks (ANNs) peaks uses: or simulated neural networks (SNNs), are a subset of machine h (1) (1) (2) (2) (𝑚 ) (𝑚 ) i𝑇 learning and are at the heart of deep learning algorithms. Their 𝑦 = 𝑝 = 𝑝 , 𝑝 = 𝑝 , . . . , 𝑝 = 𝑝 , 𝑘 𝑘 𝑘 𝑘 name and structure are inspired by the human brain, mimicking where equality comparison evaluates to 1 when true and 0 when the way that biological neurons signal to one another [12]. false. Gradient boosting classifiers are used for these models. Artificial neural networks (ANNs) are comprised of a node After all nine models are trained, peaks on a new instance layers, containing an input layer, one or more hidden layers, 𝑚 + 1 can be detected by first using the described peak detection and an output layer. Each node, or artificial neuron, connects procedure on all eight sensors to obtain input vector (𝑚+1) 𝑥 : to another and has an associated weight and threshold. If the (𝑚+1) (𝑚+1) (𝑚+1) (𝑚+1) output of any individual node is above the specified threshold 𝑥 = [𝑝 , 𝑝 , . . . , 𝑝 ]. 1 2 8 value, that node is activated, sending data to the next layer of the This vector is then first fed into 𝑀 model to predict number 𝑎 network. Otherwise, no data is passed along to the next layer of of axes and the result is rounded to closest integer value to get the network [12]. 𝑎. Furthermore models 𝑀 are used to get confidence 𝑐 for 𝑘 𝑘 In this method, artificial neural network was used to predict each sensor. Now valid sensors are the ones using which correct the number of peaks. Neural networks were a viable solution for number of peaks were detected and have confidence higher than this problem, because we had enough data at our disposal for some threshold 𝑇 : deep learning. Whole signal from sensor 1 was provided as input (𝑚+1) layer. Architecture of the neural network contains two hidden 𝑠𝑒𝑛𝑠𝑜𝑟 𝑠 = {𝑘 | 1 ≤ 𝑘 ≤ 8 ∧ 𝑝 = 𝑎 ∧ 𝑐 > 𝑇 }. 𝑘 𝑘 layers, with 16 and 12 neurons, respectively. Output data (number If 𝑠𝑒𝑛𝑠𝑜𝑟𝑠 = ∅, instance 𝑚 + 1 is skipped, otherwise 𝑎 axes are of peaks) was one-hot encoded, therefore softmax activation func-predicted and min{𝑠𝑒𝑛𝑠𝑜𝑟𝑠 }is the best sensor for detection. For tion was used in the output layer. Model returned the probability 𝑇 = 0.95 this system has accuracy 99.5% while skipping 20% of for each class. In the end, an algorithm picked the column with instances. the highest probability (𝑖-th column depicts 𝑖 number of peaks). This model achieved 91% accuracy for predicting the number of 3.2 Peak Detection with Encoder/Decoder peaks. Since we know where peaks are located in every signal, we can train a model that will for every instance predict locations of 3.4 Peak Detection with Convolution Neural peaks. Because we are working with time series data, we can Network use a one dimensional convolutional neural network with au- Convolutional neural networks are distinguished from other neu- toencoder architecture. This allows us to predict locations for ral networks by their superior performance with image, speech, variable number of peaks. Inputs and outputs have the same or audio signal inputs. They have three main types of layers, dimensions, while the model consists of two parts: encoder, to which are: create low dimensional embedding in latent space, and decoder, to reconstruct output from it. • Convolutional layers which convolve the input and pass As inputs we use signals from sensor 1. On output we want its result to the next layer. This is similar to the response to predict a vector of the same dimension, which has ones in of a neuron in the visual cortex to a specific stimulus. Each time slots containing a peak and zeros everywhere else. Because convolutional neuron processes data only for its receptive CNNs take inputs of the same length, we pad all input and output field. vectors to maximum length. To make maximum length smaller, • Pooling layers which reduce the dimensions of data by we use the noise removal method from section 3.1 to crop noise combining the outputs of neuron clusters at one layer into at the beginning and at the end from the signals. a single neuron in the next layer. Encoder is made of 3 convolutional layers. Each is followed • Fully-connected layers which connect every neuron in by batch normalization and max pooling of size 2. Convolutional one layer to every neuron in another layer. layers use ReLU activation, 8, 16 and 32 filters and sizes 5, 2 and With each layer, the CNN increases in its complexity, identify-3 respectively. Decoder has the same structure with number of ing greater portions of the required information [1]. 21 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Žiga Kolar, et al. Convolutional neural network was utilized to predict the num- (time series) [3]. After the features were extracted, they were used ber of peaks. The motivation for the usage of this type of network by gradient boost classifier for predicting the number of peaks. comes from a fact that convolutional neural networks work well This approach produced 89% accuracy predicting the number of with time series data. Whole signal from sensor 1 was used as peaks. input layer. Architecture of the network contains three 1D con- volution layers and three 1D pooling layers. At the end we used 4 CONCLUSION AND DISCUSSION the fully connected layer with 100 neurons. Output layer has We defined and presented the problem of peak detection from a a softmax activation layer. Similarly than in subsection 3.3, the raw signal. Providing the example, we showed how the problem model returned the probability for each class and in the end, algo-of peak detection can be translated into detecting the number of rithm picked the column with the highest probability. This model axles in vehicles. Various algorithms for predicting the number achieved 97% accuracy. If we decide to skip 6.5% samples that are of peaks (axles) were presented. The solution with derivative, the below the 99% probability threshold, we achieve the accuracy of solution with encoder and decoder and the solution with convo- 99.1%. lution neural network produced the best results, 99% accuracy with a certain percentage of skipped instances. In future work, 3.5 Peak Detection with Predefined Method the mentioned results can be tweaked and improved by using dif- Find_peaks ferent learning parameters, e.g. different learning rate, different Another method for peak detection is by using the predefined number of neurons, different activation function. Furthermore, function named better results can be achieved by changing the architecture of 𝑓 𝑖𝑛𝑑 _𝑝𝑒𝑎𝑘𝑠 . This function takes a 1-D array and finds all local maxima by simple comparison of neighboring val- the neural network, e.g. different or more convolution or pooling ues. In the context of this function, a peak or local maximum layers. The results of peak detection can also be extended into is defined as any sample whose two direct neighbours have a determining the axle distances. Once the axle number is accu- smaller amplitude [6]. Because each signal has different maxi-rately predicted, a new set of algorithms can be implemented to mum height, the parameter in function find_peaks named solve this new task. ℎ𝑒𝑖𝑔ℎ𝑡 differs from sample to sample. Height parameter is defined as minimal required height for peaks to be detected. Peaks below ACKNOWLEDGMENTS that threshold are not detected. Height was calculated by formula: This study received funding from company Cestel. The authors height = | |max(sensorHeight)| - |min(sensorHeight)| | / 10. The acknowledge the funding from the Slovenian Research Agency above described method achieved 89% accuracy and also returned (ARRS), Grant (PR-10495) and Basic core funding P2-0209. the position of every peak. An example can be seen on Figure 2 on which 2 peaks are detected. They are marked with 2 oranges REFERENCES crosses. [1] Saad Albawi, Tareq Abed Mohammed, and Saad Al-Zawi. 2017. Understanding of a convolutional neural network. In 2017 international conference on engineering and technology (ICET). Ieee, 1–6. [2] Ivano Azzini, Rossana Dell’Anna, Federica Ciocchetta, Francesca Demichelis, Andrea Sboner, Enrico Blanzieri, and A Malossini. 2004. Simple methods for peak detection in time series microarray data. Proc. CAMDA’04 (Critical Assessment of Microarray Data). [3] Maximilian Christ, Nils Braun, Julius Neuffer, and Andreas W Kempa-Liehr. 2018. Time series feature extraction on basis of scalable hypothesis tests (tsfresh–a python package). Neurocomputing, 307, 72–77. [4] Kevin R Coombes, Spiridon Tsavachidis, Jeffrey S Morris, Keith A Baggerly, Mien-Chie Hung, and Henry M Kuerer. 2005. Improved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform. Proteomics, 5, 16, 4107–4117. [5] Pan Du, Warren A Kibbe, and Simon M Lin. 2006. Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching. bioinformatics, 22, 17, 2059–2065. [6] 2022. Find peaks. https://docs.scipy.org/doc/scipy/reference/generated/scip y.signal.find_peaks.html. (2022). [7] Karl Harmer, Gareth Howells, Weiguo Sheng, Michael Fairhurst, and Farzin Deravi. 2008. A peak-trough detection algorithm based on momentum. In 2008 Congress on Image and Signal Processing. Vol. 4. IEEE, 454–458. Figure 2: Signal with two peaks. Location of the peak is [8] Valentin T Jordanov and Dave L Hall. 2002. Digital peak detector with noise marked with an orange cross. threshold. In 2002 IEEE Nuclear Science Symposium Conference Record. Vol. 1. IEEE, 140–142. [9] Eva Lange, Clemens Gröpl, Knut Reinert, Oliver Kohlbacher, and Andreas Hildebrandt. 2006. High-accuracy peak picking of proteomics data using wavelet techniques. In Biocomputing 2006. World Scientific, 243–254. 3.6 Peak Detection with Library Tsfresh [10] Meng Ma, Arjan Van Genderen, and Peter Beukelman. 2005. Developing and implementing peak detection for real-time image registration. In Proceed-Another alternative approach for peak prediction is by using ings of the 16th Annual Workshop on Circuits, Systems & Signal Processing (ProRISC2005). Citeseer, 641–652. the python package named 𝑡𝑠 𝑓 𝑟𝑒𝑠ℎ. It automatically calculates [11] GM Nijm, AV Sahakian, S Swiryn, and AC Larson. 2007. Comparison of a large number of time series characteristics, the so called fea-signal peak detection algorithms for self-gated cardiac cine mri. In 2007 tures. Furthermore, the package contains methods to evaluate Computers in Cardiology. IEEE, 407–410. [12] Sun-Chong Wang. 2003. Artificial neural network. In Interdisciplinary com-the explaining power and importance of such characteristics for puting in java programming. Springer, 81–100. regression or classification tasks. 𝑡𝑠 𝑓 𝑟𝑒𝑠ℎ is used for systematic [13] Yunyue Zhu and Dennis Shasha. 2003. Efficient elastic burst detection in data feature engineering from time-series and other sequential data. streams. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 336–345. These data have in common that they are ordered by an indepen- dent variable. The most common independent variable is time 22 Unified Question Answering in Slovene Katja Logar and Marko Robnik-Šikonja University of Ljubljana, Faculty of Computer and Information Science Ljubljana, Slovenia kl2164@student.uni- lj.si, marko.robnik@fri.uni- lj.si ABSTRACT on the multilingual mT5 model [13], using English and Slovene datasets. Finally, we perform a qualitative analysis of the obtained Question answering is one of the most challenging tasks in lan-models. The results show that our system is currently the best guage understanding. Most approaches are developed for English, performing QA system for Slovene. We make its source code while less-resourced languages are much less researched. We 1 freely accessible . adapt a successful English question-answering approach, called The paper is split into four further sections. In Section 2, we UnifiedQA, to the less-resourced Slovene language. Our adapta-outline the related work on QA in Slovene. Section 3 presents our tion uses the encoder-decoder transformer SloT5 and mT5 models adaptation of UnifiedQA methodology and the applied Slovene to handle four question-answering formats: yes/no, multiple- QA datasets, and Section 4 discusses different evaluation settings choice, abstractive, and extractive. We use existing Slovene adap-and their results. In Section 5, we present the findings and ideas tations of four datasets, and machine translate the MCTest dataset. for further improvements. We show that a general model can answer questions in different formats at least as well as specialized models. The results are fur-2 RELATED WORK ther improved using cross-lingual transfer from English. While we produce state-of-the-art results for Slovene, the performance The QA in Slovene is relatively unexplored. In the pre-neural still lags behind English. setting, Čeh et al. [1] developed a closed-domain QA system for answering common questions that arise during students’ studies KEYWORDS at the University of Maribor, Faculty of Electrical Engineering, question answering, Slovene language, deep neural networks, Computer Science and Informatics. The translation of the Su- encoder-decoder models, natural language processing perGLUE benchmark suite to Slovene in 2021 [14] provided four partially human, partially machine translated QA datasets (BoolQ, 1 INTRODUCTION COPA, MultiRC, and ReCoRD) and evaluation of Slovene BERT models. Ulčar et al. [11] adapted the SloT5 model for the yes-Most studies for the question answering (QA) task deal with no and multiple-choice questions. Finally, Zupanič et al. [15] the English language. This leaves many language specifics, not translated the SQuAD 2.0 dataset from English and adapted dif- present in English, potentially inadequately addressed. E.g., some ferent multilingual models. They achieved the best result with problematic language specifics in morphologically-rich Slovene the SloBERTa 2.0 model [12]. In contrast to the above works, we language are noun and adverb declension, three different genders, apply the transfer learning paradigm within the encoder-decoder three counts, the person or pronoun being hidden in a verb, etc. SloT5 and mT5 models and provide a unified approach to different An additional problem for less-resourced languages is the lack of QA formats, obtaining the best results so far. suitable datasets for QA. Khashabi et al. [5] argue that building specialized models 3 METHODOLOGY for each QA dataset or QA format is unnecessary, as they all require a similar inference capability. Therefore, it is possible to Our methodology follows Khashabi et al. [5] UnifiedQA method-develop one model capable of answering questions in different ology. The authors define four QA formats (extractive, abstractive, formats. They call their approach UnifiedQA, and we adapted multiple-choice, and yes/no) and unify the learning approach to this approach to Slovene. these formats. The extractive format requires that the answer is The number of QA datasets in Slovene is much lower than directly stated in the supplied context as a substring. The abstracused in the original UnifiedQA. We found four partially human- tive format requires paraphrasing of the given context and the translated but mostly machine-translated datasets. To improve answer may require linking information from several sentences. that, we first machine translate the additional MCTest dataset The multiple-choice datasets have possible answers listed and [9] into Slovene and fix translation errors. the aim is to select the given option correctly. Finally, the yes/no Our method is based on the pretrained Slovene encoder-decoder questions require only yes or no as an answer. transformer model SloT5 [11]. We finetune the model on the five The datasets with different QA formats are converted to text QA datasets and analyze its performance. We also test the role format, with parts of the input separated by the "\n" sepa-of uppercase and lowercase letters, the impact of unanswerable rator. Extractive, abstractive and yes/no questions are coded questions, and the contribution of each dataset to the perfor- as "question \n context" and multiple-choice questions as mance of the unified model. Next, we test the cross-lingual trans- "question \n possible choices \n context". Here, the fer and train a multilingual question answering model based possible choices are indicated in capital letters from A onwards (A) choice 1 (B) choice 2.... Permission to make digital or hard copies of part or all of this work for personal We initially considered four QA datasets. Three stem from or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the translation of the SuperGLUE benchmark to Slovene [14]: the full citation on the first page. Copyrights for third-party components of this MultiRC [4] (abstractive), COPA [10] (multiple-choice) and BoolQ work must be honored. For all other uses, contact the owner /author(s). [2] (yes/no). We also used the SQuAD 2.0 [8] (extractive) Slovene Information Society 2022, 11–14 October 2022, Ljubljana, Slovenia © 2021 Copyright held by the owner/author(s). 1 https://github.com/klogar/QAslovene 23 Information Society 2022, 11–14 October 2022, Ljubljana, Slovenia Katja Logar and Marko Robnik-Šikonja translation [15]. SQuAD 2.0 contains unanswerable questions, dataset in [5]). The results are presented in Table 2. The results and some are also present in MultiRC. As we focus on the reading for BoolQ and MCTest are slightly worse than originally reported, comprehension task, all selected datasets have a context. COPA which could be attributed to slightly different parameters for text is a commonsense reasoning dataset, which is not our primary generation. We achieved a much worse result for the SQuAD 2.0 focus, but we included it due to being human translated into dataset, with 𝐹 only 46.1% rather than 67.6%. Trying to replicate 1 2 Slovene. BoolQ, MultiRC, and SQuAD 2.0 are partially human the published scores with the original code , we obtained similar translated [14, 15]. results to ours . However, we analyzed the difference and believe To have a non-commonsense multiple-choice dataset, we ma- that at least some of them are due to unanswerable questions, as chine translated the MCTest dataset [9] and fixed some transla-the 𝐹 score is 84.5% for questions that have an answer and only 1 tion errors. To reduce the cost of translation, we partially used 7.8% for unanswerable questions. The UnifiedQA model, there- the commercial solution DeepL [3] and partially an internal neu-fore, does a poor job of detecting if a question is unanswerable ral machine translator of a bit lesser quality. Later, we translated from the context. the entire MCTest dataset with the DeepL translator and made it publicly available in our repository. However, the reported Table 2: Our and published results of the UnifiedQA results are obtained using the initial mixed translation setting. (UniQA) approach on English datasets using the T5small As the starting training model for monolingual Slovene Uni- model. fiedQA models, we used the monolingual Slovene variant of the T5 transformer encoder-decoder model [7], called SloT5 [11]. For Dataset BoolQ COPA MCTest MultiRC SQuAD 2.0 the cross-lingual transfer experiments, we applied the multilin- Metric CA CA CA ROUGE-L 𝐹1 gual variant of T5, called mT5 [13]. Due to computational time UniQA(publ.) 0.771 / 0.800 / 0.676 and GP U memory limitations, we used the SloT5 and mT5 mod- UniQA(ours) 0.757 0.560 0.762 0.536 0.461 els of the smallest size (60M and 300M parameters, respectively). Originally, Khashabi et al. used the T5 model [7] of the largest possible size (11B parameters) and the BART model [6] as large 4.3 Slovene Monolingual Results Using SloT5 a starting point for the UnifiedQA model. However, they also report results for the T5 model, which we report for com- In the Slovene monolingual setting, we compare different vari- small parison, so all models are of comparable sizes. Table 1 lists the ants of Slovene UnifiedQA models and report the results in Table parameters used to finetune our models. 3. We adapted the models for each QA format separately and obtained so-called specialized models. These provided a baseline Table 1: Parameters for finetuning UnifiedQA models. for what could be achieved with each individual QA format. We then trained the SloUnifiedQA model using all available Slovene datasets. We also investigated the impact of unanswerable ques- Parameter Value tions (SloUnifiedQA-NA, SloUnifiedQA-NA2, explained below) Maximum input size [tokens] 512 and the use of only lower case letters (SloUnifiedQA-LC). Maximum output size [tokens] 100 Number of epochs 25 Table 3: Comparing variants of Slovene UnifiedQA ap- Batch size 8 proach (based on the SloT5 model). Besides the unified Number of beams 4 model, we report the results of specialized models for each Learning rate 5e-5 QA format (specialized), the best results published so far on these datasets (published), and the default classifier. The ef- fect of unanswerable questions and lowercasing is analyzed 4 EXPERIMENTS AND RESULTS in the bottom part of the table. Note that SloUniQA-NA is tested on modified datasets without unanswerable ques- In this section, we report our work on empirical evaluation. We tions, so the results for this model are incomparable. present the evaluation metrics, original English results, experi- ments and results in the monolingual Slovene setting, and in the Dataset BoolQ COPA MCTest MultiRC SQuAD 2.0 cross-lingual transfer setting. Metric CA CA CA ROUGE-L 𝐹 Avg. 1 4.1 Evaluation Metrics SloUniQA 0.683 0.532 0.463 0.310 0.555 0.509 specialized 0.688 0.486 0.439 0.255 0.554 0.484 For each dataset, we use a different evaluation metric. For BoolQ, published 0.666 0.500 / / 0.739 / default 0.623 0.500 0.269 / / / we report the classification accuracy; for SQuAD 2.0, the 𝐹1 SloUniQA-NA 0.675 0.524 0.454 0.319 0.637 0.522 score; for MultiRC, we use ROUGE-L; and for the multiple-choice SloUniQA-NA2 0.695 0.554 0.474 0.321 0.556 0.520 datasets (MCTest and COPA), we calculate the best match be- SloUniQA-LC 0.686 0.530 0.449 0.259 0.533 0.491 tween the generated text and the offered options and compute the classification accuracy. In all cases, the answers are first Comparing the SloUnifiedQA model with specialized models, normalized (removing punctuation and unnecessary spaces and the models achieve better results for the multiple-choice datasets converting the text to lowercase). (COPA and MCTest) and the abstractive dataset (MultiRC). The 4.2 English UnifiedQA Results Using T5 improvement for the extractive dataset is minimal, and we ob- small serve a slight decrease in accuracy for the yes/no dataset (BoolQ). First, we replicated the results of the original English UnifiedQA Better results are also obtained compared to all main classifiers. [5] and also obtained the results for the datasets not originally used, i.e. COPA and MultiRC (the latter was only used as a yes/no 2 https://github.com/allenai/unifiedqa 24 Unified Question Answering in Slovene Information Society 2022, 11–14 October 2022, Ljubljana, Slovenia Comparing SloUnifiedQA on Slovene with the English Uni- on which models were not trained. Overall, the COPA dataset fiedQA model on English datasets (in Table 2), the English model contributes the least to the performance of SloUnifiedQA, the gives better results for all selected formats except SQuAD 2.0. corresponding model achieving almost the same performance. Interestingly, the English and Slovene models have different prob- lems with SQuAD 2.0. The Slovenian one predicts unanswerable Table 4: Contribution of datasets in the unified model by questions too often (it has 𝐹 score of 60,3% for unanswerable omitting one dataset at a time. The red color indicates the 1 questions and only 50,4% for answerable ones, while incorrectly two largest performance drops for each dataset. identifying 13% of answerable questions as unanswerable), the English one too rarely. At the same time, the English model never Dataset BoolQ COPA MCTest MultiRC SQuAD2.0 wrongly predicts that a question is unanswerable. This is likely Metric CA CA CA ROUGE-L 𝐹 Avg. 1 due to unanswerable questions making up a larger proportion SloUniQA 0.683 0.532 0.463 0.310 0.555 0.509 of the dataset in the Slovene training dataset than in the English no BoolQ 0.001 0.522 0.486 0.319 0.561 0.378 no SQuAD 2.0 0.664 0.516 0.451 0.258 0.120 0.402 one. For other datasets, the biggest difference in metrics can be no MCTest 0.676 0.510 0.351 0.317 0.560 0.483 observed in the MCTest multiple-choice dataset, where the dif- no MultiRC 0.690 0.536 0.457 0.209 0.552 0.489 ference is 33%. We attribute the worse result of SloUnifiedQA to no COPA 0.683 0.510 0.456 0.319 0.554 0.504 machine translations and a much smaller training dataset, espe- cially for the multiple-choice questions; as in the original work, the authors use three additional datasets in addition to MCTest. 4.4 Cross-Lingual Transfer Using mT5 Compared to other published works on the same datasets, There are only a few QA datasets in Slovene, so we checked if we achieve better results with the SloUnifiedQA on the BoolQ using transfer from additional English datasets can improve the and COPA datasets compared to Ulčar and Robnik-Šikonja [11], Slovene results. We used three different collections of datasets. while on the SQuAD 2.0 dataset, Zupanič et al. [15] achieve a • SLO: Slovene datasets BoolQ, COPA, MCTest, MultiRC significantly better result (almost 20%). Here, Ulčar and Robnik- and SQuAD 2.0 (described in Section 3). Šikonja [11] also use the SloT5 model with the textual output, • ANG5: English datasets BoolQ, COPA, MCTest, MultiRC, while Zupanič et al. [15] use the SloBERTa model and only predict and SQuAD 2.0 (the English dataset, whose translations the span of the answer, which is an easier task. form the SLO collection). 4.3.1 The Effect of Unanswerable Questions. • ANG9: English datasets BoolQ, COPA, MCTest, MultiRC, Unanswerable questions account for about one-third of all train- and SQuAD 2.0 and all datasets, used by Khashabi et al. ing examples, and models could overfit such questions. To ad- [5], except SQuAD 1.1, i.e. NarrativeQA, RACE, ARC, and dress this issue, we train two models, SloUnifiedQA-NA and OBQA. SloUnifiedQA-NA2. For the SloUnifiedQA-NA model, we removed We trained five models using the multilingual mT5 model all unanswerable questions. As evident from Table 3, for yes/no on these dataset collections and tested them on the SLO test questions and multiple-choice questions the accuracy deterio- sets. The first model, mSloUnifiedQA, was trained only on SLO rates, while for abstractive and extractive questions the metrics datasets and gives a baseline performance of mT5, also enabling improve. The biggest improvement occurred for the SQuAD 2.0 comparison to monolingual SloT5. The mSloUnifiedQA models 1 dataset, where the 𝐹 metric for answerable questions improved 1 were trained on both English and Slovene datasets simultane- to 63.7%. ously (only one phase), with the English dataset collection being The SloUnifiedQA-NA was the basis for the SloUnifiedQA- either ANG5 or ANG9. Only the SLO dataset group was used NA2 model, which we trained on complete datasets, including for validation. The mSloUnifiedQA models were trained in two 2 unanswerable questions. The metrics slightly improved for BoolQ, phases, first on the English datasets (ANG5 or ANG9), using the COPA, and MCTest but may be due to the longer training time. No ROUGE-L metric to select the best model, and the obtained model improvement is observed for SQuAD 2.0; the 𝐹 for answerable 1 was then finetuned on the SLO dataset collection. questions even drops to 51.5%. The results of the five multilingual models are presented in Table 5. Comparison between the monolingual SloUnifiedQA 4.3.2 The Effect of Using Lower Case Letters. model (in Table 3) and the multilingual mSloUnifiedQA shows To analyze the effect of using only lower case letters, we trained that they perform on average equally well, with SloUnifiedQA the SloUnifiedQA-LC model. The results are comparable for performing better on the BoolQ, COPA and MultiRC datasets, and BoolQ and COPA, but for MCTest, MultiRC, and SQuAD 2.0, mSloUnifiedQA performing better on the MCTest and SQuAD the results are worse. The uppercase letters, therefore, contain 2.0 datasets. relevant information in Slovene. Adding additional knowledge in English improved the aver- 4.3.3 Contribution of Datasets in the Unified Model. age metrics by 3-4%, but the training time increased by about To assess the impact of each dataset in the SloUnifiedQA model, we dropped each training dataset in turn. The results are shown Table 5: Results of cross-lingual transfer using additional in Table 4. The largest individual performance drop is observed English datasets and multilingual models based on mT5. for the model without BoolQ, as the yes/no questions become unanswerable (the CA for the BoolQ dataset is almost 0%). This Dataset BoolQ COPA MCTest MultiRC SQuAD 2.0 also strongly affects the average impact but causes even slight Meric CA CA CA ROUGE-L 𝐹 Avg. 1 improvements on MCTest, MultiRC, and SQuAD 2.0. The sec- mSloUniQA 0.646 0.488 0.515 0.298 0.571 0.504 mSloUniQA (ANG5) 0.672 0.486 0.582 0.308 0.587 0.527 1 ond largest average performance drop is achieved by the model mSloUniQA (ANG9) 0.676 0.508 0.579 0.340 0.598 0.540 1 without SQuAD 2.0, where a drop is observed on all datasets. mSloUniQA (ANG5) 0.682 0.504 0.564 0.313 0.593 0.531 2 mSloUniQA (ANG9) 0.683 0.486 0.602 0.323 0.604 0.540 2 For other models, the drops are observed mainly on datasets 25 Information Society 2022, 11–14 October 2022, Ljubljana, Slovenia Katja Logar and Marko Robnik-Šikonja four times for the models with the most datasets (ANG9). A the Association for Computational Linguistics: Human Lan- slight improvement can be observed for models using nine Eng- guage Technologies, Volume 1 (Long and Short Papers). 2019, lish datasets (ANG9) relative to those with only five English pp. 2924–2936. datasets (ANG5). The additional datasets contribute the most [3] DeepL Translator. [18 July 2022]. url: https://www.deepl. to the MCTest multiple-choice results, but the performance on com/translator. MultiRC and SQuAD 2.0 also improved. On the other hand, de- [4] Daniel Khashabi et al. “Looking beyond the surface: A spite the additional datasets, the results for BoolQ and COPA are challenge set for reading comprehension over multiple worse than for the monolingual model. Using one or two-phase sentences”. In: Proceedings of the 2018 Conference of the training does not make a difference on average, but there are North American Chapter of the Association for Computa- differences in individual datasets. tional Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018, pp. 252–262. 4.5 Qualitative Analysis [5] Daniel Khashabi et al. “UnifiedQA: Crossing Format Bound- Qualitative analysis of our models showed that the generated an- aries With a Single QA System”. In: Proceedings of the 2020 swers are mostly substrings or given choices in multiple-choice Conference on Empirical Methods in Natural Language Pro- questions. Models cannot paraphrase, rephrase or provide an- cessing: Findings. 2020, pp. 1896–1907. swers in the correct Slovene case. They also have problems with [6] Mike Lewis et al. “BART: Denoising Sequence-to-Sequence multi-part questions requiring multiple answers that are not Pre-training for Natural Language Generation, Transla- listed in the same place in the context. Machine translations, tion, and Comprehension”. In: Proceedings of the 58th An- which are not always grammatically correct or do not make it nual Meeting of the Association for Computational Linguis- clear what the question is asking for, also make answering the tics. 2020, pp. 7871–7880. questions difficult. The models performed best on factoid ques- [7] Colin Raffel et al. “Exploring the Limits of Transfer Learn- tions that require a short answer. ing with a Unified Text-to-Text Transformer”. In: Journal of Machine Learning Research 21 (2020), pp. 1–67. 5 CONCLUSION AND FUTURE WORK [8] Pranav Rajpurkar, Robin Jia, and Percy Liang. “Know What You Don’t Know: Unanswerable Questions for SQuAD”. The main contributions of this work are the generative unified In: Proceedings of the 56th Annual Meeting of the Associa- QA models based on SloT5 and mT5 encoder-decoder transformer tion for Computational Linguistics (Volume 2: Short Papers). models, which set new state-of-the-art results for QA in Slovene. 2018, pp. 784–789. An additional contribution is the machine-translated and cor- [9] Matthew Richardson, Christopher J.C. Burges, and Erin rected MCTest dataset. Renshaw. “MCTest: A Challenge Dataset for the Open- We identify three possible directions for further work. First, Domain Machine Comprehension of Text”. In: Proceedings better translations or dedicated Slovenian datasets would im- of the 2013 Conference on Empirical Methods in Natural prove upon currently mainly machine-translated datasets. Sec- Language Processing. 2013, pp. 193–203. ond, larger T5 models and longer training times have shown [10] Melissa Roemmele, Cosmin Adrian Bejan, and Andrew S better performance in English. In our work, we used only the Gordon. “Choice of Plausible Alternatives: An Evaluation smallest available T5 models due to the limited memory of the of Commonsense Causal Reasoning.” In: AAAI spring sym- GP U; we also limited training sessions to a maximum of 25 epochs. posium: logical formalizations of commonsense reasoning. Third, by using new datasets, especially additional multiple- 2011, pp. 90–95. choice datasets, as evidenced by the improvement brought by [11] Matej Ulčar and Marko Robnik-Šikonja. “Sequence to se- the introduction of English multiple-choice datasets. Further, ad- quence pretraining for a less-resourced Slovenian lan- ditional abstractive datasets could teach the models to rephrase guage”. In: arXiv preprint arXiv:2207.13988 (2022). better or that answers shall not be just substrings of the provided [12] Matej Ulčar and Marko Robnik-Šikonja. “SloBERTa: Slovene context. monolingual large pretrained masked language model”. In: ACKNOWLEDGMENTS Proceedings of the 24th International Multiconference Infor- mation Society - IS 2021, Data Mining and Data Warehouses Marko Robnik-Šikonja received financial support from the Slove- - SiKDD. 2021. nian Research Agency through core research programme P6-0411 [13] Linting Xue et al. “mT5: A Massively Multilingual Pre- and projects J6-2581 and J7-3159, as well as the Ministry of Cul- trained Text-to-Text Transformer”. In: Proceedings of the ture of Republic of Slovenia through the project Development of 2021 Conference of the North American Chapter of the As- Slovene in Digital Environment (RSDO). sociation for Computational Linguistics: Human Language REFERENCES Technologies. 2021, pp. 483–498. [14] Aleš Žagar and Marko Robnik-Šikonja. “Slovene Super- [1] Ines Čeh and Milan Ojsteršek. “Slovene language ques- GLUE Benchmark: Translation and Evaluation”. In: Pro- tion answering system”. In: Recent advances in computers: ceedings of the Language Resources and Evaluation Confer- Proceedings of the 13th WSEAS international conference on ence. 2022, pp. 2058–2065. computers (part of the 13th WSEAS CSCC multiconference). [15] Matjaž Zupanič et al. Cross-lingual Question Answering 2009, pp. 502–508. with Transformers. Assignmnet in the NLP course, Uni- [2] Christopher Clark et al. “BoolQ: Exploring the Surprising versity of Ljubljana, Faculty of Computer and Informa- Difficulty of Natural Yes/No Questions”. In: Proceedings tion Science. [20 June 2022]. url: https : / / github . com / of the 2019 Conference of the North American Chapter of mtzcorporations/NLP_TeamJodka. 26 Social Media Analysis for Assessing Resilience Aljaž Osojnik Bernard Ženko Martin Žnidaršič aljaz.osojnik@ijs.si bernard.zenko@ijs.si martin.znidarsic@ijs.si Jožef Stefan Institute Jamova cesta 39 1000 Ljubljana, Slovenia ABSTRACT The key element of the RESILOC framework is a methodology for assessing resilience structured along six dimensions: Gover- In this paper, we describe tools developed to investigate the nance, Social, Economic, Infrastructure, Disaster Risk Reduction potential use of social media analysis for resilience assessment. and Environmental. Each dimension is described in terms of its We focus on tweets as a data source and apply sentiment analysis, attributes or indicators, the values of which are assessed with topic detection and filtering approaches. We present computed proxies, which are empirically measurable quantities. Indicators aggregates with potential information on resilience, and a web and proxies need to have associated scales and aggregation func- application that was made for the use of domain experts. Finally, tions that allow their calculation within the RESILOC tool [5]. For we discuss preliminary user feedback and lessons learned about example, the social dimension of the resilience of a community the applicability of our approach. could be described with indicators such as Community engage- KEYWORDS ment, Social connectedness, Trust in authority and Risk awareness. The Community engagement indicator could then be assessed social media, sentiment analysis, resilience, disaster management with proxies such as % of population who vote in local elections, Number of NGOs for pre- and post-disaster response per capita 1 INTRODUCTION and % of population undertaking voluntary work. Notably, the Resilience is generally understood as the ability to adapt and resilience assessment of different communities may include dif- recover after a disruptive event. In this work, we focus on re- ferent indicators and proxies. The RESILOC platform seeks to silience of communities in the context of disaster management, provide an initial set of indicators and proxies, allow for the addi-and adopt the following UN definition: Resilience is the ability of tion of new ones, and enable their aggregation and visualization. a system, community or society exposed to hazards to resist, absorb, As some of the indicators mentioned above can also be as-accommodate, adapt to, transform and recover from the effects of sessed through proxies based on social media analysis, we devel- a hazard in a timely and efficient manner, including through the oped a tool for investigating this approach. For example, proxies preservation and restoration of its essential basic structures and (or their components) assessing the indicator Trust in authority functions through risk management [7]. Resilience research aims could be assessed with techniques such as sentiment analysis. In to develop strategies not focused on isolated risks, such as earth-particular, we could hypothesize that a more positive sentiment quakes or fires, but on approaches that subsume and address all in social media posts related to public authorities, e.g., disaster relevant risks, both natural and man-made. The strategies need to response authorities, is related to higher trust in them (and con- identify and account for different human, social, environmental, sequently better adoption of any disaster relief measures they economic and technological factors that influence behavior of a introduce). This is the primary motivation that led the investiga- community when facing a disaster. The goal is that, by adopting tion presented in this paper. As our social media data source, we 2 these strategies, communities can perform their intended func- use tweets, posts on the Twitter microblogging platform. These tions in normal and adverse times. A key element of a resilient are public, abundant and have well supported APIs for collection community is active involvement of local citizens and their active and filtering. role in a decision making process. The rest of the paper is organized as follows. Section 2 briefly The EU-funded project RESILOC (Resilient Europe and Soci-presents the the related work, Section 3 presents the social media eties by Innovating Local Communities, https://www.resilocpro data used in our analysis and Section 4 presents the results of said ject.eu/) aims to develop a holistic framework of studies, methods analyses. Section 5 presents our web tool for resilience assess-and software tools that can be used to assess the resilience of a ment, which is part of the RESILOC framework. The final section 1 community in practice by Local Resilience Teams (LRT). The concludes our presentation, summarizes lessons learned during final goal is to use this framework to identify new strategies for our analysis and provides some avenues for further research. improving the processes of preparedness of local communities against any kind of hazards. 2 RELATED WORK Our work relates to two main fields of research. The first one 1 An LRT is a team in charge of resilience assessment and risk management of a is research on community resilience in the context of disaster given community, typically organized within a civil protection organization. management. Parker et al. Parker [14] addresses the problems of measuring and assessing resilience and warns that past attempts Permission to make digital or hard copies of part or all of this work for personal to define comprehensive resilience assessment frameworks fre-or classroom use is granted without fee provided that copies are not made or quently led to simplifications and focus on a single risk. Particular distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this assessments of resilience can be found in the literature review work must be honored. For all other uses, contact the owner /author(s). conducted in the scope of the RESILOC project [11]. Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia © 2022 Copyright held by the owner/author(s). 2 https://twitter.com/ 27 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Osojnik, Ženko, Žnidaršič, The second related field is social media sentiment analysis approach described below, in Section 4.1. Frequent tokens are and associated techniques. Sentiment analysis [9] is a machine commonly appearing words, numbers or emojis, that we identify learning field, that has benefited from the current rapid devel- on a monthly basis according to the procedure in Section 4.2. The opment of natural language processing techniques [6] based on volume and sentiment aggregates are also calculated for specific large corpora and deep learning developed in the recent years. subsets of tweets called topics, which can be either automatically Sentiment analysis of social media posts has previously been inferred by the mechanism in Section 4.3, or provided by the applied to resilience adjacent domains, such as disaster response users. and management [12, 1]. Research in the cross-section of both fields, i.e., resilience 4.1 Sentiment Analysis and social media, has mostly focused on investigations on how The goal of the sentiment analysis is to assess the sentiment social media affects community and self resilience [8], including of tweets in a given community, its trends, and variations in recent examples during the COVID-19 pandemic [16]. Using sentiment in general and in specific resilience-related topics. social media analysis to assess resilience is, to the best of our To assess the sentiment of tweets we use two machine-learned knowledge, novel, and we were not able to find any similar tools classifiers. The first is a three-class classifier, denoted as LO- to the one presented in this paper. GREG, that classifies tweets as positive, neutral, or negative, and employs logistic regression. It is trained on high-dimensional 3 DATA vector representations of Italian tweets that include weighted The data used in our analyses and visualizations are tweets that words, pairs of consecutive words, 4-character sequences and specifically mention target communities (using a predefined key- emoji characteristics as representation elements, as presented word), which were collected through the Twitter API for Aca- in [10]. The second classifier is the two class (positive, negative) 3 5 demic Research . This allowed us to collect all tweets of interest, FEEL-IT sentiment classifier [2] for Italian that uses word or including those from the past. subword series of character representation in a high dimensional Notably, a tweet is not only the posted text, but rather meta- vector space. It is a fine-tuned BERT-based model [6] for Italian. data-rich data object containing a pleathora of information, such 4.2 Frequent Tokens as its language, geolocation, author code, etc., and relations to other tweets, e.g., if it is a response to another tweet or a retweet. In addition to the information on volume and sentiment, we also For our purposes, however, we only gather the unique tweet provide the most frequent tokens (words, numbers or emojis) that code/id (field id), creation time (field created_at), language (field appear during any given month. These are provided separately lang) and text (field text). for subsets of tweets, which are classified as positive, negative The selected tweets are collected in dataset that we use for our and neutral (when available). analysis. The dataset is recreated during each repetition of the Frequent tokens relate to the concepts mentioned in the tweets analysis, due to potential tweet removal according to Twitter’s and could, as such, be used do discover aspects of resilience privacy mechanisms. Hence, the results shown in the online app relevant to the community, thus potentially serving as proxy are not static but can change with renewed analysis. candidates. Frequent tokens are computed using the following In RESILOC, four communities are studied as use-cases: Go- procedure: (1) the text of all tweets is whitespace split into tokens rizia (Italy), Catania (Italy), West Achaia (Greece) and Tetovo and cast to lower case, (2) unwanted tokens are removed, (3) (Bulgaria). These four communities vary widely in size, which is tokens are sorted by frequency of occurrence, and (4) the most reflected in the amount of tweets in which they are mentioned. frequent tokens are selected for presentation. As the latter two communities are mentioned only in a couple of Unwanted tokens, mentioned in the second step, are the Italian tweets per month, the social media analysis was executed only stop-words. These are filtered using the nltk library [3], as well for the first two. We considered extending the pool of tweets by as using a custom unwanted tokens list, such as punctuation including tweets that are geo-tagged to the selected communities, characters and various versions of the names of the communities. 4 but only a small fraction of tweets contain such information. In the final step, we check that the frequent tokens appear in The data gathering process consists of collection and filtering. enough different tweets. Namely, repeated tokens in a single In particular, we collect all tweets that contain the name of the tweet all count towards token frequency, but count only once community (Gorizia or Catania) in the text of the post during a in terms of tweet appearance. We prevent presenting frequent given time period. These tweets are then filtered based on the tokens that do not appear in enough individual tweets, with a language (lang=it) to gather local posts and to filter out some of occurrence check using a predefined threshold. the noise, e.g., caused by posts mentioning people (particularly celebrities) with names that match the two communities. 4.3 Topic Modeling To identify potential resilience-related topics, we wanted to au- 4 METHODS tomatically model topics in the collected tweets. Our motiva- There are four main data analysis results the online app: (1) tion was that, given such topics, resilience experts could analyze volume and sentiment, (2) frequent tokens, (3) data for specific the corresponding tweets and extract information useful for re- topics and (4) sentiment aggregates and trends. silience assessment and proxy construction. Volume is the amount of tweets in a given time period, yearly or Topic detection or modeling [13] is a common task in natural monthly, while sentiment refers to the yearly or monthly positive, language processing and aims at discovering topics, e.g., politics, neutral or negative sentiment detected in the tweets with the sports, cycling, etc., that appear in a set of text documents, such as news articles or tweets. The assumption is that if a document dis-3 https://developer.twitter.com/en/products/twitter-api/academic-research cusses a specific topic, some words will appear more frequently. 4 In particular, out of 364531 tweets that mention Catania in 2020, only 8777 (ap-5 proximately 2.4%) were labeled with geo-location meta-data. Available at https://github.com/MilaNLProc/feel- it. 28 Social Media Analysis for Assessing Resilience Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Topic modeling is an unsupervised classification method, similar who can use them as inputs for proxy construction. The aggre- to soft or fuzzy clustering, since a document can belong to more gates are calculated monthly and yearly. The list of aggregates than one topic or cluster. One popular algorithm for topic model- and their descriptions is as follows. ing is Latent Dirichlet Allocation (LDA) [15, 4], which takes the Positive ratio. Ratio of positive tweets vs the total number of number of topics as an input parameter. total tweets in the current time period. After standard preprocessing of tweets required for topic mod- Neutral ratio. Ratio of neutral tweets vs the total number of eling (upper to lower case, removal of URLs, tokenization into total tweets in the current time period. words, removal of stopwords and other irrelevant words), we ap- Negative ratio. Ratio of neutral tweets vs the total number of 6 plied the LDA algorithm with a preset number of topics ranging total tweets in the current time period. from 3 to 15. We visualized the resulting topics as word clouds and Relative change of volume. Ratio of the number of tweets in manually inspected them. We sought to identify topics related to the current time period and previous time period. resilience, such as sets of tweets discussing actions of authorities Relative change of positive tweets. Ratio of the number of in response to natural disasters (floods, fires, etc.) or citizens positive tweets in the current and previous time period. perception of authorities’ ability to act in case of such disasters. Relative change of neutral tweets. Ratio of the number of Our inspection was inherently subjective and mostly focused neutral tweets in the current and previous time period. on the top-ranked words, although we also considered standard Relative change of negative tweets. Ratio of the number of measures for topic evaluation (perplexity and coherence). negative tweets in the current and previous time period. Absolute change in volume. Difference of the number of tweets in the current and previous time period. Absolute change in positive tweets. Difference of the number of positive tweets in the current and previous time period. Absolute change in neutral tweets. Difference of the number of neutral tweets in the current and previous time period. Absolute change in negative tweets. Difference of the number of negative tweets in the current and previous time period. 5 WEB APPLICATION We make the collected summaries of volume and sentiment anal- yses, frequent tokens and topic data available through a simple web interface. To access the tool, which is intended for internal use, a user provides a security access token, which determines which community data the user is privileged to view. Figure 1: An example of a detected topic. It focuses on COVID-19 measures and conditions for crossing the border between Italy and Slovenia. The most interesting topic is presented in Figure 1 and includes tweets discussing the COVID-19 measures (mascherine is Italian for masks) and conditions (condizioni) for crossing the border between Italy and Slovenia, as Gorizia is a border town. Unfortu- nately, we did not find any topics directly related to resilience, which could be used by resilience experts to assess resilience. This analysis seems to infer that automatic topic modeling from tweets is likely not very useful for assessing resilience, at least not in the context that we tried to use it, though this might be due to the nature of topic modeling. Namely, the topics that we get obtained were general, in that they cover general, rather than Figure 2: An example of a sentiment distribution as dis-resilience specific, concepts, and unpredictable in a sense that in played in the web application. some circumstances (and locations) we might obtain resilience All community data is split into sections based on available related topics and in others not. Ultimately, the automatically classifiers and time intervals. The application provides two op- constructed topics were not very informative, and, as such, we tions for the time interval, i.e., monthly and yearly views. used topics constructed manually by resilience experts from the The monthly view is composed of the following sections: a involved communities. total tweet count, tweet count by sentiment accompanied with 4.4 Aggregates and trends a corresponding pie chart (as seen in Figure 2), a table of frequent tokens per classified sentiment (as seen in Table 1), a table To support facilitate the use of the results of our analyses in the with the calculated aggregates and trends and, finally, a topics RESILOC platform, several aggregates are explicitly calculated. section, that shows values for particular topics. In the final sec-These aggregates seek to capture the overall sentiment and senti- tion, each defined topic has a subsection, where the user can see ment trends over various time periods and are available to LRTs, which tokens define the topic, its sentiment distribution and a 6We selected 15 topics as an amount that can be inspected manually. corresponding pie chart, as shown in Figure 3. 29 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Osojnik, Ženko, Žnidaršič, Table 1: An example set of detected frequent tokens. resilience assessment models, but to be considered and prepared for use by the users, i.e., employed with human oversight. (a) Positive (b) Negative Analyses such as the ones presented in this paper are only use- Token Occurence Token Occurence ful for large enough communities that get mentioned in tweets edizioni 174 stato 102 frequently. Furthermore, tweets often do not represent the opin- europea 141 recarsi 77 ion of the population at large. While they are suitable for analysis, oggi 139 solo 57 even in real time, their representativity is an issue that needs to nova 128 x 47 be considered when using such data. capitale 119 ospedali 39 ACKNOWLEDGMENTS This work has been supported by the RESILOC, which has re- ceived funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 833671. We acknowledge also the financial support from the Slovenian Re- search Agency for research core funding for the programme Knowledge Technologies (No. P2-0103). REFERENCES [1] Ghazaleh Beigi, Xia Hu, Ross Maciejewski, and Huan Liu. 2016. An overview of sentiment analysis in social media and its applications in disaster relief. Sentiment analysis and ontology engineering, 313–340. [2] Federico Bianchi, Debora Nozza, and Dirk Hovy. 2021. "FEEL-IT: Emotion and Sentiment Classification for the Italian Language". In Proceedings of the 11th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Association for Computational Linguistics. [3] Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media. [4] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet Figure 3: An example of a COVID-19 related topic. allocation. Journal of Machine Learning Research, 3, (Mar. 2003), 993–1022. [5] Uberto Delprato, Joe Cullen, Thomas Spielhofer, Daniele Del Bianco, Rajen-dra Akerkar, Nadia Miteva, and Zlatka Gospodinova. 2022. RESILOC Project Deliverable 3.1 – RESILOC Resilience Indicators. Tech. rep. The RESILOC Consortium. Retrieved Aug. 24, 2022 from https://www.resilocproject.eu/w p- content/uploads/2022/05/RESILOC_D3.1_V7.0.pdf . [6] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, (June 2019), 4171–4186. doi: 10.18653/v1/N19- 1423. [7] United Nations Office for Disaster Risk Reduction. 2012. Terminology. Retrieved Aug. 24, 2022 from https://www.undrr.org/terminology/resilience. [8] Manon Jurgens and Ira Helsloot. 2018. The effect of social media on the dynamics of (self ) resilience during disasters: a literature review. Journal of Figure 4: An example of a yearly view of relative sentiment. Contingencies and Crisis Management, 26, 1, 79–88. doi: 10.1111/1468-5973 .12212. [9] Bing Liu. 2015. Sentiment Analysis: Mining Opinions, Sentiments, and Emo-The yearly view is composed of the same sections as the tions. Cambridge University Press. [10] Matej Martinc, Iza Skrjanec, Katja Zupan, and Senja Pollak. 2017. Pan 2017: monthly view, however, it concerns data for the entire year. In ad-author profiling-gender and language variety prediction. In CLEF (Working dition to the global pie charts (for sentiment distribution), graphs Notes). [11] Sjirk Meijer, Jon Hall, Rut Erdelyiova, Marcello Sabanes, Abby Onencan, for the progression of absolute and relative sentiment are shown and Kerstin Junge. 2022. RESILOC Project Deliverable 2.6 – Analysis of based on month by month data, as shown in Figure 4. different approaches to resilience also outside EU, Section 6. Tech. rep. The RESILOC Consortium. Retrieved Aug. 24, 2022 from https://www.resilocpr 6 CONCLUSION oject.eu/wp- content/uploads/2021/04/RESILOC_D2.6- v6.0_Final.pdf . [12] Ahmed Nagy and Jeannie A Stamberger. 2012. Crowd sentiment detection We propose a novel approach to resilience assessment based during disasters and crises. In ISCRAM. [13] Christos H. Papadimitriou, Hisao Tamaki, Prabhakar Raghavan, and San-on social media datasets. The analyses and tools described in tosh Vempala. 1998. Latent semantic indexing: a probabilistic analysis. In the paper were developed and presented to potential users in Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium preliminary try-out sessions, i.e., as sprints in Agile software on Principles of Database Systems (PODS ’98). Association for Computing Machinery, Seattle, Washington, USA, 159–168. doi: 10.1145/275487.275505. development, as well as discussed with the project consortium’s [14] Dennis J Parker. 2020. Disaster resilience–a challenged science. Environ-domain experts. The approach is currently being evaluated in the mental Hazards, 19, 1, 1–9. [15] Jonathan K Pritchard, Matthew Stephens, and Peter Donnelly. 2000. Infer-project trials, to quantify its usefulness based on expert feedback. ence of Population Structure Using Multilocus Genotype Data. Genetics, 155, While automatic topic modeling resulted in some meaningful 2, (June 2000), 945–959. doi: 10.1093/genetics/155.2.945. topics, these were mostly general and very few of them were [16] Lola Xie, Juliet Pinto, and Bu Zhong. 2022. Building community resilience on social media to help recover from the covid-19 pandemic. Computers in related to resilience. The users expressed preference for more Human Behavior, 134, 107294. doi: 10.1016/j.chb.2022.107294. focused topics, which can now be defined manually, and find the new results interesting and potentially relevant. Interestingly, there is a general preference to not directly use the automatically calculated aggregates of volume and sentiment as inputs to the 30 Urban Mobility Policy Proposal Using Machine-Learning Techniques Miljana Shulajkovska Maj Smerkol Matjaž Gams miljana.sulajkovska@ijs.si maj.smerkol@ijs.si matjaz.gams@ijs.si Jožef Stefan Institute Jožef Stefan Institute Jožef Stefan Institute Jamova cesta 39 Jamova cesta 39 Jamova cesta 39 Ljubljana, Slovenia Ljubljana, Slovenia Ljubljana, Slovenia ABSTRACT increasing the usage of public transport, the system outputs the most appropriate policy that satisfies those requirements. The world’s demography is constantly increasing and most peo- The rest of the paper is organised as follows. First, an overview ple move to big cities. This growth in urbanization affects people’s of the proposed system is given. Then Section 3 describes the daily activities by encountering congestion, air and noise pol-process of collecting the data by giving a brief overview of the lution, water and energy usage etc. To deal with such issues, simulation tool used, the implemented scenarios and the key city authorities are taking various actions in order to provide performance indicators (KPIs). In Section 4 the machine learning the most optimal solution. In that terms testing, evaluating and approach is discussed followed by a description of the applied implementing different scenarios are of great importance that methods and the experimental results. Finally in Section 6 a cost money and time at the same time. Therefore, in the era of conclusion is given. artificial intelligence, different approaches can be used to auto- mate this process. In this paper, we propose a system that has the potential to automatically propose mobility policies based 2 SYSTEM on previously defined city changes. The decision-makers input The key concept of the proposed system is to collect data from the required city changes, while the system outputs a mobility the microscopic traffic simulator. All the components are shown policy that satisfies that specification. To implement the idea, in Figure 1. The microscopic traffic simulator emulates the be-machine-learning algorithms are trained on data produced by a haviour of all the people interacting on the mobility infrastruc- microscopic traffic simulator. The system is tested on data rep- ture. To run the simulator several input files are required such as resenting the city of Bilbao, where the policies are related to network and travel demand. Then the output of the simulator is closing the Moyua square in the city centre at a specific time and used to calculate the KPIs and later both are used to train the ML for different duration, while the city changes are related to air model as part of the ML module. The ML module takes as input pollution and usage of different means of transport. a required city change defined by the decision-makers and using the trained model outputs a mobility policy that satisfies those KEYWORDS requirements. traffic simulation, artificial intelligence, mobility policy 1 INTRODUCTION According United Nations report [7] by 2050 two in three people will live in urban areas. However, as cities continue to grow they may face many challenges that affect the daily mobility services and people’s movement in general. Therefore, finding a solution that satisfies people’s needs is a crucial step toward building a more sustainable mobility system. Currently, many traditional approaches exist that rely on experts’ knowledge using results from microscopic simulations. Those approaches include simula- tion of different mobility policies which are then analysed by the decision-makers. The key issue is how to choose the most appropriate mobility policy that will satisfy the requirements defined Figure 1: System description by the decision-makers that meet the user’s needs. Achieving this goal is impossible without using the help of modern technologies such as machine learning. In this paper, we propose a system that makes usage of ma- 3 DATA COLLECTION chine learning methods to automate the process of mobility policy Building a ML module/model that predicts the most suitable suggestions. As decision-makers are interested in achieving a mobility policy requires a sufficient amount of data related to a particular set of goals (KPIs) such as reducing 𝐶𝑂 emissions or 2 particular set of policy actions and their consequences. In that Permission to make digital or hard copies of part or all of this work for personal terms, the main source of data for the proposed system comes or classroom use is granted without fee provided that copies are not made or from a microscopic traffic simulator as a very common research distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this method for testing and evaluating different mobility situations. In work must be honored. For all other uses, contact the owner /author(s). the following sections, a brief overview of the simulation tool is Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia given, and then different scenarios representing specific policies © 2022 Copyright held by the owner/author(s). are discussed. Finally, the KPIs used for this study are presented. 31 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia M. Shulajkovska, M. Smerkol, M. Gams Figure 2: System for Urban mobility policy design architecture. 3.1 Simulation 3.2 Scenarios In this study, MATSim was used as the most suitable microscopic simulation tool [3]. The main concept behind this tool is shown in Figure 3. It consists of an iterative process where plans from the travel demand are executed simultaneously in the mobsim and scored in the scoring module. The score is a metric to evaluate whether a plan is good or bad. It takes into account travelling and waiting time, activity duration etc. More about scoring can be read here [5]. Then a certain number of plans are chosen and modified in the replanning step. After a sufficient number of iterations (in our case 200) an equilibrium is reached where no more plans are evolving, producing higher scores. Figure 3: MATSim cycle Figure 4: Scenarios 𝐶𝑂2 A crucial step before running the simulator is to provide data representative of the study area of our interest. The input data is related to the city network map, public transit schedules, travel demand etc. The most challenging part is to construct the travel demand as demographic and other people’s movement data is hard to find. Therefore different techniques exist to solve the issue. One approach is to replicate a real population (construct synthetic population) using sample data and marginal distribu- tion, and then assign activity location using origin-destination (OD) matrices. The iterative proportional fitting (IPF) [1] algorithm is used to construct the synthetic population using sample data from EUSILC [2] and demographic data provided by the city. The IPF algorithm is one of the most widely used algorithms for synthetic population reconstruction that combines both datasets (sample data and marginal distributions) to produce weights that show the number of replication of a specific person from the sample data to a specific geographical zone while maintaining the marginal distributions. The simulation outputs a large amount of data that describes people’s movements in form of events. Each event has a type such as “vehicle enters traffic", “vehicle enters/leave link", “person starts activity" etc., time and person’s id that performs it. Figure 5: Scenarios Public Transport Other output files also exist as histograms of travel/wait times, usage of different transport modes etc. All this data is used in the We have tested 40 different situations of one policy represented calculation of KPIs for the ML module. by the closure of the Moyua square from 8 am to 5 pm with a 32 Mobility Policy Proposal Using ML Techniques Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia different duration from one to four hours. Results of the applied 4.1 Methods and Results policy are shown in Figure 4 and Figure 5. Since the target variables are continuous and there are multi- On both plots, the x-axis represents the start time of closure, ple of them, we deal with a multi-output regression problem. while the y-axis represents the duration of closing the square This involves predicting more numerical values at the same time for transport in hours. On the first bar plot, the z-axis shows which limits the usage of many algorithms that are designed in the changes in percents of CO warm emissions and the z-axis 2 predicting only a single numerical value. However, to solve the on the second bar plot shows the changes in percents in public problem several solutions exist: transport usage compared to a situation when no changes in the city are applied. In terms of reducing the CO emissions, the best 2 scenario is if we close the square at 9 am for 1 hour. In that case • Multi-output regression algorithms the CO has lowered for 0.9%. On the other hand, if the square is 2 • Wrapper multi-output regression algorithms closed from 5 pm to 6 pm for traffic, the city gets congested in – Direct multi-output regression the surrounding areas and the CO warm exhausts are increased 2 – Chained multi-output regression by 26% even though the public transport usage has increased and car usage has decreased as shown on the second bar plot. One approach is to use regression algorithms that support 3.3 KPIs Calculation multiple outputs directly such as linear regression, k-nearest The key performance indicators (KPIs) represent the objectives neighbours, decision tree, random forest etc. The other approach defined by the decision makers that need to be achieved. In col- is to divide the multi-output regression problem into multiple sub-laboration with the city, a set of KPIs was defined as follows: problems (wrapper multi-output regression) and then deal with • Air pollution single regression problems. On one hand, there is direct multi- : 𝐶𝑂 , 2 𝑁 𝑂 𝑥 , 𝑃 𝑀 cold/warm emissions. • Usage of different modes output regression where independent models are developed for : car, bicycle, public transport, the prediction of each numerical value. On the other hand, the walk. • Bike safety, bikeability chained multi-output regression consists of dependent models when predicting each numerical value. The first set of KPIs related to air pollution is modelled using To evaluate the models a cross-validation technique was used. MATSIm additional emission package [6]. The emissions are The mean absolute error (MAE) performance metric is used as a calculated using HBEFA (Handbook on Emission Factors for score. The mean and standard deviation of the MAE are calculated Road Transport) database [4] in combination with the simulation across all the folds and all repeats. Table 2 shows the results of output. As air pollution is caused by different contributions of inherently multi-output regression algorithms. K-Nearest Neigh- road traffic the emissions module considers both warm and cold bours proved the best results with a mean value of MAE of 3.743 emissions. Warm emissions are emitted while driving and depend and a standard deviation (SD) of 0.852, which means that the on driving speed, stop duration, and vehicle characteristics while difference between the predicted and true value is approximately cold emissions are emitted during the warm-up phase and are 4-time intervals or 2 hours for start hour and 1 hour for the dependent on the engine’s temperature. duration. The second set of KPIs related to the usage of different modes The plot on Figure 6 depicts the difference between true (blue of transport is produced during the simulation, while bike safety dots) and predicted (red dots) values for 30% of the data. The and bikeability are calculated from the simulation results. x-axis represents the start hour, while the y-axis represents the 4 MACHINE-LEARNING FOR POLICY duration of the closure. The smallest error (best-predicted in- stance) between a true and predicted value is marked with black PROPOSAL and yellow dot respectively. Table 3 and Table 4 show results for The developed system proposes a ML module that helps decision-direct and chained multi-output regression respectively where makers in suggesting mobility policies that satisfy a set of pre- random forest and k-nearest neighbours proved the best results defined KPIs. As mentioned before, the main source to train the in both cases. ML models comes from the microscopic traffic simulator. The By applying different ML models, we examined which algo- data used to train the models is shown in Table 1. The KPIs rep-rithm can prove the best results in predicting a mobility policy resent the features, while the policies, i.e., scenarios are treated that satisfies a set of predefined city changes. On the other hand, as target variables where the start time and duration of closure when discussing the simulation results in Section 3.2 we con-are discretized into 30 and 15 minutes intervals respectively. By cluded that closing the Moyua square in the afternoon (from doing so, the most suitable scenario will be predicted according 4 pm to 7 pm) decreases the 𝐶𝑂 for 1%. Also if the square is 2 to the pre-selected KPIs values. closed at 11 am, the car usage decreases by 0.12% reducing the 𝐶𝑂 emissions by 0.3%. Both situations can be implemented to 2 Table 1: Features and target variables achieve a more sustainable city but which one is better depends on the goals that the city wants to accomplish. If the aim is to Features Target variables decrease the 𝐶𝑂 in the afternoon peak hours, the first situa- 2 CO warm CO cold start time of closure 2 2 tion is better. The latter situation provides better results if the NOx warm NOx cold duration of closure city is interested in reducing car usage and increasing the usage PM warm PM of public transport. Therefore, selecting the best option when car trips bike trips having multiple criteria that need to be satisfied is hard when PT trips walk only human knowledge is included. Additionally, the number of bikeability bike safety scenarios included in this work is limited as not every possible situation can be tested due to computational and time costs. 33 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia M. Shulajkovska, M. Smerkol, M. Gams Table 2: Inherently Multi-Output Regression Algorithms 6 FUTURE WORK In order to provide the decision-makers with more options when Model MAE (mean) MAE (SD) implementing different strategies, in future work different areas Linear Regression 4.472 2.565 around Moyua square will be closed. Closing multiple streets K-Nearest Neighbours 3.743 0.852 around the square might help in reducing the air pollution in Decision Tree Regression 4.367 1.083 the centre and in the city in general. Also, it can contribute to reducing congestion and other relevant KPIs during peak hours. Therefore, additional target variables such as the city areas to be closed and the length of the streets inside will be added. Since more simulation needs to be executed, three servers will be used to reduce the computational time. By doing this, the dataset will be expanded and more options will be available to the decision- makers in implementing the best scenario that meets the people’s needs. Moreover, a mobility expert validation will be included in testing the ML module which will contribute to a more detailed analysis of the ML results. ACKNOWLEDGMENTS This work is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 870338. The authors also acknowledge the financial support from the Slovenian Research Figure 6: True vs Predicted Data Agency (research core funding No. P2-0209). REFERENCES Table 3: Direct Multi-Output Regression [1] Abdoul-Ahad Choupani and Amir Reza Mamdoohi. 2016. Population syn- thesis using iterative proportional fitting (ipf ): a review and future research. Transportation Research Procedia, 17, 223–233. International Conference on Model MAE (mean) MAE (SD) Transportation Planning and Implementation Methodologies for Developing Linear Support Vector Regression 5.684 4.709 Countries (12th TPMDC) Selected Proceedings, IIT Bombay, Mumbai, India, Random Forest Regression 3.590 0.967 10-12 December 2014. doi: https://doi.org/10.1016/j.trpro.2016.11.078. [2] Eurostat. 2020. Eu statistics on income and living conditions microdata 2004-Linear Regression 4.472 2.565 2019, release 2 in 2020. en. (2020). doi: 10.2907/EUSILC2004- 2019V.1. K-Nearest Neighbors Regression 3.743 0.852 [3] 2016. Introducing matsim. The Multi-Agent Transport Simulation MATSim. Ubiquity Press, (Aug. 2016), 3–8. Decision Tree Regression 4.333 1.274 [4] Mario Keller, Stefan Hausberger, Claus Matzer, Philipp Wüthrich, and Benedikt Notter. 2017. Hbefa version 3.3. Background documentation, Berne, 12. [5] Kai Nagel, Benjamin Kickhöfer, Andreas Horni, and David Charypar. 2016. Table 4: Chained Multi-Output Regression A closer look at scoring. (Aug. 2016). doi: 10.5334/baw.3. [6] 2016. Emission modeling. The Multi-Agent Transport Simulation MATSim. Ubiquity Press, (Aug. 2016), 247–252. doi: 10.5334/baw.36. Model MAE (mean) MAE (SD) [7] 2017. Population Facts No. 2017/4, October 2017: The impact of population Linear Support Vector Regression 5.718 4.364 momentum on future population growth. https://population.un.org/wpp/Publi cations/Files/PopFacts_2017- 4_Population- Momentum.pdf . Random Forest Regression 3.607 0.931 Linear Regression 4.472 2.565 K-Nearest Neighbors Regression 3.690 0.829 Decision Tree Regression 4.325 1.118 5 CONCLUSION In this paper, we presented an approach for proposing mobility policies in an automatic way. First, an overview of the system was given. Then the simulation tool was described. 40 variations of one policy for closing the Moyua square in the centre of Bilbao for transport were simulated and evaluated. The variations refer to the start hour and duration of the closure. On the simulation data, the desired KPIs were calculated which together with other simulation output data were used as input to the ML models. Since the ML models output multiple variables (start hour and duration of closure) the problem becomes multi-output regression and limits the usage of many ML algorithms that are developed to deal with a single target. Therefore, two approaches were presented: multi-output and wrapper multi-output regression algorithms. Applying both sets of algorithms, the best results proved random forest regression using the chained method from the second approach. 34 IMF Quality Assurance of Mammograms Using Deep Convolutional Neural Networks and Transfer Learning Gašper Slapničar Peter Us Erna Alukić gasper.slapnicar@ijs.si peter@u- s.si erna.alukic@zf .uni- lj.si Jožef Stefan Institute UpDev d.o.o. Faculty of Health Sciences Jožef Stefan International Kranj, Slovenia University of Ljubljana Postgraduate School Ljubljana, Slovenia Jamova cesta 39 Ljubljana, Slovenia Nejc Mekiš Miha Mlakar Janez Žibert nejc.mekis@zf.uni- lj.si miha@kaiber.si janez.zibert@zf.uni- lj.si Faculty of Health Sciences kAIber d.o.o. Faculty of Health Sciences University of Ljubljana Ljubljana, Slovenia University of Ljubljana Ljubljana, Slovenia Ljubljana, Slovenia Figure 1: Pipeline of our proposed system. MLO mammograms are taken as input, fed into a chosen CNN architecture, which then outputs a quality grade in an ordinal regression task. ABSTRACT 1 INTRODUCTION Quality assurance (QA) of mammograms is of vital importance, Mammography is the process of using low-energy X-rays to ex- since they are the de-facto method used by doctors for detec- amine human breast tissue for diagnosis and screening, with tion of breast cancer and other tissue abnormalities. Despite this, the typical goal being early detection of breast cancer through there is a distinct lack of both experts and tools for this task. We detection of anomalies in the tissue [3]. The procedure consists thus investigated a deep-learning-based approach using convolu-of compression of breast tissue using a dedicated mammography tional neural networks for prediction of the inframammary fold device with the aim of reducing and evening out the tissue thick- (IMF) quality grade, which cannot be measured and determined ness that X-rays must penetrate, in turn reducing the required quantitatively with rules (e.g., if some point 𝑥 cm from edge, radiation dose. There are two common views in which a mam- then grade 𝑦 ). We showed in a 5-fold cross-validation experiment mogram is recorded, namely craniocaudal (CC) and mediolateral that a relatively simple model can achieve respectable perfor- oblique (MLO). The former captures the breast tissue in a top- mance in terms of root mean squared error (RMSE), area under down direction along the pectoral muscle plane, while the latter the ROC curve (AUC) and accuracy, predicting the IMF grade captures it at an angle sideways. with 3 possible values. Finally we also showed that the model in The importance of regular mammographic screening can not fact derives features from the relevant ROI also looked at by the be overstated, as it the de-facto method used by doctors for early experts, hinting at real-world usefulness of such a QA model. breast cancer or other tissue-anomaly detection (e.g., tumors). However, the procedure itself is rather involved and can be te- KEYWORDS dious for the patient. Subsequently, it is of utmost importance to ensure high quality of taken images, as it is highly undesirable for mammography, quality assurance, ordinal regression, neural net-a patient to have to revisit and repeat the procedure. To this end works, deep learning there are a number of guidelines available that are being followed by the radiologists with the aim of minimizing the amount of Permission to make digital or hard copies of part or all of this work for personal low-quality images taken. Some quality metrics can be quantified or classroom use is granted without fee provided that copies are not made or and measured precisely, while others are more subtle and often distributed for profit or commercial advantage and that copies bear this notice and left to the expertise of the professionals. One such elusive metric the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner /author(s). is the Inframammary Fold (IMF). IMF is the inferior border of the Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia breast and the crease between the breast and abdominal tissue. © 2022 Copyright held by the owner/author(s). 35 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Slapničar and Us, et al. It serves as an important anatomical landmark on an MLO mam- 3 DATA DESCRIPTION AND mogram to provide assurance to the radiographer that all of the PREPROCESSING posterior breast tissue has been included. The dataset used in our work was collected in the previously In Slovenia, only a single radiographer is responsible for weekly mentioned DORA program between 2011 and 2013 and graded by grading of randomly sampled mammographic segments from that the single expert using a modified PGMI (Perfect, Good, Moderate, week, as a part of the DORA oncology program. This is inefficient Inadequate) grading system, where there were just three grades and the grade itself can be subjective, especially the metrics that (Good, Moderate, Inadequate/Poor). All the mammograms were are not clearly defined, such as IMF. There is thus a need for auto-anonymized and held by the Faculty of Health Sciences. In total mated tools that would help radiologists with quality assurance there were 4928 mammograms, 2424 in the CC view and 2424 in (QA), optimizing the process while also potentially serving as a the MLO view, each view in turn having a left (L) and right (R) training tool for improvements in quality of mammograms being image, each with a corresponding ground truth label. Example collected. images can be seen as input in Figure 1. In this paper we highlight the importance and lack of QA Initially these images were saved in the widely-used Digi- methods for mammograms, and develop two deep convolutional- tal Imaging and Communications in Medicine (DICOM) format, neural-network (CNN) computer vision models, aiming to recog- which contains the image itself with a lot of corresponding meta- nize and successfully predict the IMF quality metric. The latter is data. These initial images are grayscale and come in varying known to be especially tricky and subjective. We evaluate and resolutions, which is problematic for CNN inputs, which are ex- compare the performance of our proposed models on a custom pected to be of a constant shape. Subsequently we first extracted dataset collected in Slovenia. the images and standardized the resolution to 256 x 256 pixels, reThe rest of this paper is organized as follows: we first high- saving them in lossless .png format. Some information loss can light related work about QA in mammography, together with not be avoided however, since we are substantially decreasing existing computer vision methods that are important for QA in resolution in this step. Section 2; in Section 3 we describe our dataset; Section 4 details Each image is by default also equipped with letters in the top-our methodology and experimental results; and in Section 5 we right or top-left corner of the image itself, denoting the view and summarize and discuss the implications with future directions side (e.g., CC-L or MLO-R). Since our computer vision models for our work. might use this information to learn some data-specific pattern 2 RELATED WORK between the view and quality, we semi-manually removed these letters by zeroing pixels in an empirically determined region, To ensure the key goals of mammography are achieved, qual- leaving only the breast tissue in each image. ity assurance must be adopted in order for the mammograms Finally, some of the images have important keypoints or ar- to be suitable for diagnosis. In the past decades, several stan- eas that are very difficult to see, especially with the naked eye. dards have been developed nationally and internationally to this We thus investigated Adaptive Histogram Equalization methods, end. A review study by Reis et al. [3] presents an overview of more specifically Contrast-Limited AHE (CLAHE), which was these, showing importance of both technical and clinical aspects, reported in related work to help substantially with visibility of especially with the development of digital mammography. important regions in mammograms [2]. The effects can be seen In terms of QA, there are specific keypoints or region segmen-in Figure 2. tations that are often required for individual grades. One such is the segmentation of pectoral muscle in the MLO view, which was traditionally segmented using pixel thresholding and region growing algorithms. Recently however, with the rise of deep learning, this task was successfully resolved. Soleimani et al. [6] proposed a two-stage algorithm that predicts precise pectoral muscle boundary using a CNN. Evaluating on three datasets they achieved average values of dice similarity and accuracy of 97% and 99% respectively. Other researchers focused on the final task directly - predic- tion of anomalies related to cancer. For instance Abel et al. [1] proposed a CNN architecture for detection of abnormal axillary lymph nodes, which is a specific abnormality in the tissue. They reported accuracies of 96% for detection of suspicious lymph nodes. Shen et al. [5] have importantly shown that such end-to-end networks are not only performing well, but can be trans- ferred between different datasets, for instance CBIS-DDSM and INBreast, hinting at good generalization capabilities. Despite all existing work, we noticed a distinct lack of research dealing with machine learning QA prediction, meaning models that output grades of mammograms rather than try to predict some other subsequent outcome. The quality itself is however vital both for physicians or other models taking mammograms as Figure 2: Effects of CLAHE image preprocessing on mam- inputs, as good quality mammograms are useful for detections, mograms in both views. Top row are originals, bottom rows while poor quality is not only difficult for diagnosis, but requires a are preprocessed. repeat measurement which is “expensive” for everyone involved. 36 IMF Quality Assurance of Mammograms Using Deep CNNs and Transfer Learning Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Table 1: Numeric results from the 5-fold CV using our For our class label we used the one given by the expert for simple model and transfer learning with VGG19. the IMF. It is a numeric value from 1 to 3, where 1 = good, 2 = moderate and 3 = poor. It is problematic since it is known to be subjective, but still the best we could obtain. It is also Simple model quite difficult to visualize clearly for a non-expert, but we show Fold Nr. RMSE AUC 1v3 AUC 1v2 AUC 2v3 Accuracy the relevant region of interest (ROI) looked at by the experts in Figure 1. Ideally the ground truth grade would be a voted 1 0.26 0.90 0.79 0.78 0.93 value obtained by several experts, but that was not feasible. Since 2 0.25 0.98 0.80 0.92 0.95 IMF is only evaluated on the MLO view, we used just those 3 0.24 0.98 0.79 0.95 0.91 mammograms in further analysis. 4 0.23 0.97 0.78 0.89 0.96 5 0.24 0.94 0.78 0.84 0.95 4 METHODOLOGY AND RESULTS Avg. 0.24 0.95 0.79 0.88 0.94 The learning problem itself seems like a typical classification, Transfer VGG19 model however, after giving it some thought, we realized it is better to 1 0.29 0.78 0.73 0.60 0.79 set it up as an ordinal regression problem, as we are predicting 2 0.27 0.89 0.73 0.77 0.75 a range of grades. We thus mapped our class label values from 3 0.28 0.85 0.70 0.71 0.66 discrete 1, 2 and 3 to the [0.0, 1.0] interval, where grade 1 = 0.0, 4 0.28 0.94 0.65 0.92 0.70 grade 2 = 0.5 and grade 3 = 1.0. We thus obtain a numeric value 5 0.30 0.85 0.69 0.73 0.77 from the network which gives us not only the class information Avg. 0.28 0.86 0.70 0.75 0.73 but also the distance between prediction and ground truth (e.g., prediction of 0.96 is much better compared to prediction of 0.76, given ground truth 1.0). separation between poor images and everything else, since those Once our data was finalized in terms of inputs and outputs, are the most problematic, while moderate images could be close we focused our attention towards a model. Related work dealing to either good or poor. Numeric results are given in Table 1 with visual tasks in general, as well as with mammograms specif- and the ROC curves for all three cases (of the better performing ically, shows convincing dominance of CNNs in the past decade. model) are given in Figures 3, 4 and 5. Subsequently we decided to investigate such architectures. Our aim was to start with a simple architecture consisting of three 2D Conv layers with 16, 16 and 8 kernels (each of size 9 x 9), intermediate max pooling layers with kernel size 2 and stride 2, and one fully connected layer with 1000 neurons on top. We used batch normalization and dropout as commonly-used mechanisms to prevent overfitting. ReLU was used as the activation function and the network was trained for 100 epochs. We then wanted to compare such a simple architecture with a more complex CNN. We decided to attempt a transfer learning approach, where we based it on the known VGG19 model trained on ImageNet. We replaced the final layer with two fully connected layers to instead predict our IMF grade, while keeping the bulk of the model intact with existing weights. Hyperparameters were mostly left at default values, except for learning rate which had a linear decay implemented in our simple architecture. Initially the full data was split into training (80%), validation (10%) and test sets (10%), as traditional. However, using a random train-validation-test split can be volatile and is undesirable in terms of making any conclusions about robustness of the model. We thus instead used the 5-fold cross validation (CV) evaluation setup, where k-1 groups are taken for training and the remaining group is left for testing in each of the 5 iterations. Additionaly it was ensured that the distribution of labels is kept in each group, Figure 3: ROC curve of the better performing (simple) meaning we did a stratified split. This was important in our model for 1v2 class combination. experiment since the class label distribution is relatively skewed (good / 1 / 0.0 = 65%, moderate / 2 / 0.5 = 30%, poor / 3 / 1.0 = 5%) and we are especially interested in the bad examples, which are 5 DISCUSSION AND CONCLUSION in vast minority. For evaluation metrics we used root mean squared error (RMSE) Looking at the results, we can initially observe the overall better and area under the curve (AUC of ROC curve), while also looking performance of our simple model compared to the pre-trained at the classificaion accuracy (transforming from ordinal regres- VGG19 transferred to our domain. All the metrics are more stable sion back to classification) as it is the most intuitive metric. In across fold while also achieving better overall values. Since we terms of AUC, we always compared all possible pairs of grades, were especially interested in the separation of the good and poor meaning 1v2, 1v3 and 2v3. The most important is to have good mammograms (grade 1 vs. grade 3), we can also see that this 37 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Slapničar and Us, et al. where we can see the model commonly focused on the relevant IMF ROI that is also focused by the experts. However, this focus was not exclusive, meaning it did not focus just that region and also it wasn’t the same on all images, but still rather consistent, which is a good indicator that the model actually learned the relevant features for IMF QA. Figure 6: Grad-CAM heatmap showing the areas on the Figure 4: ROC curve of the better performing (simple) mammogram that were focused by the model to derive model for 1v3 class combination. features. To summarize, we investigated the possibility of CNN-based IMF QA for mammograms, looking at a simple CNN model and a transferred slightly-modified VGG19 model. The simple CNN architecture achieved respectable results in terms of several met- rics and importantly also focused on the correct part of the image without any guidance, hinting that it learned the relevant features also looked at by the experts. Extensions to prediction of other QA grades might allow for a system that could help with continuous QA, as well as expert training. ACKNOWLEDGMENTS We would like to acknowledge the DORA program for provid- ing us with the dataset. Additionally we would like to thank colleagues from the Faculty of Health Sciences for helping with understanding and interpretation of data. Finally a special thanks goes to kAIber d.o.o. and UpDev d.o.o. for participating in the data analysis and modelling. REFERENCES [1] Frederik Abel, Anna Landsmann, Patryk Hejduk, Carlotta Ruppert, Karol Borkowski, Alexander Ciritsis, Cristina Rossi, and Andreas Boss. 2022. Detecting abnormal axillary lymph nodes on mammograms using a deep convolutional neural network. Diagnostics, 12, 6, 1347. [2] Etta D Pisano, Shuquan Zong, Bradley M Hemminger, Marla DeLuca, R Figure 5: ROC curve of the better performing (simple) Eugene Johnston, Keith Muller, M Patricia Braeuning, and Stephen M Pizer. model for 2v3 class combination. 1998. Contrast limited adaptive histogram equalization image processing to improve the detection of simulated spiculations in dense mammograms. Journal of Digital imaging, 11, 4, 193–200. [3] Cláudia Reis, Ana Pascoal, Taxiarchis Sakellaris, and Manthos Koutalonis. separation is indeed consistently successful and seems relatively 2013. Quality assurance and quality control in mammography: a review of robust. As expected the separation of moderate (grade 2) mam-available guidance worldwide. Insights into imaging, 4, 5, 539–553. [4] Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna mograms from others is more challenging, but still can be done Vedantam, Devi Parikh, and Dhruv Batra. 2019. Grad-CAM: visual expla-reasonably well. nations from deep networks via gradient-based localization. International Deep learning models are often criticized for being black-Journal of Computer Vision, 128, 2, (Oct. 2019), 336–359. doi: 10.1007/s11263- 019- 01228- 7. box and not offering explainability for their decisions. We also [5] Li Shen, Laurie R Margolies, Joseph H Rothstein, Eugene Fluder, Russell wanted to do a quick investigation of this by using the Grad-McBride, and Weiva Sieh. 2019. Deep learning to improve breast cancer CAM approach [4], which is a popular technique for producing detection on screening mammography. Scientific reports, 9, 1, 1–12. [6] Hossein Soleimani and Oleg V. Michailovich. 2020. On segmentation of pec- “visual explanations” for decisions from a large class of CNN- toral muscle in digital mammograms by means of deep learning. IEEE Access, based models in the form of a heatmap showing where the model 8, 204173–204182. doi: 10.1109/ACCESS.2020.3036662. focused the most on an image. An example is shown in Figure 6, 38 Vehicle Axle Distance Detection From Time-series Signals Using Machine Learning David Susič Blaž Erzar Nika Čelan david.susic@ijs.si blaz.erzar@gmail.com nika.celan8@gmail.com Jožef Stefan Institute Jožef Stefan Institute Jožef Stefan Institute Jamova cesta 39 Jamova cesta 39 Jamova cesta 39 Ljubljana, Slovenia Ljubljana, Slovenia Ljubljana, Slovenia Gašper Leskovec Žiga Kolar Martin Konečnik leskovecg@gmail.com ziga.kolar@ijs.si martin.konecnik@cestel.si Jožef Stefan Institute Jožef Stefan Institute Cestel cestni inženiring d.o.o Jamova cesta 39 Jamova cesta 39 Špruha 32 Ljubljana, Slovenia Ljubljana, Slovenia Trzin, Slovenia Domen Prestor Matjaž Skobir Matjaž Gams domen.prestor@cestel.si matjaz.skobir@cestel.si matjaz.gams@ijs.si Cestel cestni inženiring d.o.o Cestel cestni inženiring d.o.o Jožef Stefan Institute Špruha 32 Špruha 32 Jamova cesta 39 Trzin, Slovenia Trzin, Slovenia Ljubljana, Slovenia ABSTRACT The authors of Marszalek et al. [5] measured vehicle axle distances based on multifrequency impedance measurement of a In this study, we compare a signal decomposition and a con- slim inductive loop sensor. Using test vehicles, they were able volutional autoencoder approach in determining vehicle axle to confirm that their method can successfully determine the dis- distances. Our dataset consists of 62076 instances of vehicles tances. In the work by Chatterjee et al. [1] they used data from crossing a bridge. Each vehicle is detected by eight identical sen-sensors on the bridge and a wavelet-based analysis to determine sors placed under the bridge that record time series vibration the axle distances. In the work of Khalili et al. [3], piezoelectric data. We compare our results to those computed using an ex-elements were used for a system to detect the weight of vehicles pert’s model, which we consider to be the ground truth. The in motion. They used the weight-in-motion system to determine signal decomposition approach achieves accuracies of up to 0.89, both the axle distances and vehicle weights with sufficient ac- 0.98, and 1, with the calculated distances matching the expert curacy. Rujin et al. [4] developed a deep learning system for model within 2%, 5%, and 10%, respectively. The convolutional vehicle recognition based on strain sensor data. They were able autoencoder, on the other hand, achieves accuracies of up to 0.97, to classify 11 different vehicle types with a very high average 0.99, and 1 with the same error margins compared to the expert precision. model. In this work, we use data collected from a single bridge to KEYWORDS test two machine learning approaches for vehicle axle distance detection. The first approach is based on signal decomposition vehicle detection, axle distance, neural network, peak detection, and the second approach is based on the convolutiona neural machine learning network autoencoder. We begin in section 2 with a description of the dataset used 1 INTRODUCTION in this study. In section 3, we explain our approaches and illus-trate them with examples. Results are presented and discussed In order to accurately weigh the vehicles crossing the bridge in section 4. The paper concludes with section 5. and determine if they weigh too much and damage the road, the vehicle speed, the number of vehicle axles, and their in-between distances must first be determined. 2 DATASET In recent years, many types of sensors have been used for Our dataset consists of sensor data from vehicles crossing the vehicle detection. These include acoustic sensors, inductive loop bridge. The sensors are placed under the bridge in the configura- sensors, strain sensors, magnetic sensors, and imaging sensors. tion shown in the Figure 1. The sensors are identical and record Researchers around the world have developed various methods the vibrations of the crossing vehicles at a sampling rate of 512𝐻 𝑧 . for using sensor data for vehicle axle detection, weight detection, In this study, only data from vehicles travelling in lane 1 were and classification. used (orange sensors 1–8 in Figure 1). The vibration data from the sensors in the first and last columns (1 and 2) are also used Permission to make digital or hard copies of part or all of this work for personal to calculate the vehicle speed. The vehicle speed is calculated by or classroom use is granted without fee provided that copies are not made or superimposing the signals using a cross-correlation method. Our distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this dataset consists of 62076 instances, where each instance contains work must be honored. For all other uses, contact the owner /author(s). data for one vehicle. In addition, each instance also contains the Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia axle distances calculated by the expert model and the times at © 2021 Copyright held by the owner/author(s). which each vehicle axle crossed the signal. 39 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia David Susič, et al. We compare our results with those of the expert model, which has a measured accuracy of 98% in practise. Our results are con- sidered correct if all calculated axle distances match those of the expert model within the specified margin of error. In addition, we have the option to skip the predictions of the results whose confidence level is below a certain threshold. 3.1 Signal Decomposition The first step of a signal decomposition approach was to de- termine the most appropriate signal for each instance. For this purpose, peak detection was performed for all eight signals, cal- culating the first and second derivatives. A 62076 × 8 matrix was then created, with the eight columns indicating the number of detected peaks from each of the signals. A gradient boosting regression model was then trained on this matrix, the output of which was an array with the correct number of peaks. In addition, eight gradient boosting classifiers were trained, one for each of the signals. The output of each classifier was an array giving the Figure 1: Placement of the sensors on the bridge. probabilities that the number of detected peaks was correct for that signal. An example of sensor data for a vehicle with 5 axles is shown For each test instance, the regression model first predicted the in Figure 2. The signal peaks correspond to the vehicle axles cross-axis number, rounding the result to the nearest integer. All eight ing the sensor, while the amplitude corresponds to the weight signals were then run, starting with the signal from sensor one. of the vehicle on that axle. Although the peaks in the signal For each signal, a check was made to see if the number of peaks correspond to the vehicle axles, the intervening spacing of the determined matched the number predicted by the regression peaks generally does not represent the actual axle spacing. Due model. If it did, we checked whether the probability that the to interference between the signals from the individual axles, number of detected peaks, as predicted by the classifier for that the peaks shift relative to the position of the peaks if the signals signal, was above a confidence threshold. Iteration was stopped from the individual axles were isolated. A small effect of this if both criteria were met, and the signal was selected as the most can be seen in the peak triplet in the right part of the signal in appropriate for that instance. This means that in most cases not Figure 2. In some cases, where two of the adjacent vehicle axes all signals were checked. Although in principle there could be a are very close to each other, the peaks may overlap and become signal that would be even more suitable, we found experimentally indistinguishable from each other, resulting in a single peak. that in more than 90% of cases signal 1 was the best, followed by signals 2, 5, and 6, in that order. If none of the eight signals met the criteria, the instance was skipped. In our experiments, we used confidence values between 0 and 0.997. After the best signal for each instance was determined, the signals were decomposed into what are called base waves. A base wave is a function designed to have the form of an isolated wave, and can be constructed with three parameters: x-location, scaling in the x-direction, and scaling in the y-direction. The signal decomposition can be defined as an optimization problem where we want to find the best parameters for the base waves. The objective function we want to minimize is the mean square error between the original signal and the sum of the base waves. Once the signal was optimally decomposed, the peaks of the base waves were used to calculate the axle distances. The base wave peaks now correspond to the actual axle times and represent isolated waves, thus their peaks are not shifted by interference. Figure 2: Example sensor data for a vehicle with 5 axles. This can also be seen in Figure 3: the green and black vertical The green markings correspond to the crossing points of lines do not exactly coincide, which is most obvious in the triplet the vehicle axles as calculated by the expert model. on the right. Peak detection and models’ training was performed using Python 3.7 and libraries Scikit 0.24.2 [7] and Numpy 1.18.5 [2]. 3 METHODOLOGY 3.2 Convolutional Autoencoder The objective of this study is to evaluate the performance of signal decomposition and convolutional autoencoder approaches The second approach is a convolutional neural network as an in computing vehicle axle distances from sensor data. In our two autoencoder. The schematic of the model is shown in Figure 5. approaches, the signal timing of the axles is first determined The first layer is the input layer. It consists of 4000 nodes, since and then their intermediate distances are computed based on the this was the number of samples of the longest signal, and takes vehicle speed. the raw signal as input. The signals with length less than 4000 40 Vehicle Axle Distance Detection Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia at least T2, the axle distances were calculated, otherwise the in- stance was skipped. In our experiments, T1 was fixed at 0.01, while we tried values between 0 and 0.7 for T2. Convolutional autoencoding was performed using Python 3.7 and library Tensorflow 2.9.1 [6]. 4 RESULTS Both approaches were tested with 5-fold cross-validation, and the folds were the same in both experiments. The results are shown in tables 1 and 2. They are given for a percentage of skipped instances between 0 and 50 %. The "±𝑥 " values in the brackets of the accuracy columns represent percentages within which the calculated axle distances must match those given by the expert models for the prediction to be considered correct. In the Table1, the confidence column corresponds to the confidence threshold Figure 3: Example of signal decomposition. The black ver- of the prediction model for the number of peaks, while the T2 tical lines represent the peaks as detected during peak column in Table 2 corresponds to the minimum peak probability detection, while the green lines represent the peaks after of the peak with the highest probability. If these criteria are decomposition. not met, an instance is skipped. The corresponding amounts of skipped instances are given in the skipped columns. We see that the signal decomposition approach achieves ac- were padded with zeros. For all instances, signal 1 was chosen as curacies up to 0.89, 0.98, and 1, with the calculated distances the input signal, since it worked best in most cases. The encoding matching the expert model within 2%, 5%, and 10%, respectively. part of the autoencoder consists of three convolutional layers The convolutional autoencoder, on the other hand, achieves ac- with sizes 5, 2 and 3 and the number of filters 8, 16 and 32. Each curacies of up to 0.97, 0.99, and 1 compared to the expert model, convolutional layer is followed by a batch normalization and a with the same error margins. max-pooling of size 2. The decoder has the opposite structure It can also be seen that for both approaches, the accuracies compared to the encoder and the max-pooling layers are replaced start to converge when about 15% of the instances are skipped, by up-sampling layers. The output layer has the same size as the and do not improve significantly even when the percentage of input layer. The loss function used for model training was a skipping is 50. The convolutional autoencoder generally has binary cross entropy. higher accuracy than the signal decomposition approach, except The output for each training instance was a binary array, in cases where the margin of error is 10 %, in which case the with ones at the sample locations containing a peak and zeros performances of both approaches are similar. everywhere else. Thus, for the unseen (test) instances, the model outputs the probabilities for each of the input signal samples to 5 CONCLUSION include a peak. An example output for a test instance is shown In this work, we tested a signal decomposition and convolutional in Figure 4. It can be seen that the probabilities for the peaks are autoencoder approach for vehicle axle distances detection using almost always less than one, but the number of probabilities that data from eight sensors mounted under the bride. We used a are not zero (or not very close to zero) is equal to the number of dataset of 62076 vehicles travelling in the same lane. We compared actual peaks. It can also be seen that the model has learned to our results to those computed by an expert’s model, which we shift the peaks where necessary (triplet on the right). considered to be the ground truth. Using the signal decomposition approach, we achieved accuracies of up to 0.89, 0.98, and 1 for the cases where each vehicle axle distance matched the expert model within 2%, 5%, and 10%, respectively. For the convolutional autoencoder, the accuracies obtained were 0.97, 0.99, and 1 for the same error margins compared to the expert model. The models will be improved in future work to include detection of vehicle axle weights. ACKNOWLEDGMENTS This study received funding from company Cestel. The authors acknowledge the funding from the Slovenian Research Agency (ARRS), Grant (PR-10495) and Basic core funding P2-0209. REFERENCES [1] Pranesh Chatterjee, Eugene OBrien, Yingyan Li, and Arturo González. 2006. Wavelet domain analysis for identification of vehicle axles from bridge mea-Figure 4: Example output of convolutional autoencoder. surements. Computers & Structures, 84, 28, 1792–1801. doi: https://doi.org/10 .1016/j.compstruc.2006.04.013. [2] Charles R. Harris, Jarrod K. Millman, Stefan J. van der Walt, Ralf Gom-After decoding, we selected lower and upper probability thresh- mers, Pauli Virtanen, and David Caurnapeau. 2020. Array programming with olds, T1 and T2 (red and blue dashed lines in Figure 4). If all peaks numpy. Nature, 585, 357–362. doi: https://doi.org/10.1038/s41586- 020- 2649- 2. had a probability of at least T1 and the highest probability was 41 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia David Susič, et al. Figure 5: Scheme of the convolutional autoencoder structure. Table 1: Results for the signal decomposition approach. Confidence Skipped Accuracy (± 2%) Accuracy (± 5%) Accuracy (± 10%) 0.0 0.01 0.79 0.88 0.91 0.75 0.05 0.82 0.92 0.95 0.865 0.1 0.84 0.95 0.97 0.915 0.15 0.86 0.96 0.98 0.95 0.2 0.87 0.96 0.99 0.974 0.25 0.88 0.98 0.99 0.977 0.3 0.88 0.98 0.99 0.982 0.35 0.88 0.98 0.99 0.988 0.4 0.88 0.98 0.99 0.996 0.46 0.89 0.98 1.0 0.997 0.5 0.89 0.98 1.0 Table 2: Results for the signal convolutional autoencoder approach. T2 Skipped Accuracy (± 2%) Accuracy (± 5%) Accuracy (± 10%) 0.0 0.0 0.85 0.88 0.89 0.021 0.06 0.91 0.94 0.95 0.12 0.1 0.94 0.97 0.98 0.431 0.15 0.95 0.98 0.99 0.488 0.2 0.96 0.99 0.99 0.53 0.25 0.96 0.99 0.99 0.566 0.3 0.96 0.99 0.99 0.601 0.35 0.96 0.99 0.99 0.636 0.4 0.97 0.99 1.0 0.665 0.45 0.97 0.99 1.0 0.7 0.5 0.97 0.99 1.0 [3] Mohamadreza Khalili, Gopal Vishwakarma, Sara Ahmed, and Athanassios multi-frequency impedance measurement of a slim inductive-loop sensor. Thomas Papagiannakis. 2022. Development of a low-power weigh-in-motion Measurement, 169, 108525. doi: https://doi.org/10.1016/j.measurement.2020.1 system using cylindrical piezoelectric elements. International Journal of Trans- 08525. portation Science and Technology, 11, 3, 496–508. doi: https://doi.org/10.1016 [6] Martín Abadi et al. 2015. TensorFlow: large-scale machine learning on het- /j.ijtst.2021.06.004. erogeneous systems. Software available from tensorflow.org. (2015). https: [4] Rujin Ma, Zhen Zhang, Yiqing Dong, and Yue Pan. 2020. Deep learning based //www.tensorf low.org/. vehicle detection and classification methodology using strain sensors under [7] F. Pedregosa et al. 2011. Scikit-learn: machine learning in Python. Journal of bridge deck. Sensors, 20, 18. https://www.mdpi.com/1424- 8220/20/18/5051. Machine Learning Research, 12, 2825–2830. http://www.jmlr.org/papers/volu [5] Zbigniew Marszalek, Waclaw Gawedzki, and Krzysztof Duda. 2021. A re- me12/pedregosa11a/pedregosa11a.pdf . liable moving vehicle axle-to-axle distance measurement system based on 42 Študija učinkovitosti algoritma za razporejanje terenskega dela A Study of the Performance of a Fieldwork Scheduling Algorithm Tea Tušar Nace Sever Aljoša Vodopija Institut “Jožef Stefan” in Univerza v Ljubljani Bogdan Filipič Mednarodna podiplomska šola Fakulteta za računalništvo in Institut “Jožef Stefan” in Jožefa Stefana informatiko Mednarodna podiplomska šola Jamova cesta 39 Večna pot 113 Jožefa Stefana Ljubljana, Slovenija Ljubljana, Slovenija Jamova cesta 39 tea.tusar@ijs.si nace.sever@gmail.com Ljubljana, Slovenija aljosa.vodopija@ijs.si bogdan.f ilipic@ijs.si POVZETEK KEYWORDS Razporejanje terenskega dela je zahteven optimizacijski problem. scheduling problem, evolutionary algoirthm, heuristic algorithm, Za njegovo reševanje smo razvili trinivojski algoritem. Na prvem branch-and-bound algorithm, mixed-integer linear programming, nivoju evolucijski algoritem razporedi naloge po delavcih, na efficiency drugem nivoju hevristika za vsakega delavca razporedi naloge po dnevih, na tretjem nivoju pa algoritem razveji in omeji rešuje pro-1 UVOD blem mešanega celoštevilskega linearnega programiranja (angl. Razporejanje terenskega dela je optimizacijski problem, ki zah- Mixed-Integer Linear Programming, MILP), kjer nalogam za vsa- teva dodelitev delavca in časa začetka opravljanja vsaki terenski kega delavca in vsak dan posebej dodeli čas njihovega začetka. nalogi tako, da je zadoščeno vsem omejitvam razporejanja in V tem prispevku se posvečamo študiji učinkovitosti algoritma je cena celotnega urnika čim nižja. Obstajajo številne različice na tretjem nivoju. Izkaže se, da ta nalog ne more razporediti tega problema, ki se razlikujejo tako po omejitvah kot po na- dovolj hitro za praktično uporabo, zato za povečanje njegove činu izračuna cene razporeda. Posledično obstajajo tudi različni učinkovitosti MILP poenostavimo. Rezultati poskusov kažejo, da pristopi za njegovo reševanje [6]. Študija [3] primerja dve formu-poenostavitev izboljša učinkovitost algoritma na tretjem nivoju, laciji problema, in sicer v obliki problema usmerjanja vozil (angl. medtem ko je učinek na celoten algoritem načeloma ugoden, a Vehicle Routing Problem, VRP) in v obliki problema mešanega odvisen od problema. celoštevilskega linearnega programiranja (angl. Mixed-Integer Linear Programming, MILP). Rezultati poskusov študije nakazu- KLJUČNE BESEDE jejo, da je oblika MILP za zapis razporejanja nalog terenskega problem razporejanja, evolucijski algoritem, hevristika, algoritem dela ustreznejša od oblike VRP, zato tudi naš pristop uporablja razveji in omeji, mešano celoštevilsko linearno programiranje, obliko MILP. učinkovitost Vendar pa je učinkovitost reševanja takšnih kombinatoričnih problemov zelo odvisna od njihove velikosti. Že pri relativno ABSTRACT majhnih problemih se namreč pogosto zgodi, da jih ni moč re- šiti v doglednem času. Zato se v našem pristopu zgledujemo po Fieldwork scheduling is a demanding optimization problem. To podobnih prijemih iz sorodnega dela (glej npr. [1]) in problem solve it, we developed a three-level algorithm. At the first level, an razdelimo na manjše, lažje obvladljive podprobleme. Problem evolutionary algorithm distributes tasks to workers, at the second razporejanja terenskega dela tako rešujemo s trinivojskim opti-level, a heuristic algorithm distributes tasks of each worker over mizacijskim algoritmom, pri katerem na prvem nivoju evolucijski days, and at the third level, a branch-and-bound algorithm solves algoritem razporedi naloge po delavcih, na drugem nivoju hevri- the problem in the form of mixed-integer linear programming stika za vsakega delavca razporedi naloge po dnevih, na tretjem (MILP), where the starting times of tasks need to be scheduled for nivoju pa algoritem razveji in omeji rešuje problem v obliki MILP, each worker and each day separately. In this paper, we study the tj. nalogam za vsakega delavca in vsak dan posebej dodeli čas efficiency of the algorithm at the third level. Because it cannot njihovega začetka. schedule the tasks fast enough for practical use, we try to increase Tak trinivojski algoritem je sposoben v uri zadovoljivo rešiti its efficiency by simplifying the MILP. Experimental results show tudi nekoliko večje probleme (npr. z 20 delavci, 20 dnevi in več sto that the simplification improves the performance of the algorithm nalogami), a je ta čas za praktično uporabo predolg. Zato želimo at the third level, while the effect on the overall algorithm is in algoritem pohitriti. Ozko grlo predstavlja reševanje problema principle favorable, but depends on the problem. MILP, saj sta evolucijski algoritem in hevristika zelo hitra, tako da lahko največjo pohitritev celotnega algoritma dosežemo s pohitritvijo na tretjem nivoju. Permission to make digital or hard copies of part or all of this work for personal V nadaljevanju prispevka v 2. razdelku najprej predstavimo or classroom use is granted without fee provided that copies are not made or našo različico problema razporejanja terenskega dela, nato pa distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this v 3. razdelku na kratko opišemo trinivojski algoritem za njeno work must be honored. For all other uses, contact the owner /author(s). reševanje. V 4. razdelku analiziramo učinkovitost algoritma na Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia tretjem nivoju, v 5. razdelku pa predlagamo poenostavitev pro- © 2022 Copyright held by the owner/author(s). blema MILP in preverimo njen učinek najprej na algoritem na 43 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Tea Tušar, Nace Sever, Aljoša Vodopija, and Bogdan Filipič tretjemu nivoju in končno na celoten trinivojski algoritem. Pri- • Dnevno aktivnih delavcev naj bo čim manj. spevek sklenemo z zaključki v 6. razdelku. • Aktivni delavci naj bodo čim bolj enakomerno obreme- njeni. 2 PROBLEM RAZPOREJANJA TERENSKEGA • Skupno trajanje potovanj med lokacijami naj bo čim krajše. DELA • Izvede naj se čim več nalog. • Naloge naj se izvedejo čim prej. Problem razporejanja terenskega dela opišemo s scenarijem raz- • Delavci naj imajo čim manj neaktivnega časa. porejanja, spremenljivkami problema, omejitvami in optimiza- • Nadur naj bo čim manj. cijskim kriterijem (podrobne formalne definicije tu ne moremo • Naloge z višjo prioriteto naj se začnejo izvajati pred nalo- zapisati zaradi pomanjkanja prostora). Obravnavamo najbolj splo- gami z nižjo prioriteto. šno različico problema, v kateri želimo razporediti večino nalog, • Naloge, ki zapadejo prej, naj se začnejo izvajati pred nalo- saj ta pokriva tudi posebni primer, ko je zaradi spremembe v gami, ki zapadejo kasneje. zadnjem trenutku treba prerazporediti samo nekaj nalog. • Naloge naj se izvedejo čim bliže želenemu časovnemu 2.1 Scenarij razporejanja oknu. • Nalogo, ki ima želene delavce, naj opravi eden izmed žele- Časovno obdobje razporejanja je razdeljeno na dneve, znotraj njih nih delavcev. je čas obravnavan zvezno. Za vsak dan poznamo začetek in konec • Nalogo z želenim razporedom naj se izvede čim bliže temu rednega delovnika ter trajanje morebitnih nadur (bodisi na za- razporedu. četku bodisi na koncu dneva). Dane imamo tudi množico lokacij, Uteži posameznih delnih kriterijev so zelo pomembne, saj časovne oddaljenosti za vsak par lokacij ter množico kompetenc, določajo njihova medsebojna razmerja in drastično vplivajo na ki so skupne nalogam in delavcem. Scenarij razporejanja vse- dobljene rešitve. Nastavili smo jih s pomočjo ekspertnega znanja buje tudi podatke o delavcih, in sicer za vsakega kompetence, in poskusov na številnih različnih scenarijih. dovoljeno število nadur ter začetno in končno lokacijo. Podatki o nalogah pa za vsako obsegajo njeno trajanje, želeno in obvezno časovno okno, prioriteto, zahtevane kompetence in morebitne 3 TRINIVOJSKI OPTIMIZACIJSKI želene delavce. Malice so posebne naloge, za katere lokacija ni ALGORITEM definirana (malica se vedno izvaja na isti lokaciji kot predhodna V nadaljevanju na kratko predstavimo vse tri nivoje optimizacij- naloga in se ne more prekrivati z drugimi nalogami). skega algoritma. Dodatno lahko scenarij razporejanja vsebuje že vnaprej pripra- vljene razporede posameznih nalog, ki so dveh tipov. Obveznih razporedov 3.1 Prvi nivo: razporejanje nalog po delavcih se ne sme spreminjati, a jih je treba vseeno upoštevati, saj postavljajo omejitve k razporejanju ostalih nalog. Po drugi Na tem nivoju z evolucijskim algoritmom [5] vsaki nalogi dode-strani pa se želene razporede lahko spreminja, a to vpliva na ceno limo delavca, ki jo bo izvedel. Evolucijski algoritem začetno po-končnega urnika. pulacijo 𝑁 rešitev ustvari naključno, vendar tako, da vse rešitve p ustrezajo omejitvam za delavce (prve tri omejitve v razdelku 2.3). 2.2 Spremenljivke Potem algoritem izvaja naslednje korake največ 𝑁 generacij. V g vsaki generaciji algoritem najprej izbere 𝑁 staršev s turnirsko Spremenljivke optimizacijskega problema v celoti določijo urnik, p selekcijo. Nato pare staršev križa in mutira (pri mutaciji upo- saj za vsako nalogo povedo ali je razporejena ali ni (ni namreč rabimo različne strategije zasnovane po meri delnih kriterijev treba razporediti vseh nalog) in če je, kateri delavec jo bo opravil (glej razdelek 2.4), ki jih izbiramo tako, da se pogostost uporabe ter kdaj se bo začela izvajati. sklada z njihovimi utežmi). Tako dobljeno populacijo evolucijski 2.3 Omejitve algoritem ovrednoti tako, da za vsako rešitev izvede drugi in tretji nivo algoritma. Nato staro populacijo prepiše z novo (najboljšo Urnik, ki predstavlja rešitev problema, je dopusten samo, če iz- staro rešitev ohrani) in nadaljuje z enakimi koraki. polnjuje vse naslednje omejitve: • Delavec lahko izvaja samo eno nalogo hkrati (v časovnem 3.2 Drugi nivo: razporejanje nalog delavca po razporejanju nalog je treba poskrbeti tudi za upoštevanje dnevih trajanja potovanja med lokacijami). • Delavec lahko izvaja naloge le znotraj delovnega časa in Hevristika na drugem nivoju naloge vsakega delavca razporedi po ima omejeno število nadur. dnevih. Po vrsti vsem nalogam, urejenim naraščajoče po številu • Delavec mora imeti zahtevane kompetence za opravljanje dni, v katerih se lahko izvedejo, dodelimo zanje najugodnejši naloge. dan. Ugodnost dne določimo z glasovanjem, ki poteka tako, da • Naloge morajo biti razporejene znotraj svojih obveznih različni delni kriteriji (glej razdelek 2.4) glasujejo za dneve, ki so časovnih oken. zanje najugodnejši. Glasovi so uteženi z utežmi delnih kriterijev, • Nalog z obveznim razporedom se ne sme prerazporejati. nalogi pa dodelimo dan z največ glasovi. 2.4 Optimizacijski kriterij 3.3 Tretji nivo: določitev časa začetka nalog za Optimizacijski kriterij oz. cena urnika, ki jo želimo minimizirati, en dan enega delavca je definirana kot utežena vsota naslednjih delnih kriterijev (prve Na tretjem nivoju z algoritmom razveji in omeji nalogam za en tri postavke so si v nasprotju, zato je smiselno upoštevati samo dan enega delavca dodelimo začetni interval. Problem torej za- eno od njih naenkrat): pišemo v obliki MILP tako, da upoštevamo samo tiste omejitve • Vsi delavci naj bodo čim bolj enakomerno obremenjeni. in delne kriterije, ki so na tem nivoju še smiselni (npr. na tem 44 Študija učinkovitosti algoritma za razporejanje terenskega dela Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia 103 tevilo spremenljivk 600 tevilo omejitev 500 102 400 300 as [s] 101 200 100 0 100 1 2 3 4 5 6 7 8 9 10 2 3 4 5 6 7 8 9 10 tevilo nalog tevilo nalog Slika 1: Odvisnost števila spremenljivk in omejitev v for- Slika 2: Povprečen čas, potreben za optimalno rešitev pro- mulaciji problema MILP na tretjem nivoju od števila nalog. blema glede na njegovo velikost. sekund, kar pomeni, da je za naše potrebe algoritem neučinkovit nivoju se ne ukvarjamo več s kompetencami, enakomerno obre- že za probleme s sedmimi ali več nalogami. menjenostjo delavcev in podobnimi delnimi kriteriji, saj je zanje Preverimo še, kako dobro deluje algoritem, če mu omejimo poskrbljeno na prvih dveh nivojih). čas, ki ga ima na voljo za iskanje rešitev. Poskuse izvedemo z Podobno kot pri predstavitvi problema tudi tu zaradi omeje- naslednjimi časovnimi omejitvami: 0.5 s, 1 s, 2 s, 4 s, 8 s, 16 s in nega prostora ne moremo navesti celotne formulacije problema 32 s. Pri tem opazujemo, koliko nalog je algoritem razporedil, in MILP. Za razumevanje nadaljevanja je najpomembneje vedeti, da to število primerjamo z optimalnim številom razporejenih nalog 2 imamo pri takšni formulaciji za problem z 𝑛 nalogami 3𝑛 + O (𝑛) (dobljenim v prejšnjem poskusu, ko algoritem ni bil časovno ome- 2 spremenljivk in 5𝑛 + O (𝑛) omejitev, kot prikazuje slika 1. jen). Čeprav cilj algoritma ni samo razporediti čim več nalog, je število razporejenih nalog dober pokazatelj kakovosti delovanja 4 PREIZKUS UČINKOVITOSTI algoritma. Kot je razvidno iz slike 1, je število spremenljivk problema MILP Na sliki 3 vidimo, da ob prekratkem času na problemih z veliko zelo veliko že za probleme z majhnim številom nalog, kar ote-nalogami algoritem odpove (večino nalog zavrne, čeprav bi jih žuje nalogo optimizacijskemu algoritmu. Čas, ki ga potrebuje lahko razporedil). Na primer, ko ima algoritem na voljo le 0,5 s, za najdbo optimalne rešitve, preverimo s poskusom na množici primerno deluje le za probleme z do štirimi nalogami, za večje testnih problemov, ki imajo lastnosti podobne problemom iz pra- problema pa njegova uspešnost pade in ko je nalog osem ali kse. več, v povprečju razporedi le eno nalogo. Delovanje algoritma Ta množica vsebuje 180 testnih problemov (20 za vsako ve- je nekoliko boljše, če ima na voljo daljši čas, a šele pri 32 s se likost problema od dveh do desetih nalog), pri katerih je treba določiti čas izvajanja nalog za enega delavca v enem dnevu. Ne- kateri problemi imajo samo navadne naloge, lahko pa imajo tudi malico, eno nalogo z obveznim razporedom ali pa oboje. Trajanje 4.0 malice je vedno pol ure, trajanje ostalih nalog pa je izbrano na- ključno iz porazdelitve, ki skuša posnemati probleme iz prakse. 3.5 Tako je večina nalog krajših od 90 minut, nekaj pa jih ima dolžino ejenih nalog do štirih ur. Prioriteta vsake naloge je izbrana naključno med 1 in 3.0 9. Prav tako so lokacije izvajanja nalog in začetna lokacija delavca 2.5 izbrane naključno izmed lokacij nekaterih večjih slovenskih mest. t = 0.5 s Večina nalog ima neomejeno časovno okno, pri nekaterih pa je tevilo razpor t = 1 s 2.0 okno skrajšano na začetku ali koncu dneva. Trajanje delovnega no t = 2 s e t = 4 s časa je izbrano naključno med 6 in 10 ur, lahko pa delavec vedno 1.5 t = 8 s opravlja do dve naduri. Povpr t = 16 s Vse testne probleme rešujemo z algoritmom razveji in omeji t = 32 s 1.0 t = neomejen iz reševalnika SCIP [2] preko knjižnice OR-Tools [4]. Pri tem be-ležimo čas, ki ga algoritem potrebuje, da najde optimalno rešitev. 2 3 4 5 6 7 8 9 10 tevilo nalog Reševanje poteka na osebnem računalniku s 16 GB pomnilnika in frekvenco procesorja 3,60 GHz. Slika 3: Povprečno število razporejenih nalog pri različ- Rezultati poskusa so prikazani na sliki 2. Vidimo, da algoritem nih časovnih omejitvah in velikostih problemov (s črtkano praviloma potrebuje eksponentno več časa z dodajanjem vsake črno črto je prikazano optimalno število razporejenih na- naloge. Če želimo, da je celoten trinivojski algoritem koristen v log). praksi, si lahko za reševanje problema MILP privoščimo le nekaj 45 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Tea Tušar, Nace Sever, Aljoša Vodopija, and Bogdan Filipič P1, trajanje potovanja P1, obremenitev delavcev število razporejenih nalog na problemih z devetimi in desetimi t = 1 s t = 1 s nalogami približa optimalnemu številu razporejenih nalog. 310000 t = 2 s 335000 t = 2 s a t = 1 s, poenostavitev t = 1 s, poenostavitev 305000 t = 2 s, poenostavitev a 330000 t = 2 s, poenostavitev 5 POENOSTAVITEV PROBLEMA nik nik 300000 325000 Cena ur Cena ur Ker z delovanjem algoritma nismo zadovoljni, poskusimo pro- 295000 320000 blem poenostaviti. Za zapis delnih kriterijev za prioriteto in čas 290000 315000 2 2 zapadlosti potrebujemo 2 0 1000 2000 3000 𝑛 + O (𝑛) spremenljivk ter 4𝑛 + O (𝑛) Trajanje optimizacije [s] 0 1000 2000 3000 Trajanje optimizacije [s] omejitev, kar je zelo veliko, sploh ker ta dva delna kriterija ni- P2, trajanje potovanja P2, obremenitev delavcev sta zelo pomembna. Zato preizkusimo, kako algoritem deluje, če ju izpustimo (zanju lahko do neke mere poskrbimo na zgornjih 50000 50000 dveh nivojih optimizacijskega algoritma). Število spremenljivk a a t = 1 s nik nik t = 2 s in omejitev se še vedno povečuje kvadratično s številom nalog, 40000 40000 t = 1 s, poenostavitev t = 1 s t = 2 s, poenostavitev vendar pa smo koeficienta pred kvadratnim členom s 3 oziroma Cena ur t = 2 s Cena ur 30000 30000 t = 1 s, poenostavitev 5 zmanjšali na 1. t = 2 s, poenostavitev Za poenostavljeni problem izvedemo podoben test kot v raz- 20000 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 Trajanje optimizacije [s] Trajanje optimizacije [s] delku 4, le da testiramo samo pri časovnih omejitvah 0.5 s, 2 s, 8 s in 32 s. Na sliki 4 primerjamo število razporejenih nalog pri osnovnemu ter poenostavljenemu problemu. Vidimo, da na večjih Slika 5: Rezultati optimizacije za štiri različice algoritma poenostavljenih problemih algoritem deluje mnogo bolje. na dveh problemih (P1 zgoraj in P2 spodaj) z dvema raz- ličnima delnima kriterijema (trajanje potovanja levo in enakomerna obremenitev delavcev desno). Manjše vredno- 4.0 sti so boljše. 3.5 6 ZAKLJUČKI ejenih nalog 3.0 V prispevku smo analizirali učinkovitost algoritma za razporeja- nje terenskega dela. Posvetili smo se le časovno najzahtevnejšemu 2.5 t = 0.5 s t = 2 s delu trinivojskega algoritma – reševanju problema MILP na tre- tevilo razpor t = 8 s tjem nivoju. Z dvema poskusoma smo pokazali, da algoritem 2.0 no t = 32 s razveji in omeji ni dovolj učinkovit za reševanje praktičnih pro- e t = 0.5 s, poenostavitev blemov, zato smo problem MILP poenostavili. To ne spremeni 1.5 t = 2 s, poenostavitev Povpr t = 8 s, poenostavitev kriterijev celotnega problema, algoritmu na tretjem nivoju pa t = 32 s, poenostavitev 1.0 omogoči, da učinkovito reši tudi probleme z desetimi nalogami t = neomejen (več jih v praksi ne pričakujemo). Primerjali smo tudi, kako poe- 2 3 4 5 6 7 8 9 10 nostavitev vpliva na delovanje celotnega algoritma, in ugotovili, tevilo nalog da čeprav obstajajo problemi, za katere poenostavitev ni koristna, v splošnem daje dobre rezultate in se je bomo posluževali tudi v Slika 4: Primerjava delovanja algoritma na izvirni (polne praksi. črte) in poenostavljeni formulaciji problema (črtkane črte) pri različnih časovnih omejitvah in velikostih problemov ZAHVALA (s črno črtkano črto je prikazano optimalno število razpo- To delo je nastalo v okviru projekta Inteligentno in okolju pri- rejenih nalog izvirne formulacije problema). jazno razporejanje terenskega dela – MF-Scheduler, katerega naročnik je Comland d.o.o., sofinancerja pa Ministrstvo za gospo- S poenostavitvijo torej dosežemo, da algoritem na tretjem darski razvoj in tehnologijo Republike Slovenije in Evropski sklad nivoju deluje zadovoljivo tudi za praktične potrebe. Vendar pa za regionalni razvoj Evropske unije, ter raziskovalnega programa to še ne pomeni nujno, da poenostavitev izboljša delovanje ce- P2-0209, ki ga financira Javna agencija za raziskovalno dejavnost lotnega trinivojskega algoritma. To preverimo s poskusom na Republike Slovenije iz državnega proračuna. dveh testnih problemih, P1 in P2, pri katerih damo enkrat večjo utež delnemu kriteriju trajanja potovanja, drugič pa enakomerni LITERATURA obremenitvi delavcev. Na ta način dobimo štiri različne testne [1] S. Bertels in T. Fahle. 2006. A hybrid setup for a hybrid scenario: Combining probleme. P1 obsega 220 nalog, deset delavcev in sedem dni, P2 heuristics for the home health care problem. Computers & Operations Research, 33, 10, 2866–2890. doi: 10.1016/j.cor.2005.01.015. pa 114 nalog (vsebuje tudi malice), pet delavcev in tri dni. Za vsak [2] K. Bestuzheva in sod. 2021. The SCIP Optimization Suite 8.0. Technical Report. testni problem poženemo štiri različice trinivojskega algoritma, Optimization Online, (dec. 2021). http : / / www . optimization - online . org ki se razlikujejo samo na tretjem nivoju – ta uporablja bodisi /DB_HTML/2021/12/8728.html. [3] J. A. Castillo-Salazar, D. Landa-Silva in R. Qu. 2016. Workforce scheduling izvirni bodisi poenostavljeni problem, izvajanje algoritma pa je and routing problems: Literature survey and computational study. Annals of omejeno bodisi na 1 s bodisi na 2 s. Operations Research, 239, 1, 39–67. doi: 10.1007/s10479-014-1687-2. Slika 5 kaže rezultate teh poskusov. Na problemu P1 (zgornja [4] Google Developers. 2021. About OR-Tools. Retrieved 19. avg. 2022 from https://developers.google.com/optimization/introduction/overview. dva grafa na sliki) lahko jasno vidimo, da poenostavitev problema [5] A. E. Eiben in J. E. Smith. 2015. Introduction to Evolutionary Computing. na tretjem nivoju koristi učinkovitosti celotnega algoritma. Tega (2. izd.). Springer. doi: 10.1007/978- 3- 662- 44874- 8. [6] D. C. Paraskevopoulos, G. Laporte, P. P. Repoussis in C. D. Tarantilis. 2017. ne moremo trditi za problem P2, na katerem je delovanje izvirne Resource constrained routing and scheduling: Review and research prospects. in ponostavljene različice zelo podobno, vidimo pa veliko boljše European Journal of Operational Research, 263, 3, 737–754. doi: 10.1016/j.ejor delovanje v primeru omejitve izvajanja na 2 s. .2017.05.035. 46 Interaktivno eksperimentiranje z besednimi vložitvami v platformi ClowdFlows Interactive Experimentation with Word Embeddings in the ClowdFlows platform Martin Žnidaršič Senja Pollak Vid Podpečan martin.znidarsic@ijs.si senja.pollak@ijs.si vid.podpecan@ijs.si Institut "Jožef Stefan" Jamova cesta 39 1000 Ljubljana, Slovenija POVZETEK rabo tovrstnih metod med drugim otežuje potrebno predznanje, V članku predstavimo spletno platformo ClowdFlows, ki je na- ki je potrebno za njihovo smiselno uporabo, včasih pa tudi po- menjena analiziranju podatkov in strojnemu učenju in omogoča stopki namestitve in nastavitev programske opreme. Prototipno uporabo interaktivnih delotokov. Posebej predstavimo značilno- raziskovalno orodje ClowdFlows, ki ga razvijamo na Odseku za sti platforme, ki lajšajo njeno uporabo programiranja neveščim tehnologije znanja na Institutu "Jožef Stefan", naslavlja ti dve uporabnikom in elemente platforme, ki omogočajo analizo teksta oviri in kaže potencial za praktično uporabo. V sklopu projekta z najsodobnejšimi pristopi vektorskih vložitev. Poročamo tudi o EMBEDDIA [14, 13, 16] smo razširili nabor zmogljivosti tega praktičnem preizkusu uporabnosti platforme in njenih orodij z orodja predvsem na področju analize naravnega jezika, zato se v vektorskimi vložitvami za izbrane ciljne uporabnike s področij tem prispevku osredotočamo na metode in končne uporabnike humanistike in družboslovja. s tega področja. Natančneje, predstavimo primer učenja in upo- rabe modelov za besedne vektorske vložitve in izkušnje novih KLJUČNE BESEDE uporabnikov s področja humanistike in družboslovja. procesiranje naravnega jezika, besedne vložitve, spletna aplika- V razdelku 2 predstavimo osnovno sorodno delo. Platforma cija, delotoki ClowdFlows je opisana v razdelku 3. Razdelek 4 predstavi primer uporabe vektorskih vložitev in uporabniške izkušnje. Zaključki ABSTRACT so podani v razdelku 5. The paper presents the ClowdFlows web platform for machine learning and data analysis using interactive workflows. In par- 2 OZADJE IN SORODNO DELO ticular, we highlight selected features that facilitate its use by 2.1 Platforme za vizualno programiranje in non-programmers as well as selected elements of the platform deljenje rešitev that enable text analysis using state-of-the-art word embedding approaches. We also report on a hands-on evaluation of the us- Programsko orodje ClowdFlows, ki je predstavljeno in upora- ability of the platform and its word embedding components in bljeno v tem prispevku, je podobno nekaterim drugim orodjem a selected group of end users from the fields of humanities and za upravljanje delotokov podatkovnega rudarjenja. Slovenskim social sciences. uporabnikom je verjetno najbolj poznano orodje Orange [2], podobni pa sta orodji tudi Weka [18] in RapidMiner [8, 5]. Vsa ta KEYWORDS orodja omogočajo vizualno programiranje s programskimi gra- dniki in upravljanje tako izdelanih programov. Manj razširjene so natural language processing, word embeddings, web application, rešitve za skupno rabo delovnih tokov. To recimo ponuja portal workflows myExperiment [15] ali spletna stran pobude OpenML [17]. Je pa uporabnost teh rešitev omejena predvsem na dobro podprto 1 UVOD javno deljenje rešitev, za izvajanje ali urejanje delovnih tokov pa Področja, povezana z metodami umetne inteligence, kot so rudar-mora uporabnik še vedno namestiti posebno programsko opremo, jenje podatkov, strojno učenje in avtomatska obdelava naravnega v kateri so bili le-ti zasnovani. ClowdFlows, po drugi strani, omo-jezika, v zadnjih letih doživljajo razmah v praktični uporabi. Naj-goča tako izdelavo kot tudi deljenje in izvajanje delotokov. novejši metodološki dosežki so običajno najprej na voljo v obliki programskih knjižnic ali spletnih storitev (angl. web services), pozneje v platformah za razvijanje rešitev z udobnim uporabniškim 2.2 Besedne vložitve vmesnikom in običajno še pozneje v namenskih orodjih, ki to Besedne vektorske vložitve, ki so strojno naučene z uporabo metodologijo uporabljajo interno in omogočajo njeno uporabo nevronskih mrež, so predstavitve besed v prostoru, kjer vsako brez ali z zelo omejenim vplivom na način delovanja tudi upo- besedo opisuje vektor z veliko dimenzijami (tipično od nekaj rabnikom brez računalniškega predznanja. Slednjim samostojno deset do nekaj sto). Besede, ki so si blizu v vektorskem pro- storu (kar lahko merimo s kosinusno razdaljo), so si tudi se- Permission to make digital or hard copies of part or all of this work for personal mantično podobne. Med vektorskimi vložitvami je mogoče raču-or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and nati tudi odnose, ki presegajo enostavno sorodnost besed, npr. the full citation on the first page. Copyrights for third-party components of this preko analogij. Na primer, odnos Madrid:Španija je podoben od-work must be honored. For all other uses, contact the owner/author(s). nosu Pariz:Francija [10]. Pri statičnih vložitvah, kot so modeli Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia © 2022 Copyright held by the owner/author(s). word2vec [9] in fastText [1], je posamezna beseda v korpusu predstavljena z enim vektorjem. Pri metodi fastText je vsaka 47 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Žnidaršič, Pollak, Podpečan Slika 1: Glavni pogled v ClowdFlows. beseda predstavljena kot vsota vektorskih vložitev znakovnih Po prijavi v ClowdFlows imamo na voljo kratek tečaj o osno- n-gramov, ki jih beseda vsebuje. V praksi to pomeni, da metoda vah, izdelavo novega delotoka ali pregled javno dostopnih rešitev. pri modeliranju semantične bližine upošteva tudi morfološko Glavni pogled je namenjen izdelavi, pregledu in poganjanju de- podobnost besed, zaradi česar je ta metoda še posebej uporabna lotokov. Prikazan je na sliki 1. Večji del tega pogleda predstavlja za izračun besednih vložitev v morfološko bogatih jezikih, kot je delovna površina, na katero lahko potegnemo (ali uvrstimo z slovenščina. Za razliko od statičnih vložitev pa pri kontekstualnih dvoklikom) željeni programski gradnik (angl. widget) iz seznama vložitvah, kot sta na primer modela ELMo [12] in BERT [3], vsako razpoložljivih gradnikov na levi strani pogleda. Smiselno po-pojavitev besede opisuje svoj vektor. To je pomembno predvsem vezani gradniki predstavljajo delotok, ki ga lahko poženemo z z vidika večpomenskih besed pa tudi v primerih, kjer analiziramo nadzornim gumbom Play. razlike med besedami v različnih kontekstih. Za veliko jezikov Povezave med gradniki vzpostavimo s klikom na izhod enega obstajajo prednaučeni modeli na velikih jezikovnih korpusih [4, gradnika in vhod drugega. Vhodi so predstavljeni kot svetlo mo- 3], ki jih je mogoče priučiti za posamezne domene in naloge. dri pravokotniki na levi strani gradnika in izhodi kot tovrstni pravokotniki na desni. Povezave lahko odstranimo tako, da z de- 3 CLOWDFLOWS sno tipko miške kliknemo povezavo in izberemo možnost Remove. Delotoki se shranjujejo samodejno, lahko pa jih tudi eksplicitno ClowdFlows [6, 7] je spletna platforma za analiziranje podatkov shranimo s pritiskom na kontrolnik za shranjevanje, kar nam in strojno učenje z grafičnim uporabniškim vmesnikom, ki omo- omogoča tudi lastno poimenovanje shranjenega dela. Shranjene goča izvajanje v brskalniku brez zahtev po lokalni namestitvi delotoke lahko pregledujemo, kopiramo, brišemo, izvažamo ali programske opreme, ponuja pa tudi preprosto javno deljenje izde- javno delimo na pogledu, ki se pokaže ob izbiri lanih rešitev. Gre za odprtokodno raziskovalno orodje, katerega Your workflows. Javno objavljeni delotoki dobijo nespremenljiv URL naslov, ki ga zadnja stabilna različica ClowdFlows 3 je na voljo na naslovu: lahko delimo in vsakemu uporabniku ClowdFlows omogoča, da https://cf3.ijs.si/. ustvari svojo kopijo tako deljenega dela. Grafičen način sestave delovnih tokov in uporaba javno de- ljenih rešitev brez nameščanja dodatne programske opreme sta značilnosti, ki lajšata uporabo tudi uporabnikom, ki nimajo pro- gramerskega predznanja, imajo pa zanimive podatke in razisko- 4 UPORABA VEKTORSKIH VLOŽITEV valne probleme, pri katerih bi jim prav prišle metode, ki so na voljo v ClowdFlows. Za raziskovalce je poleg tega pomembno tudi 4.1 Učenje modela vložitev v ClowdFlows preprosto deljenje in preprostost ponavljanja ali nadgrajevanja Za pridobitev predstavitve teksta v obliki vektorskih vložitev obstoječih eksperimentov. lahko uporabimo predpripravljene modele ali pa take modele Elementi v ClowdFlows 3 vsebujejo vrsto programskih gra- sami strojno naučimo. Tovrstno strojno učenje je običajno ra- dnikov, ki ponujajo delo z vektorskimi vložitvami. Vsebujejo čunsko zelo zahtevno in za smiselne rezultate potrebuje velike prednaučene statične in kontekstualne modele za več jezikov količine podatkov. V praksi se zato pogosto uporablja predhodno kakor tudi nekaj orodij, ki na njih temeljijo, kot so na primer naučene modele. ClowdFlows ponuja več prednaučenih modelov, klasifikatorji za analizo sentimenta novic [11] in prepoznavanje naprimer ELMo, Word2Vec in razne modele pristopa BERT in sovražnega govora [11]. fastText, tudi za slovenščino. 48 Interaktivno eksperimentiranje z besednimi vložitvami v platformi ClowdFlows Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Slika 2: Delotok, ki je bil uporabljen na delavnici s ciljnimi uporabniki. Dostopen je na: https://cf3.ijs.si/workflow/283 . Učenje lastnih modelov je smiselno, ko gre za posebna bese- min_count - najmanjše število pojavitev besede, pri katerem dila ali naloge, pri katerih jih želimo uporabljati. Velika računska se beseda še upošteva. zahtevnost, velike količine podatkov in z njima povezani daljši časi obdelave namreč niso združljivi z interaktivno uporabo, ki Kjer je primerno, opis parametra vključuje namig, ali je v primeru je značilna za ClowdFlows. Pri uporabnikih digitalne humani-majhnih učnih podatkov priporočljivo povečati oz. zmanjšati stike in družboslovja smo zaznali potrebo za učenje vložitev na vrednost parametra. majhnih, specifičnih korpusih, kot so pesniške zbirke, specializi- rani novičarski članki ipd. Takšni korpusi so pogosto bistveno 4.2 Izkušnje uporabnikov manjši od tipičnih korpusov, ki se uporabljajo za učenje vektor- Uporabnost platforme ClowdFlows in najpomembnejših kompo- skih vložitvev. Glede na potrebe uporabnikov in zmogljivosti nent za analizo naravnega jezika z vidika ciljnih končnih uporab- platforme smo se odločili za implementacijo gradnika za učenje nikov smo preverjali v okviru enodnevne delavnice, ki je potekala modelov train fastText, saj je algoritem fastText eden najučinko- (na daljavo) 27. januarja 2022. Delavnica je bila namenjena eni vitejših in najmanj računsko zahtevnih. Implementacija gradnika od naših primarnih ciljnih skupin: raziskovalcem z različnih po- v ClowdFlows vsebuje tudi namige, kako prilagoditi privzete dročij humanistike in družboslovja, ki (predvidoma) niso vešči parametre za učenje na majhnih korpusih. Za sprejemljivo hi- programiranja. tro interaktivno delo vseeno priporočamo, da vhodni korpus ne Za potrebe delavnice smo pripravili primer delotoka za anali- presega dveh milijonov besed ali približno 10 MB neobdelanega ziranje besedil z vektorskimi vložitvami. Prikazan je na sliki 2. besedila. Delotok se začne z dvema možnima načinoma vnosa vhodnih Gradnik train fastText z uporabo algoritma fastText nauči nov podatkov, nadaljuje z opcijsko uporabo tokenizatorja in lema- vektorski model na vhodnem korpusu. Tak model lahko nato po- tizatorja (ta kot vhodni podatek sprejema tudi oznako jezika), sredujemo drugim gradnikom. Vhod v train fastText je besedilni čemur sledi učenje modela fastText. Naučeni model nato v delo- korpus, kot je na primer izhod gradnika Load Corpus from CSV. toku uporabimo na dva načina: v gradniku fastText neighboring Korpus je mogoče tokenizirati, lematizirati ali pa uporabiti tudi words pregledujemo okolico (sosednje besede) izbranih besed, v brez tovrstne predobdelave. gradniku Evaluate word expressions with fastText pa na modelu train fastText uporabniku ponuja nastavljanje sledečih para- preizkušamo uporabo izrazov (seštevanje, odštevanje) na vek- metrov: torskih predstavitvah besed. Ogled rezultatov v obeh primerih omogočimo z gradnikom Object viewer. bucket - število skupin (značilke besednih in znakovnih 𝑛- Delavnica se je začela s skupno uvodno predstavitvijo plat- gramov so zgoščene v fiksno število skupin); forme ClowdFlows in primera delotoka s slike 2, ki je trajala epoch - število epoh učenja; 20 minut in v kateri smo izbrane primere prikazali z uporabo lr - hitrost učenja; besedila novele Deseti brat Josipa Jurčiča. dimension - velikost besednih vektorjev; Temu je sledilo osem 20-minutnih sej, v katerih je vsak uporab- window - velikost kontekstnega okna; nik ustvaril svoj primerek delotoka, naložil svoj korpus in preizku-model - vrsta nenadzorovanega fastText modela (cbow ali sil izbrane komponente ClowdFlows. Ena seja je bila namenjena skipgram) ter enemu uporabniku in njegovim podatkom, drugi uporabniki pa 49 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Žnidaršič, Pollak, Podpečan so lahko prisostvovali kot opazovalci. Uporabnikom smo pri nji- ZAHVALA hovem delu pomagali, če so imeli težave pri uporabi platforme Prispevek je rezultat raziskovalnih projektov Računalniško pod- ali pri pripravi svojih vhodnih podatkov. Udeležba na delavnici prta večjezična analiza novičarskega diskurza s kontekstualnimi be-je bila na povabilo. Udeleženci, ki so bili povabljeni na delavnico, sednimi vložitvami (št. J6-2581), Sovražni govor v sodobnih koncep-so raziskovalci s področij literarnih ved, sociologije, socialnega tualizacijah nacionalizma, rasizma, spola in migracij (št. J5-3102) dela in sorodnih področij. Pripravili so lastne korpuse s svojih in programa Tehnologije znanja (št. P2-0103), ki jih je sofinanci- področij, kot so na primer tematski korpusi migracij, korpus del rala Javna agencija za raziskovalno dejavnost Republike Slovenije slovenskih literatov, korpus francoske poezije, LGBT, novice, ki iz državnega proračuna, ter evropskega projekta EMBEDDIA (št. govorijo o socialnem delu in podobno. Nekateri udeleženci so 825153), ki ga v okviru okvirnega programa za raziskave in ino- bili vabljeni v okviru interdisciplinarnih projektov SOVRAG in vacije Obzorje 2020 financira EU. CANDAS. Nihče od udeležencev pa ni imel predhodnih izkušenj s ClowdFlows. Zaradi velikega zanimanja smo število sej povečali LITERATURA s predvidenih 8 na 10. [1] Piotr Bojanowski, Edouard Grave, Armand Joulin in Tomas Mikolov. 2017. Uporabljeni korpusi so bili zelo raznoliki, udeležence pa so Enriching word vectors with subword information. Transactions of the Asso-zanimali različni vidiki obdelave besedil. V večini primerov so ciation for Computational Linguistics, 5, 135–146. [2] Janez Demšar in sod. 2013. Orange: data mining toolbox in python. Journal bili začetnemu delotoku dodani dodatni gradniki, da bi rešili of Machine Learning Research, 14, 1, 2349–2353. določeno težavo ali zadovoljili posebne interese. Udeleženci so na [3] Jacob Devlin, Ming-Wei Chang, Kenton Lee in Kristina Toutanova. 2019. primer iskali podobnosti in razlike v sosedstvu besed na podlagi BERT: pre-training of deep bidirectional transformers for language understanding. V Proceedings of the 2019 Conference of the North American korpusov iz različnih obdobij ali od različnih avtorjev. Zanimale Chapter of the Association for Computational Linguistics: Human Language so jih tudi osnovne značilnosti takih korpusov, kot so recimo Technologies, Volume 1 (Long and Short Papers), 4171–4186. najpogosteje uporabljene besede, s čimer so bile povezane tudi [4] Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin in Tomas Mikolov. 2018. Learning word vectors for 157 languages. V Proceedings of druge osnovne operacije, kot je na primer filtriranje besed. Med the International Conference on Language Resources and Evaluation (LREC delavnico sta bili odkriti dve specifični tehnični težavi: (I) napake 2018). [5] Markus Hofmann in Ralf Klinkenberg. 2016. RapidMiner: Data mining use so se pojavile v primeru vnosa besedila s posebnimi znaki, ki cases and business analytics applications. CRC Press. ni bilo kodirano v kodni tabeli UTF, in (II) nekateri gradniki, ki [6] Janez Kranjc. 2017. Web Workflows for Data Mining in the Cloud. Doktorska vsebujejo klice na spletne storitve, so poročali o preseženi časovni disertacija. Jožef Stefan International Postgraduate School. [7] Janez Kranjc, Vid Podpečan in Nada Lavrač. 2012. ClowdFlows: a cloud based omejitvi. scientific workflow platform. V Machine Learning and Knowledge Discovery Za udeležence je bil pripravljen anonimen vprašalnik, pove-in Databases. Lecture Notes in Computer Science. Zv. 7524. Peter A. Flach, zava do vprašalnika pa je bila posredovana po delavnici. Večini Tijl Bie in Nello Cristianini, uredniki. Springer Berlin Heidelberg, 816–819. [8] Ingo Mierswa, Michael Wurst, Ralf Klinkenberg, Martin Scholz in Timm udeležencev se je prikazani potek dela zdel zelo uporaben. Velika Euler. 2006. YALE: rapid prototyping for complex data mining tasks. V večina (80%) še nikoli ni poskusila uporabljati vektorskih vložitev. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, USA, 935–940. O uporabniškem vmesniku ClowdFlows so večinoma poročali [9] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado in Jeff Dean. 2013. kot o preprostem za uporabo (preprost: 60%, zelo preprost: 30%), Distributed representations of words and phrases and their compositionality. le enemu udeležencu pa se je zdel zapleten. Te rezultate je sicer V Advances in Neural Information Processing Systems, 3111–3119. [10] Tomas Mikolov, Wen-tau Yih in Geoffrey Zweig. 2013. Linguistic regula-treba upoštevati v kontekstu dejstva, da so odzivi zbrani kmalu rities in continuous space word representations. V Proceedings of the 2013 po uporabi ClowdFlows, pri čemer je bila na voljo pomoč. Brez Conference of the North American Chapter of the ACL: Human Language uvoda in pomoči odgovori morda ne bi bili tako pozitivni, ven-Technologies. ACL, 746–751. [11] Andraž Pelicon, Ravi Shekhar, Matej Martinc, Blaž Škrlj, Matthew Purver dar tega še nismo preizkusili. Večina udeležencev je menila, da in Senja Pollak. 2021. Zero-shot cross-lingual content filtering: offensive bi ponovno uporabili ClowdFlows, če bi jim zagotovili vnaprej language and hate speech detection. V Proceedings of the EACL Hackashop on pripravljen delotok za njihov problem. News Media Content Analysis and Automated Report Generation. Association for Computational Linguistics, Online, (apr. 2021), 30–34. https://aclantholo gy.org/2021.hackashop-1.5. [12] Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee in Luke Zettlemoyer. 2018. Deep contextualized word representations. V Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2227–2237. [13] Senja Pollak in Andraž Pelicon. 2022. EMBEDDIA project: cross-lingual 5 ZAKLJUČEK embeddings for less- represented languages in European news media. V Predstavili smo spletno platformo ClowdFlows, izbrane elemente, Proceedings of the 23rd Annual Conference of the European Association for ki omogočajo napredno uporabo pristopov za učenje besednih Machine Translation. European Association for Machine Translation, Ghent, Belgium, (jun. 2022), 293–294. vektorskih vložitev, in izkušnje nekaterih od naših ciljnih upo- [14] Senja Pollak in sod. 2021. EMBEDDIA tools, datasets and challenges: resour-rabnikov teh orodij. ces and hackathon contributions. V Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation. Association Eden od ciljev platforme ClowdFlows je približanje uporabe for Computational Linguistics, Online, (apr. 2021), 99–109. https://aclanthol najnovejših metod analize podatkov in strojnega učenja tudi upo- ogy.org/2021.hackashop-1.14. rabnikom, ki niso vešči programiranja. Izkušnje naše delavnice [15] David De Roure, Carole Goble in Robert Stevens. 2009. The Design and Realisation of the myExperiment Virtual Research Environment for Social z nekaterimi od potencialnih uporabnikov so pokazale, da je to Sharing of Workflows. Future Generation Computer Systems, 25, (feb. 2009), vsekakor smiselno, saj so na podlagi predpripravljenih deloto-561–567. [16] Matej Ulčar, Aleš Žagar, Carlos S. Armendariz, Andraž Repar, Senja Pollak, kov uporabniki (sicer strokovnjaki na drugih področjih) lahko Matthew Purver in Marko Robnik-Šikonja. 2021. Evaluation of contextual opravili analize na lastnih podatkih in tudi že iskali in predlagali embeddings on less-resourced languages. (2021). https://arxiv.org/abs/2107 nadaljnje postopke analiz, ki so smiselni in uporabni pri njihovem .10614. [17] Joaquin Vanschoren, Jan N Van Rijn, Bernd Bischl in Luis Torgo. 2014. Ope-delu. Poleg metodoloških razširitev in tehničnih izboljšav plat- nML: networked science in machine learning. ACM SIGKDD Explorations forme bomo zato v bodoče več pozornosti namenjali tudi razvoju Newsletter, 15, 2, 49–60. primerov rešitev za ciljne uporabnike, v prvi vrsti za raziskovalce [18] Ian H Witten in Eibe Frank. 2005. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann. s področij, ki niso povezana z računalništvom. 50 Indeks avtorjev / Author index Alukić Erna .................................................................................................................................................................................. 35 Andova Andrejaana ........................................................................................................................................................................ 7 Brence Jure ................................................................................................................................................................................... 11 Čelan Nika .............................................................................................................................................................................. 19, 39 Chaushevska Marija ..................................................................................................................................................................... 11 Džeroski Sašo ............................................................................................................................................................................... 11 Erzar Blaž ............................................................................................................................................................................... 19, 39 Filipič Bogdan .......................................................................................................................................................................... 7, 43 Gams Matjaž .................................................................................................................................................................... 19, 31, 39 Gjoreski Hristijan ......................................................................................................................................................................... 15 Hrastič Aleksander ....................................................................................................................................................................... 19 Kizhevska Emilija ........................................................................................................................................................................ 15 Kolar Žiga .............................................................................................................................................................................. 19, 39 Konečnik Martin .................................................................................................................................................................... 19, 39 Krömer Pavel ................................................................................................................................................................................. 7 Leskovec Gašper .................................................................................................................................................................... 19, 39 Logar Katja ................................................................................................................................................................................... 23 Luštrek Mitja ................................................................................................................................................................................ 15 Mekiš Nejc ................................................................................................................................................................................... 35 Mlakar Miha ................................................................................................................................................................................. 35 Osojnik Aljaž ............................................................................................................................................................................... 27 Podpečan Vid ............................................................................................................................................................................... 47 Pollak Senja .................................................................................................................................................................................. 47 Prestor Domen ........................................................................................................................................................................ 19, 39 Robnik-Šikonja Marko ................................................................................................................................................................. 23 Sever Nace ................................................................................................................................................................................... 43 Shulajkovska Miljana ................................................................................................................................................................... 31 Skobir Matjaž ......................................................................................................................................................................... 19, 39 Slapničar Gašper .......................................................................................................................................................................... 35 Smerkol Maj ................................................................................................................................................................................. 31 Susič David ............................................................................................................................................................................ 19, 39 Todorovski Ljupčo ....................................................................................................................................................................... 11 Tušar Tea .................................................................................................................................................................................. 7, 43 Uher Vojtěch .................................................................................................................................................................................. 7 Us Peter ........................................................................................................................................................................................ 35 Vodopija Aljoša ....................................................................................................................................................................... 7, 43 Ženko Bernard .............................................................................................................................................................................. 27 Žibert Janez .................................................................................................................................................................................. 35 Žnidaršič Martin ..................................................................................................................................................................... 27, 47 51 52 Slovenska konferenca o umetni inteligenci Slovenian Conference on Artificial Intelligence Uredniki  Editors: Mitja Lustrek, Matjaz Gams, Rok Piltaver Document Outline 02 - Naslovnica - notranja - A - TEMP 03 - Kolofon - A - TEMP 04 - IS2022 - Predgovor - TEMP 05 - IS2022 - Konferencni odbori - TEMP 07 - Kazalo - A Blank Page 08 - Naslovnica - notranja - A - TEMP Blank Page 09 - Predgovor podkonference - A 10 - Programski odbor podkonference - A Blank Page 01 Andova Abstract 1 Introduction 2 Theoretical Background 3 Exploratory Landscape Analysis 4 Test problems 5 Experimental setup 6 Results 7 Conclusion and future work Acknowledgments 02 Chaushevska Abstract 1 Introduction 2 Grammars for Equation Discovery 2.1 Context-Free Grammar (CFG) 2.2 Probabilistic Context-Free Grammar (PCFG) 3 Learning probabilities in PCFGs for arithmetical expressions 3.1 The Approach 3.2 The Corpora 3.3 The Learned Probabilities 4 Conclusions and Further work 03 Kizhevska Abstract 1 Introduction 2 Metodology 2.1 Preprocessing 2.2 Feature Engineering 3 Experiments and results 3.1 Dataset Description 3.2 Experimental Setup 3.3 Results 4 Conclusions 04 Kolar Abstract 1 Introduction 2 Related work 3 Methodology 3.1 Peak Detection with Derivatives 3.2 Peak Detection with Encoder/Decoder 3.3 Peak Detection with Artificial Neural Network 3.4 Peak Detection with Convolution Neural Network 3.5 Peak Detection with Predefined Method Find_peaks 3.6 Peak Detection with Library Tsfresh 4 Conclusion and discussion Acknowledgments 05 Logar Abstract 1 Introduction 2 Related work 3 Methodology 4 Experiments and results 4.1 Evaluation Metrics 4.2 English UnifiedQA Results Using T5small 4.3 Slovene Monolingual Results Using SloT5 4.4 Cross-Lingual Transfer Using mT5 4.5 Qualitative Analysis 5 Conclusion and future work 06 Osojnik Abstract 1 Introduction 2 Related Work 3 Data 4 Methods 4.1 Sentiment Analysis 4.2 Frequent Tokens 4.3 Topic Modeling 4.4 Aggregates and trends 5 Web application 6 Conclusion Acknowledgments 07 Shulajkovska Abstract 1 Introduction 2 System 3 Data collection 3.1 Simulation 3.2 Scenarios 3.3 KPIs Calculation 4 Machine-learning for policy proposal 4.1 Methods and Results 5 conclusion 6 future work Acknowledgments 08 Slapnicar Abstract 1 Introduction 2 Related Work 3 Data Description and Preprocessing 4 Methodology and Results 5 Discussion and Conclusion Acknowledgments 09 Susic Abstract 1 Introduction 2 Dataset 3 Methodology 3.1 Signal Decomposition 3.2 Convolutional Autoencoder 4 Results 5 Conclusion Acknowledgments 10 Tusar Povzetek 1 Uvod 2 Problem razporejanja terenskega dela 2.1 Scenarij razporejanja 2.2 Spremenljivke 2.3 Omejitve 2.4 Optimizacijski kriterij 3 Trinivojski optimizacijski algoritem 3.1 Prvi nivo: razporejanje nalog po delavcih 3.2 Drugi nivo: razporejanje nalog delavca po dnevih 3.3 Tretji nivo: določitev časa začetka nalog za en dan enega delavca 4 Preizkus učinkovitosti 5 Poenostavitev problema 6 Zaključki Zahvala 11 Znidarsic Abstract 1 Uvod 2 Ozadje in sorodno delo 2.1 Platforme za vizualno programiranje in deljenje rešitev 2.2 Besedne vložitve 3 ClowdFlows 4 Uporaba Vektorskih Vložitev 4.1 Učenje modela vložitev v ClowdFlows 4.2 Izkušnje uporabnikov 5 Zaključek Zahvala 12 - Index - A Blank Page Blank Page 08 - Naslovnica - notranja - A.pdf Blank Page Blank Page