Zbornik 25. mednarodne multikonference

INFORMACIJSKA DRUZBA

Zvezek A

Proceedings of the 25th International Multiconference

INFORMATION SOCIETY

Volume A



Slovenska konferenca o

umetni inteligenci

Slovenian Conference on

Artificial Intelligence

Uredniki  Editors:

Mitja Lustrek, Matjaz Gams, Rok Piltaver

Ljubljana, Slovenija

11. oktober

11 October

Ljubljana, Slovenia

httpis.ijs.si





Zbornik 25. mednarodne multikonference

INFORMACIJSKA DRUŽBA – IS 2022

Zvezek A





Proceedings of the 25th International Multiconference

INFORMATION SOCIETY – IS 2022

Volume A





Slovenska konferenca o umetni inteligenci

Slovenian Conference on Artificial Intelligence





Uredniki / Editors



Mitja Luštrek, Matjaž Gams, Rok Piltaver





http://is.ijs.si





11. oktober 2022 / 11 October 2022

Ljubljana, Slovenija



Uredniki:





Mitja Luštrek

Odsek za inteligentne sisteme

Institut »Jožef Stefan«, Ljubljana



Matjaž Gams

Odsek za inteligentne sisteme

Institut »Jožef Stefan«, Ljubljana



Rok Piltaver

Outfit7

in Odsek za inteligentne sisteme, Institut »Jožef Stefan«, Ljubljana Založnik: Institut »Jožef Stefan«, Ljubljana

Priprava zbornika: Mitja Lasič, Vesna Lasič, Lana Zemljak

Oblikovanje naslovnice: Vesna Lasič





Dostop do e-publikacije:

http://library.ijs.si/Stacks/Proceedings/InformationSociety





Ljubljana, oktober 2022





Informacijska družba

ISSN 2630-371X



Kataložni zapis o publikaciji (CIP) pripravili v Narodni in univerzitetni knjižnici v Ljubljani

COBISS.SI-ID 127433219

ISBN 978-961-264-241-9 (PDF)





PREDGOVOR MULTIKONFERENCI

INFORMACIJSKA DRUŽBA 2022



Petindvajseta multikonferenca Informacijska družba je preživela probleme zaradi korone. Zahvala za skoraj normalno delovanje konference gre predvsem tistim predsednikom konferenc, ki so kljub prvi pandemiji modernega sveta pogumno obdržali visok strokovni nivo.



Pandemija v letih 2020 do danes skoraj v ničemer ni omejila neverjetne rasti IKTja, informacijske družbe, umetne inteligence in znanosti nasploh, ampak nasprotno – rast znanja, računalništva in umetne inteligence se nadaljuje z že kar običajno nesluteno hitrostjo. Po drugi strani se nadaljuje razpadanje družbenih vrednot ter tragična vojna v Ukrajini, ki lahko pljuskne v Evropo. Se pa zavedanje večine ljudi, da je potrebno podpreti stroko, krepi. Konec koncev je v 2022 v veljavo stopil not raziskovalni zakon, ki bo izboljšal razmere, predvsem leto za letom povečeval sredstva za znanost.



Letos smo v multikonferenco povezali enajst odličnih neodvisnih konferenc, med njimi »Legende računalništva«, s katero postavljamo nov mehanizem promocije informacijske družbe. IS 2022 zajema okoli 200 predstavitev, povzetkov in referatov v okviru samostojnih konferenc in delavnic ter 400 obiskovalcev. Prireditev so spremljale okrogle mize in razprave ter posebni dogodki, kot je svečana podelitev nagrad. Izbrani prispevki bodo izšli tudi v posebni številki revije Informatica (http://www.informatica.si/), ki se ponaša s 46-letno tradicijo odlične znanstvene revije. Multikonferenco Informacijska družba 2022 sestavljajo naslednje samostojne konference:

• Slovenska konferenca o umetni inteligenci

• Izkopavanje znanja in podatkovna skladišča

• Demografske in družinske analize

• Kognitivna znanost

• Kognitonika

• Legende računalništva

• Vseprisotne zdravstvene storitve in pametni senzorji

• Mednarodna konferenca o prenosu tehnologij

• Vzgoja in izobraževanje v informacijski družbi

• Študentska konferenca o računalniškem raziskovanju

• Matcos 2022

Soorganizatorji in podporniki konference so različne raziskovalne institucije in združenja, med njimi ACM

Slovenija, SLAIS, DKZ in druga slovenska nacionalna akademija, Inženirska akademija Slovenije (IAS). V imenu organizatorjev konference se zahvaljujemo združenjem in institucijam, še posebej pa udeležencem za njihove dragocene prispevke in priložnost, da z nami delijo svoje izkušnje o informacijski družbi. Zahvaljujemo se tudi recenzentom za njihovo pomoč pri recenziranju.



S podelitvijo nagrad, še posebej z nagrado Michie-Turing, se avtonomna stroka s področja opredeli do najbolj izstopajočih dosežkov. Nagrado Michie-Turing za izjemen življenjski prispevek k razvoju in promociji informacijske družbe je prejel prof. dr. Jadran Lenarčič. Priznanje za dosežek leta pripada ekipi NIJZ za portal zVEM. »Informacijsko limono« za najmanj primerno informacijsko potezo je prejela cenzura na socialnih omrežjih,

»informacijsko jagodo« kot najboljšo potezo pa nova elektronska osebna izkaznica. Čestitke nagrajencem!



Mojca Ciglarič, predsednik programskega odbora

Matjaž Gams, predsednik organizacijskega odbora



i

FOREWORD - INFORMATION SOCIETY 2022



The 25th Information Society Multiconference (http://is.ijs.si) survived the COVID-19 problems. The multiconference survived due to the conference chairs who bravely decided to continue with their conferences despite the first pandemics in the modern era.



The COVID-19 pandemic from 2020 till now did not decrease the growth of ICT, information society, artificial intelligence and science overall, quite on the contrary – the progress of computers, knowledge and artificial intelligence continued with the fascinating growth rate. However, the downfall of societal norms and progress seems to slowly but surely continue along with the tragical war in Ukraine. On the other hand, the awareness of the majority, that science and development are the only perspective for prosperous future, substantially grows. In 2020, a new law regulating Slovenian research was accepted promoting increase of funding year by year.



The Multiconference is running parallel sessions with 200 presentations of scientific papers at twelve conferences, many round tables, workshops and award ceremonies, and 400 attendees. Among the conferences, “Legends of computing” introduce the “Hall of fame” concept for computer science and informatics. Selected papers will be published in the Informatica journal with its 46-years tradition of excellent research publishing.



The Information Society 2022 Multiconference consists of the following conferences:

• Slovenian Conference on Artificial Intelligence

• Data Mining and Data Warehouses

• Cognitive Science

• Demographic and family analyses

• Cognitonics

• Legends of computing

• Pervasive health and smart sensing

• International technology transfer conference

• Education in information society

• Student computer science research conference 2022

• Matcos 2022

The multiconference is co-organized and supported by several major research institutions and societies, among them ACM Slovenia, i.e. the Slovenian chapter of the ACM, SLAIS, DKZ and the second national academy, the Slovenian Engineering Academy. In the name of the conference organizers, we thank all the societies and institutions, and particularly all the participants for their valuable contribution and their interest in this event, and the reviewers for their thorough reviews.



The award for life-long outstanding contributions is presented in memory of Donald Michie and Alan Turing. The Michie-Turing award was given to Prof. Dr. Jadran Lenarčič for his life-long outstanding contribution to the development and promotion of information society in our country. In addition, the yearly recognition for current achievements was awarded to NIJZ for the zVEM platform. The information lemon goes to the censorship on social networks. The information strawberry as the best information service last year went to the electronic identity card.

Congratulations!



Mojca Ciglarič, Programme Committee Chair

Matjaž Gams, Organizing Committee Chair





ii





KONFERENČNI ODBORI

CONFERENCE COMMITTEES



International Programme Committee

Organizing Committee

Vladimir Bajic, South Africa

Matjaž Gams, chair

Heiner Benking, Germany

Mitja Luštrek

Se Woo Cheon, South Korea

Lana Zemljak

Howie Firth, UK

Vesna Koricki

Olga Fomichova, Russia

Mitja Lasič

Vladimir Fomichov, Russia

Blaž Mahnič

Vesna Hljuz Dobric, Croatia



Alfred Inselberg, Israel

Jay Liebowitz, USA

Huan Liu, Singapore

Henz Martin, Germany

Marcin Paprzycki, USA

Claude Sammut, Australia

Jiri Wiedermann, Czech Republic

Xindong Wu, USA

Yiming Ye, USA

Ning Zhong, USA

Wray Buntine, Australia

Bezalel Gavish, USA

Gal A. Kaminka, Israel

Mike Bain, Australia

Michela Milano, Italy

Derong Liu, Chicago, USA

Toby Walsh, Australia

Sergio Campos-Cordobes, Spain

Shabnam Farahmand, Finland

Sergio Crovella, Italy





Programme Committee

Mojca Ciglarič, chair

Nikola Guid

Andrej Ule

Bojan Orel,

Marjan Heričko

Boštjan Vilfan

Franc Solina,

Borka Jerman Blažič Džonova

Baldomir Zajc

Viljan Mahnič,

Gorazd Kandus

Blaž Zupan

Cene Bavec,

Urban Kordeš

Boris Žemva

Tomaž Kalin,

Marjan Krisper

Leon Žlajpah

Jozsef Györkös,

Andrej Kuščer

Niko Zimic

Tadej Bajd

Jadran Lenarčič

Rok Piltaver

Jaroslav Berce

Borut Likar

Toma Strle

Mojca Bernik

Janez Malačič

Tine Kolenik

Marko Bohanec

Olga Markič

Franci Pivec

Ivan Bratko

Dunja Mladenič

Uroš Rajkovič

Andrej Brodnik

Franc Novak

Borut Batagelj

Dušan Caf

Vladislav Rajkovič

Tomaž Ogrin

Saša Divjak

Grega Repovš

Aleš Ude

Tomaž Erjavec

Ivan Rozman

Bojan Blažica

Bogdan Filipič

Niko Schlamberger

Matjaž Kljun

Andrej Gams

Stanko Strmčnik

Robert Blatnik

Matjaž Gams

Jurij Šilc

Erik Dovgan

Mitja Luštrek

Jurij Tasič

Špela Stres

Marko Grobelnik

Denis Trček

Anton Gradišek





iii



iv





KAZALO / TABLE OF CONTENTS



Slovenska konferenca o umetni inteligenci / Slovenian Conference on Artificial Intelligence .......................... 1

PREDGOVOR / FOREWORD ................................................................................................................................. 3

PROGRAMSKI ODBORI / PROGRAMME COMMITTEES ..................................................................................... 5

Initial Results in Predicting High-Level Features of Constrained Multi-Objective Optimization Problems / Andova

Andrejaana, Vodopija Aljoša, Krömer Pavel, Uher Vojtěch, Tušar Tea, Filipič Bogdan .................................... 7

Learning the Probabilities in Probabilistic Context-Free Grammars for Arithmetical Expressions from Equation

Corpora / Chaushevska Marija, Todorovski Ljupčo, Brence Jure, Džeroski Sašo ......................................... 11

Prediction of the Inflow in Macedonian Hydropower Plants, Using Machine Learning / Kizhevska Emilija,

Gjoreski Hristijan, Luštrek Mitja ........................................................................................................................ 15

Naslov / Kolar Žiga, Erzar Blaž, Čelan Nika, Hrastič Aleksander, Leskovec Gašper, Konečnik Martin, Prestor

Domen, Susič David, Skobir Matjaž, Gams Matjaž .......................................................................................... 19

Unified Question Answering in Slovene / Logar Katja, Robnik-Šikonja Marko .................................................... 23

Social Media Analysis for Assessing Resilience / Osojnik Aljaž, Ženko Bernard, Žnidaršič Martin .................... 27

Urban Mobility Policy Proposal Using Machine-Learning Techniques / Shulajkovska Miljana, Smerkol Maj,

Gams Matjaž .................................................................................................................................................... 31

IMF Quality Assurance of Mammograms Using Deep Convolutional Neural Networks and Transfer Learning /

Slapničar Gašper, Us Peter, Alukić Erna, Mekiš Nejc, Mlakar Miha, Žibert Janez .......................................... 35

Vehicle Axle Distance Detection From Time-series Signals Using Machine Learning / Susič David, Erzar Blaž,

Čelan Nika, Leskovec Gašper, Kolar Žiga, Konečnik Martin, Prestor Domen, Skobir Matjaž, Gams Matjaž .. 39

Študija učinkovitosti algoritma za razporejanje terenskega dela / Tušar Tea, Sever Nace, Vodopija Aljoša,

Filipič Bogdan ................................................................................................................................................... 43

Interaktivno eksperimentiranje z besednimi vložitvami v platformi ClowdFlows / Žnidaršič Martin, Pol ak Senja,

Podpečan Vid ................................................................................................................................................... 47

Indeks avtorjev / Author index ................................................................................................................................ 51





v





vi



Zbornik 25. mednarodne multikonference

INFORMACIJSKA DRUŽBA – IS 2022

Zvezek A





Proceedings of the 25th International Multiconference

INFORMATION SOCIETY – IS 2022

Volume A





Slovenska konferenca o umetni inteligenci

Slovenian Conference on Artificial Intelligence





Uredniki / Editors



Mitja Luštrek, Matjaž Gams, Rok Piltaver





http://is.ijs.si





11. oktober 2022 / 11 October 2022

Ljubljana, Slovenija

1





2





PREDGOVOR





Umetna inteligenca še vedno hitro napreduje, so pa glavni dosežki lanskega leta na področjih, kjer smo jih že vajeni. Avtonomna vozila so vedno bolj avtonomna in se že uporabljajo za prevoz potnikov, čeravno v zelo omejenem obsegu. Jezikovni modeli, kot so izboljšani GPT-3, postajajo zreli za praktično uporabo, zato se njihovi stvaritelji začenjajo ukvarjati s tem, kako jih odvračati od tvorbe politično nekorektnih besedil. Po eni strani razumljivo, po drugi strani pa – ob problemu omejevanja svobode govora na spletu, ki si je letos prislužil nominacijo za informacijsko limono – tudi nekoliko skrb zbujajoče. Modeli za generiranje slik iz opisov, katerih prvi vidnejši predstavnik je bil DALL-E, so se letos namnožili, in videli smo več poizkusov njihove uporabe za izdelavo stripov. Potlej pa so tu še aplikacije v robotiki, medicini, računalniški varnosti in seveda zvitemu streženju spletnih reklam.



Ko umetna inteligenca postaja vedno zmožnejša in bolj razširjena, se pojavljalo pomisleki o njeni varnosti ter prizadevanja za uporabo, ki bo družbi v korist in ne v škodo. Ta škoda se začne z nepotrebnimi nakupi zaradi preveč zvitih reklam, ki nas spremljajo že dolgo in smo se z njimi sprijaznili, nadaljuje pa s še resnejšimi problemi, kot so denimo slabe medicinske in zaposlovalne odločitve. Zaradi tovrstnih problemov vse več držav sprejema zakonodajo o umetni inteligenci, ki bo raziskovalcem bržkone povzročila nekaj sivih las, a če bo dobra – in k temu skušajmo prispevati, kolikor lahko – bo tudi pomagala, da naše delo ne bo dobilo zloveščega pridiha. Vse več je tudi razmišljanja o splošni umetni inteligenci z zmožnostmi, ki presegajo človeške. Njen vpliv na človeško družbo utegne biti dramatičen. A če želimo zagotoviti, da bo dramatično dober, se bomo morali v prihodnjih letih resno lotiti raziskovalnega področja zagotavljanja, da kompleksni modeli umetne inteligence zares počno tisto, kar mislimo in želimo, da počno, ki je zaenkrat še precej v povojih.



Za konec pa poglejmo, kako je letos z našo konferenco. 11 prispevkov, ki smo jih prejeli, sicer ne opisuje tako visokoletečega dela, kot ga obravnavata prejšnja dva odstavka, so pa vseeno kakovosti in morda začetek česa pomembnega. Število je zmerno in Institut Jožef Stefan še malo bolj prevladujoč, kot običajno, za kar do neke mere krivimo COVID-19 – ne ker bi nas še vedno hudo pestil, ampak ker sta dve konferenčno klavrni leti raziskovalce konferenčenja malo odvadili. A upajmo, da bo tudi to minilo. Prirejamo pa letos v okviru konference Data Science Meetup – dogodek z lepo tradicijo in dobro udeležbo, kjer imajo strokovnjaki iz industrije kratke predstavitve svojega dela. Na to smo ponosni, saj rešuje težavo pomanjkanja prispevkov iz industrije, ki smo se je dotaknili že v preteklih predgovorih.





3

FOREWORD





Artificial intelligence is still making good progress, but the major achievements of the past year are in the areas where we have grown to expect them. Autonomous vehicles are increasingly autonomous and already being used to carry passengers, albeit in a very limited way. Language models, such as the improved GPT-3, are becoming ready for practical use.

Because of that, their authors are starting to work on preventing them from generating politically incorrect texts. This is on one hand understandable, but on the other hand –

considering the problem of censorship on the internet, which was nominated for the Information Lemon this year – somewhat concerning. Models that generate images from text descriptions, whose first prominent representative was DALL-E, are proliferating. We have seen several attempts of using them to generate comics. There are also applications in robotics, medicine, cybersecurity and of course cunning delivery of online ads.



With artificial intelligence becoming ever more capable and pervasive, concerns about its safety and use for the benefit of the society rather them harm are increasingly raised. The harm starts with unnecessary consumption due to insidious advertising, but these is old news we have become accustomed to. However, there are potentially more serious problems, such as bad medical or employment decisions. Because of these, a number of countries are drafting legislation about artificial intelligence. This will surely be a headache for researchers, but if the legislation is good – and we should help make it such if we can – it will benefit the reputation of our work. Superhuman general artificial intelligence is also increasingly entering professional and public debate. Its impact on the humanity could be dramatic. To ensure it is dramatically good, we will have to tackle the very much open research problem of ensuring that complex artificial-intelligence models indeed do what we think and want them to do.



Let us finally take a look at our conference. The 11 papers we received are not describing work as ambitious as that described in the previous paragraphs, but they are nevertheless good and perhaps the beginning of something important. The number is modest and Jožef Stefan Institute even more overrepresented than usual, which we partially blame on COVID-19. Not that it is still a major problem, but in the two years without truly good conference the researchers seem to have lost the habit of going to conferences to some degree. We hope that this, too, shall pass. On a brighter note, we are organizing Data Science Meetup as a part of our conference. This is an event with a longstanding tradition and good attendance in which experts from the industry give short talks on their work. We are quite proud of this achievement, since it addresses the lack of papers from the industry which we bemoaned in past forewords.





4





PROGRAMSKI ODBOR / PROGRAMME COMMITTEE



Mitja Luštrek

Matjaž Gams

Rok Piltaver

Cene Bavec

Marko Bohanec

Marko Bonač

Ivan Bratko

Bojan Cestnik

Aleš Dobnikar

Erik Dovgan

Bogdan Filipič

Borka Jerman Blažič

Marjan Krisper

Marjan Mernik

Biljana Mileva Boshkoska

Vladislav Rajkovič

Niko Schlamberger

Tomaž Seljak

Peter Stanovnik

Damjan Strnad

Miha Štajdohar

Vasja Vehovar

Martin Žnidaršič



5





6





Initial Results in Predicting High-Level Features of Constrained Multi-Objective Optimization Problems

Andrejaana Andova

Pavel Krömer

Tea Tušar

Aljoša Vodopija

Vojtěch Uher

Bogdan Filipič

Jožef Stefan Institute and

Department of Computer Science

Jožef Stefan Institute and

Jožef Stefan International

VSB - Technical University of

Jožef Stefan International

Postgraduate School

Ostrava

Postgraduate School

Jamova cesta 39

17. listopadu 2172/15

Jamova cesta 39

Ljubljana, Slovenia

Ostrava-Poruba, Czech Republic

Ljubljana, Slovenia

andrejaana.andova@ijs.si

pavel.kromer@vsb.cz

tea.tusar@ijs.si

aljosa.vodopija@ijs.si

vojtech.uher@vsb.cz

bogdan.filipic@ijs.si

ABSTRACT

needs to be treated as constrained or unconstrained. Moreover,

Trying numerous algorithms on an optimization problem that we

Ma et al. [5] showed which constraint handling techniques are encounter for the first time in order to find the best-performing

more successful in solving CMOPs, depending on the problem

algorithm is time-consuming and impractical. To narrow down

type. Similarly, the connectivity of the feasible region (or problem the number of algorithm choices, high-level features describing

connectivity for short) defines the multimodality of the problem

important problem characteristics can be related with algorithm

violation landscape and, therefore, crucially affects the choice of performance. However, characterizing optimization problems for

algorithms that can solve the problem efficiently [5].

this purpose is challenging, especially when they include multiple High-level features of a new problem can be predicted using

objectives and constraints. In this work, we use machine learning

automatically calculated low-level problem features. The most

(ML) to automatically predict high-level features of constrained

widely known low-level features in evolutionary optimization

multi-objective optimization problems (CMOPs) from low-level,

are the exploratory landscape analysis (ELA) features. They were

exploratory landscape analysis features. The results obtained on

initially introduced to characterize single-objective optimization the MW benchmark show a significant difference in classification

problems and implemented in the flacco package [2]. More re-accuracy depending on the applied evaluation approach. The poor

cently, Liefooghe et al. [4] proposed a set of ELA features for performance of the leave-one-problem-out strategy indicates

multi-objective optimization problems, and Vodopija et al. [10]

the need for further investigation of the relevance of low-level

introduced additional ELA features for CMOPs.

features in CMOP characterization.

In this work, we use the ELA features from [4] and some from

[10] to investigate whether they are useful for predicting problem KEYWORDS

type and connectivity. To the best of our knowledge, this is the

first attempt to predict the high-level features of CMOPs. A simi-

constrained multi-objective optimization, exploratory landscape

lar study was performed by Renau et al. [7] on single-objective analysis, sampling methods, problem characterization, machine

optimization problems. They used ELA features to classify the op-

learning

timization problem. When splitting the data into training and test sets, instances from the same problem were used for both training

1

INTRODUCTION

and testing. The first of our three experiments follows this setup.

Predicting high-level features of constrained multi-objective op-

However, because this evaluation methodology is not useful in

timization problems (CMOPs) is important as it can help de-

practice (the class of a new real-world problem is unknown), a

cide which algorithm to use when faced with a new (real-world)

second experiment is performed using the leave-one-problem-

CMOP. The structure of the objective and constraint functions are

out methodology. Finally, the third experiment varies the number

usually unknown for such problems. Moreover, the evaluation of

of target problem instances used for training to gain further in-

problem solutions might be very time-consuming. In such cases,

sight in the difficult task of predicting high-level features from it is beneficial to know certain high-level features of the CMOP,

low-level ones.

which eases the selection of an appropriate multi-objective op-

The paper is further organized as follows. In Section 2, we in-timization algorithm or constraint handling technique to solve

troduce the theoretical background of constrained multi-objective

the problem efficiently.

optimization. In Section 3, we explain the features used in this Two frequently considered high-level features of CMOPs are

study. In Section 4, we present the considered test problems, and the problem type and connectivity of the feasible region. The

in Section 5 the experimental setup. In Section 6, we report on the problem type characterizes whether and how the constraints

obtained results. Finally, in Section 7, we provide a conclusion change the Pareto front of the problem. As pointed out by Tanabe

and present the ideas for future work.

et al. [8], this feature is useful as it indicates whether the problem 2

THEORETICAL BACKGROUND

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or A CMOP can be formulated as:

distributed for profit or commercial advantage and that copies bear this notice and minimize

𝑓

(𝑥 ),

𝑚 = 1, . . . , 𝑀

𝑚

the full citation on the first page. Copyrights for third-party components of this (1)

work must be honored. For all other uses, contact the owner/author(s).

subject to

𝑔

(𝑥 ) ≤ 0,

𝑘 = 1, . . . , 𝐾,

𝑘

Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

where 𝑥 = (𝑥

) is a search vector of dimension 𝐷, 𝑓

:

© 2022 Copyright held by the owner/author(s).

1, . . . , 𝑥𝐷

𝑚

𝑆 → R are objective functions, 𝑔

: 𝑆 →

𝑘

R constraint functions,

7





Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Andova et al.

𝐷

Table 1: High-level features of the MW test problems.

𝑆

⊆ R is the search space, and 𝑀 and 𝐾 are the numbers of

objectives and constraints, respectively.

A solution 𝑥 is feasible, if it satisfies all constraints 𝑔 (𝑥) ≤ 0

𝑘

Problem

Type

Connectivity

for 𝑘 = 1, . . . , 𝐾. For each constraint 𝑔 we can define the con-

𝑘

MW1

II

Disconnected

straint violation as 𝑣 (𝑥) = max(0, 𝑔 (𝑥)). The overall constraint

𝑘

𝑘

MW2

I

Disconnected

violation is defined as

MW3

III

Connected

𝐾

MW4

I

Connected

∑︁

𝑣 (𝑥 ) =

𝑣

(𝑥 ).

(2)

𝑘

MW5

II

Connected

𝑖

MW6

II

Disconnected

A solution 𝑥 is feasible iff 𝑣 (𝑥) = 0.

MW7

III

Connected

A feasible solution 𝑥 ∈ 𝑆 is said to dominate another feasible

MW8

II

Disconnected

solution 𝑦 ∈ 𝑆 if 𝑓 (𝑥) ≤ 𝑓 (𝑦) for all 1 ≤ 𝑚 ≤ 𝑀, and 𝑓 (𝑥) < MW9

IV

Connected

𝑚

𝑚

𝑚

∗

𝑓

(𝑦) for at least one 1 ≤ 𝑚 ≤ 𝑀. A feasible solution 𝑥 ∈ 𝑆 is a MW10

III

Disconnected

𝑚

Pareto-optimal solution if there exists no feasible solution 𝑥 ∈ 𝑆

MW11

IV

Disconnected

that dominates ∗

𝑥

. All feasible solutions constitute the feasible

MW12

IV

Disconnected

region, 𝐹 = {𝑥 ∈ 𝑆 | 𝑣 (𝑥) = 0}, and all nondominated feasible MW13

III

Disconnected

solutions form the Pareto set, 𝑆o. The image of the Pareto set in MW14

I

Connected

the objective space is the Pareto front, 𝑃o = {𝑓 (𝑥) | 𝑥 ∈ 𝑆o}.

3

EXPLORATORY LANDSCAPE ANALYSIS

They are the minimum and maximum correlations between the

objectives and the overall constraint violation.

ELA is a selection of techniques able to analyze the search and

objective space of a problem, their correlation and their charac-

4

TEST PROBLEMS

teristics with the goal of identifying the features important for

the performance of optimization algorithms. To extract the ELA

We base this study on 14 CMOPs proposed by Ma et al. [5] and features, one needs to first generate a sample of solutions. The

called MW1–14. In addition to proposing the problems, the au-

ELA features use statistical methods to characterize the problem

thors also describe them with high-level features, such as the

landscape. Thus, one can use an arbitrary sample size. However,

problem type and connectivity of the feasible region. The values

the ELA features are generally more accurate for large sample

of these two high-level features for each MW problem are listed

sizes. The ELA features proposed by Liefooghe et al. [4] and in Table 1.

used also in this work can be divided into four categories: global, Many of the ELA features proposed by Liefooghe et al. [4]

multimodality, evolvability, and ruggedness features.

can only be calculated for bi-objective optimization problems.

The global features capture certain global problem properties,

Therefore, we investigate only the bi-objective versions of the

for example, the correlation between the objective values, average MW problems although three of them are scalable in the number

and maximum distance between solutions in the search space and

of objectives. All MW problems are also scalable in the num-

the objective space, the proportion of non-dominated solutions,

ber of variables. We use 5-dimensional problems to match the

the average and maximum rank of solutions, etc.

experimental setup from [7].

The multimodality features assess the number of local optima

in the objective space. They are computed for the bi-objective

5

EXPERIMENTAL SETUP

space and also for each objective separately, in both cases by

In preliminary experiments, we used six sampling methods from

analyzing the neighbourhood of each solution. If a solution domi-

the ghalton [1] and scipy [9] Python libraries: gHalton, Halton, nates its neighbors (or has a better objective value than its neigh-Sobol, Latin hypercube sampling, optimized Latin hypercube

bors), it is defined as a local optimum. The multimodality features sampling, and uniform sampling [3]. The results have shown comprise the proportion of solutions that are locally optimal, the that similar prediction accuracies are obtained when using data

average and maximum distances between local optima, etc.

provided by any of these sampling methods. For this reason,

The evolvability features describe how fast a local optimizer

we only present the results obtained using the Sobol sampling

would converge towards an optimum. They are calculated by

method in the rest of the paper.

analyzing how many neighboring solutions are dominated by,

The Sobol sampling method generates a sample set by parti-

dominating, or incomparable with a given solution.

tioning the search space and filling each partition with a sample

The ruggedness features measure the correlation between the

solution. We generate additional Sobol sample sets using the

information and quality from neighboring solutions – larger cor-

Cranley-Patterson rotation [3]. The solutions from the original relation means a smoother landscape. The features are calculated

sample set are rotated using a random shift of each dimension,

by using Spearman’s correlation coefficient on the evolvability

thus creating new sample sets that preserve the properties of

features between each pair of neighboring solutions.

the Sobol sampling. The modulo operation keeps the shifted val-

In addition, we include four ELA features from [10] that de-ues within the unitary interval. This approach was also used by

scribe the violation landscape and its relation with the objective Renau et al. [7].

space. The first feature is the feasibility ratio. It is expressed as Following this approach, we generate 100 sets of samples,

the proportion of feasible solutions in the sample and is one

each with 512 solutions, which we then evaluate on all 14 MW

of the most frequently used features in categorizing violation

benchmark problems. For each problem and sample set pair, we

landscapes. The second feature is the maximum value of overall

compute 46 ELA features, which represent a single instance in

constraint violation values in the sample. The last two features

the data. As a result, by evaluating the 100 sample sets on each

measure the relationship between the objectives and constraints.

of the 14 test problems, we get 1400 data instances. We then use

8





Initial Results in Predicting High-Level Features of CMOPs Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

these data instances and the corresponding high-level problem

Table 2: Classification accuracy when 50% of all data is used

features (problem type and connectivity) to train a classifier for for training and 50% for testing (first experiment).

predicting the high-level problem features.

We use two widely used machine learning (ML) methods for

Learning method

Problem type

Problem connectivity

classification: the Random Forest (RF) classifier and the k-Nearest RF

98%

99%

Neighbors (KNN) classifier. The reason for choosing these classi-

KNN

100%

100%

fiers instead of some others is that, usually, RF performs favorably compared to other ML classifiers. KNN, on the other hand, uses

the distance between solutions as a performance metric, which is

useful when analyzing the obtained classification results visually.

the problem type prediction, and to 41–57% for the problem con-

For both RF and KNN, we apply the implementation from the

nectivity prediction (see the leftmost points corresponding to 0%

scikit-learn library [6]. For KNN, we keep the default settings, on the plots in Figure 2). This is comparable to the classification while for RF we train 100 trees.

accuracy of the stratified classifier, which achieves 19% for the

We perform three experiments that differ in the classifier eval-

problem type prediction and 45% for the problem connectivity

uation methodology. In the first experiment, we base the evalua-

prediction. We can look at the results of the third experiment to

tion methodology on the work by Renau et al. [7], where the data help us understand this decline in classification accuracy. As seen is split by using instances from the same problem for both train-from Figure 2, adding just a few instances of the target problem to ing and testing. There, 50% of all instances are used for training, the training set drastically increases the classification accuracy.

and the remaining 50% for testing. Furthermore, we take care

When the training data contains no instances from the target

of dividing the instances into training and test sets so that the

problem, the classifier is forced to find information about the

proportion of instances from each problem is equal in both sets.

high-level feature from other problems. However, this is a much

However, this methodology does not correspond to the real-

harder task given that similar problems often have different high-

world scenario where we want to learn the high-level features of

level features (see the middle and right plots in Figure 1).

a problem encountered for the first time. Therefore, we use the

In the visualizations in Figure 1 the points indicating the cor-leave-one-problem-out evaluation methodology in the second

rectly classified instances have black edges. As we can see, for

experiment. Here, the instances from a single problem are used

many problems, RF has a 0% classification accuracy (top middle

for testing, and the instances from all other problems for training.

and top right plot). There are, however, some problems for which

The procedure is repeated for all problems and the classification

RF finds the correct class for a number of instances. Nonethe-

accuracy is calculated as the average over all train-test splits.

less, from these 2-D plots it is hard to understand why certain

Finally, the third experiment is performed to see how adding

instances are misclassified by RF. This is because RF detects de-

target problem data to the training set influences the resulting

tails in the data that the dimensionality reduction visualization

classification accuracy. In this experiment, we vary the percent-

method is unable to capture.

age of target problem data that is used for training between 0%

Similar behavior can be observed for KNN. Given that KNN

and 99% with the step of 1%. When it equals 0%, no target problem

classifies an instance depending on the classes of its most similar data is used for training, which corresponds to the leave-one-instances, the visualization from Figure 1 can help interpret its problem-out methodology of the second experiment. Note that

poor results on the leave-one-problem-out methodology. We can

this setup never equals the one from the first experiment because

see that the clusters created by PacMAP are not well-aligned

here the data of all other (non-target) problems is always used

with the high-level features of problem type and connectivity.

for training. Again, this procedure is repeated for all problems

This makes predicting them a hard task for KNN. The cluster-

and we report the average classification accuracy.

ing by PacMAP suggests that the applied ELA features are not

To better understand the task we are trying to solve, we visu-

descriptive enough for predicting problem type and connectivity.

alize the classes by first reducing the dimensionality of the fea-

ture space from 46-D to 2-D using Pairwise Controlled Manifold

7

CONCLUSION AND FUTURE WORK

Approximation Projection (PaCMAP) [11]. We use the Python In this work, we tried to predict high-level features of CMOPs.

package pacmap with default parameter values.

More specifically, using low-level ELA features, we constructed

the classifiers to predict the problem type and connectivity. Two

ML classifiers were utilized, RF and KNN.

6

RESULTS

We employed three evaluation methodologies. The first one

The results of the first experiment, where 50% of all data is used follows the related work and splits the data into two halves, one

for training and 50% for testing, show that both RF and KNN

serving as the training set and the other as the test set (instances achieve a classification accuracy above 98% (see Table 2). An ex-from the same problem are used in both sets). The second evalu-

planation for such good results can be derived from the two left-

ation methodology uses all instances from the target problem for

most plots in Figure 1. Here, we can see that PacMAP finds many testing, and none for training. The third method gradually adds

clusters in the data. However, the clusters are highly correlated to the target problem data to the training set. We achieved excellent the problems themselves. Thus, leaving some instances from the

classification accuracy with the first evaluation methodology, but target problem in the training set results in a high classification very poor ones with the second one. The drop in classification ac-accuracy because the classification task is now transformed into

curacy was checked by the third methodology, which has shown

identifying to which cluster the new sample belongs, which is a

that already a small number of instances of the same problem

much easier task to perform.

increases the classification accuracy.

The more realistic scenario of having to predict the high-level

Visualizations of the data in the form of 2-D plots show that

feature of a yet unseen problem is tested in the second experi-

CMOP instances form clusters that are highly correlated to the

ment. Here, the classification accuracy drops to only 7–19% for

problem instances, but not to the high-level problem features. For 9





Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

Andova et al.

Figure 1: Dimensionality reduction of the ELA feature space using the PacMAP method. Points are colored based on their true values with correct classifications denoted by a black point edge. The top and bottom rows show the results for Random Forest and KNN, respectively, while the different classification targets are arranged in columns: the left column displays the results for the problem, the middle for problem type and the right for problem connectivity.

N2-0254) and the Czech Science Foundation (grant no. GF22-

34873K). The Slovenian authors acknowledge additional financial

support from the Slovenian Research Agency (young researcher

program and research core funding no. P2-0209).

REFERENCES

[1]

François-Michel De Rainville, Christian Gagné, Olivier Teytaud, and Denis Laurendeau. 2012. Evolutionary optimization of low-discrepancy sequences.

ACM Transactions on Modeling and Computer Simulation, 22, 2, 1–25.

[2]

Christian Hanster and Pascal Kerschke. 2017. flaccogui: Exploratory landscape analysis for everyone. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO) Companion. ACM, 1215–1222.

Figure 2: Classification accuracy for different proportions

[3]

Christiane Lemieux. 2009. Monte Carlo and Quasi-Monte Carlo Sampling.

of data from the target problem used for training.

Springer Series in Statistics. Springer New York, NY.

[4]

Arnaud Liefooghe, Sébastien Verel, Benjamin Lacroix, Alexandru-Ciprian Zăvoianu, and John McCall. 2021. Landscape features and automated algorithm selection for multi-objective interpolated continuous optimisation problems. In Proceedings of the Genetic and Evolutionary Computation Con-this reason, by including some instances from the target problem

ference (GECCO). ACM, 421–429.

in the training set, the classification task becomes an easier task of

[5]

Zhongwei Ma and Yong Wang. 2019. Evolutionary constrained multiobjective optimization: Test suite construction and performance comparisons.

recognizing to which cluster an instance belongs. Unfortunately,

IEEE Transactions on Evolutionary Computation, 23, 6, 972–986.

this is not a realistic scenario, since in the real world we have

[6]

F. Pedregosa et al. 2011. Scikit-learn: Machine learning in Python. Journal no information on the characteristics of the newly encountered

of Machine Learning Research, 12, 2825–2830.

[7]

Quentin Renau, Carola Doerr, Johann Dreo, and Benjamin Doerr. 2020.

problem. We therefore recommend to use the second evaluation

Exploratory landscape analysis is strongly sensitive to the sampling strategy.

methodology when addressing this task.

In International Conference on Parallel Problem Solving from Nature. Springer, However, the initial results obtained using the second evalua-139–153.

[8]

Ryoji Tanabe and Akira Oyama. 2017. A note on constrained multi-objective tion methodology are not so promising. A possible improvement

optimization benchmark problems. In IEEE Congress on Evolutionary Com-could be considering more ELA features in the learning procedure,

putation (CEC). IEEE, 1127–1134.

[9]

Pauli Virtanen et al. 2020. SciPy 1.0: Fundamental Algorithms for Scientific either additional ones from [10] or newly created ones. Moreover, Computing in Python. Nature Methods, 17, 261–272.

using a more representative set of test problems from various

[10]

Aljoša Vodopija, Tea Tušar, and Bogdan Filipič. 2022. Characterization of benchmark suites may also improve classifier performance.

constrained continuous multiobjective optimization problems: A feature space perspective. Information Sciences, 607, 244–262.

[11]

Yingfan Wang, Haiyang Huang, Cynthia Rudin, and Yaron Shaposhnik.

ACKNOWLEDGMENTS

2021. Understanding how dimension reduction tools work: An empirical approach to deciphering t-sne, umap, trimap, and pacmap for data visual-The authors acknowledge the project “Constrained multi-objective

ization. Journal of Machine Learning Research, 22, 201, 1–73.

Optimization Based on Problem Landscape Analysis” was finan-

cially supported by the Slovenian Research Agency (project no.

10





Learning the Probabilities in Probabilistic Context-Free Grammars for Arithmetical Expressions from Equation Corpora

Marija Chaushevska

Ljupčo Todorovski

Jure Brence & Sašo Džeroski

marija.chaushevska@ijs.si

ljupco.todorovski@fmf.uni-lj.si

jure.brence@ijs.si|saso.dzeroski@ijs.si

Jožef Stefan Int. Postgraduate School

Jožef Stefan Institute &

Jožef Stefan Institute &

& Jožef Stefan Institute

Faculty of Mathematics and Physics

Jožef Stefan Int. Postgraduate School

Jamova cesta 39

Jadranska cesta 21

Jamova cesta 39

Ljubljana, Slovenia

Ljubljana, Slovenia

Ljubljana, Slovenia

ABSTRACT

Machine learning methods for supervised regression assume a

A core challenge for both physics and artificial intelligence (AI) is fixed class of models, such as linear regression or neural networks symbolic regression: finding a symbolic expression that matches

with a particular architecture, and find the one that provides the data from an unknown function. Symbolic regression approaches

best fit to the training data. Equation discovery methods typi-

are largely data-driven and search an unconstrained space of

cally consider broader classes of mathematical equations. These

mathematical expressions, often employing genetic algorithms.

classes may be vast and many (often infinitely many) equations

On the other hand, equation discovery approaches incorporate

can be found that provide excellent fit to the training data. The

domain knowledge to constrain the structure space and search it

challenge of symbolic regression is therefore twofold. On one

using local or exhaustive search methods. In this paper, we adopt

hand, one can easily overfit the training data with an unnecessar-

the use of probabilistic context-free grammars (PCFG) in equation

ily complex equation. On the other hand, the space of candidate

discovery and propose a method for learning the probabilities

equations is huge and grows exponentially as equation complex-

of production rules in such PCFGs. We take a universal PCFG

ity increases, posing serious computational issues to equation

with an initial set of manually assigned probabilities for each

discovery methods.

production rule. We learn new probabilities by parsing each

Equation Discovery systems explore the hypothesis space of

expressions in a given corpus of expression, such as the Feynman

all equations that can be constructed given a set of arithmetic

dataset.

operators, functions and variable. They search for equations that

fit given input data best. The number of all possible candidate

KEYWORDS

equations can be infinite.

Early equation discovery systems used parametric approaches

equation discovery, grammar, probabilistic context-free grammar,

to specify the space of polynomial equations considered. LA-

parsing, learning probabilities, probability distribution

GRAMGE [13] uses context-free grammars (CFG) [9] to specify 1

INTRODUCTION

the language of equations considered. The recent system ProGED

[2] uses probabilistic context-free grammars (PCFG), where a Equation discovery is an area of machine learning that develops

probability is associated with each production rule. In this paper, methods for automated discovery of quantitative laws, expressed

we propose a method for learning these probabilities for a given

in the form of equations, in collections of measured numeric

PCFG by using a given corpus of expressions.

data [5] [11]. More precisely, equation discovery methods seek to automate the identification of equation structure as well as parameters. Traditionally, domain experts derive equation structure

based on the theory in the domain and use standard numerical

optimization methods to estimate their parameters. Equation

2

GRAMMARS FOR EQUATION DISCOVERY

discovery methods often use domain knowledge to specify the

space of equations they consider. The key questions in the field

A grammar is a finite specification of a language. A language can

are how to best represent the symbolic language of mathematics,

contain an infinite number of strings, or even if it is finite, it can how to incorporate domain knowledge in the process of equa-contain so many strings that it is not practical to list them all.

tion discovery, as well as how to perform the search for optimal

Originating from computational linguistics, grammars are used

equation structures. Symbolic regression methods are largely

as formal specifications of languages and use a set of production

data-driven and search an unconstrained space of mathematical

rules to derive valid strings in the language of interest. A gram-

expressions, often employing evolutionary algorithms. On the

mar mainly consists of a set of production rules, rewriting rules

other hand, equation discovery methods, such as process-based

for transforming strings. Each rule specifies a replacement of a

modeling [4], incorporate domain knowledge to constrain the particular string (its left-hand side) with another (its right-hand structure space and search using greedy-local [12] or exhaustive side). A rule can be applied to each string (equation) that contains search methods on the constrained space. The task of equation

its left-hand side and produces a string in which an occurrence

discovery is closely related to the task of supervised regression.

of that left-hand side has been replaced with its right-hand side.

A grammar further distinguishes between two kinds of symbols:

Permission to make digital or hard copies of part or all of this work for personal non-terminal and terminal symbols; each left-hand side must

or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and contain at least one non-terminal symbol. It also distinguishes a

the full citation on the first page. Copyrights for third-party components of this special non-terminal symbol, called the start symbol. In equation

work must be honored. For all other uses, contact the owner/author(s).

discovery, we are interested in using grammars as generative

Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

models, as opposed to their common use for parsing, i.e., discrim-

© 2022 Copyright held by the owner/author(s).

inating between legal and illegal strings in a language.

11





Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Chaushevska, Todorovski, Brence, and Džeroski

2.1

Context-Free Grammar (CFG)

∑︁

In formal language theory, a context-free grammar [9] is a for-

𝑃 → (𝐴 → 𝛼 ) = 1

(2)

mal grammar which is defined as a tuple G = (N, T, R, S). It is

(𝐴→𝛼 ) ∈𝑅

used to generate all possible patterns of strings in a given for-

The probability of a derivation (parse) is defined as the product

mal language. The syntax of the expression on the right-hand

of the probabilities of all the production rules used to expand

side of the equation is prescribed with a context-free grammar.

each node in the parse tree (derivation). These probabilities can

The set T contains terminal symbols, i.e.,words for composing

be viewed as parameters of the model. The probabilities of all

sentences in the language or variables in the arithmetical ex-

productions with the same non-terminal symbol on the left hand

pressions. The terminals are primitive grammar symbols that

side sum up to one, i.e., we impose a probability distribution on

can not be further rewritten, i.e., no productions are affiliated

the productions with the same symbol on the left hand side. As

with them. Non-terminal symbols (syntactic categories) in the

each parse tree, derived by a grammar 𝐺, is characterized by a

set N represent higher-order terms in the language, such as noun

sequence of productions, its probability is simply the product of

or verb phrases. Each of the non-terminals represents expres-

the probabilities of all productions in the sequence [11].

sions or phrases in a language. The production rules in the set

We can extend the example context-free grammar 𝐺

above to

𝑀

R are rewrite rules of the form A → 𝛼, where the left-hand side

a PCFG by assigning a probability to each of the six productions,

is a non-terminal, 𝐴 ∈ 𝑁 , while the right-hand side is a string given below in brackets after each production:

of non-terminals and terminals, 𝛼 ∈ (𝑁 ∪ 𝑇 )* . In natural lan-

guage, a rule 𝑁 𝑃 → 𝐴𝑁 specifies that a noun phrase 𝑁 𝑃 has

𝐸 → 𝐸 + 𝑉 [𝑝 ] |𝐸 ∗ 𝑉 [𝑞] |𝑉 [1 − 𝑝 − 𝑞]

an adjective 𝐴 and a noun 𝑁 . 𝐴 and 𝑁 represent the subsets of (3)

adjectives and nouns, which are both terminals. No matter which

𝑉 → 𝑥 [𝑝 ] |𝑦 [𝑞 ] |𝑧 [1 − 𝑝

− 𝑝 ]

𝑣

𝑣

𝑣

𝑞

symbols surround it, the single non-terminal on the left hand

Here we have parameterized the probability distributions over

side can always be replaced by the right hand side. This is what

productions for 𝐸 and 𝑉 with the parameters 0 < 𝑝 < 1; 0 < 𝑞 < distinguishes it from context-sensitive grammar. When deriving

1; 0 < 𝑝 < 1; and 0 < 𝑞 < 1, respectively.

𝑣

𝑣

a sentence, a grammar starts with a string containing a single

Context-free grammars are typically used to parse sentences.

non-terminal 𝑆 ∈ 𝑁 and recursively applies production rules to

Probabilistic context-free grammars provide an estimate of the

replace non-terminals in the current string with the strings on

probability of a parse tree, in addition to the tree itself. Prob-

the right-hands sides of the rules. The final string contains only abilistic context-free grammars also allow for another type of

terminal symbols and belongs to the language defined by 𝐺.

application — stochastic generation of sentences or, in our case,

In equation discovery, grammars represent sets of expressions

mathematical expressions. The probabilities, assigned to the pro-

that can appear in the right hand side of equations. These gram-

ductions, provide a great amount of control over the probability

mars use several symbols with special meanings. For example,

distribution of individual parse trees. In our example in Eq. 3, the the terminal 𝑐𝑜𝑛𝑠𝑡 ∈ 𝑇 is used to denote a constant parameter in parameters 𝑝 and 𝑞 control the probability of a larger number of an equation that has to be fitted to the input data.

terms in an expression, while the parameters 𝑝 and 𝑞 tune the

𝑣

𝑣

A simple context-free grammar 𝐺

= (𝑁 ,𝑇 ,𝑅 ,𝑆 ) deriv-

𝑀

𝑀

𝑀

𝑀

𝑀

ratio between the number of occurrences of variables 𝑥, 𝑦 and 𝑧.

ing linear expressions from variables 𝑥, 𝑦, 𝑧 is as follows:

An important concept to consider when working with gram-

mars is ambiguity. A grammar is formally ambiguous if there

𝑁

= {𝐸, 𝑉 }

exist sentences (expressions) that can be described by more than

𝑀

one parse tree, generated by the grammar. Grammars for arith-

𝑇

= {𝑥, 𝑦, 𝑧, +, ∗}

𝑀

metic expressions can express another type of ambiguity, called

𝑅

= {𝐸 → 𝐸 + 𝑉 |𝐸 ∗ 𝑉 |𝑉

𝑀

(1)

semantic ambiguity. All but the simplest arithmetic expressions

𝑉 → 𝑥 |𝑦 |𝑧 }

can be written in many mathematically equivalent, but grammat-

𝑆

= 𝐸

ically distinct ways. It is generally useful to adopt a canonical

𝑀

representation that each generated equation is converted into.

This allows us to compare expressions to each other and check

We write multiple production rules with the same non-terminal

whether they are mathematically equivalent in addition to com-

on the left hand side using a compact, single-line representation, paring their parse trees. In our work, we use the Python symbolic

e.g., 𝐸 → 𝐸 + 𝑉 | 𝐸 ∗ 𝑉 | 𝑉 stands for the set of rules {𝐸 →

mathematics library SymPy [8] to simplify expressions, convert

𝐸 + 𝑉 , 𝐸 → 𝐸 ∗ 𝑉 , 𝐸 → 𝑉 }.

them into canonical form, and compare them symbolically.

2.2

Probabilistic Context-Free Grammar

3

LEARNING PROBABILITIES IN PCFGS FOR

(PCFG)

ARITHMETICAL EXPRESSIONS

Grammar formalisms are not new to the field of equation discov-

Parameter learning approaches for PCFGs assume a fixed set of

ery [4] [3] [13], but probabilistic grammars are. A probabilistic production rules and try to learn the probabilities assigned to

grammar assigns probabilities to productions and thereby al-

them. Some approaches encourage sparsity and remove rules

lows one to use the grammar as a stochastic generator [10] [6]

with very small probabilities. Parameter learning approaches

[15]. Probabilistic Context-Free Grammars (PCFGs), are a sim-are typically more scalable than structure search approaches, be-

ple model of phrase-structure trees. They extend context-free

cause parameter learning is a continuous optimization problem

grammars (CFGs) similarly to how hidden Markov models extend

which is in general easier than the discrete optimization problem

regular grammars. A grammar can be turned into a probabilistic

of structure search. Therefore, most of the state-of-the art algo-

grammar by assigning probabilities to each of its productions,

rithms for unsupervised learning of natural language grammars

such that for each 𝐴 ∈ 𝑁 :

are parameter learning approaches.

12





Learning the Probabilities in PCFGs for Arithmetical Expressions from Equation Corpora Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

3.1

The Approach

The second corpus consists of 4080 scientific expressions from

In this paper, we propose a parameter learning approach for

Wikipedia. Those mathematical expressions are named after peo-

PCFGs, based on parsing a corpus of expressions. We adopt the

ple and they are parsed from Wikipedia. Compared to the Feyn-

universal PCFG probabilistic context-free grammar for arithmetic

man dataset, Wikipedia’s corpus contains more functions such

expressions used by ProGED [2]. While ProGED uses manually as: 𝐴𝑏𝑠, 𝑓 𝑎𝑐𝑡𝑜𝑟𝑖𝑎𝑙, 𝑡𝑎𝑛, 𝑠𝑖𝑛ℎ, 𝑐𝑜𝑠ℎ and 𝑝𝑜𝑤 (which do not exist assigned probabilities in this grammar, we use an initial set of

in the Feynman database) as well as irrational constants (𝑒 and

randomly assigned probabilities to each production rule. The

𝜋 ) and numerical constants, which have to be replaced by a con-

universal grammar is composed of production rules that include

stant ‘C’(const) in the grammar. The equations in Wikipedia’s

the four basic operations (+,-,*, /), basic functions (such as sin or dataset contain between one and fifteen variables, which is twice

log), constant parameters and variables.

as much compared to the Feynman dataset and the maximum

Our method for learning probabilities from a given corpus of

number of ‘C’ terminal symbols is 16 per equation.

expressions is designed on the assumption that the probability

3.3

The Learned Probabilities

of a production rule in the grammar is proportional to the inci-

dence of the production in the parse trees for the expressions in

By using the proposed approach on the two corpora of arithmetic

the corpus. It uses a parser from the NLTK (Natural Language

expressions described above, we obtain two sets of probabilities,

Toolkit) Python library [1] to parse the expressions in the given with each probability assigned to one of the production rules in

corpus using the universal PCFG. NLTK contains classes to work

the PCFG. More precisely, we now have three universal PCFGs: (1)

with PCFGs and there are different types of parsers implemented

with the initial probabilities, manually assigned by the authors

in the NLTK Python library. In particular, we use the InsideChart-

of ProGED, (2) with probabilities fine-tuned (learned) on the

Parser(), a bottom-up parser for PCFG grammars that tries edges

Feynman dataset, and (3) with probabilities fine-tuned (learned)

in descending order of the inside probabilities of their trees. The on the Wikipedia corpus of arithmetical expressions.

"inside probability" of a tree is simply the probability of the entire In this section, we first present the three sets of probabilities, tree, ignoring its context. In particular, the inside probability of a for each of the above mentioned PCFs: these are given in Table 1.

tree generated by production

We then compare the probability distributions across the rules

𝑝 with children 𝑐 [1], 𝑐 [2], ..., 𝑐 [𝑛]

is

for each non-terminal symbol (𝑆,𝐹 ,𝑇 and 𝑅) in the PCFGs.

𝑃 (𝑝 )𝑃 (𝑐 [1])𝑃 (𝑐 [2])...𝑃 (𝑐 [𝑛]); and the inside probability of a token is 1 if it is present in the text, and 0 if it is absent. For a As compared to the initial grammar, the grammar learned on

given string (expression) and a grammar, the parser determines

the Feynman database reduces the probabilities of the recursive

whether the string can be derived using the grammar and if yes,

production rules (𝑆 → 𝑆 + 𝐹 and 𝑆 → 𝑆 − 𝐹 ) and increases

returns the appropriate parse tree. After parsing the equations

the probability of the non-recursive rule (𝑆 → 𝐹 ): This leads to we count the number of times each production rule appears in

simpler expressions with fewer additive terms. In contrast, the

the set of parsing trees, for all parsed equations (except for rules grammar learned on the Wikipedia corpus has a probability for

directly resulting in terminal symbols (variables)). We then group the rule 𝑆 → 𝑆 + 𝐹 very comparable with the probability in the

production rules by left non-terminal symbol and derive the prob-

initial grammar. It also decreases the probability of the recursive abilities for each production rule as the number of appearances

production rule 𝑆 → 𝑆 − 𝐹 and increases the probability of the

of a given production rule divided by the sum of such numbers

non-recursive rule 𝑆 → 𝐹 by approximately 0.1 in each case.

for all production rules for the same non-terminal.

The probabilities of the recursive production rules for the 𝐹

non-terminal symbol (𝐹 → 𝐹 ∗ 𝑇 and 𝐹 → 𝐹 /𝑇 ) are mostly

larger than the ones in the initial grammar. An exception is the

rule 𝐹 → 𝐹 /𝑇 with the Wikipedia corpus. The probability of the 3.2

The Corpora

non-recursive production rule (𝐹 → 𝑇 ) is smaller, slightly for the We apply the proposed approach to two corpora of expressions

Wikipedia corpus, more substantially for the Feynman dataset.

(that appear on the right hand side of equations). The first one is In the learned probability distributions over the production

the Feynman Symbolic Regression Database, which includes a

rules for the non-terminal 𝑇 , the probability of the rule 𝑇 → 𝑉

diverse sample of equations from the three-volume set of physics

is much higher (goes from 0.4 to 0.7). In both learned grammars,

textbooks by Richard P. Feynman [7] and has been previously the probabilities of the 𝑇 → 𝑅 and 𝑇 → ‘𝐶’ production rules

used as a benchmark for equation discovery [14]. It was con-are substantially reduced. This is more noticeable for 𝑇 → ‘𝐶’,

structed by Udrescu and Tegmark [3] to facilitate the develop-where the probability goes from 0.4 to slightly above 0.1.

ment and testing of algorithms for symbolic regression. The

We finally discuss the probability distributions over the pro-

equations from Feynman database contain between one and nine

duction rules for the non-terminal symbol 𝑅 in the initial gram-

variables, the four basic operations (+, −, /, ∗), the functions exp, mar and the two learned grammars. A probability with value 0 for

√ , sin, cos, tanh, arcsin and ln, as well as a variety of constants a production rule here means that that function for the particular

– mostly rational, but also 𝑒 and 𝜋 . There are three components production rule is not present either in the Feynman corpus or the to an arithmetic expression: variables, constants and operators.

Wikipedia corpus of mathematical expressions. For example, the

Numerical values and constants are typically treated as free pa-

functions 𝑙𝑛 and 𝑎𝑟𝑐𝑠𝑖𝑛 are not present in the arithmetical expres-rameters (terminal symbols) to be optimized when evaluating an

sions from the Wikipedia dataset, but are present in the Feynman

equation for its fit against given data. We replaced all constants, database. On the other hand, the functions 𝑙𝑜𝑔, 𝑝𝑜𝑤 , 𝐴𝑏𝑠, 𝑠𝑖𝑛ℎ, such as 𝑒, 𝜋 and rational constants with the terminal ‘C’(const),

𝑐𝑜𝑠ℎ, 𝑓 𝑎𝑐𝑡𝑜𝑟 𝑖𝑎𝑙 and 𝑡𝑎𝑛 do not exist in the arithmetic expressions because we treat them as free parameters. The minimum number

from the Feynman database, that’s why their probability is 0.

of constants (‘C’), in the Feynman database is 0, which means that The grammar learned on the Feynman database increases the

there are a some equations that have only variables as terminal

probabilities of the production rules 𝑅 → (𝑆), 𝑅 → 𝑠𝑖𝑛(𝑆) and symbols. On the other hand the maximum number of constants

𝑅 → 𝑠𝑞𝑟 𝑡 (𝑆 ) as compared to the probabilities of the initial gram-in the Feynman database is five constants in only one equation.

mar. In contrast, the probabilities of the remaining production

13





Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Chaushevska, Todorovski, Brence, and Džeroski

Table 1: Probabilities of the production rules for the non-

is expressive enough to encode these properties and that the

terminal symbols in the initial grammar, the grammar

learning algorithm is able to discover them. The results show a

trained on the Feynman database and the grammar trained

great deal of promise for the goals of inferring domain knowledge

on the Wikipedia corpus of expressions.

from equation corpora and improving the efficiency of grammar-

based equation discovery through the fine-tuning of production

Production rule

Initial

Feynman

Wikipedia

probabilities.

As further work we would like to perform equation discovery

S -> S + F

0.2

0.1034

0.2004

experiments using the three universal grammars: the universal

S -> S - F

0.2

0.1552

0.1108

grammar with initial (default) probabilities, with probabilities

S -> F

0.6

0.7414

0.6888

learned on the Feynman dataset and probabilities learned on the

F -> F * T

0.2

0.3635

0.3349

Wikipedia corpus. For this purpose, we will use the equation

F -> F / T

0.2

0.2446

0.1098

discovery system ProGED, which uses a Monte-Carlo approach

F -> T

0.6

0.3919

0.5553

of sampling equation structures from a given PCFG and evalu-

ating their fit to the given data. We expect that the number of

T -> R

0.2

0.1554

0.1746

successfully reconstructed equations from the Feynman dataset,

T -> ‘C’

0.4

0.1338

0.1174

when using the learned (fine-tuned) universal PCFGs, will be

T -> V

0.4

0.7108

0.7082

higher as the number of equations successfully reconstructed

R →( S )

0.3

0.5391

0.6841

with the universal grammar with manually set probabilities.

R → sin( S )

0.1

0.113

0.0249

R → arcsin ( S )

0.1

0.0173

0

REFERENCES

R → ln ( S )

0.1

0.0087

0

[1]

Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language ProR → tanh ( S )

0.1

0.0087

0.0045

cessing with Python. (1st ed.). O’Reilly Media, Inc.

[2]

Jure Brence, Ljupčo Todorovski, and Sašo Džeroski. 2021. Probabilistic gram-R → cos ( S )

0.1

0.0956

0.0435

mars for equation discovery. Knowledge-Based Systems, 224, (Apr. 2021), 1–

R → sqrt ( S )

0.1

0.1304

0.0831

12.

R → exp ( S )

0.1

0.0872

0.0780

[3]

Will Bridewell and Pat Langley. 2010. Two kinds of knowledge in scientific discovery. Topics in Cognitive Science, 21, 36–52.

R → log( S )

0

0

0.0479

[4]

Will Bridewell, Pat Langley, Ljupčo Todorovski, and Sašo Džeroski. 2008.

R → Abs( S )

0

0

0.0211

Inductive process modeling. Machine Learning, 71, 1, 1–32.

R → ( S ) ‘**’ ( S )

0

0

0.0032

[5]

Sašo Džeroski, Pat Langley, and Ljupco Todorovski. 2007. Computational Discovery of Scientific Knowledge. Vol. 4660. (Aug. 2007), 1–14.

R → sinh( S )

0

0

0.0032

[6]

Brian C. Falkenhainer and Ryszard S. Michalski. 1990. Integrating quanti-R → cosh( S )

0

0

0.0026

tative and qualitative discovery: the abacus system. In Machine Learning.

Yves Kodratoff and Ryszard S. Michalski, editors. Morgan Kaufmann, San R → factorial( S )

0

0

0.0019

Francisco (CA), 153–190.

R → tan( S )

0

0

0.0019

[7]

R.P. Feynman, R.B. Leighton, and M. Sands. 2015. The Feynman Lectures on Physics, Vol. I: The New Millennium Edition: Mainly Mechanics, Radiation, and Heat. Basic Books.

[8]

Aaron Meurer et al. 2017. Sympy: symbolic computing in python. PeerJ

rules learned on the Feynman database have lower probabilities

Computer Science, 3, (Jan. 2017), e103.

as compared to the initial grammar. The grammar learned on

[9]

Alan P. Parkes. 2008. A Concise Introduction to Languages and Machines.

(1st ed.). Springer Publishing Company, Incorporated.

the Wikipedia corpus increases the probability of the 𝑅 → (𝑆)

[10]

Cullen Schaffer. 1993. Bivariate scientific function finding in a sampled, production rule and decreases the probabilities of the remaining

real-data testbed. In Machine Learning, 167–183.

rules as compared to the initial grammar.

[11]

Michael Schmidt and Hod Lipson. 2009. Distilling free-form natural laws from experimental data. Science, 324, 5923, 81–85.

[12]

Jovan Tanevski, Ljupčo Todorovski, and Sašo Džeroski. 2020. Combinatorial 4

CONCLUSIONS AND FURTHER WORK

search for selecting the structure of models of dynamical systems with equation discovery. Engineering Applications of Artificial Intelligence, 89, In this paper, we have proposed an approach to learn the param-

(Mar. 2020), 103423.

eters, i.e., production rule probabilities, in probabilistic context-

[13]

Ljupčo Todorovski and Sašo Dzeroski. 1997. Declarative bias in equation discovery. In Proceedings of the Fourteenth International Conference on Machine free grammars for arithmetic expressions. We demonstrated the

Learning (ICML ’97). Morgan Kaufmann Publishers Inc., San Francisco, CA, proposed approach by learning the probabilities in a universal

USA, 376–384.

grammar for arithmetic expressions from two corpora of expres-

[14]

Silviu-Marian Udrescu and Max Tegmark. 2020. Ai feynman: a physics-inspired method for symbolic regression. Science Advances, 6, 16, eaay2631.

sions. The learned probabilities differ substantially from their

[15]

R. Zembowitz and J Zytkow. 1992. Discovery of equations: experimental initial values. Most notably, the initial settings underestimated

evaluation of convergence. In Proceedings of Tenth National Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc., San Mateo, CA, the frequency of variables in favor of numerical constants, over-USA, 101–117.

estimated the need for recursion with addition and subtraction,

while setting the probability of recursion with multiplication too low. These observations show how difficult it is to set probabilities manually and highlight the utility of the learning algorithm.

The comparison of the learned probability values for the two

corpora also hints towards differing properties of the two col-

lections of equations. The Wikipedia corpus seems to favor mul-

tiplication over division to a greater extent than the Feynman

dataset. In terms of expression complexity, we observed a pref-

erence for high-order terms in the Feynman dataset, in contrast

to a preference for higher numbers of low-order terms in the

Wikipedia corpus. The observed differences between the proper-

ties of the two corpora demonstrate that the universal grammar

14





Prediction of the Inflow in Macedonian Hydropower Plants, Using Machine Learning

Emilija Kizhevska

Hristijan Gjoreski

Mitja Luštrek

emilija.kizhevska@ijs.si

hristijang@feit.ukim.edu.mk

mitja.lustrek@ijs.si

Jožef Stefan Institute

Ss. Cyril and Methodius University

Jožef Stefan Institute

Jožef Stefan International

Faculty of Electrical Engineering

Jožef Stefan International

Postgraduate School

and Information Technologies

Postgraduate School

Jamova cesta 39

Rugjer Boskovic 18

Jamova cesta 39

Ljubljana, Slovenia

Skopje, R.N.Macedonia

Ljubljana, Slovenia

ABSTRACT

more electrical devices and less and less non-renewable electric-

As weather conditions become more complex and unpredictable

ity sources. One of the solutions is to optimize electricity losses as a consequence of global warming and air pollution, humans

due to erroneous power consumption forecasts. First, from the

find it increasingly difficult to predict the amount of precipitation standpoint of world non-renewable electricity savings consump-in the coming period, thus predicting the inflow into hydroelec-

tion, as the coal or oil are, and then from the standpoint of public tric basins. Different types of hydropower plants (HPP), soil com-spending, because the later you purchase electricity, the more

position, how dry the soil is at the moment, the composition of

expensive it becomes. [3][5].

precipitation, etc., also influence the inflow, making it even more Hydropower now provides around 6.5% of the world’s elec-difficult to be predicted. This research looks into the problem of tricity needs. In Republic of North Macedonia the total installed

predicting inflow in hydroelectric basins in Republic of North

capacity of hydropower is 556.8 MW, which is over 40% of the

Macedonia and building machine learning models to do so. The

total capacity, ranking first among renewable energy sources.

main contribution of this research is the models for the largest

Hydroelectricity is used the most to meet daily variations in

five hydropower plants in Republic of North Macedonia (RM)

electricity consumption and to provide system services for regu-

that could optimize the loss and shortage of purchased electricity.

lation, allowing the power system to be more flexible and reliable.

Historical data from the closest meteorological station to each

The peaks of electricity consumption are always regulated (i.e.

hydropower plant that we were working on, as well as historical

supplemented) by hydroelectric energy while coal-fired power

data from the inflows at the hydropower plants, were used to

plants produce the majority of electricity. Predicting the quantity build regression models that predict the inflow one day in ad-of electricity available from each hydropower plant in the future

vance for each hydropower plant separately. After deriving 19

(the longer the period, the better) leads not only to the most effi-new features, of which the majority are statistical, the predic-

cient use of finances, but also to protection from natural disasters tive models’ error was reduced. In the final step, we analyze the

such as river and lake overflows. The amount of inflow in the

results empirically and qualitatively and comparing the models

foreseeable time to be predicted by humans becomes increasingly

generated using different machine learning algorithms. For in-

difficult, if not impossible, as meteorological conditions become

stance, one of the best models is the model for HPP Vrben, with

more complex and unpredictable as a result of global warming

the mean absolute error of around 8% of the average daily inflow.

and air pollution [4][1][9].

We built models using eight different regression algorithms for

The inflow prediction in hydropower plants is mainly based

each hydropower plant, including linear regression and gradient

on human judgement, however, it is not always accurate. Primar-

boosting regression models as the models that make the smallest

ily, because it is about nature. There are two major issues with

errors in predicting. These models could also help to prevent

predicting future inflow accurately:

river and lake overflows in areas where hydropower facilities are

(1) Weather forecasting inaccuracy in the coming days. The

located, with timely warnings minimizing the severity of natural

issue is that the forecasting models get less accurate as

disasters.

the forecast gets further out in time [6].

(2) Different types of hydropower plants, especially the con-

KEYWORDS

struction of pumped-storage hydropower plants, the com-

machine learning, regression models, hydropower plants, opti-

plexity of geology, etc. In this research, we consider run-

mization of hydroelectric energy loss

of-river and storage hydropower plants. Run-of-river hy-

dropower plant includes a facility that channels flowing

water from a river through a canal or pen-stock to spin

1

INTRODUCTION

a turbine, and rain influences the inflow almost immedi-

The global trend for producing electricity from renewable sources

ately. Storage hydropower plants that include a dam and

is increasing at an exponential rate. On the other hand, the world a reservoir to accumulate water, which is stored and re-aims to reduce world’s electricity losses, as there are more and

leased later when needed, providing flexibility to generate

electricity on demand and reducing dependence on the

Permission to make digital or hard copies of part or all of this work for personal variability of inflow. But whatever the hydropower plant

or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and type, the inflow is not directly related to weather condi-the full citation on the first page. Copyrights for third-party components of this tions. For instance, the snow and hail do not accumulate

work must be honored. For all other uses, contact the owner/author(s).

straight away; it does require a time of melting, soil wet-

Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

ting, and for storage plants, additionally, conducting the

© 2022 Copyright held by the owner/author(s).

water through the pipes to reach the basin, etc. [7].

15





Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

E.Kizhevska, et al.

The inflow into hydroelectric basins can be predicted more

the details that affect it, 19 new features were created from the

easily when using machine learning than by analyzing geology,

original two. The ’date’ feature was divided into three new fea-

satellite monitoring, pollution monitoring, and changes in global

tures: day, month, and year, and they were added to the original

warming. Developing a machine learning model that connects

features. Regarding the amount of precipitation and inflows into

all of these features, allows the prediction to be made. Otherwise, the hydroelectric basins, the derived features are the average,

it is quite difficult to perform it empirically, owing to a lack of variance, p-variance, minimum and maximum values from the

resources for repeating the method for each existing hydroelectric previous five days, and values from the previous day of both

plant. In this study, we built models using collected data from

original features are also added as new features. Also, additional hydropower plants and the nearest meteorological stations with

features are derived as sine and cosine functions of the days and

the aim of developing an application that would help to monitor

months. The purpose of the trigonometric functions is to reduce

the daily inflow one day in advance (Figure 1)[2].

the difference between December 31 and January 1, for instance.

Random Forest features ranking demonstrate that the features

derived from the latest five days of both original features have

the highest score. If we consider the correlation matrix, we can

realize the same. The derived statistical features have the high-

est correlations with the inflow. Also it is interesting that while trigonometrically generated date features have the same correlation as the features from which they were derived, in terms

of inflow, they do not correlate to each other with a degree of

correlation of 1. For instance, HPP Vrben’s sine function of the

month, as well as the month itself (from which the sine function

is derived), get a correlation index of -0.01 with the inflow, but a value of -0.04 with each other (Figure 2).

Figure 1: Graphical representation of the process: predict-

ing inflow in the hydroelectric basins

2

METODOLOGY

2.1

Preprocessing

The daily inflow into the hydroelectric basins as labels and the

amount of precipitation observed at the nearest meteorological

station as descriptive features are merged by date. Then, because

missing values represent less than 1% of the total data, they are

filled using the average of each feature’s values.

For most hydropower facilities, we could find only two de-

scriptive features at first: the daily amount of precipitation [l]

and the date. There were 11 additional available features in the

data for HPP Tikvesh: time of moonrise and moonset (times-

tamp), intensity of precipitation [

2

𝑙 /𝑚 ] , duration of precipitation

[min], time of sunrise and sunset (timestamp), the highest and

lowest temperature of the day [K], absolute humidity [

3

𝑔/𝑚 ] ,

cloud cover [oktas]. For HPP Tikvesh, missing values in the ad-

ditional features are filled in with the average of the instances

that have the same value in the most meteorologically related

feature. For instance, ’humidity’ and ’cloud cover’. The procedure is as follows: for a missing value in the feature ’cloud cover,’ the corresponding value of ’air humidity’ is X; find all the values

Figure 2: Correlation matrix after scaling for HPP Vrben

from the feature ’cloud cover’ that have the value X in the feature

’air humidity’, calculate their average value and substitute for

the missing value in the ’cloud cover’ feature. In conjunction

with the ’maximum temperature’ feature, the same procedure is

3

EXPERIMENTS AND RESULTS

used to replace missing values in the ’lowest temperature’ feature.

Scaling the values in the descriptive features between 0 and 1

3.1

Dataset Description

was the final step in the prepossessing process.

For each of the five hydropower plants in the Republic of North

Macedonia that have been analyzed, the data processing and

2.2

Feature Engineering

approach to the problem are the same. HPP Kozjak, HES Mavrovo

Using only the date and amount of precipitation as descriptive

power plants (consisting of HPP Vrutok and HPP Vrben), HPP

features, different regression models can be developed, but all

Tikvesh, and HPP Shpilje are part of this study. The datasets are

of these predictive models have a low accuracy. Due to the com-

time-series, they were collected at daily intervals throughout an

puter’s inability to understand the inflow pattern, including all

11-year period (1/1/09 – 12/31/19) for each hydroelectric facility.

16





Prediction of the Inflow in Macedonian Hydropower Plants, Using ML

Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

ESM1 and UHMR2 have collected the initial two variables, the SelectKBest, which selects the best features based on univariate

amount of precipitation [

2

𝑙 /𝑚 ] and inflows into the hydropower

statistical tests and the best numbers of features for each HPP

basin (Figure 3). The inflow is expressed as the maximum amount

were chosen based on the predictive models’ errors. The experi-

of electricity [MWh] it could be used to produce.

ment was divided into four parts: using all features, the best 15

features, the best 10 features, and the best 5 features, out of a

total of 20 features.

Time-series can be troublesome for splits where with the shuf-

fling process we get different train and test sets across different executions, for cross-validation, or when the test subset is before or somewhere in the middle of the train subset, etc. For

instance, if a pattern appears in year 3 and persists for years

4-6, the model can detect it, even though it was not present in

years 1 and 2. Because the datasets we use in this research are

continuous time-series at the daily level, we split the evaluation datasets contentiously, without shuffling the subsets [8].

3.3

Results

For each of the four parts of the experiment for selecting the

best n features, we calculated and plotted the mean absolute

error [MWh] for each model (Figure 4, Figure 5). As we can

see in Figure 4 and Figure 5, the differences in the errors are

similar regardless of how many features the model is trained

with. The best results are provided by different models developed

using various algorithms and various number of features for each

hydropower plant, but one thing that all of them have in common

Figure 3: Top to bottom: (a) Amount of precipitation (me-

is that dummy regressor is the worst, while linear regression or

teorological station Debar) and (b) amount of electricity

gradient boosting produce the smallest errors. For example, if

production (HPP Shpilje) during an 11-year period (2009 -

we choose HPP Vrben as one of the best outcomes, we can see

2019)

that while the average daily input is 3.7 MWh, the error of the

model developed using linear regression and selected 15 features

Equation 1 describes how to calculate the electricity that could

is 0.34 [MWh]. Because the errors, with the exception of dummy

be produced by hydropower plants, knowing that it is a product

and lasso, are not particularly big, ranging between [0.345, 0.40], of power and working time [4].

all of the features listed after the five most influential features contribute a negligible percentage to improving or decreasing

accuracy (Figure 5).

𝐸 = 𝜌𝑄𝑔𝐻 𝑡 [𝑊 ℎ]

(1)

where, electricity is equal to water density

3

𝜌

[1000𝑘𝑔/𝑚 ],

multiplied by water flow (inflow) Q [ 3

𝑚

/𝑠], acceleration of grav-

ity g [

2

𝑚/𝑠 ] and gross height drop H [m]. Inflow refers to water

flowing into accumulation basins of hydropower plants. The in-

flow is measured in cubic meters per second [ 3

𝑚

/𝑠], but it can

also be expressed as the quantity of electricity that the same

amount of inflow could supply. When electricity is a projection

of the inflow, the inflow is computed using Equation 1 and ex-

pressed in watt-hours [Wh].

3.2

Experimental Setup

To build models for one day in advance inflow prediction, we used

eight different regression algorithms: Support Vector Machine,

Figure 4: Graphical representation of the mean absolute

Random Forest, Linear Regression, Lasso, Gradient Boosting, Ex-

error of the eight machine learning regressors used in the

treme Gradient Boosting, K-nearest neighbours, Decision Tree

study, selecting different numbers of best features for HPP

and Dummy that always predicts the mean of the training tar-

Vrben as a run-of-river diversion hydropower plant

get values. Four evaluation metrics were used to evaluate the

regression models for the prediction of inflow in hydropower

plants: mean absolute error (MAE), mean squared error (MSE),

root mean squared error (RMSE), and R-Squared (R2) as a stan-

4

CONCLUSIONS

dardized version of MSE.

Using eight different machine learning regressors, we built mod-

Different numbers of selected features for each hydropower

els for predicting inflow in Macedonian hydroelectric basins. A

plant were considered using a class from sklearn library called

solution for predicting the daily inflow in hydropower plants

has been proposed if the daily amount of precipitation and the

1ESM - Elektrani na Severna Makedonija (litt. "Power plants of North Macedonia") 2UHMR - Uprava za Hidro-Meteoroloski raboti (National Hydrometeorological amount of precipitation for the previous five days for the nearest Service - Republic of North Macedonia)

meteorological stations are known.

17





Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

E.Kizhevska, et al.

temperature, soil moisture, and therefore the amount of inflow

into the hydropower plant are significant for storage hydropower

plants because it raises the river level, and thus the amount of

inflow into the hydroelectric basins, but not immediately.

Because of the differences in the location, construction, and

operation of hydropower plants, we can only build hydropower

plant specific models. For some, the daily quantity of precipitation or the amount of precipitation from the previous day is the most

essential factor, while for others, the amount of precipitation over a longer period is the most important factor. Based on this fact,

which is also supported by our results for various hydropower

facilities, we may conclude that we cannot build general model

that can estimate the inflow for all hydropower plants. Because

of the geological properties of the soil along the rivers and the

temperature fluctuations in the past, we also cannot create a

general model for hydropower plants of the same type.

Linear regression and gradient boosting models produce the

best results. We can solve the inflow problem as a linear problem, because the relations between precipitation and inflow to the

basins are simple. The precision of projected weather conditions

is the key drawback for obtaining even lower errors. The more

accurate the weather conditions are and the longer the time pe-

riod of projected weather conditions is, the better the prediction of inflow in hydroelectric basins would be.

We built predictive models for the next day’s inflow in this

study. The next step is to create predictive models for as far in

the future as possible, so that the model can assist in power man-

agement decisions. However, accurate projected meteorological

conditions are required to develop such a model, and the further

the time point, the greater the error. Hourly or minute time-series would predict more precisely in terms of time intervals.

REFERENCES

[1]

Arsenov A. 2003. Proizvodstvo na elektricna energija. (1st. edi-

Figure 5: Top to bottom: Graphical representation of the

tion). ETF, Skopje.

mean absolute error of the eight machine learning regres-

[2]

A. P. Verma A. Kusiak X. Wei and E. Roz. 2013. Modeling

sors used in the study, selecting different numbers of best

and prediction of rainfall using radar reflectivity data: a

features for: HPP Shpilje, HPP Tikvesh, HPP Kozjak as

data-mining approach. IEEE Transactions on Geoscience and

storage hydropower plants

Remote Sensing, 2337–2342.

[3]

Jianzhou Niu Tong Du Pei. ang Wendong Wang. 2019. A

hybrid forecasting system based on a dual decomposition

strategy and multi-objective optimization for electricity

According to the results, the daily amount of precipitation

price forecasting. Applied Energy, 1205–1225.

and other inflow-related elements from the preceding days are

[4]

Djordjevic B. [n. d.] Koriscenje vodnih snaga: Objekti hidroelek-

the most important factors that explain inflows in hydroelectric

trana. (1st. edition). Naucna knjiga, Beograd.

basins. Of course, there are other variables to consider, such

[5]

Yazidi A. Goodwin M. 2014. A pattern recognition approach

as temperature, cloud cover or humidity. Considering that the

for peak prediction of electrical consumption. IFIP Advances

errors in prediction for HPP Tikvesh are not much smaller than

in Information and Communication Technology, Springer.

the ones for the other HPPs where we have only the precipitation

[6]

S. Makridakis. 2013. Accuracy measures: theoretical and

and inflow as original data and of course, the most important

practical concerns. International Journal of Forecasting, 9

part - the derived features about last days, we may conclude that

(4), 1, 527–529.

precipitation takes a certain amount of time to reach the basins

[7]

Joshua S. and Coblenz. 2015. Using machine learning tech-

as an inflow, depending on daily temperatures, the nature of the

niques to improve precipitation forecasting. PLoS One.

soil where the hydropower plant is located, and other factors.

[8]

D. Fernando S. De Silva S. Perera S. Dissanayake and W.

Also, we can confirm hydrological and geological assump-

Rankothge. 2019. Supply and demand planning of electric-

tions that continuity in the data is far more important in storage ity power: a comprehensive solution. IEEE Conference on

hydropower plants than in run-of-river diversion hydropower

Information and Communication Technology.

plants. For the first type of data, weather characteristics from

[9]

Stoilkov V. 2015. Predavanja po predmetot Osnovi na ob-

previous days have no significant impact, but all original and

novlivi izvori na energija. (2nd. edition). FEEIT, Skopje.

derived attributes related to inflows and precipitation for the

same day are crucial, because the rains are accumulated immedi-

ately. Otherwise, factors such as the period for melting the snow, 18





Peak Detection for Classification of Number of Axles Žiga Kolar

Blaž Erzar

Nika Čelan

Jožef Stefan Institute

Jožef Stefan Institute

Jožef Stefan Institute

Jamova cesta 39

Jamova cesta 39

Jamova cesta 39

Ljubljana, Slovenia

Ljubljana, Slovenia

Ljubljana, Slovenia

ziga.kolar@ijs.si

blaz.erzar@gmail.com

nika.celan8@gmail.com

Aleksander Hrastič

Gašper Leskovec

Martin Konečnik

Jožef Stefan Institute

Jožef Stefan Institute

Cestel Cestni Inženiring d.o.o

Jamova cesta 39

Jamova cesta 39

Špruha 32

Ljubljana, Slovenia

Ljubljana, Slovenia

Trzin, Slovenia

ah3001@student.uni-lj.si

leskovecg@gmail.com

martin.konecnik@cestel.si

Domen Prestor

David Susič

Matjaž Skobir

Cestel Cestni Inženiring d.o.o

Jožef Stefan Institute

Cestel Cestni Inženiring d.o.o

Špruha 32

Jamova cesta 39

Špruha 32

Trzin, Slovenia

Ljubljana, Slovenia

Trzin, Slovenia

domen.prestor@cestel.si

david.susic@ijs.si

matjaz.skobir@cestel.si

Matjaž Gams

Jožef Stefan Institute

Jamova cesta 39

Ljubljana, Slovenia

matjaz.gams@ijs.si

ABSTRACT

spectrometry [4], signal processing [7, 8], image processing [10],

A common requirement in scientific data processing is to detect

astrophysics [13] – require peak detection.

peaks in a signal and to measure their positions, heights, widths, Peak detection algorithms are also used for classification of

and/or areas. In this paper, the problem of peak detection from a

the number of peaks or axles. For example, when a vehicle places

raw signal is defined and presented. Providing the example, we

one of its tyres on a weight sensor, a peak is detected in the signal.

showed how the problem of peak detection can be translated into

Each peak represents one vehicle axle. Therefore, the algorithm

detecting the number of axles in vehicles. Various algorithms

detecting how many peaks occur in a given signal in this way

for predicting the number of peaks (axles) were presented. So-

detects the number of axes. For the purpose of this study, 16

lution with derivatives, the solution with encoder and decoder

different signals for two driving lanes were provided by company

and the solution with convolution neural network produced the

Cestel. Sensors were placed under a bridge near Obrežje. Sensors 1

best result, 99% accuracy with a certain percentage of skipped

and 2 were placed at the beginning and end of measuring area for

instances.

lane 1. Sensors 15 and 16 were placed on lane 2 in a similar fashion.

The rest were placed perpendicular on the road between the pairs.

KEYWORDS

The main goal of this paper is to predict the number of axles as

accurately as possible with the use of mathematical models and

peak detection, neural networks, machine learning, signal, sen-

machine learning algorithms given signals. We introduce the

sors, number of axles

solution using deep neural networks (artificial neural network

1

INTRODUCTION

and convolution neural network), regular derivatives, predefined

library find_peaks and a package 𝑡𝑠 𝑓 𝑟𝑒𝑠ℎ for peak detection. In Identifying and analyzing peaks in a given time-series is impor-theory, peak detection is formally a trivial task, however, in reality tant in many applications, because peaks are useful topologi-the task can be performed only with some degree of accuracy.

cal features of a time-series. In power distribution data, peaks

The rest of the paper is organized as follows. Section 2 presents indicate sudden high demands. In server CPU utilization data,

related work. Main methodology and algorithms are described in

peaks indicate sharp increase in workload. In network data, peaks

section 3. Finally, section 4 concludes the paper with summary correspond to bursts in traffic. In financial data, peaks indicate and ideas for future work.

abrupt rise in price or volume. Troughs can be considered as

inverted peaks and are equally important in many applications.

Many other application areas – e.g., bioinformatics [2], mass 2

RELATED WORK

Permission to make digital or hard copies of part or all of this work for personal Peak detection is a common task in time-series analysis and sig-or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and nal processing. Standard approaches to peak detection include (i)

the full citation on the first page. Copyrights for third-party components of this using smoothing and then fitting a known function (e.g., a poly-work must be honored. For all other uses, contact the owner/author(s).

nomial) to the time-series; and (ii) matching a known peak shape

Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

to the time-series. Another common approach to peak-trough

© 2022 Copyright held by the owner/author(s).

detection is to detect zero-crossings (i.e., local maxima) in the

19





Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

Žiga Kolar, et al.

differences (slope sign change) between a point and its neigh-

the first signal had less noise than other signals. Each signal has bours. However, this detects all peaks-troughs, whether strong

different length. Therefore, the signals that had length less that or not. To reduce the effects of noise, it is required that the local maximum time had to be extended to maximum signal time in

signal-to-noise ratio (SNR) should be over a certain threshold [8,

order to create features with the same length. To achieve this,

11]. The key question now is how to set the correct threshold so additional zeros were filled to the positions up to the maximum

as to minimize false positives. Ma, van Genderen and Beukelman

signal time. Maximum time is 6113, which is equal to the number

et al. [10] compute the threshold automatically by adapting it to of features. Figure 1 shows a signal from Sensor 1 which has

𝑚𝑎𝑥 +𝑎𝑏𝑠

the noise levels in the time-series as

𝑎 𝑣𝑔

maximum sensor time.

ℎ =

+

,

2

𝐾 ∗ 𝑎𝑏𝑠𝑑𝑒 𝑣

where

Each method uses classification accuracy for evaluation of the

𝑚𝑎𝑥 is the maximum value in the time-series, 𝑎𝑏𝑠

is

𝑎 𝑣𝑔

the average of the absolute values in the time-series,

model. Classification accuracy is a metric that summarizes the

𝑎𝑏𝑠

is

𝑑 𝑒 𝑣

the mean absolute deviation and K is a user-specified constant.

performance of a classification model as the number of correct

Azzini et al. [2] analyze peaks in gene expression microarray predictions divided by the total number of predictions. In this

time-series data (for malaria parasite Plasmodium falciparum) us-

study, correct predictions are correctly predicted peaks(number

ing multiple methods; each method assigns a score to every point

of axles).

in the time-series. In one method, the score is the rate of change (i.e., the derivative) computed at each point. In another method,

the score is computed as the fraction of the area under the candi-

date peak. Top 10 candidate peaks are selected for each method;

peaks detected by multiple methods are chosen as true peaks. The

detected peaks are used to identify genes; SVM are then used to

assign a functional group to each identified gene. Key problems

in peak detection are noise in the data and the fact that peaks

occur with different amplitudes (strong and weak peaks) and at

different scales, which result in a large number of false positives among detected peaks. Based on the observation that peaks in

mass spectroscopy data have characteristic shapes, Du, Kibbe

and Lin et al. [5] propose a continuous wavelet transform (CWT) based pattern-matching algorithm for peak detection. 2D array

of CWT coefficients is computed (using a Mexican Hat mother

wavelet which has the basic shape like a peak) for the time-series at multiple scales and ridges in this wavelet space representation are systematically examined to identify peaks. Coombes et al. [4]

Figure 1: Signal with maximum sensor time.

and Lange et al. [9] present other approaches for peak detection using wavelets and their applications to analyze spectroscopy

data. Zhu and Shasha et al. [13] propose a wavelet-based burst 3.1

Peak Detection with Derivatives

(not peak) detection algorithm. The wavelet coefficients (as well

Since peaks are local maxima, we can use mathematical methods

as window statistics such as averages) for Haar wavelets are

for finding them. But because we are not working with functions

organized in a special data structure called the shifted wavelet

but rather noisy discrete signals, we need to modify them slightly.

tree (SWT). Each level in the tree corresponds to a resolution

Let us define

th

𝑠

as 𝑖

signal value. First we remove noise at the

or time scale and each node corresponds to a window. By auto-

𝑖

beginning and the end of the signal where there are no axes. To

matically scanning windows of different sizes and different time

do so we find the horizontal line with maximum support, where

resolutions, the bursts can be elastically detected (appropriate

support for line at height 𝑦 is defined as the number of signal

window size is automatically decided). Zhu and Shasha et al. [13]

values 𝑠 for which |𝑦 − 𝑠 | < 𝑚𝑎𝑟𝑔𝑖𝑛. For line height 𝑦 we take apply their technique to detecting Gamma Ray bursts in real-

𝑖

𝑖

𝑛 equally spaced values from interval [𝑚𝑖𝑛, 𝑚𝑎𝑥 ] and half the time in the Milagro astronomical telescope, which vary widely

distance between consecutive 𝑦 values is used as 𝑚𝑎𝑟𝑔𝑖𝑛. Here in their strength and duration (from minutes to days). Harmer

𝑚𝑖𝑛 and 𝑚𝑎𝑥 represent minimum and maximum value of the

et al. [7] propose a momentum-based algorithm to detect peaks.

signal. To remove noise and normalize signal, we now define:

The idea is compute velocity (i.e., rate of change) and momentum

(i.e., product of value and velocity) at various points. A “ball”

(0,

if 𝑠 < 𝑦 + 2 ∗ 𝑚𝑎𝑟𝑔𝑖𝑛.

𝑖

dropped from a previously detected peak will gain momentum

ˆ𝑠 =

𝑖

𝑠 −𝑚𝑖𝑛

𝑖

,

otherwise.

as it climbs down and lose momentum as it climbs the next peak;

𝑚𝑎𝑥 −𝑚𝑖𝑛

the point where it comes to rest (loses all its momentum) is the

On signal ˆ𝑠 we calculate first and second derivative - ¤𝑠 and ¥𝑠 -

next peak. Simple analogs of the laws in Newtonian mechanics

using finite difference, which is implemented using convolution

are proposed (e.g., friction) to compute changes in momentum

with kernels [−0.5, 0, 0.5] and [1, −2, 1]. Finally peaks can be

as the ball traverses the time-series.

acquired by finding indices 𝑖 for which ˆ𝑠 > 0.25, | ¤𝑠 | < 0.01

𝑖

𝑖

and |¥𝑠 | < 0 while only taking peaks which are local minima, i.e.

𝑖

𝑠

> max{𝑠

𝑖

𝑖 −1, 𝑠𝑖 +1 }.

3

METHODOLOGY

This procedure achieves accuracy ≈ 90% when using sensor

In this section, algorithms for peak detection are described. Each 1. When using other sensors, accuracy is lower, but it can be

machine learning method uses 62076 samples (vehicles) and up

improved by choosing the correct sensor for every instance. We

to 16 signals for classification of number of axles. Typically, only do this by training nine models: one regression model 𝑀 for

𝑎

the first signal was chosen for training the model. This is because predicting number of axes and eight models 𝑀 , 𝑘 = 1..8, for

𝑘

20





Peak Detection for Classification of Number of Axles Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

predicting whether prediction on sensor 𝑘 is correct. We only

filters and sizes reversed and max pooling layers replaced with

use sensors from lane 1, since sensors on the other lane give poor up sampling. This model is then trained using Adam optimizer

accuracy.

and binary cross entropy loss function.

First we define (𝑖 )

𝑝

as correct number of peaks for instance

After model is trained, peaks on new instance can be detected

(𝑖 )

by feeding sensor 1 signal 𝑠 to it to obtain prediction vector 𝑝.

𝑖 and 𝑝

as number of peaks detected by procedure described

𝑘

in this section for instance

Peaks are now located at indices for which prediction value is

𝑖 using sensor 𝑘 . Now we create

matrix

a strong enough local maximum and signal amplitude is high

𝑋

and vector 𝑦 on which we train gradient boosting

regression. This model predicts number of axes from number

enough:

of peaks detected on all sensors. Matrix 𝑋 contains one row

𝑝𝑒𝑎𝑘𝑠 = {𝑖 | 𝑝

≥ max{𝑝

> 𝑇

> 0.15},

𝑖

𝑖 −5:𝑖 +5 } ∧ ˆ

𝑝𝑖

1 ∧ ˆ

𝑠𝑖

(

(

(

(

𝑖 )

𝑖 )

𝑖 )

𝑖 )

𝑥

= [𝑝

, 𝑝

, . . . , 𝑝

] for each instance 𝑖 with one column

1

2

8

while skipping instances for which max{𝑝 | 𝑖 ∈ 𝑝𝑒𝑎𝑘𝑠 } < 𝑇

𝑖

2.

for every sensor, while vector 𝑦 contains ground truth values for Here ˆ

𝑝 , ˆ

𝑠 are normalized to contain values in [0, 1] and 𝑇1, 𝑇2 are

number of axes:

thresholds that need to be selected. For 𝑇1 = 0.01 and 𝑇2 = 0.5

h

(1)

(2)

(

h

𝑚 ) i𝑇

(1)

(2)

(𝑚 ) i𝑇

accuracy 99.6% is achieved with 20% skipped instances.

𝑋 =

𝑥

, 𝑥

, . . . , 𝑥

,

𝑦 =

𝑝

, 𝑝

, . . . , 𝑝

.

Here 𝑚 is number of all instances. Other eight models use the

3.3

Peak Detection with Artificial Neural

same matrix 𝑋 , but different vector 𝑦. Model 𝑀 which predicts

𝑘

Network

whether detection using sensor 𝑘 produces correct number of

Neural networks, also known as artificial neural networks (ANNs)

peaks uses:

or simulated neural networks (SNNs), are a subset of machine

h

(1)

(1)

(2)

(2)

(𝑚 )

(𝑚 ) i𝑇

learning and are at the heart of deep learning algorithms. Their

𝑦

= 𝑝

= 𝑝

, 𝑝

= 𝑝

, . . . , 𝑝

= 𝑝

,

𝑘

𝑘

𝑘

𝑘

name and structure are inspired by the human brain, mimicking

where equality comparison evaluates to 1 when true and 0 when

the way that biological neurons signal to one another [12].

false. Gradient boosting classifiers are used for these models.

Artificial neural networks (ANNs) are comprised of a node

After all nine models are trained, peaks on a new instance

layers, containing an input layer, one or more hidden layers,

𝑚 + 1 can be detected by first using the described peak detection and an output layer. Each node, or artificial neuron, connects

procedure on all eight sensors to obtain input vector (𝑚+1)

𝑥

:

to another and has an associated weight and threshold. If the

(𝑚+1)

(𝑚+1)

(𝑚+1)

(𝑚+1)

output of any individual node is above the specified threshold

𝑥

= [𝑝

, 𝑝

, . . . , 𝑝

].

1

2

8

value, that node is activated, sending data to the next layer of the This vector is then first fed into 𝑀 model to predict number

𝑎

network. Otherwise, no data is passed along to the next layer of

of axes and the result is rounded to closest integer value to get

the network [12].

𝑎. Furthermore models 𝑀

are used to get confidence 𝑐 for

𝑘

𝑘

In this method, artificial neural network was used to predict

each sensor. Now valid sensors are the ones using which correct

the number of peaks. Neural networks were a viable solution for

number of peaks were detected and have confidence higher than

this problem, because we had enough data at our disposal for

some threshold 𝑇 :

deep learning. Whole signal from sensor 1 was provided as input

(𝑚+1)

layer. Architecture of the neural network contains two hidden

𝑠𝑒𝑛𝑠𝑜𝑟 𝑠 = {𝑘 | 1 ≤ 𝑘 ≤ 8 ∧ 𝑝

= 𝑎 ∧ 𝑐 > 𝑇 }.

𝑘

𝑘

layers, with 16 and 12 neurons, respectively. Output data (number

If 𝑠𝑒𝑛𝑠𝑜𝑟𝑠 = ∅, instance 𝑚 + 1 is skipped, otherwise 𝑎 axes are of peaks) was one-hot encoded, therefore softmax activation func-predicted and min{𝑠𝑒𝑛𝑠𝑜𝑟𝑠 }is the best sensor for detection. For tion was used in the output layer. Model returned the probability

𝑇 = 0.95 this system has accuracy 99.5% while skipping 20% of

for each class. In the end, an algorithm picked the column with

instances.

the highest probability (𝑖-th column depicts 𝑖 number of peaks).

This model achieved 91% accuracy for predicting the number of

3.2

Peak Detection with Encoder/Decoder

peaks.

Since we know where peaks are located in every signal, we can

train a model that will for every instance predict locations of

3.4

Peak Detection with Convolution Neural

peaks. Because we are working with time series data, we can

Network

use a one dimensional convolutional neural network with au-

Convolutional neural networks are distinguished from other neu-

toencoder architecture. This allows us to predict locations for

ral networks by their superior performance with image, speech,

variable number of peaks. Inputs and outputs have the same

or audio signal inputs. They have three main types of layers,

dimensions, while the model consists of two parts: encoder, to

which are:

create low dimensional embedding in latent space, and decoder,

to reconstruct output from it.

• Convolutional layers which convolve the input and pass

As inputs we use signals from sensor 1. On output we want

its result to the next layer. This is similar to the response

to predict a vector of the same dimension, which has ones in

of a neuron in the visual cortex to a specific stimulus. Each

time slots containing a peak and zeros everywhere else. Because

convolutional neuron processes data only for its receptive

CNNs take inputs of the same length, we pad all input and output

field.

vectors to maximum length. To make maximum length smaller,

• Pooling layers which reduce the dimensions of data by

we use the noise removal method from section 3.1 to crop noise combining the outputs of neuron clusters at one layer into

at the beginning and at the end from the signals.

a single neuron in the next layer.

Encoder is made of 3 convolutional layers. Each is followed

• Fully-connected layers which connect every neuron in

by batch normalization and max pooling of size 2. Convolutional

one layer to every neuron in another layer.

layers use ReLU activation, 8, 16 and 32 filters and sizes 5, 2 and With each layer, the CNN increases in its complexity, identify-3 respectively. Decoder has the same structure with number of

ing greater portions of the required information [1].

21





Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

Žiga Kolar, et al.

Convolutional neural network was utilized to predict the num-

(time series) [3]. After the features were extracted, they were used ber of peaks. The motivation for the usage of this type of network by gradient boost classifier for predicting the number of peaks.

comes from a fact that convolutional neural networks work well

This approach produced 89% accuracy predicting the number of

with time series data. Whole signal from sensor 1 was used as

peaks.

input layer. Architecture of the network contains three 1D con-

volution layers and three 1D pooling layers. At the end we used

4

CONCLUSION AND DISCUSSION

the fully connected layer with 100 neurons. Output layer has

We defined and presented the problem of peak detection from a

a softmax activation layer. Similarly than in subsection 3.3, the raw signal. Providing the example, we showed how the problem

model returned the probability for each class and in the end, algo-of peak detection can be translated into detecting the number of

rithm picked the column with the highest probability. This model

axles in vehicles. Various algorithms for predicting the number

achieved 97% accuracy. If we decide to skip 6.5% samples that are

of peaks (axles) were presented. The solution with derivative, the below the 99% probability threshold, we achieve the accuracy of

solution with encoder and decoder and the solution with convo-

99.1%.

lution neural network produced the best results, 99% accuracy

with a certain percentage of skipped instances. In future work,

3.5

Peak Detection with Predefined Method

the mentioned results can be tweaked and improved by using dif-

Find_peaks

ferent learning parameters, e.g. different learning rate, different Another method for peak detection is by using the predefined

number of neurons, different activation function. Furthermore,

function named

better results can be achieved by changing the architecture of

𝑓 𝑖𝑛𝑑 _𝑝𝑒𝑎𝑘𝑠 . This function takes a 1-D array and

finds all local maxima by simple comparison of neighboring val-

the neural network, e.g. different or more convolution or pooling

ues. In the context of this function, a peak or local maximum

layers. The results of peak detection can also be extended into

is defined as any sample whose two direct neighbours have a

determining the axle distances. Once the axle number is accu-

smaller amplitude [6]. Because each signal has different maxi-rately predicted, a new set of algorithms can be implemented to

mum height, the parameter in function find_peaks named

solve this new task.

ℎ𝑒𝑖𝑔ℎ𝑡

differs from sample to sample. Height parameter is defined as

minimal required height for peaks to be detected. Peaks below

ACKNOWLEDGMENTS

that threshold are not detected. Height was calculated by formula: This study received funding from company Cestel. The authors

height = | |max(sensorHeight)| - |min(sensorHeight)| | / 10. The

acknowledge the funding from the Slovenian Research Agency

above described method achieved 89% accuracy and also returned

(ARRS), Grant (PR-10495) and Basic core funding P2-0209.

the position of every peak. An example can be seen on Figure 2

on which 2 peaks are detected. They are marked with 2 oranges

REFERENCES

crosses.

[1]

Saad Albawi, Tareq Abed Mohammed, and Saad Al-Zawi. 2017. Understanding of a convolutional neural network. In 2017 international conference on engineering and technology (ICET). Ieee, 1–6.

[2]

Ivano Azzini, Rossana Dell’Anna, Federica Ciocchetta, Francesca Demichelis, Andrea Sboner, Enrico Blanzieri, and A Malossini. 2004. Simple methods for peak detection in time series microarray data. Proc. CAMDA’04 (Critical Assessment of Microarray Data).

[3]

Maximilian Christ, Nils Braun, Julius Neuffer, and Andreas W Kempa-Liehr.

2018. Time series feature extraction on basis of scalable hypothesis tests (tsfresh–a python package). Neurocomputing, 307, 72–77.

[4]

Kevin R Coombes, Spiridon Tsavachidis, Jeffrey S Morris, Keith A Baggerly, Mien-Chie Hung, and Henry M Kuerer. 2005. Improved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform. Proteomics, 5, 16, 4107–4117.

[5]

Pan Du, Warren A Kibbe, and Simon M Lin. 2006. Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching. bioinformatics, 22, 17, 2059–2065.

[6]

2022. Find peaks. https://docs.scipy.org/doc/scipy/reference/generated/scip

y.signal.find_peaks.html. (2022).

[7]

Karl Harmer, Gareth Howells, Weiguo Sheng, Michael Fairhurst, and Farzin Deravi. 2008. A peak-trough detection algorithm based on momentum. In 2008 Congress on Image and Signal Processing. Vol. 4. IEEE, 454–458.

Figure 2: Signal with two peaks. Location of the peak is

[8]

Valentin T Jordanov and Dave L Hall. 2002. Digital peak detector with noise marked with an orange cross.

threshold. In 2002 IEEE Nuclear Science Symposium Conference Record. Vol. 1.

IEEE, 140–142.

[9]

Eva Lange, Clemens Gröpl, Knut Reinert, Oliver Kohlbacher, and Andreas Hildebrandt. 2006. High-accuracy peak picking of proteomics data using wavelet techniques. In Biocomputing 2006. World Scientific, 243–254.

3.6

Peak Detection with Library Tsfresh

[10]

Meng Ma, Arjan Van Genderen, and Peter Beukelman. 2005. Developing and implementing peak detection for real-time image registration. In Proceed-Another alternative approach for peak prediction is by using

ings of the 16th Annual Workshop on Circuits, Systems & Signal Processing (ProRISC2005). Citeseer, 641–652.

the python package named 𝑡𝑠 𝑓 𝑟𝑒𝑠ℎ. It automatically calculates

[11]

GM Nijm, AV Sahakian, S Swiryn, and AC Larson. 2007. Comparison of a large number of time series characteristics, the so called fea-signal peak detection algorithms for self-gated cardiac cine mri. In 2007

tures. Furthermore, the package contains methods to evaluate

Computers in Cardiology. IEEE, 407–410.

[12]

Sun-Chong Wang. 2003. Artificial neural network. In Interdisciplinary com-the explaining power and importance of such characteristics for

puting in java programming. Springer, 81–100.

regression or classification tasks. 𝑡𝑠 𝑓 𝑟𝑒𝑠ℎ is used for systematic

[13]

Yunyue Zhu and Dennis Shasha. 2003. Efficient elastic burst detection in data feature engineering from time-series and other sequential data.

streams. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 336–345.

These data have in common that they are ordered by an indepen-

dent variable. The most common independent variable is time

22





Unified Question Answering in Slovene

Katja Logar and Marko Robnik-Šikonja

University of Ljubljana, Faculty of Computer and Information Science Ljubljana, Slovenia

kl2164@student.uni- lj.si, marko.robnik@fri.uni- lj.si

ABSTRACT

on the multilingual mT5 model [13], using English and Slovene datasets. Finally, we perform a qualitative analysis of the obtained Question answering is one of the most challenging tasks in lan-models. The results show that our system is currently the best

guage understanding. Most approaches are developed for English,

performing QA system for Slovene. We make its source code

while less-resourced languages are much less researched. We

1

freely accessible .

adapt a successful English question-answering approach, called

The paper is split into four further sections. In Section 2, we UnifiedQA, to the less-resourced Slovene language. Our adapta-outline the related work on QA in Slovene. Section 3 presents our tion uses the encoder-decoder transformer SloT5 and mT5 models

adaptation of UnifiedQA methodology and the applied Slovene

to handle four question-answering formats: yes/no, multiple-

QA datasets, and Section 4 discusses different evaluation settings choice, abstractive, and extractive. We use existing Slovene adap-and their results. In Section 5, we present the findings and ideas tations of four datasets, and machine translate the MCTest dataset.

for further improvements.

We show that a general model can answer questions in different

formats at least as well as specialized models. The results are fur-2

RELATED WORK

ther improved using cross-lingual transfer from English. While

we produce state-of-the-art results for Slovene, the performance

The QA in Slovene is relatively unexplored. In the pre-neural

still lags behind English.

setting, Čeh et al. [1] developed a closed-domain QA system for answering common questions that arise during students’ studies

KEYWORDS

at the University of Maribor, Faculty of Electrical Engineering,

question answering, Slovene language, deep neural networks,

Computer Science and Informatics. The translation of the Su-

encoder-decoder models, natural language processing

perGLUE benchmark suite to Slovene in 2021 [14] provided four partially human, partially machine translated QA datasets (BoolQ,

1

INTRODUCTION

COPA, MultiRC, and ReCoRD) and evaluation of Slovene BERT

models. Ulčar et al. [11] adapted the SloT5 model for the yes-Most studies for the question answering (QA) task deal with

no and multiple-choice questions. Finally, Zupanič et al. [15]

the English language. This leaves many language specifics, not

translated the SQuAD 2.0 dataset from English and adapted dif-

present in English, potentially inadequately addressed. E.g., some ferent multilingual models. They achieved the best result with

problematic language specifics in morphologically-rich Slovene

the SloBERTa 2.0 model [12]. In contrast to the above works, we language are noun and adverb declension, three different genders,

apply the transfer learning paradigm within the encoder-decoder

three counts, the person or pronoun being hidden in a verb, etc.

SloT5 and mT5 models and provide a unified approach to different

An additional problem for less-resourced languages is the lack of

QA formats, obtaining the best results so far.

suitable datasets for QA.

Khashabi et al. [5] argue that building specialized models 3

METHODOLOGY

for each QA dataset or QA format is unnecessary, as they all

require a similar inference capability. Therefore, it is possible to Our methodology follows Khashabi et al. [5] UnifiedQA method-develop one model capable of answering questions in different

ology. The authors define four QA formats (extractive, abstractive, formats. They call their approach UnifiedQA, and we adapted

multiple-choice, and yes/no) and unify the learning approach to

this approach to Slovene.

these formats. The extractive format requires that the answer is

The number of QA datasets in Slovene is much lower than

directly stated in the supplied context as a substring. The abstracused in the original UnifiedQA. We found four partially human-

tive format requires paraphrasing of the given context and the

translated but mostly machine-translated datasets. To improve

answer may require linking information from several sentences.

that, we first machine translate the additional MCTest dataset

The multiple-choice datasets have possible answers listed and

[9] into Slovene and fix translation errors.

the aim is to select the given option correctly. Finally, the yes/no Our method is based on the pretrained Slovene encoder-decoder

questions require only yes or no as an answer.

transformer model SloT5 [11]. We finetune the model on the five The datasets with different QA formats are converted to text

QA datasets and analyze its performance. We also test the role

format, with parts of the input separated by the "\n" sepa-of uppercase and lowercase letters, the impact of unanswerable

rator. Extractive, abstractive and yes/no questions are coded

questions, and the contribution of each dataset to the perfor-

as "question \n context" and multiple-choice questions as mance of the unified model. Next, we test the cross-lingual trans-

"question \n possible choices \n context". Here, the

fer and train a multilingual question answering model based

possible choices are indicated in capital letters from A onwards

(A) choice 1 (B) choice 2....

Permission to make digital or hard copies of part or all of this work for personal We initially considered four QA datasets. Three stem from

or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the translation of the SuperGLUE benchmark to Slovene [14]:

the full citation on the first page. Copyrights for third-party components of this MultiRC [4] (abstractive), COPA [10] (multiple-choice) and BoolQ

work must be honored. For all other uses, contact the owner /author(s).

[2] (yes/no). We also used the SQuAD 2.0 [8] (extractive) Slovene Information Society 2022, 11–14 October 2022, Ljubljana, Slovenia

© 2021 Copyright held by the owner/author(s).

1 https://github.com/klogar/QAslovene

23





Information Society 2022, 11–14 October 2022, Ljubljana, Slovenia Katja Logar and Marko Robnik-Šikonja

translation [15]. SQuAD 2.0 contains unanswerable questions, dataset in [5]). The results are presented in Table 2. The results and some are also present in MultiRC. As we focus on the reading

for BoolQ and MCTest are slightly worse than originally reported,

comprehension task, all selected datasets have a context. COPA

which could be attributed to slightly different parameters for text is a commonsense reasoning dataset, which is not our primary

generation. We achieved a much worse result for the SQuAD 2.0

focus, but we included it due to being human translated into

dataset, with 𝐹

only 46.1% rather than 67.6%. Trying to replicate

1

2

Slovene. BoolQ, MultiRC, and SQuAD 2.0 are partially human

the published scores with the original code , we obtained similar translated [14, 15].

results to ours . However, we analyzed the difference and believe

To have a non-commonsense multiple-choice dataset, we ma-

that at least some of them are due to unanswerable questions, as

chine translated the MCTest dataset [9] and fixed some transla-the 𝐹

score is 84.5% for questions that have an answer and only

1

tion errors. To reduce the cost of translation, we partially used

7.8% for unanswerable questions. The UnifiedQA model, there-

the commercial solution DeepL [3] and partially an internal neu-fore, does a poor job of detecting if a question is unanswerable

ral machine translator of a bit lesser quality. Later, we translated from the context.

the entire MCTest dataset with the DeepL translator and made

it publicly available in our repository. However, the reported

Table 2: Our and published results of the UnifiedQA

results are obtained using the initial mixed translation setting.

(UniQA) approach on English datasets using the T5small

As the starting training model for monolingual Slovene Uni-

model.

fiedQA models, we used the monolingual Slovene variant of the

T5 transformer encoder-decoder model [7], called SloT5 [11]. For Dataset

BoolQ

COPA

MCTest

MultiRC

SQuAD 2.0

the cross-lingual transfer experiments, we applied the multilin-

Metric

CA

CA

CA

ROUGE-L

𝐹1

gual variant of T5, called mT5 [13]. Due to computational time UniQA(publ.)

0.771

/

0.800

/

0.676

and GP U memory limitations, we used the SloT5 and mT5 mod-

UniQA(ours)

0.757

0.560

0.762

0.536

0.461

els of the smallest size (60M and 300M parameters, respectively).

Originally, Khashabi et al. used the T5 model [7] of the largest possible size (11B parameters) and the BART

model [6] as

large

4.3

Slovene Monolingual Results Using SloT5

a starting point for the UnifiedQA model. However, they also

report results for the T5

model, which we report for com-

In the Slovene monolingual setting, we compare different vari-

small

parison, so all models are of comparable sizes. Table 1 lists the ants of Slovene UnifiedQA models and report the results in Table

parameters used to finetune our models.

3. We adapted the models for each QA format separately and obtained so-called specialized models. These provided a baseline

Table 1: Parameters for finetuning UnifiedQA models.

for what could be achieved with each individual QA format. We

then trained the SloUnifiedQA model using all available Slovene

datasets. We also investigated the impact of unanswerable ques-

Parameter

Value

tions (SloUnifiedQA-NA, SloUnifiedQA-NA2, explained below)

Maximum input size [tokens]

512

and the use of only lower case letters (SloUnifiedQA-LC).

Maximum output size [tokens]

100

Number of epochs

25

Table 3: Comparing variants of Slovene UnifiedQA ap-

Batch size

8

proach (based on the SloT5 model). Besides the unified

Number of beams

4

model, we report the results of specialized models for each

Learning rate

5e-5

QA format (specialized), the best results published so far on

these datasets (published), and the default classifier. The ef-

fect of unanswerable questions and lowercasing is analyzed

4

EXPERIMENTS AND RESULTS

in the bottom part of the table. Note that SloUniQA-NA is

tested on modified datasets without unanswerable ques-

In this section, we report our work on empirical evaluation. We

tions, so the results for this model are incomparable.

present the evaluation metrics, original English results, experi-

ments and results in the monolingual Slovene setting, and in the

Dataset

BoolQ

COPA

MCTest

MultiRC

SQuAD 2.0

cross-lingual transfer setting.

Metric

CA

CA

CA

ROUGE-L

𝐹

Avg.

1

4.1

Evaluation Metrics

SloUniQA

0.683

0.532

0.463

0.310

0.555

0.509

specialized

0.688

0.486

0.439

0.255

0.554

0.484

For each dataset, we use a different evaluation metric. For BoolQ, published

0.666

0.500

/

/

0.739

/

default

0.623

0.500

0.269

/

/

/

we report the classification accuracy; for SQuAD 2.0, the 𝐹1

SloUniQA-NA

0.675

0.524

0.454

0.319

0.637

0.522

score; for MultiRC, we use ROUGE-L; and for the multiple-choice

SloUniQA-NA2

0.695

0.554

0.474

0.321

0.556

0.520

datasets (MCTest and COPA), we calculate the best match be-

SloUniQA-LC

0.686

0.530

0.449

0.259

0.533

0.491

tween the generated text and the offered options and compute

the classification accuracy. In all cases, the answers are first

Comparing the SloUnifiedQA model with specialized models,

normalized (removing punctuation and unnecessary spaces and

the models achieve better results for the multiple-choice datasets converting the text to lowercase).

(COPA and MCTest) and the abstractive dataset (MultiRC). The

4.2

English UnifiedQA Results Using T5

improvement for the extractive dataset is minimal, and we ob-

small

serve a slight decrease in accuracy for the yes/no dataset (BoolQ).

First, we replicated the results of the original English UnifiedQA Better results are also obtained compared to all main classifiers.

[5] and also obtained the results for the datasets not originally used, i.e. COPA and MultiRC (the latter was only used as a yes/no

2 https://github.com/allenai/unifiedqa

24





Unified Question Answering in Slovene

Information Society 2022, 11–14 October 2022, Ljubljana, Slovenia

Comparing SloUnifiedQA on Slovene with the English Uni-

on which models were not trained. Overall, the COPA dataset

fiedQA model on English datasets (in Table 2), the English model contributes the least to the performance of SloUnifiedQA, the

gives better results for all selected formats except SQuAD 2.0.

corresponding model achieving almost the same performance.

Interestingly, the English and Slovene models have different prob-

lems with SQuAD 2.0. The Slovenian one predicts unanswerable

Table 4: Contribution of datasets in the unified model by

questions too often (it has 𝐹

score of 60,3% for unanswerable

omitting one dataset at a time. The red color indicates the

1

questions and only 50,4% for answerable ones, while incorrectly

two largest performance drops for each dataset.

identifying 13% of answerable questions as unanswerable), the

English one too rarely. At the same time, the English model never

Dataset

BoolQ

COPA

MCTest

MultiRC

SQuAD2.0

wrongly predicts that a question is unanswerable. This is likely

Metric

CA

CA

CA

ROUGE-L

𝐹

Avg.

1

due to unanswerable questions making up a larger proportion

SloUniQA

0.683

0.532

0.463

0.310

0.555

0.509

of the dataset in the Slovene training dataset than in the English no BoolQ

0.001

0.522

0.486

0.319

0.561

0.378

no SQuAD 2.0

0.664

0.516

0.451

0.258

0.120

0.402

one. For other datasets, the biggest difference in metrics can be

no MCTest

0.676

0.510

0.351

0.317

0.560

0.483

observed in the MCTest multiple-choice dataset, where the dif-

no MultiRC

0.690

0.536

0.457

0.209

0.552

0.489

ference is 33%. We attribute the worse result of SloUnifiedQA to

no COPA

0.683

0.510

0.456

0.319

0.554

0.504

machine translations and a much smaller training dataset, espe-

cially for the multiple-choice questions; as in the original work, the authors use three additional datasets in addition to MCTest.

4.4

Cross-Lingual Transfer Using mT5

Compared to other published works on the same datasets,

There are only a few QA datasets in Slovene, so we checked if

we achieve better results with the SloUnifiedQA on the BoolQ

using transfer from additional English datasets can improve the

and COPA datasets compared to Ulčar and Robnik-Šikonja [11],

Slovene results. We used three different collections of datasets.

while on the SQuAD 2.0 dataset, Zupanič et al. [15] achieve a

• SLO: Slovene datasets BoolQ, COPA, MCTest, MultiRC

significantly better result (almost 20%). Here, Ulčar and Robnik-

and SQuAD 2.0 (described in Section 3).

Šikonja [11] also use the SloT5 model with the textual output,

• ANG5: English datasets BoolQ, COPA, MCTest, MultiRC,

while Zupanič et al. [15] use the SloBERTa model and only predict and SQuAD 2.0 (the English dataset, whose translations

the span of the answer, which is an easier task.

form the SLO collection).

4.3.1

The Effect of Unanswerable Questions.

• ANG9: English datasets BoolQ, COPA, MCTest, MultiRC,

Unanswerable questions account for about one-third of all train-

and SQuAD 2.0 and all datasets, used by Khashabi et al.

ing examples, and models could overfit such questions. To ad-

[5], except SQuAD 1.1, i.e. NarrativeQA, RACE, ARC, and dress this issue, we train two models, SloUnifiedQA-NA and

OBQA.

SloUnifiedQA-NA2. For the SloUnifiedQA-NA model, we removed

We trained five models using the multilingual mT5 model

all unanswerable questions. As evident from Table 3, for yes/no on these dataset collections and tested them on the SLO test

questions and multiple-choice questions the accuracy deterio-

sets. The first model, mSloUnifiedQA, was trained only on SLO

rates, while for abstractive and extractive questions the metrics

datasets and gives a baseline performance of mT5, also enabling

improve. The biggest improvement occurred for the SQuAD 2.0

comparison to monolingual SloT5. The mSloUnifiedQA

models

1

dataset, where the 𝐹

metric for answerable questions improved

1

were trained on both English and Slovene datasets simultane-

to 63.7%.

ously (only one phase), with the English dataset collection being

The SloUnifiedQA-NA was the basis for the SloUnifiedQA-

either ANG5 or ANG9. Only the SLO dataset group was used

NA2 model, which we trained on complete datasets, including

for validation. The mSloUnifiedQA

models were trained in two

2

unanswerable questions. The metrics slightly improved for BoolQ,

phases, first on the English datasets (ANG5 or ANG9), using the

COPA, and MCTest but may be due to the longer training time. No

ROUGE-L metric to select the best model, and the obtained model

improvement is observed for SQuAD 2.0; the 𝐹

for answerable

1

was then finetuned on the SLO dataset collection.

questions even drops to 51.5%.

The results of the five multilingual models are presented in

Table 5. Comparison between the monolingual SloUnifiedQA 4.3.2

The Effect of Using Lower Case Letters.

model (in Table 3) and the multilingual mSloUnifiedQA shows To analyze the effect of using only lower case letters, we trained that they perform on average equally well, with SloUnifiedQA

the SloUnifiedQA-LC model. The results are comparable for

performing better on the BoolQ, COPA and MultiRC datasets, and

BoolQ and COPA, but for MCTest, MultiRC, and SQuAD 2.0,

mSloUnifiedQA performing better on the MCTest and SQuAD

the results are worse. The uppercase letters, therefore, contain

2.0 datasets.

relevant information in Slovene.

Adding additional knowledge in English improved the aver-

4.3.3

Contribution of Datasets in the Unified Model.

age metrics by 3-4%, but the training time increased by about

To assess the impact of each dataset in the SloUnifiedQA model,

we dropped each training dataset in turn. The results are shown

Table 5: Results of cross-lingual transfer using additional

in Table 4. The largest individual performance drop is observed English datasets and multilingual models based on mT5.

for the model without BoolQ, as the yes/no questions become

unanswerable (the CA for the BoolQ dataset is almost 0%). This

Dataset

BoolQ

COPA

MCTest

MultiRC

SQuAD 2.0

also strongly affects the average impact but causes even slight

Meric

CA

CA

CA

ROUGE-L

𝐹

Avg.

1

improvements on MCTest, MultiRC, and SQuAD 2.0. The sec-

mSloUniQA

0.646

0.488

0.515

0.298

0.571

0.504

mSloUniQA

(ANG5)

0.672

0.486

0.582

0.308

0.587

0.527

1

ond largest average performance drop is achieved by the model

mSloUniQA

(ANG9)

0.676

0.508

0.579

0.340

0.598

0.540

1

without SQuAD 2.0, where a drop is observed on all datasets.

mSloUniQA

(ANG5)

0.682

0.504

0.564

0.313

0.593

0.531

2

mSloUniQA

(ANG9)

0.683

0.486

0.602

0.323

0.604

0.540

2

For other models, the drops are observed mainly on datasets

25





Information Society 2022, 11–14 October 2022, Ljubljana, Slovenia Katja Logar and Marko Robnik-Šikonja

four times for the models with the most datasets (ANG9). A

the Association for Computational Linguistics: Human Lan-

slight improvement can be observed for models using nine Eng-

guage Technologies, Volume 1 (Long and Short Papers). 2019,

lish datasets (ANG9) relative to those with only five English

pp. 2924–2936.

datasets (ANG5). The additional datasets contribute the most

[3]

DeepL Translator. [18 July 2022]. url: https://www.deepl.

to the MCTest multiple-choice results, but the performance on

com/translator.

MultiRC and SQuAD 2.0 also improved. On the other hand, de-

[4]

Daniel Khashabi et al. “Looking beyond the surface: A

spite the additional datasets, the results for BoolQ and COPA are

challenge set for reading comprehension over multiple

worse than for the monolingual model. Using one or two-phase

sentences”. In: Proceedings of the 2018 Conference of the

training does not make a difference on average, but there are

North American Chapter of the Association for Computa-

differences in individual datasets.

tional Linguistics: Human Language Technologies, Volume

1 (Long Papers). 2018, pp. 252–262.

4.5

Qualitative Analysis

[5]

Daniel Khashabi et al. “UnifiedQA: Crossing Format Bound-

Qualitative analysis of our models showed that the generated an-

aries With a Single QA System”. In: Proceedings of the 2020

swers are mostly substrings or given choices in multiple-choice

Conference on Empirical Methods in Natural Language Pro-

questions. Models cannot paraphrase, rephrase or provide an-

cessing: Findings. 2020, pp. 1896–1907.

swers in the correct Slovene case. They also have problems with

[6]

Mike Lewis et al. “BART: Denoising Sequence-to-Sequence

multi-part questions requiring multiple answers that are not

Pre-training for Natural Language Generation, Transla-

listed in the same place in the context. Machine translations,

tion, and Comprehension”. In: Proceedings of the 58th An-

which are not always grammatically correct or do not make it

nual Meeting of the Association for Computational Linguis-

clear what the question is asking for, also make answering the

tics. 2020, pp. 7871–7880.

questions difficult. The models performed best on factoid ques-

[7]

Colin Raffel et al. “Exploring the Limits of Transfer Learn-

tions that require a short answer.

ing with a Unified Text-to-Text Transformer”. In: Journal

of Machine Learning Research 21 (2020), pp. 1–67.

5

CONCLUSION AND FUTURE WORK

[8]

Pranav Rajpurkar, Robin Jia, and Percy Liang. “Know What

You Don’t Know: Unanswerable Questions for SQuAD”.

The main contributions of this work are the generative unified

In: Proceedings of the 56th Annual Meeting of the Associa-

QA models based on SloT5 and mT5 encoder-decoder transformer

tion for Computational Linguistics (Volume 2: Short Papers).

models, which set new state-of-the-art results for QA in Slovene.

2018, pp. 784–789.

An additional contribution is the machine-translated and cor-

[9]

Matthew Richardson, Christopher J.C. Burges, and Erin

rected MCTest dataset.

Renshaw. “MCTest: A Challenge Dataset for the Open-

We identify three possible directions for further work. First,

Domain Machine Comprehension of Text”. In: Proceedings

better translations or dedicated Slovenian datasets would im-

of the 2013 Conference on Empirical Methods in Natural

prove upon currently mainly machine-translated datasets. Sec-

Language Processing. 2013, pp. 193–203.

ond, larger T5 models and longer training times have shown

[10]

Melissa Roemmele, Cosmin Adrian Bejan, and Andrew S

better performance in English. In our work, we used only the

Gordon. “Choice of Plausible Alternatives: An Evaluation

smallest available T5 models due to the limited memory of the

of Commonsense Causal Reasoning.” In: AAAI spring sym-

GP U; we also limited training sessions to a maximum of 25 epochs.

posium: logical formalizations of commonsense reasoning.

Third, by using new datasets, especially additional multiple-

2011, pp. 90–95.

choice datasets, as evidenced by the improvement brought by

[11]

Matej Ulčar and Marko Robnik-Šikonja. “Sequence to se-

the introduction of English multiple-choice datasets. Further, ad-

quence pretraining for a less-resourced Slovenian lan-

ditional abstractive datasets could teach the models to rephrase

guage”. In: arXiv preprint arXiv:2207.13988 (2022).

better or that answers shall not be just substrings of the provided

[12]

Matej Ulčar and Marko Robnik-Šikonja. “SloBERTa: Slovene

context.

monolingual large pretrained masked language model”. In:

ACKNOWLEDGMENTS

Proceedings of the 24th International Multiconference Infor-

mation Society - IS 2021, Data Mining and Data Warehouses

Marko Robnik-Šikonja received financial support from the Slove-

- SiKDD. 2021.

nian Research Agency through core research programme P6-0411

[13]

Linting Xue et al. “mT5: A Massively Multilingual Pre-

and projects J6-2581 and J7-3159, as well as the Ministry of Cul-

trained Text-to-Text Transformer”. In: Proceedings of the

ture of Republic of Slovenia through the project Development of

2021 Conference of the North American Chapter of the As-

Slovene in Digital Environment (RSDO).

sociation for Computational Linguistics: Human Language

REFERENCES

Technologies. 2021, pp. 483–498.

[14]

Aleš Žagar and Marko Robnik-Šikonja. “Slovene Super-

[1]

Ines Čeh and Milan Ojsteršek. “Slovene language ques-

GLUE Benchmark: Translation and Evaluation”. In: Pro-

tion answering system”. In: Recent advances in computers:

ceedings of the Language Resources and Evaluation Confer-

Proceedings of the 13th WSEAS international conference on

ence. 2022, pp. 2058–2065.

computers (part of the 13th WSEAS CSCC multiconference).

[15]

Matjaž Zupanič et al. Cross-lingual Question Answering

2009, pp. 502–508.

with Transformers. Assignmnet in the NLP course, Uni-

[2]

Christopher Clark et al. “BoolQ: Exploring the Surprising

versity of Ljubljana, Faculty of Computer and Informa-

Difficulty of Natural Yes/No Questions”. In: Proceedings

tion Science. [20 June 2022]. url: https : / / github . com /

of the 2019 Conference of the North American Chapter of

mtzcorporations/NLP_TeamJodka.

26





Social Media Analysis for Assessing Resilience

Aljaž Osojnik

Bernard Ženko

Martin Žnidaršič

aljaz.osojnik@ijs.si

bernard.zenko@ijs.si

martin.znidarsic@ijs.si

Jožef Stefan Institute

Jamova cesta 39

1000 Ljubljana, Slovenia

ABSTRACT

The key element of the RESILOC framework is a methodology

for assessing resilience structured along six dimensions: Gover-

In this paper, we describe tools developed to investigate the

nance, Social, Economic, Infrastructure, Disaster Risk Reduction

potential use of social media analysis for resilience assessment.

and Environmental. Each dimension is described in terms of its

We focus on tweets as a data source and apply sentiment analysis,

attributes or indicators, the values of which are assessed with

topic detection and filtering approaches. We present computed

proxies, which are empirically measurable quantities. Indicators

aggregates with potential information on resilience, and a web

and proxies need to have associated scales and aggregation func-

application that was made for the use of domain experts. Finally,

tions that allow their calculation within the RESILOC tool [5]. For we discuss preliminary user feedback and lessons learned about

example, the social dimension of the resilience of a community

the applicability of our approach.

could be described with indicators such as Community engage-

KEYWORDS

ment, Social connectedness, Trust in authority and Risk awareness.

The Community engagement indicator could then be assessed

social media, sentiment analysis, resilience, disaster management

with proxies such as % of population who vote in local elections,

Number of NGOs for pre- and post-disaster response per capita

1

INTRODUCTION

and % of population undertaking voluntary work. Notably, the

Resilience is generally understood as the ability to adapt and

resilience assessment of different communities may include dif-

recover after a disruptive event. In this work, we focus on re-

ferent indicators and proxies. The RESILOC platform seeks to

silience of communities in the context of disaster management,

provide an initial set of indicators and proxies, allow for the addi-and adopt the following UN definition: Resilience is the ability of tion of new ones, and enable their aggregation and visualization.

a system, community or society exposed to hazards to resist, absorb, As some of the indicators mentioned above can also be as-accommodate, adapt to, transform and recover from the effects of

sessed through proxies based on social media analysis, we devel-

a hazard in a timely and efficient manner, including through the

oped a tool for investigating this approach. For example, proxies

preservation and restoration of its essential basic structures and (or their components) assessing the indicator Trust in authority

functions through risk management [7]. Resilience research aims could be assessed with techniques such as sentiment analysis. In

to develop strategies not focused on isolated risks, such as earth-particular, we could hypothesize that a more positive sentiment

quakes or fires, but on approaches that subsume and address all

in social media posts related to public authorities, e.g., disaster relevant risks, both natural and man-made. The strategies need to

response authorities, is related to higher trust in them (and con-

identify and account for different human, social, environmental,

sequently better adoption of any disaster relief measures they

economic and technological factors that influence behavior of a

introduce). This is the primary motivation that led the investiga-

community when facing a disaster. The goal is that, by adopting

tion presented in this paper. As our social media data source, we

2

these strategies, communities can perform their intended func-

use tweets, posts on the Twitter

microblogging platform. These

tions in normal and adverse times. A key element of a resilient

are public, abundant and have well supported APIs for collection

community is active involvement of local citizens and their active and filtering.

role in a decision making process.

The rest of the paper is organized as follows. Section 2 briefly The EU-funded project RESILOC (Resilient Europe and Soci-presents the the related work, Section 3 presents the social media eties by Innovating Local Communities, https://www.resilocpro

data used in our analysis and Section 4 presents the results of said

ject.eu/) aims to develop a holistic framework of studies, methods analyses. Section 5 presents our web tool for resilience assess-and software tools that can be used to assess the resilience of a

ment, which is part of the RESILOC framework. The final section

1

community in practice by Local Resilience Teams (LRT).

The

concludes our presentation, summarizes lessons learned during

final goal is to use this framework to identify new strategies for our analysis and provides some avenues for further research.

improving the processes of preparedness of local communities

against any kind of hazards.

2

RELATED WORK

Our work relates to two main fields of research. The first one

1 An LRT is a team in charge of resilience assessment and risk management of a is research on community resilience in the context of disaster

given community, typically organized within a civil protection organization.

management. Parker et al. Parker [14] addresses the problems of measuring and assessing resilience and warns that past attempts

Permission to make digital or hard copies of part or all of this work for personal to define comprehensive resilience assessment frameworks fre-or classroom use is granted without fee provided that copies are not made or quently led to simplifications and focus on a single risk. Particular distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this assessments of resilience can be found in the literature review

work must be honored. For all other uses, contact the owner /author(s).

conducted in the scope of the RESILOC project [11].

Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

© 2022 Copyright held by the owner/author(s).

2 https://twitter.com/

27





Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Osojnik, Ženko, Žnidaršič,

The second related field is social media sentiment analysis

approach described below, in Section 4.1. Frequent tokens are and associated techniques. Sentiment analysis [9] is a machine commonly appearing words, numbers or emojis, that we identify

learning field, that has benefited from the current rapid devel-

on a monthly basis according to the procedure in Section 4.2. The opment of natural language processing techniques [6] based on volume and sentiment aggregates are also calculated for specific

large corpora and deep learning developed in the recent years.

subsets of tweets called topics, which can be either automatically Sentiment analysis of social media posts has previously been

inferred by the mechanism in Section 4.3, or provided by the applied to resilience adjacent domains, such as disaster response

users.

and management [12, 1].

Research in the cross-section of both fields, i.e., resilience

4.1

Sentiment Analysis

and social media, has mostly focused on investigations on how

The goal of the sentiment analysis is to assess the sentiment

social media affects community and self resilience [8], including of tweets in a given community, its trends, and variations in

recent examples during the COVID-19 pandemic [16]. Using sentiment in general and in specific resilience-related topics.

social media analysis to assess resilience is, to the best of our

To assess the sentiment of tweets we use two machine-learned

knowledge, novel, and we were not able to find any similar tools

classifiers. The first is a three-class classifier, denoted as LO-

to the one presented in this paper.

GREG, that classifies tweets as positive, neutral, or negative, and employs logistic regression. It is trained on high-dimensional

3

DATA

vector representations of Italian tweets that include weighted

The data used in our analyses and visualizations are tweets that

words, pairs of consecutive words, 4-character sequences and

specifically mention target communities (using a predefined key-

emoji characteristics as representation elements, as presented

word), which were collected through the Twitter API for Aca-

in [10]. The second classifier is the two class (positive, negative)

3

5

demic Research . This allowed us to collect all tweets of interest, FEEL-IT sentiment classifier [2]

for Italian that uses word or

including those from the past.

subword series of character representation in a high dimensional

Notably, a tweet is not only the posted text, but rather meta-

vector space. It is a fine-tuned BERT-based model [6] for Italian.

data-rich data object containing a pleathora of information, such

4.2

Frequent Tokens

as its language, geolocation, author code, etc., and relations to

other tweets, e.g., if it is a response to another tweet or a retweet.

In addition to the information on volume and sentiment, we also

For our purposes, however, we only gather the unique tweet

provide the most frequent tokens (words, numbers or emojis) that

code/id (field id), creation time (field created_at), language (field appear during any given month. These are provided separately

lang) and text (field text).

for subsets of tweets, which are classified as positive, negative

The selected tweets are collected in dataset that we use for our

and neutral (when available).

analysis. The dataset is recreated during each repetition of the

Frequent tokens relate to the concepts mentioned in the tweets

analysis, due to potential tweet removal according to Twitter’s

and could, as such, be used do discover aspects of resilience

privacy mechanisms. Hence, the results shown in the online app

relevant to the community, thus potentially serving as proxy

are not static but can change with renewed analysis.

candidates. Frequent tokens are computed using the following

In RESILOC, four communities are studied as use-cases: Go-

procedure: (1) the text of all tweets is whitespace split into tokens rizia (Italy), Catania (Italy), West Achaia (Greece) and Tetovo

and cast to lower case, (2) unwanted tokens are removed, (3)

(Bulgaria). These four communities vary widely in size, which is

tokens are sorted by frequency of occurrence, and (4) the most

reflected in the amount of tweets in which they are mentioned.

frequent tokens are selected for presentation.

As the latter two communities are mentioned only in a couple of

Unwanted tokens, mentioned in the second step, are the Italian

tweets per month, the social media analysis was executed only

stop-words. These are filtered using the nltk library [3], as well for the first two. We considered extending the pool of tweets by

as using a custom unwanted tokens list, such as punctuation

including tweets that are geo-tagged to the selected communities,

characters and various versions of the names of the communities.

4

but only a small fraction of tweets contain such information.

In the final step, we check that the frequent tokens appear in

The data gathering process consists of collection and filtering.

enough different tweets. Namely, repeated tokens in a single

In particular, we collect all tweets that contain the name of the

tweet all count towards token frequency, but count only once

community (Gorizia or Catania) in the text of the post during a

in terms of tweet appearance. We prevent presenting frequent

given time period. These tweets are then filtered based on the

tokens that do not appear in enough individual tweets, with a

language (lang=it) to gather local posts and to filter out some of occurrence check using a predefined threshold.

the noise, e.g., caused by posts mentioning people (particularly

celebrities) with names that match the two communities.

4.3

Topic Modeling

To identify potential resilience-related topics, we wanted to au-

4

METHODS

tomatically model topics in the collected tweets. Our motiva-

There are four main data analysis results the online app: (1)

tion was that, given such topics, resilience experts could analyze volume and sentiment, (2) frequent tokens, (3) data for specific

the corresponding tweets and extract information useful for re-

topics and (4) sentiment aggregates and trends.

silience assessment and proxy construction.

Volume is the amount of tweets in a given time period, yearly or

Topic detection or modeling [13] is a common task in natural monthly, while sentiment refers to the yearly or monthly positive, language processing and aims at discovering topics, e.g., politics, neutral or negative sentiment detected in the tweets with the

sports, cycling, etc., that appear in a set of text documents, such as news articles or tweets. The assumption is that if a document dis-3 https://developer.twitter.com/en/products/twitter-api/academic-research cusses a specific topic, some words will appear more frequently.

4 In particular, out of 364531 tweets that mention Catania in 2020, only 8777 (ap-5

proximately 2.4%) were labeled with geo-location meta-data.

Available at https://github.com/MilaNLProc/feel- it.

28





Social Media Analysis for Assessing Resilience

Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

Topic modeling is an unsupervised classification method, similar

who can use them as inputs for proxy construction. The aggre-

to soft or fuzzy clustering, since a document can belong to more

gates are calculated monthly and yearly. The list of aggregates

than one topic or cluster. One popular algorithm for topic model-

and their descriptions is as follows.

ing is Latent Dirichlet Allocation (LDA) [15, 4], which takes the Positive ratio. Ratio of positive tweets vs the total number of

number of topics as an input parameter.

total tweets in the current time period.

After standard preprocessing of tweets required for topic mod-

Neutral ratio. Ratio of neutral tweets vs the total number of

eling (upper to lower case, removal of URLs, tokenization into

total tweets in the current time period.

words, removal of stopwords and other irrelevant words), we ap-

Negative ratio. Ratio of neutral tweets vs the total number of

6

plied the LDA algorithm with a preset number

of topics ranging

total tweets in the current time period.

from 3 to 15. We visualized the resulting topics as word clouds and Relative change of volume. Ratio of the number of tweets in

manually inspected them. We sought to identify topics related to

the current time period and previous time period.

resilience, such as sets of tweets discussing actions of authorities Relative change of positive tweets. Ratio of the number of

in response to natural disasters (floods, fires, etc.) or citizens positive tweets in the current and previous time period.

perception of authorities’ ability to act in case of such disasters.

Relative change of neutral tweets. Ratio of the number of

Our inspection was inherently subjective and mostly focused

neutral tweets in the current and previous time period.

on the top-ranked words, although we also considered standard

Relative change of negative tweets. Ratio of the number of

measures for topic evaluation (perplexity and coherence).

negative tweets in the current and previous time period.

Absolute change in volume. Difference of the number of tweets

in the current and previous time period.

Absolute change in positive tweets. Difference of the number

of positive tweets in the current and previous time period.

Absolute change in neutral tweets. Difference of the number

of neutral tweets in the current and previous time period.

Absolute change in negative tweets. Difference of the number

of negative tweets in the current and previous time period.

5

WEB APPLICATION

We make the collected summaries of volume and sentiment anal-

yses, frequent tokens and topic data available through a simple

web interface. To access the tool, which is intended for internal

use, a user provides a security access token, which determines

which community data the user is privileged to view.

Figure 1: An example of a detected topic. It focuses on

COVID-19 measures and conditions for crossing the border

between Italy and Slovenia.

The most interesting topic is presented in Figure 1 and includes tweets discussing the COVID-19 measures (mascherine is Italian

for masks) and conditions (condizioni) for crossing the border

between Italy and Slovenia, as Gorizia is a border town. Unfortu-

nately, we did not find any topics directly related to resilience, which could be used by resilience experts to assess resilience.

This analysis seems to infer that automatic topic modeling

from tweets is likely not very useful for assessing resilience, at least not in the context that we tried to use it, though this might be due to the nature of topic modeling. Namely, the topics that we get obtained were general, in that they cover general, rather than Figure 2: An example of a sentiment distribution as dis-resilience specific, concepts, and unpredictable in a sense that in played in the web application.

some circumstances (and locations) we might obtain resilience

All community data is split into sections based on available

related topics and in others not. Ultimately, the automatically

classifiers and time intervals. The application provides two op-

constructed topics were not very informative, and, as such, we

tions for the time interval, i.e., monthly and yearly views.

used topics constructed manually by resilience experts from the

The monthly view is composed of the following sections: a

involved communities.

total tweet count, tweet count by sentiment accompanied with

4.4

Aggregates and trends

a corresponding pie chart (as seen in Figure 2), a table of frequent tokens per classified sentiment (as seen in Table 1), a table To support facilitate the use of the results of our analyses in the with the calculated aggregates and trends and, finally, a topics

RESILOC platform, several aggregates are explicitly calculated.

section, that shows values for particular topics. In the final sec-These aggregates seek to capture the overall sentiment and senti-

tion, each defined topic has a subsection, where the user can see

ment trends over various time periods and are available to LRTs,

which tokens define the topic, its sentiment distribution and a

6We selected 15 topics as an amount that can be inspected manually.

corresponding pie chart, as shown in Figure 3.

29





Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

Osojnik, Ženko, Žnidaršič,

Table 1: An example set of detected frequent tokens.

resilience assessment models, but to be considered and prepared

for use by the users, i.e., employed with human oversight.

(a) Positive

(b) Negative

Analyses such as the ones presented in this paper are only use-

Token

Occurence

Token

Occurence

ful for large enough communities that get mentioned in tweets

edizioni

174

stato

102

frequently. Furthermore, tweets often do not represent the opin-

europea

141

recarsi

77

ion of the population at large. While they are suitable for analysis, oggi

139

solo

57

even in real time, their representativity is an issue that needs to nova

128

x

47

be considered when using such data.

capitale

119

ospedali

39

ACKNOWLEDGMENTS

This work has been supported by the RESILOC, which has re-

ceived funding from the European Union’s Horizon 2020 research

and innovation program under grant agreement No 833671. We

acknowledge also the financial support from the Slovenian Re-

search Agency for research core funding for the programme

Knowledge Technologies (No. P2-0103).

REFERENCES

[1]

Ghazaleh Beigi, Xia Hu, Ross Maciejewski, and Huan Liu. 2016. An overview of sentiment analysis in social media and its applications in disaster relief.

Sentiment analysis and ontology engineering, 313–340.

[2]

Federico Bianchi, Debora Nozza, and Dirk Hovy. 2021. "FEEL-IT: Emotion and Sentiment Classification for the Italian Language". In Proceedings of the 11th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Association for Computational Linguistics.

[3]

Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media.

[4]

David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet Figure 3: An example of a COVID-19 related topic.

allocation. Journal of Machine Learning Research, 3, (Mar. 2003), 993–1022.

[5]

Uberto Delprato, Joe Cullen, Thomas Spielhofer, Daniele Del Bianco, Rajen-dra Akerkar, Nadia Miteva, and Zlatka Gospodinova. 2022. RESILOC Project Deliverable 3.1 – RESILOC Resilience Indicators. Tech. rep. The RESILOC

Consortium. Retrieved Aug. 24, 2022 from https://www.resilocproject.eu/w

p- content/uploads/2022/05/RESILOC_D3.1_V7.0.pdf .

[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019.

BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, (June 2019), 4171–4186. doi:

10.18653/v1/N19- 1423.

[7]

United Nations Office for Disaster Risk Reduction. 2012. Terminology. Retrieved Aug. 24, 2022 from https://www.undrr.org/terminology/resilience.

[8]

Manon Jurgens and Ira Helsloot. 2018. The effect of social media on the dynamics of (self ) resilience during disasters: a literature review. Journal of Figure 4: An example of a yearly view of relative sentiment.

Contingencies and Crisis Management, 26, 1, 79–88. doi: 10.1111/1468-5973

.12212.

[9]

Bing Liu. 2015. Sentiment Analysis: Mining Opinions, Sentiments, and Emo-The yearly view is composed of the same sections as the

tions. Cambridge University Press.

[10]

Matej Martinc, Iza Skrjanec, Katja Zupan, and Senja Pollak. 2017. Pan 2017: monthly view, however, it concerns data for the entire year. In ad-author profiling-gender and language variety prediction. In CLEF (Working dition to the global pie charts (for sentiment distribution), graphs Notes).

[11]

Sjirk Meijer, Jon Hall, Rut Erdelyiova, Marcello Sabanes, Abby Onencan, for the progression of absolute and relative sentiment are shown

and Kerstin Junge. 2022. RESILOC Project Deliverable 2.6 – Analysis of based on month by month data, as shown in Figure 4.

different approaches to resilience also outside EU, Section 6. Tech. rep. The RESILOC Consortium. Retrieved Aug. 24, 2022 from https://www.resilocpr

6

CONCLUSION

oject.eu/wp- content/uploads/2021/04/RESILOC_D2.6- v6.0_Final.pdf .

[12]

Ahmed Nagy and Jeannie A Stamberger. 2012. Crowd sentiment detection We propose a novel approach to resilience assessment based

during disasters and crises. In ISCRAM.

[13]

Christos H. Papadimitriou, Hisao Tamaki, Prabhakar Raghavan, and San-on social media datasets. The analyses and tools described in

tosh Vempala. 1998. Latent semantic indexing: a probabilistic analysis. In the paper were developed and presented to potential users in

Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium

preliminary try-out sessions, i.e., as sprints in Agile software

on Principles of Database Systems (PODS ’98). Association for Computing Machinery, Seattle, Washington, USA, 159–168. doi: 10.1145/275487.275505.

development, as well as discussed with the project consortium’s

[14]

Dennis J Parker. 2020. Disaster resilience–a challenged science. Environ-domain experts. The approach is currently being evaluated in the

mental Hazards, 19, 1, 1–9.

[15]

Jonathan K Pritchard, Matthew Stephens, and Peter Donnelly. 2000. Infer-project trials, to quantify its usefulness based on expert feedback.

ence of Population Structure Using Multilocus Genotype Data. Genetics, 155, While automatic topic modeling resulted in some meaningful

2, (June 2000), 945–959. doi: 10.1093/genetics/155.2.945.

topics, these were mostly general and very few of them were

[16]

Lola Xie, Juliet Pinto, and Bu Zhong. 2022. Building community resilience on social media to help recover from the covid-19 pandemic. Computers in related to resilience. The users expressed preference for more

Human Behavior, 134, 107294. doi: 10.1016/j.chb.2022.107294.

focused topics, which can now be defined manually, and find the

new results interesting and potentially relevant. Interestingly,

there is a general preference to not directly use the automatically calculated aggregates of volume and sentiment as inputs to the

30





Urban Mobility Policy Proposal Using Machine-Learning

Techniques

Miljana Shulajkovska

Maj Smerkol

Matjaž Gams

miljana.sulajkovska@ijs.si

maj.smerkol@ijs.si

matjaz.gams@ijs.si

Jožef Stefan Institute

Jožef Stefan Institute

Jožef Stefan Institute

Jamova cesta 39

Jamova cesta 39

Jamova cesta 39

Ljubljana, Slovenia

Ljubljana, Slovenia

Ljubljana, Slovenia

ABSTRACT

increasing the usage of public transport, the system outputs the

most appropriate policy that satisfies those requirements.

The world’s demography is constantly increasing and most peo-

The rest of the paper is organised as follows. First, an overview

ple move to big cities. This growth in urbanization affects people’s of the proposed system is given. Then Section 3 describes the daily activities by encountering congestion, air and noise pol-process of collecting the data by giving a brief overview of the

lution, water and energy usage etc. To deal with such issues,

simulation tool used, the implemented scenarios and the key

city authorities are taking various actions in order to provide

performance indicators (KPIs). In Section 4 the machine learning the most optimal solution. In that terms testing, evaluating and

approach is discussed followed by a description of the applied

implementing different scenarios are of great importance that

methods and the experimental results. Finally in Section 6 a cost money and time at the same time. Therefore, in the era of

conclusion is given.

artificial intelligence, different approaches can be used to auto-

mate this process. In this paper, we propose a system that has

the potential to automatically propose mobility policies based

2

SYSTEM

on previously defined city changes. The decision-makers input

The key concept of the proposed system is to collect data from

the required city changes, while the system outputs a mobility

the microscopic traffic simulator. All the components are shown

policy that satisfies that specification. To implement the idea,

in Figure 1. The microscopic traffic simulator emulates the be-machine-learning algorithms are trained on data produced by a

haviour of all the people interacting on the mobility infrastruc-

microscopic traffic simulator. The system is tested on data rep-

ture. To run the simulator several input files are required such as resenting the city of Bilbao, where the policies are related to

network and travel demand. Then the output of the simulator is

closing the Moyua square in the city centre at a specific time and used to calculate the KPIs and later both are used to train the ML

for different duration, while the city changes are related to air

model as part of the ML module. The ML module takes as input

pollution and usage of different means of transport.

a required city change defined by the decision-makers and using

the trained model outputs a mobility policy that satisfies those

KEYWORDS

requirements.

traffic simulation, artificial intelligence, mobility policy

1

INTRODUCTION

According United Nations report [7] by 2050 two in three people will live in urban areas. However, as cities continue to grow they may face many challenges that affect the daily mobility services

and people’s movement in general. Therefore, finding a solution

that satisfies people’s needs is a crucial step toward building a

more sustainable mobility system. Currently, many traditional

approaches exist that rely on experts’ knowledge using results

from microscopic simulations. Those approaches include simula-

tion of different mobility policies which are then analysed by the decision-makers. The key issue is how to choose the most appropriate mobility policy that will satisfy the requirements defined

Figure 1: System description

by the decision-makers that meet the user’s needs. Achieving this

goal is impossible without using the help of modern technologies

such as machine learning.

In this paper, we propose a system that makes usage of ma-

3

DATA COLLECTION

chine learning methods to automate the process of mobility policy

Building a ML module/model that predicts the most suitable

suggestions. As decision-makers are interested in achieving a

mobility policy requires a sufficient amount of data related to a

particular set of goals (KPIs) such as reducing 𝐶𝑂

emissions or

2

particular set of policy actions and their consequences. In that

Permission to make digital or hard copies of part or all of this work for personal terms, the main source of data for the proposed system comes

or classroom use is granted without fee provided that copies are not made or from a microscopic traffic simulator as a very common research

distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this method for testing and evaluating different mobility situations. In work must be honored. For all other uses, contact the owner /author(s).

the following sections, a brief overview of the simulation tool is Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

given, and then different scenarios representing specific policies

© 2022 Copyright held by the owner/author(s).

are discussed. Finally, the KPIs used for this study are presented.

31





Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

M. Shulajkovska, M. Smerkol, M. Gams

Figure 2: System for Urban mobility policy design architecture.

3.1

Simulation

3.2

Scenarios

In this study, MATSim was used as the most suitable microscopic

simulation tool [3]. The main concept behind this tool is shown in Figure 3. It consists of an iterative process where plans from the travel demand are executed simultaneously in the mobsim and

scored in the scoring module. The score is a metric to evaluate

whether a plan is good or bad. It takes into account travelling

and waiting time, activity duration etc. More about scoring can

be read here [5]. Then a certain number of plans are chosen and modified in the replanning step. After a sufficient number of

iterations (in our case 200) an equilibrium is reached where no

more plans are evolving, producing higher scores.

Figure 3: MATSim cycle

Figure 4: Scenarios 𝐶𝑂2

A crucial step before running the simulator is to provide data

representative of the study area of our interest. The input data is related to the city network map, public transit schedules, travel

demand etc. The most challenging part is to construct the travel

demand as demographic and other people’s movement data is

hard to find. Therefore different techniques exist to solve the

issue. One approach is to replicate a real population (construct

synthetic population) using sample data and marginal distribu-

tion, and then assign activity location using origin-destination

(OD) matrices. The iterative proportional fitting (IPF) [1] algorithm is used to construct the synthetic population using sample

data from EUSILC [2] and demographic data provided by the city.

The IPF algorithm is one of the most widely used algorithms for

synthetic population reconstruction that combines both datasets

(sample data and marginal distributions) to produce weights that

show the number of replication of a specific person from the

sample data to a specific geographical zone while maintaining

the marginal distributions.

The simulation outputs a large amount of data that describes

people’s movements in form of events. Each event has a type

such as “vehicle enters traffic", “vehicle enters/leave link", “person starts activity" etc., time and person’s id that performs it.

Figure 5: Scenarios Public Transport

Other output files also exist as histograms of travel/wait times,

usage of different transport modes etc. All this data is used in the We have tested 40 different situations of one policy represented

calculation of KPIs for the ML module.

by the closure of the Moyua square from 8 am to 5 pm with a

32





Mobility Policy Proposal Using ML Techniques

Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

different duration from one to four hours. Results of the applied

4.1

Methods and Results

policy are shown in Figure 4 and Figure 5.

Since the target variables are continuous and there are multi-

On both plots, the x-axis represents the start time of closure,

ple of them, we deal with a multi-output regression problem.

while the y-axis represents the duration of closing the square

This involves predicting more numerical values at the same time

for transport in hours. On the first bar plot, the z-axis shows

which limits the usage of many algorithms that are designed in

the changes in percents of CO

warm emissions and the z-axis

2

predicting only a single numerical value. However, to solve the

on the second bar plot shows the changes in percents in public

problem several solutions exist:

transport usage compared to a situation when no changes in the

city are applied. In terms of reducing the CO

emissions, the best

2

scenario is if we close the square at 9 am for 1 hour. In that case

• Multi-output regression algorithms

the CO

has lowered for 0.9%. On the other hand, if the square is

2

• Wrapper multi-output regression algorithms

closed from 5 pm to 6 pm for traffic, the city gets congested in

– Direct multi-output regression

the surrounding areas and the CO

warm exhausts are increased

2

– Chained multi-output regression

by 26% even though the public transport usage has increased and

car usage has decreased as shown on the second bar plot.

One approach is to use regression algorithms that support

3.3

KPIs Calculation

multiple outputs directly such as linear regression, k-nearest

The key performance indicators (KPIs) represent the objectives

neighbours, decision tree, random forest etc. The other approach

defined by the decision makers that need to be achieved. In col-

is to divide the multi-output regression problem into multiple sub-laboration with the city, a set of KPIs was defined as follows:

problems (wrapper multi-output regression) and then deal with

• Air pollution

single regression problems. On one hand, there is direct multi-

: 𝐶𝑂

,

2

𝑁 𝑂 𝑥 , 𝑃 𝑀 cold/warm emissions.

• Usage of different modes

output regression where independent models are developed for

: car, bicycle, public transport,

the prediction of each numerical value. On the other hand, the

walk.

• Bike safety, bikeability

chained multi-output regression consists of dependent models

when predicting each numerical value.

The first set of KPIs related to air pollution is modelled using

To evaluate the models a cross-validation technique was used.

MATSIm additional emission package [6]. The emissions are The mean absolute error (MAE) performance metric is used as a

calculated using HBEFA (Handbook on Emission Factors for

score. The mean and standard deviation of the MAE are calculated

Road Transport) database [4] in combination with the simulation across all the folds and all repeats. Table 2 shows the results of output. As air pollution is caused by different contributions of

inherently multi-output regression algorithms. K-Nearest Neigh-

road traffic the emissions module considers both warm and cold

bours proved the best results with a mean value of MAE of 3.743

emissions. Warm emissions are emitted while driving and depend

and a standard deviation (SD) of 0.852, which means that the

on driving speed, stop duration, and vehicle characteristics while difference between the predicted and true value is approximately

cold emissions are emitted during the warm-up phase and are

4-time intervals or 2 hours for start hour and 1 hour for the

dependent on the engine’s temperature.

duration.

The second set of KPIs related to the usage of different modes

The plot on Figure 6 depicts the difference between true (blue of transport is produced during the simulation, while bike safety

dots) and predicted (red dots) values for 30% of the data. The

and bikeability are calculated from the simulation results.

x-axis represents the start hour, while the y-axis represents the

4

MACHINE-LEARNING FOR POLICY

duration of the closure. The smallest error (best-predicted in-

stance) between a true and predicted value is marked with black

PROPOSAL

and yellow dot respectively. Table 3 and Table 4 show results for The developed system proposes a ML module that helps decision-direct and chained multi-output regression respectively where

makers in suggesting mobility policies that satisfy a set of pre-

random forest and k-nearest neighbours proved the best results

defined KPIs. As mentioned before, the main source to train the

in both cases.

ML models comes from the microscopic traffic simulator. The

By applying different ML models, we examined which algo-

data used to train the models is shown in Table 1. The KPIs rep-rithm can prove the best results in predicting a mobility policy

resent the features, while the policies, i.e., scenarios are treated that satisfies a set of predefined city changes. On the other hand, as target variables where the start time and duration of closure

when discussing the simulation results in Section 3.2 we con-are discretized into 30 and 15 minutes intervals respectively. By

cluded that closing the Moyua square in the afternoon (from

doing so, the most suitable scenario will be predicted according

4 pm to 7 pm) decreases the 𝐶𝑂

for 1%. Also if the square is

2

to the pre-selected KPIs values.

closed at 11 am, the car usage decreases by 0.12% reducing the

𝐶𝑂

emissions by 0.3%. Both situations can be implemented to

2

Table 1: Features and target variables

achieve a more sustainable city but which one is better depends

on the goals that the city wants to accomplish. If the aim is to

Features

Target variables

decrease the 𝐶𝑂

in the afternoon peak hours, the first situa-

2

CO

warm

CO

cold

start time of closure

2

2

tion is better. The latter situation provides better results if the NOx warm

NOx cold

duration of closure

city is interested in reducing car usage and increasing the usage

PM warm

PM

of public transport. Therefore, selecting the best option when

car trips

bike trips

having multiple criteria that need to be satisfied is hard when

PT trips

walk

only human knowledge is included. Additionally, the number of

bikeability

bike safety

scenarios included in this work is limited as not every possible

situation can be tested due to computational and time costs.

33





Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

M. Shulajkovska, M. Smerkol, M. Gams

Table 2: Inherently Multi-Output Regression Algorithms

6

FUTURE WORK

In order to provide the decision-makers with more options when

Model

MAE (mean)

MAE (SD)

implementing different strategies, in future work different areas

Linear Regression

4.472

2.565

around Moyua square will be closed. Closing multiple streets

K-Nearest Neighbours

3.743

0.852

around the square might help in reducing the air pollution in

Decision Tree Regression

4.367

1.083

the centre and in the city in general. Also, it can contribute to

reducing congestion and other relevant KPIs during peak hours.

Therefore, additional target variables such as the city areas to be closed and the length of the streets inside will be added. Since

more simulation needs to be executed, three servers will be used

to reduce the computational time. By doing this, the dataset will

be expanded and more options will be available to the decision-

makers in implementing the best scenario that meets the people’s

needs. Moreover, a mobility expert validation will be included in

testing the ML module which will contribute to a more detailed

analysis of the ML results.

ACKNOWLEDGMENTS

This work is part of a project that has received funding from

the European Union’s Horizon 2020 research and innovation

programme under grant agreement No. 870338. The authors also

acknowledge the financial support from the Slovenian Research

Figure 6: True vs Predicted Data

Agency (research core funding No. P2-0209).

REFERENCES

Table 3: Direct Multi-Output Regression

[1]

Abdoul-Ahad Choupani and Amir Reza Mamdoohi. 2016. Population syn-

thesis using iterative proportional fitting (ipf ): a review and future research.

Transportation Research Procedia, 17, 223–233. International Conference on Model

MAE (mean)

MAE (SD)

Transportation Planning and Implementation Methodologies for Developing Linear Support Vector Regression

5.684

4.709

Countries (12th TPMDC) Selected Proceedings, IIT Bombay, Mumbai, India, Random Forest Regression

3.590

0.967

10-12 December 2014. doi: https://doi.org/10.1016/j.trpro.2016.11.078.

[2]

Eurostat. 2020. Eu statistics on income and living conditions microdata 2004-Linear Regression

4.472

2.565

2019, release 2 in 2020. en. (2020). doi: 10.2907/EUSILC2004- 2019V.1.

K-Nearest Neighbors Regression

3.743

0.852

[3]

2016. Introducing matsim. The Multi-Agent Transport Simulation MATSim.

Ubiquity Press, (Aug. 2016), 3–8.

Decision Tree Regression

4.333

1.274

[4]

Mario Keller, Stefan Hausberger, Claus Matzer, Philipp Wüthrich, and Benedikt Notter. 2017. Hbefa version 3.3. Background documentation, Berne, 12.

[5]

Kai Nagel, Benjamin Kickhöfer, Andreas Horni, and David Charypar. 2016.

Table 4: Chained Multi-Output Regression

A closer look at scoring. (Aug. 2016). doi: 10.5334/baw.3.

[6]

2016. Emission modeling. The Multi-Agent Transport Simulation MATSim.

Ubiquity Press, (Aug. 2016), 247–252. doi: 10.5334/baw.36.

Model

MAE (mean)

MAE (SD)

[7]

2017. Population Facts No. 2017/4, October 2017: The impact of population Linear Support Vector Regression

5.718

4.364

momentum on future population growth. https://population.un.org/wpp/Publi

cations/Files/PopFacts_2017- 4_Population- Momentum.pdf .

Random Forest Regression

3.607

0.931

Linear Regression

4.472

2.565

K-Nearest Neighbors Regression

3.690

0.829

Decision Tree Regression

4.325

1.118

5

CONCLUSION

In this paper, we presented an approach for proposing mobility

policies in an automatic way. First, an overview of the system

was given. Then the simulation tool was described. 40 variations

of one policy for closing the Moyua square in the centre of Bilbao for transport were simulated and evaluated. The variations refer

to the start hour and duration of the closure. On the simulation

data, the desired KPIs were calculated which together with other

simulation output data were used as input to the ML models.

Since the ML models output multiple variables (start hour and

duration of closure) the problem becomes multi-output regression

and limits the usage of many ML algorithms that are developed

to deal with a single target. Therefore, two approaches were

presented: multi-output and wrapper multi-output regression

algorithms. Applying both sets of algorithms, the best results

proved random forest regression using the chained method from

the second approach.

34





IMF Quality Assurance of Mammograms Using Deep

Convolutional Neural Networks and Transfer Learning

Gašper Slapničar

Peter Us

Erna Alukić

gasper.slapnicar@ijs.si

peter@u- s.si

erna.alukic@zf .uni- lj.si

Jožef Stefan Institute

UpDev d.o.o.

Faculty of Health Sciences

Jožef Stefan International

Kranj, Slovenia

University of Ljubljana

Postgraduate School

Ljubljana, Slovenia

Jamova cesta 39

Ljubljana, Slovenia

Nejc Mekiš

Miha Mlakar

Janez Žibert

nejc.mekis@zf.uni- lj.si

miha@kaiber.si

janez.zibert@zf.uni- lj.si

Faculty of Health Sciences

kAIber d.o.o.

Faculty of Health Sciences

University of Ljubljana

Ljubljana, Slovenia

University of Ljubljana

Ljubljana, Slovenia

Ljubljana, Slovenia

Figure 1: Pipeline of our proposed system. MLO mammograms are taken as input, fed into a chosen CNN architecture, which then outputs a quality grade in an ordinal regression task.

ABSTRACT

1

INTRODUCTION

Quality assurance (QA) of mammograms is of vital importance,

Mammography is the process of using low-energy X-rays to ex-

since they are the de-facto method used by doctors for detec-

amine human breast tissue for diagnosis and screening, with

tion of breast cancer and other tissue abnormalities. Despite this, the typical goal being early detection of breast cancer through

there is a distinct lack of both experts and tools for this task. We detection of anomalies in the tissue [3]. The procedure consists thus investigated a deep-learning-based approach using convolu-of compression of breast tissue using a dedicated mammography

tional neural networks for prediction of the inframammary fold

device with the aim of reducing and evening out the tissue thick-

(IMF) quality grade, which cannot be measured and determined

ness that X-rays must penetrate, in turn reducing the required

quantitatively with rules (e.g., if some point 𝑥 cm from edge,

radiation dose. There are two common views in which a mam-

then grade 𝑦 ). We showed in a 5-fold cross-validation experiment mogram is recorded, namely craniocaudal (CC) and mediolateral

that a relatively simple model can achieve respectable perfor-

oblique (MLO). The former captures the breast tissue in a top-

mance in terms of root mean squared error (RMSE), area under

down direction along the pectoral muscle plane, while the latter

the ROC curve (AUC) and accuracy, predicting the IMF grade

captures it at an angle sideways.

with 3 possible values. Finally we also showed that the model in

The importance of regular mammographic screening can not

fact derives features from the relevant ROI also looked at by the

be overstated, as it the de-facto method used by doctors for early experts, hinting at real-world usefulness of such a QA model.

breast cancer or other tissue-anomaly detection (e.g., tumors).

However, the procedure itself is rather involved and can be te-

KEYWORDS

dious for the patient. Subsequently, it is of utmost importance to ensure high quality of taken images, as it is highly undesirable for mammography, quality assurance, ordinal regression, neural net-a patient to have to revisit and repeat the procedure. To this end works, deep learning

there are a number of guidelines available that are being followed by the radiologists with the aim of minimizing the amount of

Permission to make digital or hard copies of part or all of this work for personal low-quality images taken. Some quality metrics can be quantified

or classroom use is granted without fee provided that copies are not made or and measured precisely, while others are more subtle and often

distributed for profit or commercial advantage and that copies bear this notice and left to the expertise of the professionals. One such elusive metric the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner /author(s).

is the Inframammary Fold (IMF). IMF is the inferior border of the

Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

breast and the crease between the breast and abdominal tissue.

© 2022 Copyright held by the owner/author(s).

35





Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

Slapničar and Us, et al.

It serves as an important anatomical landmark on an MLO mam-

3

DATA DESCRIPTION AND

mogram to provide assurance to the radiographer that all of the

PREPROCESSING

posterior breast tissue has been included.

The dataset used in our work was collected in the previously

In Slovenia, only a single radiographer is responsible for weekly

mentioned DORA program between 2011 and 2013 and graded by

grading of randomly sampled mammographic segments from that

the single expert using a modified PGMI (Perfect, Good, Moderate,

week, as a part of the DORA oncology program. This is inefficient

Inadequate) grading system, where there were just three grades

and the grade itself can be subjective, especially the metrics that (Good, Moderate, Inadequate/Poor). All the mammograms were

are not clearly defined, such as IMF. There is thus a need for auto-anonymized and held by the Faculty of Health Sciences. In total

mated tools that would help radiologists with quality assurance

there were 4928 mammograms, 2424 in the CC view and 2424 in

(QA), optimizing the process while also potentially serving as a

the MLO view, each view in turn having a left (L) and right (R)

training tool for improvements in quality of mammograms being

image, each with a corresponding ground truth label. Example

collected.

images can be seen as input in Figure 1.

In this paper we highlight the importance and lack of QA

Initially these images were saved in the widely-used Digi-

methods for mammograms, and develop two deep convolutional-

tal Imaging and Communications in Medicine (DICOM) format,

neural-network (CNN) computer vision models, aiming to recog-

which contains the image itself with a lot of corresponding meta-

nize and successfully predict the IMF quality metric. The latter is data. These initial images are grayscale and come in varying

known to be especially tricky and subjective. We evaluate and

resolutions, which is problematic for CNN inputs, which are ex-

compare the performance of our proposed models on a custom

pected to be of a constant shape. Subsequently we first extracted

dataset collected in Slovenia.

the images and standardized the resolution to 256 x 256 pixels, reThe rest of this paper is organized as follows: we first high-

saving them in lossless .png format. Some information loss can

light related work about QA in mammography, together with

not be avoided however, since we are substantially decreasing

existing computer vision methods that are important for QA in

resolution in this step.

Section 2; in Section 3 we describe our dataset; Section 4 details Each image is by default also equipped with letters in the top-our methodology and experimental results; and in Section 5 we right or top-left corner of the image itself, denoting the view and summarize and discuss the implications with future directions

side (e.g., CC-L or MLO-R). Since our computer vision models

for our work.

might use this information to learn some data-specific pattern

2

RELATED WORK

between the view and quality, we semi-manually removed these

letters by zeroing pixels in an empirically determined region,

To ensure the key goals of mammography are achieved, qual-

leaving only the breast tissue in each image.

ity assurance must be adopted in order for the mammograms

Finally, some of the images have important keypoints or ar-

to be suitable for diagnosis. In the past decades, several stan-

eas that are very difficult to see, especially with the naked eye.

dards have been developed nationally and internationally to this

We thus investigated Adaptive Histogram Equalization methods,

end. A review study by Reis et al. [3] presents an overview of more specifically Contrast-Limited AHE (CLAHE), which was

these, showing importance of both technical and clinical aspects,

reported in related work to help substantially with visibility of

especially with the development of digital mammography.

important regions in mammograms [2]. The effects can be seen In terms of QA, there are specific keypoints or region segmen-in Figure 2.

tations that are often required for individual grades. One such

is the segmentation of pectoral muscle in the MLO view, which

was traditionally segmented using pixel thresholding and region

growing algorithms. Recently however, with the rise of deep

learning, this task was successfully resolved. Soleimani et al. [6]

proposed a two-stage algorithm that predicts precise pectoral

muscle boundary using a CNN. Evaluating on three datasets they

achieved average values of dice similarity and accuracy of 97%

and 99% respectively.

Other researchers focused on the final task directly - predic-

tion of anomalies related to cancer. For instance Abel et al. [1]

proposed a CNN architecture for detection of abnormal axillary

lymph nodes, which is a specific abnormality in the tissue. They

reported accuracies of 96% for detection of suspicious lymph

nodes. Shen et al. [5] have importantly shown that such end-to-end networks are not only performing well, but can be trans-

ferred between different datasets, for instance CBIS-DDSM and

INBreast, hinting at good generalization capabilities.

Despite all existing work, we noticed a distinct lack of research

dealing with machine learning QA prediction, meaning models

that output grades of mammograms rather than try to predict

some other subsequent outcome. The quality itself is however

vital both for physicians or other models taking mammograms as

Figure 2: Effects of CLAHE image preprocessing on mam-

inputs, as good quality mammograms are useful for detections,

mograms in both views. Top row are originals, bottom rows

while poor quality is not only difficult for diagnosis, but requires a are preprocessed.

repeat measurement which is “expensive” for everyone involved.

36





IMF Quality Assurance of Mammograms Using Deep CNNs and Transfer Learning Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

Table 1: Numeric results from the 5-fold CV using our

For our class label we used the one given by the expert for

simple model and transfer learning with VGG19.

the IMF. It is a numeric value from 1 to 3, where 1 = good, 2

= moderate and 3 = poor. It is problematic since it is known

to be subjective, but still the best we could obtain. It is also

Simple model

quite difficult to visualize clearly for a non-expert, but we show Fold Nr.

RMSE

AUC 1v3

AUC 1v2

AUC 2v3

Accuracy

the relevant region of interest (ROI) looked at by the experts

in Figure 1. Ideally the ground truth grade would be a voted 1

0.26

0.90

0.79

0.78

0.93

value obtained by several experts, but that was not feasible. Since 2

0.25

0.98

0.80

0.92

0.95

IMF is only evaluated on the MLO view, we used just those

3

0.24

0.98

0.79

0.95

0.91

mammograms in further analysis.

4

0.23

0.97

0.78

0.89

0.96

5

0.24

0.94

0.78

0.84

0.95

4

METHODOLOGY AND RESULTS

Avg.

0.24

0.95

0.79

0.88

0.94

The learning problem itself seems like a typical classification,

Transfer VGG19 model

however, after giving it some thought, we realized it is better to 1

0.29

0.78

0.73

0.60

0.79

set it up as an ordinal regression problem, as we are predicting

2

0.27

0.89

0.73

0.77

0.75

a range of grades. We thus mapped our class label values from

3

0.28

0.85

0.70

0.71

0.66

discrete 1, 2 and 3 to the [0.0, 1.0] interval, where grade 1 = 0.0, 4

0.28

0.94

0.65

0.92

0.70

grade 2 = 0.5 and grade 3 = 1.0. We thus obtain a numeric value

5

0.30

0.85

0.69

0.73

0.77

from the network which gives us not only the class information

Avg.

0.28

0.86

0.70

0.75

0.73

but also the distance between prediction and ground truth (e.g.,

prediction of 0.96 is much better compared to prediction of 0.76,

given ground truth 1.0).

separation between poor images and everything else, since those

Once our data was finalized in terms of inputs and outputs,

are the most problematic, while moderate images could be close

we focused our attention towards a model. Related work dealing

to either good or poor. Numeric results are given in Table 1

with visual tasks in general, as well as with mammograms specif-

and the ROC curves for all three cases (of the better performing

ically, shows convincing dominance of CNNs in the past decade.

model) are given in Figures 3, 4 and 5.

Subsequently we decided to investigate such architectures.

Our aim was to start with a simple architecture consisting of

three 2D Conv layers with 16, 16 and 8 kernels (each of size 9 x 9), intermediate max pooling layers with kernel size 2 and stride 2,

and one fully connected layer with 1000 neurons on top. We used

batch normalization and dropout as commonly-used mechanisms

to prevent overfitting. ReLU was used as the activation function

and the network was trained for 100 epochs.

We then wanted to compare such a simple architecture with a

more complex CNN. We decided to attempt a transfer learning

approach, where we based it on the known VGG19 model trained

on ImageNet. We replaced the final layer with two fully connected

layers to instead predict our IMF grade, while keeping the bulk

of the model intact with existing weights. Hyperparameters were

mostly left at default values, except for learning rate which had

a linear decay implemented in our simple architecture.

Initially the full data was split into training (80%), validation

(10%) and test sets (10%), as traditional. However, using a random train-validation-test split can be volatile and is undesirable in

terms of making any conclusions about robustness of the model.

We thus instead used the 5-fold cross validation (CV) evaluation

setup, where k-1 groups are taken for training and the remaining

group is left for testing in each of the 5 iterations. Additionaly it was ensured that the distribution of labels is kept in each group, Figure 3: ROC curve of the better performing (simple)

meaning we did a stratified split. This was important in our

model for 1v2 class combination.

experiment since the class label distribution is relatively skewed (good / 1 / 0.0 = 65%, moderate / 2 / 0.5 = 30%, poor / 3 / 1.0 = 5%) and we are especially interested in the bad examples, which are

5

DISCUSSION AND CONCLUSION

in vast minority.

For evaluation metrics we used root mean squared error (RMSE)

Looking at the results, we can initially observe the overall better and area under the curve (AUC of ROC curve), while also looking

performance of our simple model compared to the pre-trained

at the classificaion accuracy (transforming from ordinal regres-

VGG19 transferred to our domain. All the metrics are more stable

sion back to classification) as it is the most intuitive metric. In across fold while also achieving better overall values. Since we

terms of AUC, we always compared all possible pairs of grades,

were especially interested in the separation of the good and poor

meaning 1v2, 1v3 and 2v3. The most important is to have good

mammograms (grade 1 vs. grade 3), we can also see that this

37





Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

Slapničar and Us, et al.

where we can see the model commonly focused on the relevant

IMF ROI that is also focused by the experts. However, this focus

was not exclusive, meaning it did not focus just that region and

also it wasn’t the same on all images, but still rather consistent, which is a good indicator that the model actually learned the

relevant features for IMF QA.

Figure 6: Grad-CAM heatmap showing the areas on the

Figure 4: ROC curve of the better performing (simple)

mammogram that were focused by the model to derive

model for 1v3 class combination.

features.

To summarize, we investigated the possibility of CNN-based

IMF QA for mammograms, looking at a simple CNN model and

a transferred slightly-modified VGG19 model. The simple CNN

architecture achieved respectable results in terms of several met-

rics and importantly also focused on the correct part of the image without any guidance, hinting that it learned the relevant features also looked at by the experts. Extensions to prediction of

other QA grades might allow for a system that could help with

continuous QA, as well as expert training.

ACKNOWLEDGMENTS

We would like to acknowledge the DORA program for provid-

ing us with the dataset. Additionally we would like to thank

colleagues from the Faculty of Health Sciences for helping with

understanding and interpretation of data. Finally a special thanks goes to kAIber d.o.o. and UpDev d.o.o. for participating in the

data analysis and modelling.

REFERENCES

[1]

Frederik Abel, Anna Landsmann, Patryk Hejduk, Carlotta Ruppert, Karol Borkowski, Alexander Ciritsis, Cristina Rossi, and Andreas Boss. 2022. Detecting abnormal axillary lymph nodes on mammograms using a deep convolutional neural network. Diagnostics, 12, 6, 1347.

[2]

Etta D Pisano, Shuquan Zong, Bradley M Hemminger, Marla DeLuca, R

Figure 5: ROC curve of the better performing (simple)

Eugene Johnston, Keith Muller, M Patricia Braeuning, and Stephen M Pizer.

model for 2v3 class combination.

1998. Contrast limited adaptive histogram equalization image processing to improve the detection of simulated spiculations in dense mammograms.

Journal of Digital imaging, 11, 4, 193–200.

[3]

Cláudia Reis, Ana Pascoal, Taxiarchis Sakellaris, and Manthos Koutalonis.

separation is indeed consistently successful and seems relatively

2013. Quality assurance and quality control in mammography: a review of robust. As expected the separation of moderate (grade 2) mam-available guidance worldwide. Insights into imaging, 4, 5, 539–553.

[4]

Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna mograms from others is more challenging, but still can be done

Vedantam, Devi Parikh, and Dhruv Batra. 2019. Grad-CAM: visual expla-reasonably well.

nations from deep networks via gradient-based localization. International Deep learning models are often criticized for being black-Journal of Computer Vision, 128, 2, (Oct. 2019), 336–359. doi: 10.1007/s11263-

019- 01228- 7.

box and not offering explainability for their decisions. We also

[5]

Li Shen, Laurie R Margolies, Joseph H Rothstein, Eugene Fluder, Russell wanted to do a quick investigation of this by using the Grad-McBride, and Weiva Sieh. 2019. Deep learning to improve breast cancer CAM approach [4], which is a popular technique for producing detection on screening mammography. Scientific reports, 9, 1, 1–12.

[6]

Hossein Soleimani and Oleg V. Michailovich. 2020. On segmentation of pec-

“visual explanations” for decisions from a large class of CNN-

toral muscle in digital mammograms by means of deep learning. IEEE Access, based models in the form of a heatmap showing where the model

8, 204173–204182. doi: 10.1109/ACCESS.2020.3036662.

focused the most on an image. An example is shown in Figure 6,

38





Vehicle Axle Distance Detection From Time-series Signals Using Machine Learning

David Susič

Blaž Erzar

Nika Čelan

david.susic@ijs.si

blaz.erzar@gmail.com

nika.celan8@gmail.com

Jožef Stefan Institute

Jožef Stefan Institute

Jožef Stefan Institute

Jamova cesta 39

Jamova cesta 39

Jamova cesta 39

Ljubljana, Slovenia

Ljubljana, Slovenia

Ljubljana, Slovenia

Gašper Leskovec

Žiga Kolar

Martin Konečnik

leskovecg@gmail.com

ziga.kolar@ijs.si

martin.konecnik@cestel.si

Jožef Stefan Institute

Jožef Stefan Institute

Cestel cestni inženiring d.o.o

Jamova cesta 39

Jamova cesta 39

Špruha 32

Ljubljana, Slovenia

Ljubljana, Slovenia

Trzin, Slovenia

Domen Prestor

Matjaž Skobir

Matjaž Gams

domen.prestor@cestel.si

matjaz.skobir@cestel.si

matjaz.gams@ijs.si

Cestel cestni inženiring d.o.o

Cestel cestni inženiring d.o.o

Jožef Stefan Institute

Špruha 32

Špruha 32

Jamova cesta 39

Trzin, Slovenia

Trzin, Slovenia

Ljubljana, Slovenia

ABSTRACT

The authors of Marszalek et al. [5] measured vehicle axle distances based on multifrequency impedance measurement of a

In this study, we compare a signal decomposition and a con-

slim inductive loop sensor. Using test vehicles, they were able

volutional autoencoder approach in determining vehicle axle

to confirm that their method can successfully determine the dis-

distances. Our dataset consists of 62076 instances of vehicles

tances. In the work by Chatterjee et al. [1] they used data from crossing a bridge. Each vehicle is detected by eight identical sen-sensors on the bridge and a wavelet-based analysis to determine

sors placed under the bridge that record time series vibration

the axle distances. In the work of Khalili et al. [3], piezoelectric data. We compare our results to those computed using an ex-elements were used for a system to detect the weight of vehicles

pert’s model, which we consider to be the ground truth. The

in motion. They used the weight-in-motion system to determine

signal decomposition approach achieves accuracies of up to 0.89,

both the axle distances and vehicle weights with sufficient ac-

0.98, and 1, with the calculated distances matching the expert

curacy. Rujin et al. [4] developed a deep learning system for model within 2%, 5%, and 10%, respectively. The convolutional

vehicle recognition based on strain sensor data. They were able

autoencoder, on the other hand, achieves accuracies of up to 0.97, to classify 11 different vehicle types with a very high average

0.99, and 1 with the same error margins compared to the expert

precision.

model.

In this work, we use data collected from a single bridge to

KEYWORDS

test two machine learning approaches for vehicle axle distance

detection. The first approach is based on signal decomposition

vehicle detection, axle distance, neural network, peak detection,

and the second approach is based on the convolutiona neural

machine learning

network autoencoder.

We begin in section 2 with a description of the dataset used 1

INTRODUCTION

in this study. In section 3, we explain our approaches and illus-trate them with examples. Results are presented and discussed

In order to accurately weigh the vehicles crossing the bridge

in section 4. The paper concludes with section 5.

and determine if they weigh too much and damage the road, the

vehicle speed, the number of vehicle axles, and their in-between

distances must first be determined.

2

DATASET

In recent years, many types of sensors have been used for

Our dataset consists of sensor data from vehicles crossing the

vehicle detection. These include acoustic sensors, inductive loop

bridge. The sensors are placed under the bridge in the configura-

sensors, strain sensors, magnetic sensors, and imaging sensors.

tion shown in the Figure 1. The sensors are identical and record Researchers around the world have developed various methods

the vibrations of the crossing vehicles at a sampling rate of 512𝐻 𝑧 .

for using sensor data for vehicle axle detection, weight detection, In this study, only data from vehicles travelling in lane 1 were

and classification.

used (orange sensors 1–8 in Figure 1). The vibration data from the sensors in the first and last columns (1 and 2) are also used

Permission to make digital or hard copies of part or all of this work for personal to calculate the vehicle speed. The vehicle speed is calculated by or classroom use is granted without fee provided that copies are not made or superimposing the signals using a cross-correlation method. Our

distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this dataset consists of 62076 instances, where each instance contains

work must be honored. For all other uses, contact the owner /author(s).

data for one vehicle. In addition, each instance also contains the Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

axle distances calculated by the expert model and the times at

© 2021 Copyright held by the owner/author(s).

which each vehicle axle crossed the signal.

39





Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

David Susič, et al.

We compare our results with those of the expert model, which

has a measured accuracy of 98% in practise. Our results are con-

sidered correct if all calculated axle distances match those of the expert model within the specified margin of error. In addition,

we have the option to skip the predictions of the results whose

confidence level is below a certain threshold.

3.1

Signal Decomposition

The first step of a signal decomposition approach was to de-

termine the most appropriate signal for each instance. For this

purpose, peak detection was performed for all eight signals, cal-

culating the first and second derivatives. A 62076 × 8 matrix was

then created, with the eight columns indicating the number of

detected peaks from each of the signals. A gradient boosting

regression model was then trained on this matrix, the output of

which was an array with the correct number of peaks. In addition,

eight gradient boosting classifiers were trained, one for each of

the signals. The output of each classifier was an array giving the Figure 1: Placement of the sensors on the bridge.

probabilities that the number of detected peaks was correct for

that signal.

An example of sensor data for a vehicle with 5 axles is shown

For each test instance, the regression model first predicted the

in Figure 2. The signal peaks correspond to the vehicle axles cross-axis number, rounding the result to the nearest integer. All eight ing the sensor, while the amplitude corresponds to the weight

signals were then run, starting with the signal from sensor one.

of the vehicle on that axle. Although the peaks in the signal

For each signal, a check was made to see if the number of peaks

correspond to the vehicle axles, the intervening spacing of the

determined matched the number predicted by the regression

peaks generally does not represent the actual axle spacing. Due

model. If it did, we checked whether the probability that the

to interference between the signals from the individual axles,

number of detected peaks, as predicted by the classifier for that

the peaks shift relative to the position of the peaks if the signals signal, was above a confidence threshold. Iteration was stopped

from the individual axles were isolated. A small effect of this

if both criteria were met, and the signal was selected as the most can be seen in the peak triplet in the right part of the signal in appropriate for that instance. This means that in most cases not

Figure 2. In some cases, where two of the adjacent vehicle axes all signals were checked. Although in principle there could be a

are very close to each other, the peaks may overlap and become

signal that would be even more suitable, we found experimentally

indistinguishable from each other, resulting in a single peak.

that in more than 90% of cases signal 1 was the best, followed

by signals 2, 5, and 6, in that order. If none of the eight signals met the criteria, the instance was skipped. In our experiments,

we used confidence values between 0 and 0.997.

After the best signal for each instance was determined, the

signals were decomposed into what are called base waves. A

base wave is a function designed to have the form of an isolated

wave, and can be constructed with three parameters: x-location,

scaling in the x-direction, and scaling in the y-direction. The

signal decomposition can be defined as an optimization problem

where we want to find the best parameters for the base waves.

The objective function we want to minimize is the mean square

error between the original signal and the sum of the base waves.

Once the signal was optimally decomposed, the peaks of the base

waves were used to calculate the axle distances. The base wave

peaks now correspond to the actual axle times and represent

isolated waves, thus their peaks are not shifted by interference.

Figure 2: Example sensor data for a vehicle with 5 axles.

This can also be seen in Figure 3: the green and black vertical The green markings correspond to the crossing points of

lines do not exactly coincide, which is most obvious in the triplet the vehicle axles as calculated by the expert model.

on the right.

Peak detection and models’ training was performed using

Python 3.7 and libraries Scikit 0.24.2 [7] and Numpy 1.18.5 [2].

3

METHODOLOGY

3.2

Convolutional Autoencoder

The objective of this study is to evaluate the performance of

signal decomposition and convolutional autoencoder approaches

The second approach is a convolutional neural network as an

in computing vehicle axle distances from sensor data. In our two

autoencoder. The schematic of the model is shown in Figure 5.

approaches, the signal timing of the axles is first determined

The first layer is the input layer. It consists of 4000 nodes, since and then their intermediate distances are computed based on the

this was the number of samples of the longest signal, and takes

vehicle speed.

the raw signal as input. The signals with length less than 4000

40





Vehicle Axle Distance Detection

Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

at least T2, the axle distances were calculated, otherwise the in-

stance was skipped. In our experiments, T1 was fixed at 0.01,

while we tried values between 0 and 0.7 for T2.

Convolutional autoencoding was performed using Python 3.7

and library Tensorflow 2.9.1 [6].

4

RESULTS

Both approaches were tested with 5-fold cross-validation, and the

folds were the same in both experiments. The results are shown

in tables 1 and 2. They are given for a percentage of skipped instances between 0 and 50 %. The "±𝑥 " values in the brackets of the accuracy columns represent percentages within which the

calculated axle distances must match those given by the expert

models for the prediction to be considered correct. In the Table1,

the confidence column corresponds to the confidence threshold

Figure 3: Example of signal decomposition. The black ver-

of the prediction model for the number of peaks, while the T2

tical lines represent the peaks as detected during peak

column in Table 2 corresponds to the minimum peak probability detection, while the green lines represent the peaks after

of the peak with the highest probability. If these criteria are

decomposition.

not met, an instance is skipped. The corresponding amounts of

skipped instances are given in the skipped columns.

We see that the signal decomposition approach achieves ac-

were padded with zeros. For all instances, signal 1 was chosen as

curacies up to 0.89, 0.98, and 1, with the calculated distances

the input signal, since it worked best in most cases. The encoding matching the expert model within 2%, 5%, and 10%, respectively.

part of the autoencoder consists of three convolutional layers

The convolutional autoencoder, on the other hand, achieves ac-

with sizes 5, 2 and 3 and the number of filters 8, 16 and 32. Each curacies of up to 0.97, 0.99, and 1 compared to the expert model,

convolutional layer is followed by a batch normalization and a

with the same error margins.

max-pooling of size 2. The decoder has the opposite structure

It can also be seen that for both approaches, the accuracies

compared to the encoder and the max-pooling layers are replaced

start to converge when about 15% of the instances are skipped,

by up-sampling layers. The output layer has the same size as the

and do not improve significantly even when the percentage of

input layer. The loss function used for model training was a

skipping is 50. The convolutional autoencoder generally has

binary cross entropy.

higher accuracy than the signal decomposition approach, except

The output for each training instance was a binary array,

in cases where the margin of error is 10 %, in which case the

with ones at the sample locations containing a peak and zeros

performances of both approaches are similar.

everywhere else. Thus, for the unseen (test) instances, the model

outputs the probabilities for each of the input signal samples to

5

CONCLUSION

include a peak. An example output for a test instance is shown

In this work, we tested a signal decomposition and convolutional

in Figure 4. It can be seen that the probabilities for the peaks are autoencoder approach for vehicle axle distances detection using

almost always less than one, but the number of probabilities that

data from eight sensors mounted under the bride. We used a

are not zero (or not very close to zero) is equal to the number of dataset of 62076 vehicles travelling in the same lane. We compared actual peaks. It can also be seen that the model has learned to

our results to those computed by an expert’s model, which we

shift the peaks where necessary (triplet on the right).

considered to be the ground truth. Using the signal decomposition

approach, we achieved accuracies of up to 0.89, 0.98, and 1 for

the cases where each vehicle axle distance matched the expert

model within 2%, 5%, and 10%, respectively. For the convolutional

autoencoder, the accuracies obtained were 0.97, 0.99, and 1 for the same error margins compared to the expert model. The models

will be improved in future work to include detection of vehicle

axle weights.

ACKNOWLEDGMENTS

This study received funding from company Cestel. The authors

acknowledge the funding from the Slovenian Research Agency

(ARRS), Grant (PR-10495) and Basic core funding P2-0209.

REFERENCES

[1]

Pranesh Chatterjee, Eugene OBrien, Yingyan Li, and Arturo González. 2006.

Wavelet domain analysis for identification of vehicle axles from bridge mea-Figure 4: Example output of convolutional autoencoder.

surements. Computers & Structures, 84, 28, 1792–1801. doi: https://doi.org/10

.1016/j.compstruc.2006.04.013.

[2]

Charles R. Harris, Jarrod K. Millman, Stefan J. van der Walt, Ralf Gom-After decoding, we selected lower and upper probability thresh-

mers, Pauli Virtanen, and David Caurnapeau. 2020. Array programming with olds, T1 and T2 (red and blue dashed lines in Figure 4). If all peaks numpy. Nature, 585, 357–362. doi: https://doi.org/10.1038/s41586- 020- 2649- 2.

had a probability of at least T1 and the highest probability was

41



Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

David Susič, et al.

Figure 5: Scheme of the convolutional autoencoder structure.

Table 1: Results for the signal decomposition approach.

Confidence

Skipped

Accuracy (± 2%)

Accuracy (± 5%)

Accuracy (± 10%)

0.0

0.01

0.79

0.88

0.91

0.75

0.05

0.82

0.92

0.95

0.865

0.1

0.84

0.95

0.97

0.915

0.15

0.86

0.96

0.98

0.95

0.2

0.87

0.96

0.99

0.974

0.25

0.88

0.98

0.99

0.977

0.3

0.88

0.98

0.99

0.982

0.35

0.88

0.98

0.99

0.988

0.4

0.88

0.98

0.99

0.996

0.46

0.89

0.98

1.0

0.997

0.5

0.89

0.98

1.0

Table 2: Results for the signal convolutional autoencoder approach.

T2

Skipped

Accuracy (± 2%)

Accuracy (± 5%)

Accuracy (± 10%)

0.0

0.0

0.85

0.88

0.89

0.021

0.06

0.91

0.94

0.95

0.12

0.1

0.94

0.97

0.98

0.431

0.15

0.95

0.98

0.99

0.488

0.2

0.96

0.99

0.99

0.53

0.25

0.96

0.99

0.99

0.566

0.3

0.96

0.99

0.99

0.601

0.35

0.96

0.99

0.99

0.636

0.4

0.97

0.99

1.0

0.665

0.45

0.97

0.99

1.0

0.7

0.5

0.97

0.99

1.0

[3]

Mohamadreza Khalili, Gopal Vishwakarma, Sara Ahmed, and Athanassios multi-frequency impedance measurement of a slim inductive-loop sensor.

Thomas Papagiannakis. 2022. Development of a low-power weigh-in-motion Measurement, 169, 108525. doi: https://doi.org/10.1016/j.measurement.2020.1

system using cylindrical piezoelectric elements. International Journal of Trans-

08525.

portation Science and Technology, 11, 3, 496–508. doi: https://doi.org/10.1016

[6]

Martín Abadi et al. 2015. TensorFlow: large-scale machine learning on het-

/j.ijtst.2021.06.004.

erogeneous systems. Software available from tensorflow.org. (2015). https:

[4]

Rujin Ma, Zhen Zhang, Yiqing Dong, and Yue Pan. 2020. Deep learning based

//www.tensorf low.org/.

vehicle detection and classification methodology using strain sensors under

[7]

F. Pedregosa et al. 2011. Scikit-learn: machine learning in Python. Journal of bridge deck. Sensors, 20, 18. https://www.mdpi.com/1424- 8220/20/18/5051.

Machine Learning Research, 12, 2825–2830. http://www.jmlr.org/papers/volu

[5]

Zbigniew Marszalek, Waclaw Gawedzki, and Krzysztof Duda. 2021. A re-

me12/pedregosa11a/pedregosa11a.pdf .

liable moving vehicle axle-to-axle distance measurement system based on 42





Študija učinkovitosti algoritma za razporejanje terenskega dela A Study of the Performance of a Fieldwork Scheduling Algorithm

Tea Tušar

Nace Sever

Aljoša Vodopija

Institut “Jožef Stefan” in

Univerza v Ljubljani

Bogdan Filipič

Mednarodna podiplomska šola

Fakulteta za računalništvo in

Institut “Jožef Stefan” in

Jožefa Stefana

informatiko

Mednarodna podiplomska šola

Jamova cesta 39

Večna pot 113

Jožefa Stefana

Ljubljana, Slovenija

Ljubljana, Slovenija

Jamova cesta 39

tea.tusar@ijs.si

nace.sever@gmail.com

Ljubljana, Slovenija

aljosa.vodopija@ijs.si

bogdan.f ilipic@ijs.si

POVZETEK

KEYWORDS

Razporejanje terenskega dela je zahteven optimizacijski problem.

scheduling problem, evolutionary algoirthm, heuristic algorithm,

Za njegovo reševanje smo razvili trinivojski algoritem. Na prvem

branch-and-bound algorithm, mixed-integer linear programming,

nivoju evolucijski algoritem razporedi naloge po delavcih, na

efficiency

drugem nivoju hevristika za vsakega delavca razporedi naloge po

dnevih, na tretjem nivoju pa algoritem razveji in omeji rešuje pro-1

UVOD

blem mešanega celoštevilskega linearnega programiranja (angl.

Razporejanje terenskega dela je optimizacijski problem, ki zah-

Mixed-Integer Linear Programming, MILP), kjer nalogam za vsa-

teva dodelitev delavca in časa začetka opravljanja vsaki terenski

kega delavca in vsak dan posebej dodeli čas njihovega začetka.

nalogi tako, da je zadoščeno vsem omejitvam razporejanja in

V tem prispevku se posvečamo študiji učinkovitosti algoritma

je cena celotnega urnika čim nižja. Obstajajo številne različice

na tretjem nivoju. Izkaže se, da ta nalog ne more razporediti

tega problema, ki se razlikujejo tako po omejitvah kot po na-

dovolj hitro za praktično uporabo, zato za povečanje njegove

činu izračuna cene razporeda. Posledično obstajajo tudi različni

učinkovitosti MILP poenostavimo. Rezultati poskusov kažejo, da

pristopi za njegovo reševanje [6]. Študija [3] primerja dve formu-poenostavitev izboljša učinkovitost algoritma na tretjem nivoju,

laciji problema, in sicer v obliki problema usmerjanja vozil (angl.

medtem ko je učinek na celoten algoritem načeloma ugoden, a

Vehicle Routing Problem, VRP) in v obliki problema mešanega

odvisen od problema.

celoštevilskega linearnega programiranja (angl. Mixed-Integer

Linear Programming, MILP). Rezultati poskusov študije nakazu-

KLJUČNE BESEDE

jejo, da je oblika MILP za zapis razporejanja nalog terenskega

problem razporejanja, evolucijski algoritem, hevristika, algoritem dela ustreznejša od oblike VRP, zato tudi naš pristop uporablja

razveji in omeji, mešano celoštevilsko linearno programiranje,

obliko MILP.

učinkovitost

Vendar pa je učinkovitost reševanja takšnih kombinatoričnih

problemov zelo odvisna od njihove velikosti. Že pri relativno

ABSTRACT

majhnih problemih se namreč pogosto zgodi, da jih ni moč re-

šiti v doglednem času. Zato se v našem pristopu zgledujemo po

Fieldwork scheduling is a demanding optimization problem. To

podobnih prijemih iz sorodnega dela (glej npr. [1]) in problem solve it, we developed a three-level algorithm. At the first level, an razdelimo na manjše, lažje obvladljive podprobleme. Problem

evolutionary algorithm distributes tasks to workers, at the second razporejanja terenskega dela tako rešujemo s trinivojskim opti-level, a heuristic algorithm distributes tasks of each worker over mizacijskim algoritmom, pri katerem na prvem nivoju evolucijski

days, and at the third level, a branch-and-bound algorithm solves

algoritem razporedi naloge po delavcih, na drugem nivoju hevri-

the problem in the form of mixed-integer linear programming

stika za vsakega delavca razporedi naloge po dnevih, na tretjem

(MILP), where the starting times of tasks need to be scheduled for nivoju pa algoritem razveji in omeji rešuje problem v obliki MILP, each worker and each day separately. In this paper, we study the

tj. nalogam za vsakega delavca in vsak dan posebej dodeli čas

efficiency of the algorithm at the third level. Because it cannot

njihovega začetka.

schedule the tasks fast enough for practical use, we try to increase Tak trinivojski algoritem je sposoben v uri zadovoljivo rešiti

its efficiency by simplifying the MILP. Experimental results show

tudi nekoliko večje probleme (npr. z 20 delavci, 20 dnevi in več sto that the simplification improves the performance of the algorithm

nalogami), a je ta čas za praktično uporabo predolg. Zato želimo

at the third level, while the effect on the overall algorithm is in algoritem pohitriti. Ozko grlo predstavlja reševanje problema

principle favorable, but depends on the problem.

MILP, saj sta evolucijski algoritem in hevristika zelo hitra, tako da lahko največjo pohitritev celotnega algoritma dosežemo s

pohitritvijo na tretjem nivoju.

Permission to make digital or hard copies of part or all of this work for personal V nadaljevanju prispevka v 2. razdelku najprej predstavimo or classroom use is granted without fee provided that copies are not made or našo različico problema razporejanja terenskega dela, nato pa

distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this v 3. razdelku na kratko opišemo trinivojski algoritem za njeno work must be honored. For all other uses, contact the owner /author(s).

reševanje. V 4. razdelku analiziramo učinkovitost algoritma na Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

tretjem nivoju, v 5. razdelku pa predlagamo poenostavitev pro-

© 2022 Copyright held by the owner/author(s).

blema MILP in preverimo njen učinek najprej na algoritem na

43





Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Tea Tušar, Nace Sever, Aljoša Vodopija, and Bogdan Filipič

tretjemu nivoju in končno na celoten trinivojski algoritem. Pri-

• Dnevno aktivnih delavcev naj bo čim manj.

spevek sklenemo z zaključki v 6. razdelku.

• Aktivni delavci naj bodo čim bolj enakomerno obreme-

njeni.

2

PROBLEM RAZPOREJANJA TERENSKEGA

• Skupno trajanje potovanj med lokacijami naj bo čim krajše.

DELA

• Izvede naj se čim več nalog.

• Naloge naj se izvedejo čim prej.

Problem razporejanja terenskega dela opišemo s scenarijem raz-

• Delavci naj imajo čim manj neaktivnega časa.

porejanja, spremenljivkami problema, omejitvami in optimiza-

• Nadur naj bo čim manj.

cijskim kriterijem (podrobne formalne definicije tu ne moremo

• Naloge z višjo prioriteto naj se začnejo izvajati pred nalo-

zapisati zaradi pomanjkanja prostora). Obravnavamo najbolj splo-

gami z nižjo prioriteto.

šno različico problema, v kateri želimo razporediti večino nalog,

• Naloge, ki zapadejo prej, naj se začnejo izvajati pred nalo-

saj ta pokriva tudi posebni primer, ko je zaradi spremembe v

gami, ki zapadejo kasneje.

zadnjem trenutku treba prerazporediti samo nekaj nalog.

• Naloge naj se izvedejo čim bliže želenemu časovnemu

2.1

Scenarij razporejanja

oknu.

• Nalogo, ki ima želene delavce, naj opravi eden izmed žele-

Časovno obdobje razporejanja je razdeljeno na dneve, znotraj njih

nih delavcev.

je čas obravnavan zvezno. Za vsak dan poznamo začetek in konec

• Nalogo z želenim razporedom naj se izvede čim bliže temu

rednega delovnika ter trajanje morebitnih nadur (bodisi na za-

razporedu.

četku bodisi na koncu dneva). Dane imamo tudi množico lokacij,

Uteži posameznih delnih kriterijev so zelo pomembne, saj

časovne oddaljenosti za vsak par lokacij ter množico kompetenc,

določajo njihova medsebojna razmerja in drastično vplivajo na

ki so skupne nalogam in delavcem. Scenarij razporejanja vse-

dobljene rešitve. Nastavili smo jih s pomočjo ekspertnega znanja

buje tudi podatke o delavcih, in sicer za vsakega kompetence,

in poskusov na številnih različnih scenarijih.

dovoljeno število nadur ter začetno in končno lokacijo. Podatki o

nalogah pa za vsako obsegajo njeno trajanje, želeno in obvezno

časovno okno, prioriteto, zahtevane kompetence in morebitne

3

TRINIVOJSKI OPTIMIZACIJSKI

želene delavce. Malice so posebne naloge, za katere lokacija ni

ALGORITEM

definirana (malica se vedno izvaja na isti lokaciji kot predhodna

V nadaljevanju na kratko predstavimo vse tri nivoje optimizacij-

naloga in se ne more prekrivati z drugimi nalogami).

skega algoritma.

Dodatno lahko scenarij razporejanja vsebuje že vnaprej pripra-

vljene razporede posameznih nalog, ki so dveh tipov. Obveznih

razporedov

3.1

Prvi nivo: razporejanje nalog po delavcih

se ne sme spreminjati, a jih je treba vseeno upoštevati,

saj postavljajo omejitve k razporejanju ostalih nalog. Po drugi

Na tem nivoju z evolucijskim algoritmom [5] vsaki nalogi dode-strani pa se želene razporede lahko spreminja, a to vpliva na ceno limo delavca, ki jo bo izvedel. Evolucijski algoritem začetno po-končnega urnika.

pulacijo 𝑁

rešitev ustvari naključno, vendar tako, da vse rešitve

p

ustrezajo omejitvam za delavce (prve tri omejitve v razdelku 2.3).

2.2

Spremenljivke

Potem algoritem izvaja naslednje korake največ 𝑁

generacij. V

g

vsaki generaciji algoritem najprej izbere 𝑁

staršev s turnirsko

Spremenljivke optimizacijskega problema v celoti določijo urnik,

p

selekcijo. Nato pare staršev križa in mutira (pri mutaciji upo-

saj za vsako nalogo povedo ali je razporejena ali ni (ni namreč

rabimo različne strategije zasnovane po meri delnih kriterijev

treba razporediti vseh nalog) in če je, kateri delavec jo bo opravil (glej razdelek 2.4), ki jih izbiramo tako, da se pogostost uporabe ter kdaj se bo začela izvajati.

sklada z njihovimi utežmi). Tako dobljeno populacijo evolucijski

2.3

Omejitve

algoritem ovrednoti tako, da za vsako rešitev izvede drugi in tretji nivo algoritma. Nato staro populacijo prepiše z novo (najboljšo

Urnik, ki predstavlja rešitev problema, je dopusten samo, če iz-

staro rešitev ohrani) in nadaljuje z enakimi koraki.

polnjuje vse naslednje omejitve:

• Delavec lahko izvaja samo eno nalogo hkrati (v časovnem

3.2

Drugi nivo: razporejanje nalog delavca po

razporejanju nalog je treba poskrbeti tudi za upoštevanje

dnevih

trajanja potovanja med lokacijami).

• Delavec lahko izvaja naloge le znotraj delovnega časa in

Hevristika na drugem nivoju naloge vsakega delavca razporedi po

ima omejeno število nadur.

dnevih. Po vrsti vsem nalogam, urejenim naraščajoče po številu

• Delavec mora imeti zahtevane kompetence za opravljanje

dni, v katerih se lahko izvedejo, dodelimo zanje najugodnejši

naloge.

dan. Ugodnost dne določimo z glasovanjem, ki poteka tako, da

• Naloge morajo biti razporejene znotraj svojih obveznih

različni delni kriteriji (glej razdelek 2.4) glasujejo za dneve, ki so časovnih oken.

zanje najugodnejši. Glasovi so uteženi z utežmi delnih kriterijev,

• Nalog z obveznim razporedom se ne sme prerazporejati.

nalogi pa dodelimo dan z največ glasovi.

2.4

Optimizacijski kriterij

3.3

Tretji nivo: določitev časa začetka nalog za

Optimizacijski kriterij oz. cena urnika, ki jo želimo minimizirati, en dan enega delavca

je definirana kot utežena vsota naslednjih delnih kriterijev (prve Na tretjem nivoju z algoritmom razveji in omeji nalogam za en

tri postavke so si v nasprotju, zato je smiselno upoštevati samo

dan enega delavca dodelimo začetni interval. Problem torej za-

eno od njih naenkrat):

pišemo v obliki MILP tako, da upoštevamo samo tiste omejitve

• Vsi delavci naj bodo čim bolj enakomerno obremenjeni.

in delne kriterije, ki so na tem nivoju še smiselni (npr. na tem

44





Študija učinkovitosti algoritma za razporejanje terenskega dela Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

103

tevilo spremenljivk

600

tevilo omejitev

500

102

400

300

as [s]

101

200

100

0

100

1

2

3

4

5

6

7

8

9

10

2

3

4

5

6

7

8

9

10

tevilo nalog

tevilo nalog

Slika 1: Odvisnost števila spremenljivk in omejitev v for-

Slika 2: Povprečen čas, potreben za optimalno rešitev pro-

mulaciji problema MILP na tretjem nivoju od števila nalog.

blema glede na njegovo velikost.

sekund, kar pomeni, da je za naše potrebe algoritem neučinkovit

nivoju se ne ukvarjamo več s kompetencami, enakomerno obre-

že za probleme s sedmimi ali več nalogami.

menjenostjo delavcev in podobnimi delnimi kriteriji, saj je zanje

Preverimo še, kako dobro deluje algoritem, če mu omejimo

poskrbljeno na prvih dveh nivojih).

čas, ki ga ima na voljo za iskanje rešitev. Poskuse izvedemo z

Podobno kot pri predstavitvi problema tudi tu zaradi omeje-

naslednjimi časovnimi omejitvami: 0.5 s, 1 s, 2 s, 4 s, 8 s, 16 s in nega prostora ne moremo navesti celotne formulacije problema

32 s. Pri tem opazujemo, koliko nalog je algoritem razporedil, in

MILP. Za razumevanje nadaljevanja je najpomembneje vedeti, da

to število primerjamo z optimalnim številom razporejenih nalog

2

imamo pri takšni formulaciji za problem z 𝑛 nalogami 3𝑛

+ O (𝑛)

(dobljenim v prejšnjem poskusu, ko algoritem ni bil časovno ome-

2

spremenljivk in 5𝑛

+ O (𝑛) omejitev, kot prikazuje slika 1.

jen). Čeprav cilj algoritma ni samo razporediti čim več nalog, je

število razporejenih nalog dober pokazatelj kakovosti delovanja

4

PREIZKUS UČINKOVITOSTI

algoritma.

Kot je razvidno iz slike 1, je število spremenljivk problema MILP

Na sliki 3 vidimo, da ob prekratkem času na problemih z veliko zelo veliko že za probleme z majhnim številom nalog, kar ote-nalogami algoritem odpove (večino nalog zavrne, čeprav bi jih

žuje nalogo optimizacijskemu algoritmu. Čas, ki ga potrebuje

lahko razporedil). Na primer, ko ima algoritem na voljo le 0,5 s,

za najdbo optimalne rešitve, preverimo s poskusom na množici

primerno deluje le za probleme z do štirimi nalogami, za večje

testnih problemov, ki imajo lastnosti podobne problemom iz pra-

problema pa njegova uspešnost pade in ko je nalog osem ali

kse.

več, v povprečju razporedi le eno nalogo. Delovanje algoritma

Ta množica vsebuje 180 testnih problemov (20 za vsako ve-

je nekoliko boljše, če ima na voljo daljši čas, a šele pri 32 s se likost problema od dveh do desetih nalog), pri katerih je treba

določiti čas izvajanja nalog za enega delavca v enem dnevu. Ne-

kateri problemi imajo samo navadne naloge, lahko pa imajo tudi

malico, eno nalogo z obveznim razporedom ali pa oboje. Trajanje

4.0

malice je vedno pol ure, trajanje ostalih nalog pa je izbrano na-

ključno iz porazdelitve, ki skuša posnemati probleme iz prakse.

3.5

Tako je večina nalog krajših od 90 minut, nekaj pa jih ima dolžino ejenih nalog

do štirih ur. Prioriteta vsake naloge je izbrana naključno med 1 in 3.0

9. Prav tako so lokacije izvajanja nalog in začetna lokacija delavca 2.5

izbrane naključno izmed lokacij nekaterih večjih slovenskih mest.

t = 0.5 s

Večina nalog ima neomejeno časovno okno, pri nekaterih pa je

tevilo razpor

t = 1 s

2.0

okno skrajšano na začetku ali koncu dneva. Trajanje delovnega

no

t = 2 s

e

t = 4 s

časa je izbrano naključno med 6 in 10 ur, lahko pa delavec vedno

1.5

t = 8 s

opravlja do dve naduri.

Povpr

t = 16 s

Vse testne probleme rešujemo z algoritmom razveji in omeji

t = 32 s

1.0

t = neomejen

iz reševalnika SCIP [2] preko knjižnice OR-Tools [4]. Pri tem be-ležimo čas, ki ga algoritem potrebuje, da najde optimalno rešitev.

2

3

4

5

6

7

8

9

10

tevilo nalog

Reševanje poteka na osebnem računalniku s 16 GB pomnilnika

in frekvenco procesorja 3,60 GHz.

Slika 3: Povprečno število razporejenih nalog pri različ-

Rezultati poskusa so prikazani na sliki 2. Vidimo, da algoritem nih časovnih omejitvah in velikostih problemov (s črtkano

praviloma potrebuje eksponentno več časa z dodajanjem vsake

črno črto je prikazano optimalno število razporejenih na-

naloge. Če želimo, da je celoten trinivojski algoritem koristen v

log).

praksi, si lahko za reševanje problema MILP privoščimo le nekaj

45





Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Tea Tušar, Nace Sever, Aljoša Vodopija, and Bogdan Filipič

P1, trajanje potovanja

P1, obremenitev delavcev

število razporejenih nalog na problemih z devetimi in desetimi

t = 1 s

t = 1 s

nalogami približa optimalnemu številu razporejenih nalog.

310000

t = 2 s

335000

t = 2 s

a

t = 1 s, poenostavitev

t = 1 s, poenostavitev

305000

t = 2 s, poenostavitev

a 330000

t = 2 s, poenostavitev

5

POENOSTAVITEV PROBLEMA

nik

nik

300000

325000

Cena ur

Cena ur

Ker z delovanjem algoritma nismo zadovoljni, poskusimo pro-

295000

320000

blem poenostaviti. Za zapis delnih kriterijev za prioriteto in čas 290000

315000

2

2

zapadlosti potrebujemo 2

0

1000

2000

3000

𝑛

+ O (𝑛) spremenljivk ter 4𝑛 + O (𝑛)

Trajanje optimizacije [s]

0

1000

2000

3000

Trajanje optimizacije [s]

omejitev, kar je zelo veliko, sploh ker ta dva delna kriterija ni-

P2, trajanje potovanja

P2, obremenitev delavcev

sta zelo pomembna. Zato preizkusimo, kako algoritem deluje, če

ju izpustimo (zanju lahko do neke mere poskrbimo na zgornjih

50000

50000

dveh nivojih optimizacijskega algoritma). Število spremenljivk

a

a

t = 1 s

nik

nik

t = 2 s

in omejitev se še vedno povečuje kvadratično s številom nalog,

40000

40000

t = 1 s, poenostavitev

t = 1 s

t = 2 s, poenostavitev

vendar pa smo koeficienta pred kvadratnim členom s 3 oziroma

Cena ur

t = 2 s

Cena ur

30000

30000

t = 1 s, poenostavitev

5 zmanjšali na 1.

t = 2 s, poenostavitev

Za poenostavljeni problem izvedemo podoben test kot v raz-

20000

0

500 1000 1500 2000 2500

0

500 1000 1500 2000 2500

Trajanje optimizacije [s]

Trajanje optimizacije [s]

delku 4, le da testiramo samo pri časovnih omejitvah 0.5 s, 2 s, 8 s in 32 s. Na sliki 4 primerjamo število razporejenih nalog pri osnovnemu ter poenostavljenemu problemu. Vidimo, da na večjih

Slika 5: Rezultati optimizacije za štiri različice algoritma

poenostavljenih problemih algoritem deluje mnogo bolje.

na dveh problemih (P1 zgoraj in P2 spodaj) z dvema raz-

ličnima delnima kriterijema (trajanje potovanja levo in

enakomerna obremenitev delavcev desno). Manjše vredno-

4.0

sti so boljše.

3.5

6

ZAKLJUČKI

ejenih nalog 3.0

V prispevku smo analizirali učinkovitost algoritma za razporeja-

nje terenskega dela. Posvetili smo se le časovno najzahtevnejšemu

2.5

t = 0.5 s

t = 2 s

delu trinivojskega algoritma – reševanju problema MILP na tre-

tevilo razpor

t = 8 s

tjem nivoju. Z dvema poskusoma smo pokazali, da algoritem

2.0

no

t = 32 s

razveji in omeji ni dovolj učinkovit za reševanje praktičnih pro-

e

t = 0.5 s, poenostavitev

blemov, zato smo problem MILP poenostavili. To ne spremeni

1.5

t = 2 s, poenostavitev

Povpr

t = 8 s, poenostavitev

kriterijev celotnega problema, algoritmu na tretjem nivoju pa

t = 32 s, poenostavitev

1.0

omogoči, da učinkovito reši tudi probleme z desetimi nalogami

t = neomejen

(več jih v praksi ne pričakujemo). Primerjali smo tudi, kako poe-

2

3

4

5

6

7

8

9

10

nostavitev vpliva na delovanje celotnega algoritma, in ugotovili,

tevilo nalog

da čeprav obstajajo problemi, za katere poenostavitev ni koristna, v splošnem daje dobre rezultate in se je bomo posluževali tudi v

Slika 4: Primerjava delovanja algoritma na izvirni (polne

praksi.

črte) in poenostavljeni formulaciji problema (črtkane črte)

pri različnih časovnih omejitvah in velikostih problemov

ZAHVALA

(s črno črtkano črto je prikazano optimalno število razpo-

To delo je nastalo v okviru projekta Inteligentno in okolju pri-

rejenih nalog izvirne formulacije problema).

jazno razporejanje terenskega dela – MF-Scheduler, katerega

naročnik je Comland d.o.o., sofinancerja pa Ministrstvo za gospo-

S poenostavitvijo torej dosežemo, da algoritem na tretjem

darski razvoj in tehnologijo Republike Slovenije in Evropski sklad nivoju deluje zadovoljivo tudi za praktične potrebe. Vendar pa

za regionalni razvoj Evropske unije, ter raziskovalnega programa

to še ne pomeni nujno, da poenostavitev izboljša delovanje ce-

P2-0209, ki ga financira Javna agencija za raziskovalno dejavnost

lotnega trinivojskega algoritma. To preverimo s poskusom na

Republike Slovenije iz državnega proračuna.

dveh testnih problemih, P1 in P2, pri katerih damo enkrat večjo

utež delnemu kriteriju trajanja potovanja, drugič pa enakomerni

LITERATURA

obremenitvi delavcev. Na ta način dobimo štiri različne testne

[1]

S. Bertels in T. Fahle. 2006. A hybrid setup for a hybrid scenario: Combining probleme. P1 obsega 220 nalog, deset delavcev in sedem dni, P2

heuristics for the home health care problem. Computers & Operations Research, 33, 10, 2866–2890. doi: 10.1016/j.cor.2005.01.015.

pa 114 nalog (vsebuje tudi malice), pet delavcev in tri dni. Za vsak

[2]

K. Bestuzheva in sod. 2021. The SCIP Optimization Suite 8.0. Technical Report.

testni problem poženemo štiri različice trinivojskega algoritma,

Optimization Online, (dec. 2021). http : / / www . optimization - online . org

ki se razlikujejo samo na tretjem nivoju – ta uporablja bodisi

/DB_HTML/2021/12/8728.html.

[3]

J. A. Castillo-Salazar, D. Landa-Silva in R. Qu. 2016. Workforce scheduling izvirni bodisi poenostavljeni problem, izvajanje algoritma pa je

and routing problems: Literature survey and computational study. Annals of omejeno bodisi na 1 s bodisi na 2 s.

Operations Research, 239, 1, 39–67. doi: 10.1007/s10479-014-1687-2.

Slika 5 kaže rezultate teh poskusov. Na problemu P1 (zgornja

[4]

Google Developers. 2021. About OR-Tools. Retrieved 19. avg. 2022 from

https://developers.google.com/optimization/introduction/overview.

dva grafa na sliki) lahko jasno vidimo, da poenostavitev problema

[5]

A. E. Eiben in J. E. Smith. 2015. Introduction to Evolutionary Computing.

na tretjem nivoju koristi učinkovitosti celotnega algoritma. Tega

(2. izd.). Springer. doi: 10.1007/978- 3- 662- 44874- 8.

[6]

D. C. Paraskevopoulos, G. Laporte, P. P. Repoussis in C. D. Tarantilis. 2017.

ne moremo trditi za problem P2, na katerem je delovanje izvirne

Resource constrained routing and scheduling: Review and research prospects.

in ponostavljene različice zelo podobno, vidimo pa veliko boljše

European Journal of Operational Research, 263, 3, 737–754. doi: 10.1016/j.ejor

delovanje v primeru omejitve izvajanja na 2 s.

.2017.05.035.

46





Interaktivno eksperimentiranje z besednimi vložitvami v platformi ClowdFlows

Interactive Experimentation with Word Embeddings in the ClowdFlows platform Martin Žnidaršič

Senja Pollak

Vid Podpečan

martin.znidarsic@ijs.si

senja.pollak@ijs.si

vid.podpecan@ijs.si

Institut "Jožef Stefan"

Jamova cesta 39

1000 Ljubljana, Slovenija

POVZETEK

rabo tovrstnih metod med drugim otežuje potrebno predznanje,

V članku predstavimo spletno platformo ClowdFlows, ki je na-

ki je potrebno za njihovo smiselno uporabo, včasih pa tudi po-

menjena analiziranju podatkov in strojnemu učenju in omogoča

stopki namestitve in nastavitev programske opreme. Prototipno

uporabo interaktivnih delotokov. Posebej predstavimo značilno-

raziskovalno orodje ClowdFlows, ki ga razvijamo na Odseku za

sti platforme, ki lajšajo njeno uporabo programiranja neveščim

tehnologije znanja na Institutu "Jožef Stefan", naslavlja ti dve uporabnikom in elemente platforme, ki omogočajo analizo teksta

oviri in kaže potencial za praktično uporabo. V sklopu projekta

z najsodobnejšimi pristopi vektorskih vložitev. Poročamo tudi o

EMBEDDIA [14, 13, 16] smo razširili nabor zmogljivosti tega praktičnem preizkusu uporabnosti platforme in njenih orodij z

orodja predvsem na področju analize naravnega jezika, zato se v

vektorskimi vložitvami za izbrane ciljne uporabnike s področij

tem prispevku osredotočamo na metode in končne uporabnike

humanistike in družboslovja.

s tega področja. Natančneje, predstavimo primer učenja in upo-

rabe modelov za besedne vektorske vložitve in izkušnje novih

KLJUČNE BESEDE

uporabnikov s področja humanistike in družboslovja.

procesiranje naravnega jezika, besedne vložitve, spletna aplika-

V razdelku 2 predstavimo osnovno sorodno delo. Platforma cija, delotoki

ClowdFlows je opisana v razdelku 3. Razdelek 4 predstavi primer uporabe vektorskih vložitev in uporabniške izkušnje. Zaključki

ABSTRACT

so podani v razdelku 5.

The paper presents the ClowdFlows web platform for machine

learning and data analysis using interactive workflows. In par-

2

OZADJE IN SORODNO DELO

ticular, we highlight selected features that facilitate its use by 2.1

Platforme za vizualno programiranje in

non-programmers as well as selected elements of the platform

deljenje rešitev

that enable text analysis using state-of-the-art word embedding

approaches. We also report on a hands-on evaluation of the us-

Programsko orodje ClowdFlows, ki je predstavljeno in upora-

ability of the platform and its word embedding components in

bljeno v tem prispevku, je podobno nekaterim drugim orodjem

a selected group of end users from the fields of humanities and

za upravljanje delotokov podatkovnega rudarjenja. Slovenskim

social sciences.

uporabnikom je verjetno najbolj poznano orodje Orange [2], podobni pa sta orodji tudi Weka [18] in RapidMiner [8, 5]. Vsa ta KEYWORDS

orodja omogočajo vizualno programiranje s programskimi gra-

dniki in upravljanje tako izdelanih programov. Manj razširjene so

natural language processing, word embeddings, web application,

rešitve za skupno rabo delovnih tokov. To recimo ponuja portal

workflows

myExperiment [15] ali spletna stran pobude OpenML [17]. Je pa uporabnost teh rešitev omejena predvsem na dobro podprto

1

UVOD

javno deljenje rešitev, za izvajanje ali urejanje delovnih tokov pa Področja, povezana z metodami umetne inteligence, kot so rudar-mora uporabnik še vedno namestiti posebno programsko opremo,

jenje podatkov, strojno učenje in avtomatska obdelava naravnega

v kateri so bili le-ti zasnovani. ClowdFlows, po drugi strani, omo-jezika, v zadnjih letih doživljajo razmah v praktični uporabi. Naj-goča tako izdelavo kot tudi deljenje in izvajanje delotokov.

novejši metodološki dosežki so običajno najprej na voljo v obliki

programskih knjižnic ali spletnih storitev (angl. web services), pozneje v platformah za razvijanje rešitev z udobnim uporabniškim

2.2

Besedne vložitve

vmesnikom in običajno še pozneje v namenskih orodjih, ki to

Besedne vektorske vložitve, ki so strojno naučene z uporabo

metodologijo uporabljajo interno in omogočajo njeno uporabo

nevronskih mrež, so predstavitve besed v prostoru, kjer vsako

brez ali z zelo omejenim vplivom na način delovanja tudi upo-

besedo opisuje vektor z veliko dimenzijami (tipično od nekaj

rabnikom brez računalniškega predznanja. Slednjim samostojno

deset do nekaj sto). Besede, ki so si blizu v vektorskem pro-

storu (kar lahko merimo s kosinusno razdaljo), so si tudi se-

Permission to make digital or hard copies of part or all of this work for personal mantično podobne. Med vektorskimi vložitvami je mogoče raču-or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and nati tudi odnose, ki presegajo enostavno sorodnost besed, npr.

the full citation on the first page. Copyrights for third-party components of this preko analogij. Na primer, odnos Madrid:Španija je podoben od-work must be honored. For all other uses, contact the owner/author(s).

nosu Pariz:Francija [10]. Pri statičnih vložitvah, kot so modeli Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

© 2022 Copyright held by the owner/author(s).

word2vec [9] in fastText [1], je posamezna beseda v korpusu predstavljena z enim vektorjem. Pri metodi fastText je vsaka

47





Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

Žnidaršič, Pollak, Podpečan

Slika 1: Glavni pogled v ClowdFlows.

beseda predstavljena kot vsota vektorskih vložitev znakovnih

Po prijavi v ClowdFlows imamo na voljo kratek tečaj o osno-

n-gramov, ki jih beseda vsebuje. V praksi to pomeni, da metoda

vah, izdelavo novega delotoka ali pregled javno dostopnih rešitev.

pri modeliranju semantične bližine upošteva tudi morfološko

Glavni pogled je namenjen izdelavi, pregledu in poganjanju de-

podobnost besed, zaradi česar je ta metoda še posebej uporabna

lotokov. Prikazan je na sliki 1. Večji del tega pogleda predstavlja za izračun besednih vložitev v morfološko bogatih jezikih, kot je

delovna površina, na katero lahko potegnemo (ali uvrstimo z

slovenščina. Za razliko od statičnih vložitev pa pri kontekstualnih dvoklikom) željeni programski gradnik (angl. widget) iz seznama

vložitvah, kot sta na primer modela ELMo [12] in BERT [3], vsako razpoložljivih gradnikov na levi strani pogleda. Smiselno po-pojavitev besede opisuje svoj vektor. To je pomembno predvsem

vezani gradniki predstavljajo delotok, ki ga lahko poženemo z

z vidika večpomenskih besed pa tudi v primerih, kjer analiziramo

nadzornim gumbom Play.

razlike med besedami v različnih kontekstih. Za veliko jezikov

Povezave med gradniki vzpostavimo s klikom na izhod enega

obstajajo prednaučeni modeli na velikih jezikovnih korpusih [4,

gradnika in vhod drugega. Vhodi so predstavljeni kot svetlo mo-

3], ki jih je mogoče priučiti za posamezne domene in naloge.

dri pravokotniki na levi strani gradnika in izhodi kot tovrstni

pravokotniki na desni. Povezave lahko odstranimo tako, da z de-

3

CLOWDFLOWS

sno tipko miške kliknemo povezavo in izberemo možnost Remove.

Delotoki se shranjujejo samodejno, lahko pa jih tudi eksplicitno

ClowdFlows [6, 7] je spletna platforma za analiziranje podatkov shranimo s pritiskom na kontrolnik za shranjevanje, kar nam

in strojno učenje z grafičnim uporabniškim vmesnikom, ki omo-

omogoča tudi lastno poimenovanje shranjenega dela. Shranjene

goča izvajanje v brskalniku brez zahtev po lokalni namestitvi

delotoke lahko pregledujemo, kopiramo, brišemo, izvažamo ali

programske opreme, ponuja pa tudi preprosto javno deljenje izde-

javno delimo na pogledu, ki se pokaže ob izbiri

lanih rešitev. Gre za odprtokodno raziskovalno orodje, katerega

Your workflows.

Javno objavljeni delotoki dobijo nespremenljiv URL naslov, ki ga

zadnja stabilna različica ClowdFlows 3 je na voljo na naslovu:

lahko delimo in vsakemu uporabniku ClowdFlows omogoča, da

https://cf3.ijs.si/.

ustvari svojo kopijo tako deljenega dela.

Grafičen način sestave delovnih tokov in uporaba javno de-

ljenih rešitev brez nameščanja dodatne programske opreme sta

značilnosti, ki lajšata uporabo tudi uporabnikom, ki nimajo pro-

gramerskega predznanja, imajo pa zanimive podatke in razisko-

4

UPORABA VEKTORSKIH VLOŽITEV

valne probleme, pri katerih bi jim prav prišle metode, ki so na

voljo v ClowdFlows. Za raziskovalce je poleg tega pomembno tudi

4.1

Učenje modela vložitev v ClowdFlows

preprosto deljenje in preprostost ponavljanja ali nadgrajevanja

Za pridobitev predstavitve teksta v obliki vektorskih vložitev

obstoječih eksperimentov.

lahko uporabimo predpripravljene modele ali pa take modele

Elementi v ClowdFlows 3 vsebujejo vrsto programskih gra-

sami strojno naučimo. Tovrstno strojno učenje je običajno ra-

dnikov, ki ponujajo delo z vektorskimi vložitvami. Vsebujejo

čunsko zelo zahtevno in za smiselne rezultate potrebuje velike

prednaučene statične in kontekstualne modele za več jezikov

količine podatkov. V praksi se zato pogosto uporablja predhodno

kakor tudi nekaj orodij, ki na njih temeljijo, kot so na primer

naučene modele. ClowdFlows ponuja več prednaučenih modelov,

klasifikatorji za analizo sentimenta novic [11] in prepoznavanje naprimer ELMo, Word2Vec in razne modele pristopa BERT in

sovražnega govora [11].

fastText, tudi za slovenščino.

48





Interaktivno eksperimentiranje z besednimi vložitvami v platformi ClowdFlows Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia

Slika 2: Delotok, ki je bil uporabljen na delavnici s ciljnimi uporabniki. Dostopen je na: https://cf3.ijs.si/workflow/283

.

Učenje lastnih modelov je smiselno, ko gre za posebna bese-

min_count - najmanjše število pojavitev besede, pri katerem

dila ali naloge, pri katerih jih želimo uporabljati. Velika računska se beseda še upošteva.

zahtevnost, velike količine podatkov in z njima povezani daljši

časi obdelave namreč niso združljivi z interaktivno uporabo, ki

Kjer je primerno, opis parametra vključuje namig, ali je v primeru je značilna za ClowdFlows. Pri uporabnikih digitalne humani-majhnih učnih podatkov priporočljivo povečati oz. zmanjšati

stike in družboslovja smo zaznali potrebo za učenje vložitev na

vrednost parametra.

majhnih, specifičnih korpusih, kot so pesniške zbirke, specializi-

rani novičarski članki ipd. Takšni korpusi so pogosto bistveno

4.2

Izkušnje uporabnikov

manjši od tipičnih korpusov, ki se uporabljajo za učenje vektor-

Uporabnost platforme ClowdFlows in najpomembnejših kompo-

skih vložitvev. Glede na potrebe uporabnikov in zmogljivosti

nent za analizo naravnega jezika z vidika ciljnih končnih uporab-

platforme smo se odločili za implementacijo gradnika za učenje

nikov smo preverjali v okviru enodnevne delavnice, ki je potekala

modelov train fastText, saj je algoritem fastText eden najučinko-

(na daljavo) 27. januarja 2022. Delavnica je bila namenjena eni

vitejših in najmanj računsko zahtevnih. Implementacija gradnika

od naših primarnih ciljnih skupin: raziskovalcem z različnih po-

v ClowdFlows vsebuje tudi namige, kako prilagoditi privzete

dročij humanistike in družboslovja, ki (predvidoma) niso vešči

parametre za učenje na majhnih korpusih. Za sprejemljivo hi-

programiranja.

tro interaktivno delo vseeno priporočamo, da vhodni korpus ne

Za potrebe delavnice smo pripravili primer delotoka za anali-

presega dveh milijonov besed ali približno 10 MB neobdelanega

ziranje besedil z vektorskimi vložitvami. Prikazan je na sliki 2.

besedila.

Delotok se začne z dvema možnima načinoma vnosa vhodnih

Gradnik train fastText z uporabo algoritma fastText nauči nov

podatkov, nadaljuje z opcijsko uporabo tokenizatorja in lema-

vektorski model na vhodnem korpusu. Tak model lahko nato po-

tizatorja (ta kot vhodni podatek sprejema tudi oznako jezika),

sredujemo drugim gradnikom. Vhod v train fastText je besedilni

čemur sledi učenje modela fastText. Naučeni model nato v delo-

korpus, kot je na primer izhod gradnika Load Corpus from CSV.

toku uporabimo na dva načina: v gradniku fastText neighboring

Korpus je mogoče tokenizirati, lematizirati ali pa uporabiti tudi

words pregledujemo okolico (sosednje besede) izbranih besed, v

brez tovrstne predobdelave.

gradniku Evaluate word expressions with fastText pa na modelu

train fastText uporabniku ponuja nastavljanje sledečih para-

preizkušamo uporabo izrazov (seštevanje, odštevanje) na vek-

metrov:

torskih predstavitvah besed. Ogled rezultatov v obeh primerih

omogočimo z gradnikom Object viewer.

bucket - število skupin (značilke besednih in znakovnih 𝑛-

Delavnica se je začela s skupno uvodno predstavitvijo plat-

gramov so zgoščene v fiksno število skupin);

forme ClowdFlows in primera delotoka s slike 2, ki je trajala epoch - število epoh učenja;

20 minut in v kateri smo izbrane primere prikazali z uporabo

lr - hitrost učenja;

besedila novele Deseti brat Josipa Jurčiča.

dimension - velikost besednih vektorjev;

Temu je sledilo osem 20-minutnih sej, v katerih je vsak uporab-

window - velikost kontekstnega okna;

nik ustvaril svoj primerek delotoka, naložil svoj korpus in preizku-model - vrsta nenadzorovanega fastText modela (cbow ali

sil izbrane komponente ClowdFlows. Ena seja je bila namenjena

skipgram) ter

enemu uporabniku in njegovim podatkom, drugi uporabniki pa

49





Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Žnidaršič, Pollak, Podpečan

so lahko prisostvovali kot opazovalci. Uporabnikom smo pri nji-

ZAHVALA

hovem delu pomagali, če so imeli težave pri uporabi platforme

Prispevek je rezultat raziskovalnih projektov Računalniško pod-

ali pri pripravi svojih vhodnih podatkov. Udeležba na delavnici

prta večjezična analiza novičarskega diskurza s kontekstualnimi be-je bila na povabilo. Udeleženci, ki so bili povabljeni na delavnico, sednimi vložitvami (št. J6-2581), Sovražni govor v sodobnih koncep-so raziskovalci s področij literarnih ved, sociologije, socialnega tualizacijah nacionalizma, rasizma, spola in migracij (št. J5-3102) dela in sorodnih področij. Pripravili so lastne korpuse s svojih

in programa Tehnologije znanja (št. P2-0103), ki jih je sofinanci-

področij, kot so na primer tematski korpusi migracij, korpus del

rala Javna agencija za raziskovalno dejavnost Republike Slovenije

slovenskih literatov, korpus francoske poezije, LGBT, novice, ki

iz državnega proračuna, ter evropskega projekta EMBEDDIA (št.

govorijo o socialnem delu in podobno. Nekateri udeleženci so

825153), ki ga v okviru okvirnega programa za raziskave in ino-

bili vabljeni v okviru interdisciplinarnih projektov SOVRAG in

vacije Obzorje 2020 financira EU.

CANDAS. Nihče od udeležencev pa ni imel predhodnih izkušenj

s ClowdFlows. Zaradi velikega zanimanja smo število sej povečali

LITERATURA

s predvidenih 8 na 10.

[1]

Piotr Bojanowski, Edouard Grave, Armand Joulin in Tomas Mikolov. 2017.

Uporabljeni korpusi so bili zelo raznoliki, udeležence pa so

Enriching word vectors with subword information. Transactions of the Asso-zanimali različni vidiki obdelave besedil. V večini primerov so

ciation for Computational Linguistics, 5, 135–146.

[2]

Janez Demšar in sod. 2013. Orange: data mining toolbox in python. Journal bili začetnemu delotoku dodani dodatni gradniki, da bi rešili

of Machine Learning Research, 14, 1, 2349–2353.

določeno težavo ali zadovoljili posebne interese. Udeleženci so na

[3]

Jacob Devlin, Ming-Wei Chang, Kenton Lee in Kristina Toutanova. 2019.

primer iskali podobnosti in razlike v sosedstvu besed na podlagi

BERT: pre-training of deep bidirectional transformers for language understanding. V Proceedings of the 2019 Conference of the North American korpusov iz različnih obdobij ali od različnih avtorjev. Zanimale

Chapter of the Association for Computational Linguistics: Human Language so jih tudi osnovne značilnosti takih korpusov, kot so recimo

Technologies, Volume 1 (Long and Short Papers), 4171–4186.

najpogosteje uporabljene besede, s čimer so bile povezane tudi

[4]

Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin in Tomas Mikolov. 2018. Learning word vectors for 157 languages. V Proceedings of druge osnovne operacije, kot je na primer filtriranje besed. Med

the International Conference on Language Resources and Evaluation (LREC

delavnico sta bili odkriti dve specifični tehnični težavi: (I) napake 2018).

[5]

Markus Hofmann in Ralf Klinkenberg. 2016. RapidMiner: Data mining use so se pojavile v primeru vnosa besedila s posebnimi znaki, ki

cases and business analytics applications. CRC Press.

ni bilo kodirano v kodni tabeli UTF, in (II) nekateri gradniki, ki

[6]

Janez Kranjc. 2017. Web Workflows for Data Mining in the Cloud. Doktorska vsebujejo klice na spletne storitve, so poročali o preseženi časovni disertacija. Jožef Stefan International Postgraduate School.

[7]

Janez Kranjc, Vid Podpečan in Nada Lavrač. 2012. ClowdFlows: a cloud based omejitvi.

scientific workflow platform. V Machine Learning and Knowledge Discovery Za udeležence je bil pripravljen anonimen vprašalnik, pove-in Databases. Lecture Notes in Computer Science. Zv. 7524. Peter A. Flach, zava do vprašalnika pa je bila posredovana po delavnici. Večini

Tijl Bie in Nello Cristianini, uredniki. Springer Berlin Heidelberg, 816–819.

[8]

Ingo Mierswa, Michael Wurst, Ralf Klinkenberg, Martin Scholz in Timm udeležencev se je prikazani potek dela zdel zelo uporaben. Velika

Euler. 2006. YALE: rapid prototyping for complex data mining tasks. V

večina (80%) še nikoli ni poskusila uporabljati vektorskih vložitev.

Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, USA, 935–940.

O uporabniškem vmesniku ClowdFlows so večinoma poročali

[9]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado in Jeff Dean. 2013.

kot o preprostem za uporabo (preprost: 60%, zelo preprost: 30%),

Distributed representations of words and phrases and their compositionality.

le enemu udeležencu pa se je zdel zapleten. Te rezultate je sicer

V Advances in Neural Information Processing Systems, 3111–3119.

[10]

Tomas Mikolov, Wen-tau Yih in Geoffrey Zweig. 2013. Linguistic regula-treba upoštevati v kontekstu dejstva, da so odzivi zbrani kmalu

rities in continuous space word representations. V Proceedings of the 2013

po uporabi ClowdFlows, pri čemer je bila na voljo pomoč. Brez

Conference of the North American Chapter of the ACL: Human Language uvoda in pomoči odgovori morda ne bi bili tako pozitivni, ven-Technologies. ACL, 746–751.

[11]

Andraž Pelicon, Ravi Shekhar, Matej Martinc, Blaž Škrlj, Matthew Purver dar tega še nismo preizkusili. Večina udeležencev je menila, da

in Senja Pollak. 2021. Zero-shot cross-lingual content filtering: offensive bi ponovno uporabili ClowdFlows, če bi jim zagotovili vnaprej

language and hate speech detection. V Proceedings of the EACL Hackashop on pripravljen delotok za njihov problem.

News Media Content Analysis and Automated Report Generation. Association for Computational Linguistics, Online, (apr. 2021), 30–34. https://aclantholo

gy.org/2021.hackashop-1.5.

[12]

Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee in Luke Zettlemoyer. 2018. Deep contextualized word representations. V Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2227–2237.

[13]

Senja Pollak in Andraž Pelicon. 2022. EMBEDDIA project: cross-lingual 5

ZAKLJUČEK

embeddings for less- represented languages in European news media. V

Predstavili smo spletno platformo ClowdFlows, izbrane elemente,

Proceedings of the 23rd Annual Conference of the European Association for ki omogočajo napredno uporabo pristopov za učenje besednih

Machine Translation. European Association for Machine Translation, Ghent, Belgium, (jun. 2022), 293–294.

vektorskih vložitev, in izkušnje nekaterih od naših ciljnih upo-

[14]

Senja Pollak in sod. 2021. EMBEDDIA tools, datasets and challenges: resour-rabnikov teh orodij.

ces and hackathon contributions. V Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation. Association Eden od ciljev platforme ClowdFlows je približanje uporabe

for Computational Linguistics, Online, (apr. 2021), 99–109. https://aclanthol

najnovejših metod analize podatkov in strojnega učenja tudi upo-

ogy.org/2021.hackashop-1.14.

rabnikom, ki niso vešči programiranja. Izkušnje naše delavnice

[15]

David De Roure, Carole Goble in Robert Stevens. 2009. The Design and Realisation of the myExperiment Virtual Research Environment for Social z nekaterimi od potencialnih uporabnikov so pokazale, da je to

Sharing of Workflows. Future Generation Computer Systems, 25, (feb. 2009), vsekakor smiselno, saj so na podlagi predpripravljenih deloto-561–567.

[16]

Matej Ulčar, Aleš Žagar, Carlos S. Armendariz, Andraž Repar, Senja Pollak, kov uporabniki (sicer strokovnjaki na drugih področjih) lahko

Matthew Purver in Marko Robnik-Šikonja. 2021. Evaluation of contextual opravili analize na lastnih podatkih in tudi že iskali in predlagali embeddings on less-resourced languages. (2021). https://arxiv.org/abs/2107

nadaljnje postopke analiz, ki so smiselni in uporabni pri njihovem

.10614.

[17]

Joaquin Vanschoren, Jan N Van Rijn, Bernd Bischl in Luis Torgo. 2014. Ope-delu. Poleg metodoloških razširitev in tehničnih izboljšav plat-

nML: networked science in machine learning. ACM SIGKDD Explorations forme bomo zato v bodoče več pozornosti namenjali tudi razvoju

Newsletter, 15, 2, 49–60.

primerov rešitev za ciljne uporabnike, v prvi vrsti za raziskovalce

[18]

Ian H Witten in Eibe Frank. 2005. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann.

s področij, ki niso povezana z računalništvom.

50





Indeks avtorjev / Author index



Alukić Erna .................................................................................................................................................................................. 35

Andova Andrejaana ........................................................................................................................................................................ 7

Brence Jure ................................................................................................................................................................................... 11

Čelan Nika .............................................................................................................................................................................. 19, 39

Chaushevska Marija ..................................................................................................................................................................... 11

Džeroski Sašo ............................................................................................................................................................................... 11

Erzar Blaž ............................................................................................................................................................................... 19, 39

Filipič Bogdan .......................................................................................................................................................................... 7, 43

Gams Matjaž .................................................................................................................................................................... 19, 31, 39

Gjoreski Hristijan ......................................................................................................................................................................... 15

Hrastič Aleksander ....................................................................................................................................................................... 19

Kizhevska Emilija ........................................................................................................................................................................ 15

Kolar Žiga .............................................................................................................................................................................. 19, 39

Konečnik Martin .................................................................................................................................................................... 19, 39

Krömer Pavel ................................................................................................................................................................................. 7

Leskovec Gašper .................................................................................................................................................................... 19, 39

Logar Katja ................................................................................................................................................................................... 23

Luštrek Mitja ................................................................................................................................................................................ 15

Mekiš Nejc ................................................................................................................................................................................... 35

Mlakar Miha ................................................................................................................................................................................. 35

Osojnik Aljaž ............................................................................................................................................................................... 27

Podpečan Vid ............................................................................................................................................................................... 47

Pollak Senja .................................................................................................................................................................................. 47

Prestor Domen ........................................................................................................................................................................ 19, 39

Robnik-Šikonja Marko ................................................................................................................................................................. 23

Sever Nace ................................................................................................................................................................................... 43

Shulajkovska Miljana ................................................................................................................................................................... 31

Skobir Matjaž ......................................................................................................................................................................... 19, 39

Slapničar Gašper .......................................................................................................................................................................... 35

Smerkol Maj ................................................................................................................................................................................. 31

Susič David ............................................................................................................................................................................ 19, 39

Todorovski Ljupčo ....................................................................................................................................................................... 11

Tušar Tea .................................................................................................................................................................................. 7, 43

Uher Vojtěch .................................................................................................................................................................................. 7

Us Peter ........................................................................................................................................................................................ 35

Vodopija Aljoša ....................................................................................................................................................................... 7, 43

Ženko Bernard .............................................................................................................................................................................. 27

Žibert Janez .................................................................................................................................................................................. 35

Žnidaršič Martin ..................................................................................................................................................................... 27, 47





51





52



Slovenska konferenca o

umetni inteligenci

Slovenian Conference on

Artificial Intelligence

Uredniki  Editors:

Mitja Lustrek, Matjaz Gams, Rok Piltaver





Document Outline


02 - Naslovnica - notranja - A - TEMP

03 - Kolofon - A - TEMP

04 - IS2022 - Predgovor - TEMP

05 - IS2022 - Konferencni odbori - TEMP

07 - Kazalo - A Blank Page





08 - Naslovnica - notranja - A - TEMP Blank Page





09 - Predgovor podkonference - A

10 - Programski odbor podkonference - A Blank Page





01 Andova Abstract

1 Introduction

2 Theoretical Background

3 Exploratory Landscape Analysis

4 Test problems

5 Experimental setup

6 Results

7 Conclusion and future work

Acknowledgments





02 Chaushevska Abstract

1 Introduction

2 Grammars for Equation Discovery 2.1 Context-Free Grammar (CFG)

2.2 Probabilistic Context-Free Grammar (PCFG)





3 Learning probabilities in PCFGs for arithmetical expressions 3.1 The Approach

3.2 The Corpora

3.3 The Learned Probabilities





4 Conclusions and Further work





03 Kizhevska Abstract

1 Introduction

2 Metodology 2.1 Preprocessing

2.2 Feature Engineering





3 Experiments and results 3.1 Dataset Description

3.2 Experimental Setup

3.3 Results





4 Conclusions





04 Kolar Abstract

1 Introduction

2 Related work

3 Methodology 3.1 Peak Detection with Derivatives

3.2 Peak Detection with Encoder/Decoder

3.3 Peak Detection with Artificial Neural Network

3.4 Peak Detection with Convolution Neural Network

3.5 Peak Detection with Predefined Method Find_peaks

3.6 Peak Detection with Library Tsfresh





4 Conclusion and discussion

Acknowledgments





05 Logar Abstract

1 Introduction

2 Related work

3 Methodology

4 Experiments and results 4.1 Evaluation Metrics

4.2 English UnifiedQA Results Using T5small

4.3 Slovene Monolingual Results Using SloT5

4.4 Cross-Lingual Transfer Using mT5

4.5 Qualitative Analysis





5 Conclusion and future work





06 Osojnik Abstract

1 Introduction

2 Related Work

3 Data

4 Methods 4.1 Sentiment Analysis

4.2 Frequent Tokens

4.3 Topic Modeling

4.4 Aggregates and trends





5 Web application

6 Conclusion

Acknowledgments





07 Shulajkovska Abstract

1 Introduction

2 System

3 Data collection 3.1 Simulation

3.2 Scenarios

3.3 KPIs Calculation





4 Machine-learning for policy proposal 4.1 Methods and Results





5 conclusion

6 future work

Acknowledgments





08 Slapnicar Abstract

1 Introduction

2 Related Work

3 Data Description and Preprocessing

4 Methodology and Results

5 Discussion and Conclusion

Acknowledgments





09 Susic Abstract

1 Introduction

2 Dataset

3 Methodology 3.1 Signal Decomposition

3.2 Convolutional Autoencoder





4 Results

5 Conclusion

Acknowledgments





10 Tusar Povzetek

1 Uvod

2 Problem razporejanja terenskega dela 2.1 Scenarij razporejanja

2.2 Spremenljivke

2.3 Omejitve

2.4 Optimizacijski kriterij





3 Trinivojski optimizacijski algoritem 3.1 Prvi nivo: razporejanje nalog po delavcih

3.2 Drugi nivo: razporejanje nalog delavca po dnevih

3.3 Tretji nivo: določitev časa začetka nalog za en dan enega delavca





4 Preizkus učinkovitosti

5 Poenostavitev problema

6 Zaključki

Zahvala





11 Znidarsic Abstract

1 Uvod

2 Ozadje in sorodno delo 2.1 Platforme za vizualno programiranje in deljenje rešitev

2.2 Besedne vložitve





3 ClowdFlows

4 Uporaba Vektorskih Vložitev 4.1 Učenje modela vložitev v ClowdFlows

4.2 Izkušnje uporabnikov





5 Zaključek

Zahvala





12 - Index - A Blank Page





Blank Page

08 - Naslovnica - notranja - A.pdf Blank Page





Blank Page