Zbornik 24. mednarodne multikonference

•

INFORMACIJSKA DRUZBA

Zvezek A

Proceedings of the 24th International Multiconference

INFORMATION SOCIETY

Volume A

I S S 0 S I

Slovenska konferenca o

umetni inteligenci

Slovenian Conference on

Artificial Intelligence

Uredniki • Editors:

Mitja Luštrek, Matjaž Gams, Rok Piltaver

8. oktober 2021 Ljubljana, Slovenija • 8 October 2021 Ljubljana, Slovenia • http://is.ijs.si





Zbornik 24. mednarodne multikonference

INFORMACIJSKA DRUŽBA – IS 2021

Zvezek A





Proceedings of the 24th International Multiconference

INFORMATION SOCIETY – IS 2021

Volume A





Slovenska konferenca o umetni inteligenci

Slovenian Conference on Artificial Intelligence





Uredniki / Editors



Mitja Luštrek, Matjaž Gams, Rok Piltaver





http://is.ijs.si





8. oktober 2021 / 8 October 2021

Ljubljana, Slovenia



Uredniki:





Mitja Luštrek

Odsek za inteligentne sisteme

Institut »Jožef Stefan«, Ljubljana



Matjaž Gams

Odsek za inteligentne sisteme

Institut »Jožef Stefan«, Ljubljana



Rok Piltaver

Outfit7

in Odsek za inteligentne sisteme, Institut »Jožef Stefan«, Ljubljana Založnik: Institut »Jožef Stefan«, Ljubljana

Priprava zbornika: Mitja Lasič, Vesna Lasič, Lana Zemljak

Oblikovanje naslovnice: Vesna Lasič





Dostop do e-publikacije:

http://library.ijs.si/Stacks/Proceedings/InformationSociety





Ljubljana, oktober 2021





Informacijska družba

ISSN 2630-371X



Kataložni zapis o publikaciji (CIP) pripravili v Narodni in univerzitetni knjižnici v Ljubljani

COBISS.SI-ID 85847043

ISBN 978-961-264-215-0 (PDF)





PREDGOVOR MULTIKONFERENCI

INFORMACIJSKA DRUŽBA 2021



Štiriindvajseta multikonferenca Informacijska družba je preživela probleme zaradi korone v 2020. Odziv se povečuje, v 2021 imamo enajst konferenc, a pravo upanje je za 2022, ko naj bi dovolj velika precepljenost končno omogočila normalno delovanje. Tudi v 2021 gre zahvala za skoraj normalno delovanje konference tistim predsednikom konferenc, ki so kljub prvi pandemiji modernega sveta pogumno obdržali visok strokovni nivo.



Stagnacija določenih aktivnosti v 2020 in 2021 pa skoraj v ničemer ni omejila neverjetne rasti IKTja, informacijske družbe, umetne inteligence in znanosti nasploh, ampak nasprotno – rast znanja, računalništva in umetne inteligence se nadaljuje z že kar običajno nesluteno hitrostjo. Po drugi strani se je pospešil razpad družbenih vrednot, zaupanje v znanost in razvoj. Se pa zavedanje večine ljudi, da je potrebno podpreti stroko, čedalje bolj krepi, kar je bistvena sprememba glede na 2020.



Letos smo v multikonferenco povezali enajst odličnih neodvisnih konferenc. Zajema okoli 170 večinoma spletnih predstavitev, povzetkov in referatov v okviru samostojnih konferenc in delavnic ter 400 obiskovalcev. Prireditev so spremljale okrogle mize in razprave ter posebni dogodki, kot je svečana podelitev nagrad – seveda večinoma preko spleta. Izbrani prispevki bodo izšli tudi v posebni številki revije Informatica (http://www.informatica.si/), ki se ponaša s 45-letno tradicijo odlične znanstvene revije.



Multikonferenco Informacijska družba 2021 sestavljajo naslednje samostojne konference:

• Slovenska konferenca o umetni inteligenci

• Odkrivanje znanja in podatkovna skladišča

• Kognitivna znanost

• Ljudje in okolje

• 50-letnica poučevanja računalništva v slovenskih srednjih šolah

• Delavnica projekta Batman

• Delavnica projekta Insieme Interreg

• Delavnica projekta Urbanite

• Študentska konferenca o računalniškem raziskovanju 2021

• Mednarodna konferenca o prenosu tehnologij

• Vzgoja in izobraževanje v informacijski družbi



Soorganizatorji in podporniki multikonference so različne raziskovalne institucije in združenja, med njimi ACM

Slovenija, SLAIS, DKZ in druga slovenska nacionalna akademija, Inženirska akademija Slovenije (IAS). V imenu organizatorjev konference se zahvaljujemo združenjem in institucijam, še posebej pa udeležencem za njihove dragocene prispevke in priložnost, da z nami delijo svoje izkušnje o informacijski družbi. Zahvaljujemo se tudi recenzentom za njihovo pomoč pri recenziranju.



S podelitvijo nagrad, še posebej z nagrado Michie-Turing, se avtonomna stroka s področja opredeli do najbolj izstopajočih dosežkov. Nagrado Michie-Turing za izjemen življenjski prispevek k razvoju in promociji informacijske družbe je prejel prof. dr. Jernej Kozak. Priznanje za dosežek leta pripada ekipi Odseka za inteligentne sisteme Instituta ''Jožef Stefan'' za osvojeno drugo mesto na tekmovanju XPrize Pandemic Response Challenge za iskanje najboljših ukrepov proti koroni. »Informacijsko limono« za najmanj primerno informacijsko potezo je prejela trditev, da je aplikacija za sledenje stikom problematična za zasebnost, »informacijsko jagodo« kot najboljšo potezo pa COVID-19 Sledilnik, tj. sistem za zbiranje podatkov o koroni. Čestitke nagrajencem!



Mojca Ciglarič, predsednik programskega odbora

Matjaž Gams, predsednik organizacijskega odbora



i

FOREWORD - INFORMATION SOCIETY 2021



The 24th Information Society Multiconference survived the COVID-19 problems. In 2021, there are eleven conferences with a growing trend and real hopes that 2022 will be better due to successful vaccination. The multiconference survived due to the conference chairs who bravely decided to continue with their conferences despite the first pandemic in the modern era.



The COVID-19 pandemic did not decrease the growth of ICT, information society, artificial intelligence and science overall, quite on the contrary – the progress of computers, knowledge and artificial intelligence continued with the fascinating growth rate. However, COVID-19 did increase the downfall of societal norms, trust in science and progress. On the other hand, the awareness of the majority, that science and development are the only perspectives for a prosperous future, substantially grows.



The Multiconference is running parallel sessions with 170 presentations of scientific papers at eleven conferences, many round tables, workshops and award ceremonies, and 400 attendees. Selected papers will be published in the Informatica journal with its 45-years tradition of excellent research publishing.



The Information Society 2021 Multiconference consists of the following conferences:

• Slovenian Conference on Artificial Intelligence

• Data Mining and Data Warehouses

• Cognitive Science

• People and Environment

• 50-years of High-school Computer Education in Slovenia

• Batman Project Workshop

• Insieme Interreg Project Workshop

• URBANITE Project Workshop

• Student Computer Science Research Conference 2021

• International Conference of Transfer of Technologies

• Education in Information Society



The multiconference is co-organized and supported by several major research institutions and societies, among them ACM Slovenia, i.e. the Slovenian chapter of the ACM, SLAIS, DKZ and the second national academy, the Slovenian Engineering Academy. In the name of the conference organizers, we thank all the societies and institutions, and particularly all the participants for their valuable contribution and their interest in this event, and the reviewers for their thorough reviews.



The award for lifelong outstanding contributions is presented in memory of Donald Michie and Alan Turing. The Michie-Turing award was given to Prof. Dr. Jernej Kozak for his lifelong outstanding contribution to the development and promotion of the information society in our country. In addition, the yearly recognition for current achievements was awarded to the team from the Department of Intelligent systems, Jožef Stefan Institute for the second place at the XPrize Pandemic Response Challenge for proposing best counter-measures against COVID-19. The information lemon goes to the claim that the mobile application for tracking COVID-19 contacts will harm information privacy.

The information strawberry as the best information service last year went to COVID-19 Sledilnik, a program to regularly report all data related to COVID-19 in Slovenia. Congratulations!



Mojca Ciglarič, Programme Committee Chair

Matjaž Gams, Organizing Committee Chair





ii

KONFERENČNI ODBORI

CONFERENCE COMMITTEES



International Programme Committee

Organizing Committee

Vladimir Bajic, South Africa

Matjaž Gams, chair

Heiner Benking, Germany

Mitja Luštrek

Se Woo Cheon, South Korea

Lana Zemljak

Howie Firth, UK

Vesna Koricki

Olga Fomichova, Russia

Mitja Lasič

Vladimir Fomichov, Russia

Blaž Mahnič

Vesna Hljuz Dobric, Croatia

Klara Vulikić

Alfred Inselberg, Israel



Jay Liebowitz, USA

Huan Liu, Singapore

Henz Martin, Germany

Marcin Paprzycki, USA

Claude Sammut, Australia

Jiri Wiedermann, Czech Republic

Xindong Wu, USA

Yiming Ye, USA

Ning Zhong, USA

Wray Buntine, Australia

Bezalel Gavish, USA

Gal A. Kaminka, Israel

Mike Bain, Australia

Michela Milano, Italy

Derong Liu, Chicago, USA

Toby Walsh, Australia

Sergio Campos-Cordobes, Spain

Shabnam Farahmand, Finland

Sergio Crovella, Italy





Programme Committee

Mojca Ciglarič, chair

Bogdan Filipič

Dunja Mladenič

Niko Zimic

Bojan Orel,

Andrej Gams

Franc Novak

Rok Piltaver

Franc Solina,

Matjaž Gams

Vladislav Rajkovič

Toma Strle

Viljan Mahnič,

Mitja Luštrek

Grega Repovš

Tine Kolenik

Cene Bavec,

Marko Grobelnik

Ivan Rozman

Franci Pivec

Tomaž Kalin,

Nikola Guid

Niko Schlamberger

Uroš Rajkovič

Jozsef Györkös,

Marjan Heričko

Stanko Strmčnik

Borut Batagelj

Tadej Bajd

Borka Jerman Blažič Džonova

Jurij Šilc

Tomaž Ogrin

Jaroslav Berce

Gorazd Kandus

Jurij Tasič

Aleš Ude

Mojca Bernik

Urban Kordeš

Denis Trček

Bojan Blažica

Marko Bohanec

Marjan Krisper

Andrej Ule

Matjaž Kljun

Ivan Bratko

Andrej Kuščer

Boštjan Vilfan

Robert Blatnik

Andrej Brodnik

Jadran Lenarčič

Baldomir Zajc

Erik Dovgan

Dušan Caf

Borut Likar

Blaž Zupan

Špela Stres

Saša Divjak

Janez Malačič

Boris Žemva

Anton Gradišek

Tomaž Erjavec

Olga Markič

Leon Žlajpah



iii





iv



KAZALO / TABLE OF CONTENTS



Slovenska konferenca o umetni inteligenci / Slovenian Conference on Artificial Intelligence .......................... 1

PREDGOVOR / FOREWORD ................................................................................................................................. 3

PROGRAMSKI ODBORI / PROGRAMME COMMITTEES ..................................................................................... 5

Estimating Client's Job-search Process Duration / Andonovic Viktor, Boškoski Pavle, Boshkoska Biljana Mileva

............................................................................................................................................................................ 7

Some Experimental Results in Evolutionary Multitasking / Andova Andrejaana, Filipič Bogdan ......................... 11

Intent Recognition and Drinking Detection For Assisting kitchen-based Activities / De Masi Carlo M., Stankoski

Simon, Cergolj Vincent, Luštrek Mitja .............................................................................................................. 15

Anomaly Detection in Magnetic Resonance-based Electrical Properties Tomography of in silico Brains / Golob

Ožbej, Arduino Alessandro, Bottauscio Oriano, Zilberti Luca, Sadikov Aleksander ........................................ 19

Library for Feature Calculation in the Context-Recognition Domain / Janko Vito, Boštic Matjaž, Lukan Junoš,

Slapničar Gašper .............................................................................................................................................. 23

Določanje slikovnega prostora na umetniških slikah / Komarova Nadezhda, Anželj Gregor, Batagelj Borut,

Bovcon Narvika, Solina Franc .......................................................................................................................... 27

Automated Hate Speech Target Identification / Pelicon Andraž, Škrlj Blaž, Kralj Novak Petra ........................... 31

SiDeGame: An Online Benchmark Environment for Multi-Agent Reinforcement Learning / Puc Jernej, Sadikov

Aleksander ........................................................................................................................................................ 35

Question Ranking for Food Frequency Questionnaires / Reščič Nina, Luštrek Mitja .......................................... 39

Daily Covid-19 Deaths Prediction in Slovenija / Susič David ............................................................................... 43

Iris recognition based on SIFT and SURF feature detection / Trpin Alenka, Ženko Bernard .............................. 47

Analyzing the Diversity of Constrained Multiobjective Optimization Test Suites / Vodopija Aljoša, Tušar Tea,

Filipič Bogdan ................................................................................................................................................... 51

Corpus KAS 2.0: Cleaner and with New Datasets / Žagar Aleš, Kavaš Matic, Robnik-Šikonja Marko ............... 55

Indeks avtorjev / Author index ................................................................................................................................ 59





v





vi



Zbornik 24. mednarodne multikonference

INFORMACIJSKA DRUŽBA – IS 2021

Zvezek A





Proceedings of the 24th International Multiconference

INFORMATION SOCIETY – IS 2021

Volume A





Slovenska konferenca o umetni inteligenci

Slovenian Conference on Artificial Intelligence





Uredniki / Editors



Mitja Luštrek, Matjaž Gams, Rok Piltaver





http://is.ijs.si





8. oktober 2021 / 8 October 2021

Ljubljana, Slovenia

1





2





PREDGOVOR



Po zaslugi pandemije COVID-19 še vedno živimo v bolj zanimivih časih, kot bi si želeli, vendar umetne inteligence to ne moti in napreduje s podobnim tempom kot pretekla leta.

Računalniški vid in obdelava naravnega jezika sta še vedno vroči področji, pred nedavnim pa nam je OpenAI postregel s parom navdušujočih kombinacij obojega. Prva je DALL-E, globoka nevronska mreža, izpeljana iz OpenAIjeve slavne mreže za generiranje besedila GPT-3, ki je sposobna »razumeti« opis slike in nato takšno sliko generirati. Pri tem je kos slikam, na kakršne prej ni naletela – generirati zna denimo prav čedno sliko redkve daikon v baletnem krilcu, ki sprehaja psa. Druga, CLIP, deluje obratno in generira besedilne opise slik.

Še en viden dosežek zadnjega časa prihaja s področje biologije in medicine, ki sta zelo plodni področji za uporabo umetne inteligence. Algoritem AlphaFold 2, ki – podobno kot večina pomembnih dosežkov umetne inteligence zadnjih let – temelji na globokih nevronskih mrežah, je dosegel dramatičen napredek pri določanju strukture beljakovin, kar je težaven problem, pomemben za razvoj zdravil.



Posebej odmeven nedaven dosežek umetne inteligence iz domačih logov je metoda za priporočanje optimalnih ukrepov zoper COVID-19, ki jo je razvila ekipa Odseka za inteligentne sisteme na Institutu Jožef Stefan. Pri tej sodbi avtorji predgovora sicer nismo povsem nepristranski, saj sva k dosežku dva prispevala, a drugo mesto ne tekmovanju XPrize Pandemic Response Challenge s polmilijonskim nagradnim skladom našo trditev potrjuje. Za uspeh tokrat ni bila potrebna globoka nevronska mreža – metoda kombinira epidemiološki model SEIR, klasično strojno učenje in večkriterijsko optimizacijo z evolucijskim algoritmom. Na Slovenski konferenci o umetni inteligenci je predstavljen le delček tega dela, več o njem pa je moč izvedeti na Delavnici projekta Insieme Interreg, ki prav tako poteka v okviru Informacijske družbe.



Posebej veliko število drugih delavnic in konferenc na Informacijski družbi letos je sicer dobro za multikonferenco kot celoto, našo konferenco pa je bržkone prikrajšalo za kak prispevek. K tej težavi moramo dodati še naveličanost raziskovalne srenje nezmožnosti žive udeležbe na konferencah, tako da smo se morali na koncu zadovoljiti s 13 prispevki. Večino je kot po navadi prispeval Institut Jožef Stefan, dobro je zastopana tudi Fakulteta za računalništvo in informatiko Univerze v Ljubljani, druge ustanove pa žal ne. Kljub temu smo poskrbeli, da so prispevki kakovostni, in smo jih zavrnili več kot pretekla leta. Bomo pa prihodnje leta napeli moči, da privabimo več prispevkov iz širšega nabora ustanov.





3

FOREWORD



Thanks to the COVID-19 pandemic we still live in more interesting time than we would like, but artificial intelligence is not much bothered by this and is progressing as rapidly as in the recent years. Computer vision and natural language processing are still hot topics, and OpenAI recently provided a pair of exciting combinations of the two. The first is DALL-E, a deep neural network derived from OpenAI's famous language generation network GPT-3. It can »understand« a description of an image and then generate such an image. It can handle images never encountered before – for instance, it can generate a nice image of a daikon radish in a tutu walking a dog. The second is CLIP, which works in reverse and generates descriptions of images. Another prominent recent achievement comes from biology and medicine, which is fruitful ground for applications of artificial intelligence. The AlphaGo 2

algorithm, which – like most main achievements of artificial intelligence in the recent years –

is based on deep neural networks, achieved a breakthrough in protein folding. This is a hard problem important for drug discovery.



A prominent recent Slovenian achievement of artificial intelligence is a method for recommending optimal interventions against COVID-19, which was developed by a team from the Department of Intelligence Systems at Jožef Stefan Institute. The authors of this foreword are not entirely unbiased when we say this, because two of us contributed to the achievement, but second placed at the XPrize Pandemic Response Challenge with a prize purse of half a million lends credence to our claim. This success did not require a deep neural network – the method combines a SEIR epidemiological model, classical machine learning and multi-objective optimisation with an evolutionary algorithm. The Slovenian Conference of Artificial Intelligence presents only a small part of this work, while more can be learned in the Insieme Interreg project workshop.



A particularly large number of other workshops and conference at Information Society this year are good for the multi-conference as a whole, but probably deprived our conference for a few papers. Another problem is that the research community is getting tired of the inability to attend conferences live, which is why we ended up with only 13 papers. Most of them, as usual, come from Jožef Stefan Institute. The Faculty of Computer and Information Science of the University of Ljubljana is also well represented, while other institutions less so. Despite this we made sure that the papers are high-quality, and we turned away more than usual. But our goal for the following years is of course to secure more papers from a wider range or institutions.





4





PROGRAMSKI ODBOR / PROGRAMME COMMITTEE



Mitja Luštrek

Matjaž Gams

Rok Piltaver

Cene Bavec

Jaro Berce

Marko Bohanec

Marko Bonač

Ivan Bratko

Bojan Cestnik

Aleš Dobnikar

Bogdan Filipič

Borka Jerman Blažič

Marjan Krisper

Marjan Mernik

Biljana Mileva Boshkoska

Vladislav Rajkovič

Niko Schlamberger

Tomaž Seljak

Miha Smolnikar

Peter Stanovnik

Damjan Strnad

Vasja Vehovar

Martin Žnidaršič



5





6

Estimating Client’s Job-search Process Duration Viktor Andonovic1

Pavle Boškoski2

Biljana Mileva Boshkoska2,3

Knowledge Technologies

Knowledge Technologies

Knowledge Technologies

1Jožef Stefan International

2Institute Jožef Stefan

2Institute Jožef Stefan

Postgraduate School

Ljubljana, Slovenia

3 Faculty of information studies

Ljubljana, Slovenia

pavle.boskoski@ijs.si

2Ljubljana, Slovenia

viktor.andonovikj@ijs.si

3Novo Mesto, Slovenia





biljana.mileva@ijs.si



ABSTRACT

are built on top of statistical surveys [2]. These data sets comprise a series of snapshots of an individual labour force status observed Modelling the labour market, analysing ways to reduce

at discrete time points. Such discrete sampling might be with low

unemployment, and creating decision support tools are

frequency in order to truly capture the changing dynamics.

becoming more popular topics with the rise in digital data and

Several methods for approaching similar labour market

computational power. The paper aims to analyse a Machine

modelling problems have been implemented in other countries.

Learning (ML) approach for estimating the time duration until

Finland’s Statistical profiling tool, introduced in 2007, consists a job-seeker finds a job, i.e. leaves the Public Employment of a simple logit model [3]. It predicts the probability of long-Service (PES), after the initial entering. The dataset that we use term unemployment and categorises job seekers into two groups,

from PES is complex, and there is almost no correlation between

risk or high-risk of long-term unemployment. In 2012 Ireland most of the features in it, which makes it challenging for implemented a PEX (probability of exit) model using data

modelling. We used statistical analysis and visualisations to collected on job-seekers who entered the PES as unemployed understand the problem better and form a basis for further during 13 weeks [4]. The PEX tool is a probit model for modelling. As a result, we developed several ML models,

measuring the job-seeker's probability of exiting unemployment

including basic multivariate linear regression used for

in one year.

performance comparison with other more specifically designed

As a result of our work, we have developed an ML model

models.

that can be used in a PES as a part of their decision toolbox, which can serve as a filtering method that prioritises job-seekers 1 INTRODUCTION

and recognises ones who do not necessarily need PES resources

and services, as they will get employed soon regardless of the

The research field of creating tools for supporting the decision-

interventions by the organisation.

making process for employment services has attracted significant

interest lately. One can track such efforts for more than 20 years

[1]. Different variants of tools and systems have been developed 2 DATA

and implemented with varying success in different countries.

The data used for the paper is provided by a public organisation

PES is willing to move away from the traditional role of servicing engaged in the HECAT project [4], which aims at investigation,

the job-seekers and take a more systematic approach by

demonstration and pilotting a profiling tool to support labour implementing data-driven solutions in their toolbox. Here, the market decision making by unemployed citizens and case

goal is to create a model that uses available data that describes

workers in PES.

the job-seekers that have entered the PES and outputs the approximate time (in days) needed for the individual to leave the

2.1 Data description

PES as an employed person.

These factors can be assessed either by introducing experts’

The dataset consists of 74086 instances, each representing a knowledge or by extracting the corresponding dynamics directly

client enrolled in the PES, described with 16 sociological, from the available data. What was (or is) available determines

demographic and time-related characteristics, known as features

how the models are built and their effectiveness.

or attributes. The data were obtained during one year. The dataset The biggest issue when dealing with any modelling, for that

is complex in a way that its attributes come in a different form

matter, is the quality of data. Typically models of the labour flow (categorical, numerical, date and time), and most of them need to

undergo some transformation for the aim of input suitability for



different ML models. The general structure of the client's attributes is described by dividing the attributes into several Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed prominent groups: socioeconomic variables (gender, age,

for profit or commercial advantage and that copies bear this notice and the full nationality), information on job readiness (education, health citation on the first page. Copyrights for third-party components of this work must limitations, care responsibilities), and opportunities (regional be honored. For all other uses, contact the owner/author(s).

Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia labour market development), and all available labour market

© 2021 Copyright held by the owner/author(s).

history information, such as prior work experience. Most of the

7





Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

V. Andonovic et al.



categorical features are given with numbers, where each number

represents a unique category, described in a separate CSV file.

The target variable is in numeric form, and it is a counter of days that a person stays in the process before exiting the PES. Some

of the features in the dataset contain weird values (such as the

negative number for clients age), which are a mistake or a result

of noise in the data. This indicates the necessity of performing

data cleaning and preprocessing before using the dataset to input

various ML models. Figure 1 gives an overview of the attributes

of the dataset.



Figure 3: Grid of distributions of the dataset features

2.3 Data preprocessing

It is estimated that in most data mining and knowledge discovery

pipelines, 75 to 85% of the time is dedicated to preprocessing the



data [5]. Cleaning and transforming samples are the cornerstone

Figure 1: General information on the dataset features

of a reliable and robust pattern recognition system. The first step of the data preprocessing part was data cleaning. The dataset 2.2 Data understanding

included values for some of the attributes, which were an obvious

result of a noise or a mistake. For example, some of the instances The target variable, 'duration', is a numerical count variable. In had negative values for the target variable, which is impossible

order to gain a better understanding of the target variable, the because of the nature of that attribute, which is a count-based probability distribution was plotted on a graph. Figure 2 shows

variable.

the probability distribution of 'duration’.

Most of the classical ML algorithms require the input data to

be in numerical form. We used one-hot-encoding for the

categorical features with at most 20 different categories. High-

cardinality features were encoded using the Binary Encoding technique. Frequently used techniques like label-encoding do not

work in high-cardinality because of the inclusion of artificial numerical relative distance between the instances or overfitting

in the case of one-hot-encoding [6].

The ‘Entry Date’ feature was used to extract the day and month of entry separately. As those are cyclical features, we performed a transformation in order to better represent the cyclical phenomenon, for instance to avoid the artificial large difference between month 1 and month 12. The best way to Figure 2: Probability distribution of the target variable

handle this is to calculate the sin() and cos() component so that this cyclical feature is represented as (𝑥, 𝑦) coordinates of a The information for the probability distribution of the target

circle.

variable directly influences the predictive model selection. By The normalisation of the attributes' values was applied to scale

looking at Figure 2, it can be assumed that that the target variable the attributes in a way that their mean value is zero, and their

is following the Poisson distribution. We also plotted the variance is retained with the use of their own standard deviation.

distributions of the features. Figure 3 illustrates a grid of It allows equality of opportunity for each attribute. By this, no

distributions of each feature of the dataset.

attribute gives more value to itself regarding the range of values it has. Several normalisation techniques are commonly used, but

the most popular one is the standard scaler, defined as:



𝑧 = !"# (2.1)

$



where 𝑥 is the actual value, 𝜇 is the mean, and 𝑠 is the standard deviation.





8

Estimating client’s job-search process duration

Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia



All the calculations and transformations were performed in

where 𝑃(𝑘) is the probability of seeing 𝑘 events during time unit Python programming language, by making use of modules like

given event rate 𝜆 . Let X, y be our dataset for the Poisson pandas, NumPy and sci-kit learn.

regression task. The log-likelihood function that needs to be maximised is:



3 METHODOLOGY

∑0 𝑙𝑜𝑔

%1)

9𝑃

0

<*!"$%&'2,(3&)4(&= (3.4)

Since the target variable is numerical, the task should be treated

!(𝑦): = ∑

𝑙𝑜𝑔

%1)

5&!

as a regression problem. Regression analysis describes methods



whose goal is to estimate the relationship between a dependent

After the expression is simplified, the final equation for the (target) variable and one or more independent variables. In Poisson loss has the following form:

formal terms, the goal is to specify the following general model





𝐿

0

67%$$78 = ∑

?𝜆(𝑋

%1)

%) − 𝑦%𝑙𝑜𝑔 9𝜆(𝑋%):A (3.5)

𝑌% = 𝑓(𝑋%, 𝛽) + 𝑒%

(3.1)





CatBoost Regressor is optimised with regard to this objective

where 𝑖 denotes the 𝑖&' observed input-output data set, the vector function.

𝑋 represents the input (independent) variables, 𝛽 is the set of model parameters, 𝑓(∙) is the function, and 𝑒% is the modelling error. The goal is to find the proper function 𝑓 and its parameters 4 EVALIUATION

𝛽 so the error term is as close to zero as possible.

The model performance on the test set is evaluated with Root In its simplest form, the function 𝑓(∙) can represent a linear Mean Squared Error (RMSE) as a metric. RMSE is frequently

model. For example, the univariate linear model of (3.1) would

used in regression problems, and it is a measure of the difference be:

between the values predicted by a model or an estimator and the

𝑌% = 𝛽( + 𝛽)𝑋% + 𝑒%

(3.2)

actual values of the instances. RMSE is given with the following



expression:

Generally, the function 𝑓 can describe much more complex

*

dynamics. The multivariate linear regression model is used as a

𝑅𝑀𝑆𝐸 = F∑ (5

&+,

&"6:&))

(3.6)

0

base model and will be used to help with the assessment of the



performance of other more specific and complex models simply

where 𝑦% is the original value of the instance, and 𝑝𝑣% is the by comparing them to the base model. The aim is to develop such

predicted value by the model. The hyper-parameters of the

models that will significantly outperform the base model. In models were tuned using RandomizedSearchCV. This method

order to construct a model that generalises well to the data, a optimises the hyper-parameters by cross-validated search over decision tree is used as a base learning algorithm for the given parameter settings. A fixed number of parameter settings

ensembles.

was sampled from the specified distributions.

3.1 Ensemble learning

65,28

70

60

The idea of ensemble learning is based on the theoretical 51,66

50

44,13

foundations that the generalization ability of an ensemble is usually much stronger than the one of a single learner. Ensemble

40

learning is mainly implemented as two subprocedures: training

30

weak component learners and selectively combine the member

20

learners into a stronger learner [7]. Two ensemble models based

10

on different techniques were developed, Random Forest

0

ean Squared Error (days)

Regressor [8] and boosting algorithm - CatBoost Regressor.

Linear

Random Forest

CatBoost

Bagging is used to reduce the variance of a decision tree Regression

(Poisson

oth M

classifier. The objective is to create several subsets of data from objective)

Ro

the training sample chosen randomly with replacement. Each

Model

collection of subset data is used to train their corresponding



decision trees. The result is the average of all the predictions

from different trees, which is more robust than a single decision

Figure 4: Comparison of the model performance

tree classifier.

Based on the shape of the probability distribution given in

Figure 4 shows the diagram for comparison of the models'

Figure 2, we assume that the target variable comes from Poisson

performances. The results show that both Random Forest and distribution. Therefore, we design our model to maximise the CatBoost significantly outperform the base linear regression log-likelihood for Poisson distribution [9]. The probability mass

model. Also, optimising the mean Poisson deviance as a loss function of the Poisson distribution is given with the following

function results in significant improvement in the performance

expression:

of the boosting model. The final score that the CatBoost



Regressor optimised with regards to mean Poisson deviance

𝑃(𝑘) = *!"(,)#



(3.3)

evaluated on RMSE is 44.13 days.

.!





9

Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia V. Andonovic et al.



5 CONCLUSION

Achieving desirable results using machine learning models

requires a significant amount of quality data and a deep

understanding of the problem. Feature engineering is one of the

key concepts here, which, if it is appropriately done, enables the generation of new features that give helpful, previously unknown

insights about the data. The paper proposes an approach that emphasises the engineering of optimisation function concerning

the probability distribution of the target variable, which results in developing a specific model for approaching the problem.

Including the Poisson objective function in the boosting model

resulted in significant improvement in its performance. There is

still space for improvement in the results. Using modern end-to-

end deep learning architectures have the potential to provide better results than the proposed models, which leaves space for

future work on this topic. Having a tool that can roughly estimate the time a new client stays in the job-search process by having

the standard data formation about himself is beneficial for the PES. The creation of decision-making tools for organisations dealing with employment services supports the process of

reducing unemployment in the countries, which is a massive benefit for the global economy.

ACKNOWLEDGMENTS

First author acknowledges Ad Futura, Public Scholarship,

Development, Disability and Maintenance Fund of the Republic

of Slovenia. The second author acknowledges funding from the

Slovenian Research Agency via program Complex Networks P1-

0383. The last two authors acknowledge the funding received from the European Union’s Horizon 2020 research and

innovation programme project HECAT under grant agreement

No. 870702.



REFERENCES

[1] P. Boshkoski and B. Mileva - Boshkoska, "Report on commonly used algorithms and their performance," Horizon 2020, Deliverable number: D3.1., 2020.

[2] J. Grundy, "Statistical profiling of the unemployed," Studies in Political Economy, 2015.

[3] T. Riipinen, "Risk profiling of long-term unemployment in finland," Dialogue Con- ference Brussels., 2011.

[4] P. J. O'Connel, E. Kelly and J. Walsh, "National profiling of the unemployed in Ireland," ESRI Research Series, vol. 10, 2009.

[5] "HECAT - Disruptive Technologies Supporting Labour Market Decision Making," 2020. [Online]. Available: http://hecat.eu.

[6] F. Johannes, D. Gamberger and N. Lavrac, "Machine Learning and Data Mining," Cognitive Technologies, 2012.

[7] M. Brammer, Principles of Data Mining, 2007.

[8] F. Huang, G. Xie and R. Xiao, "Research on Ensemble Learning,"

International Conference on Artificial Intelligence and

Computational Intelligence, 2009.

[9] A. Saha, S. Basu and A. Datta, "Random Forest for Dependent Data," arXiv, 2020.

[10] A. Zakariya Y, "Diagnostic in Poisson Regression Models,"

Electronic Journal of Applied Statistical Analysis, 2012.





10





Some Experimental Results in Evolutionary Multitasking Andrejaana Andova

Bogdan Filipič

Jožef Stefan Institute and

Jožef Stefan Institute and

Jožef Stefan International Postgraduate School

Jožef Stefan International Postgraduate School

Jamova cesta 39

Jamova cesta 39

Ljubljana, Slovenia

Ljubljana, Slovenia

andrejaana.andova@ijs.si

bogdan.filipic@ijs.si

ABSTRACT

that change with generations and to which techniques resem-

Transfer learning and multitask learning have shown that, in

bling natural selection and genetic variation are applied. These

machine learning, common information in two problems can

techniques ensure that the fittest individuals (solutions) from

be used to build more effective models. Inspired by this finding,

the population are passed to the next generation. The algorithm

attempts in evolutionary computation have also been made to

begins by initializing a population of solutions. Then, a selection solve multiple optimization problems simultaneously. This new

operator is used to select the fittest individuals as parents. After approach is called evolutionary multitasking (EMT).

that, a reproduction operator is utilized to create offspring from In this work, we show how EMT extends ordinary evolution-the parents. The next step is to select a subset of individuals from ary algorithms and present the results that we obtained in solving the combined set of parents and children and replace the old pop-multiple optimization problems simultaneously. We also compare

ulation with the selected subset. The new population is then used

them with the results of algorithms that solve one optimization

for the next generation. The cycle of selection, reproduction, and problem at a time. Finally, we provide visualizations and expla-replacement is repeated until a stopping criterion is satisfied. The nations of why and when EMT is beneficial.

stopping criterion can be defined in various ways, for example,

by the maximum number of generations.

KEYWORDS

Until recently, most EAs focused on solving only one optimiza-

evolutionary algorithms, numerical optimization, multifactorial

tion problem at a time. To exploit the parallelism of population-

optimization, evolutionary multitasking

based search, Gupta et al. introduced a new category of optimiza-

tion approach called multifactorial optimization or evolutionary

multitasking (EMT) [8]. The goal of EMT is to develop EAs that 1

INTRODUCTION

are able to simultaneously solve multiple optimization problems

In optimization the task is to find one or more solutions that

without sacrificing the quality of the obtained solutions and the

best solve a given problem. To determine which of the possible

algorithm efficiency.

solutions gives the best result, we use the objective function.

A practical motivation for the development of EMT algorithms

This can be the cost of fabrication, the efficiency of a process, the is the rapidly growing cloud computing. In cloud computing, mul-quality of a product, etc. The mathematical formulation of such

tiple users can simultaneously send optimization problems to the

problems is given as follows:

server. These problems may either have similar characteristics

or they may belong to completely different domains. Previously,

Minimize/Maximize

𝑓 (𝑥 )

the servers solved these problems sequentially, but with the in-

subject to

𝑔

(𝑥 ) ≥ 0,

𝑗 = 1, 2, .., 𝐽 ;

troduction of EMT, they can solve the problems in parallel.

𝑗

ℎ

(𝑥 ) = 0,

𝑘 = 1, 2, .., 𝐾 ;

After the introduction of EMT by Gupta et al., many other

𝑘

(𝐿)

(𝑈 )

works followed that also introduced methodologies specialized

𝑥

≤ 𝑥 ≤ 𝑥

,

𝑖 = 1, 2, .., 𝑛.

𝑖

𝑖

𝑖

(1)

in solving multiple optimization problems simultaneously [1, 4,

Here, a solution 𝑥 = [𝑥1, 𝑥2, .., 𝑥 ]T is a vector of 𝑛 decision

5, 6, 9, 10].

𝑛

variables. The objective 𝑓 (𝑥) can be either maximized or mini-

In this paper, we present our experimental results in solving

mized, but since many optimization algorithms are designed to

multiple optimization problems simultaneously and discuss the

solve minimization problems, we usually convert maximization

results from the point of view of EMT performance. We do this

objectives to minimization ones by multiplying the objective

by applying the EMT methodology as proposed by Gupta et al.

functions by −1. ℎ (𝑥) are equality constraints, 𝑔 (𝑥) inequality to test optimization problems and analyzing the results.

𝑘

𝑗

constraints, and (𝐿)

(𝑈 )

The paper is further organized as follows. In Section 2, we in-

𝑥

and 𝑥

are boundary constraints [3]. In

𝑖

𝑖

this paper, we consider problems that include only boundary

troduce the basic concepts of EMT. In Section 3, we first present constraints.

our results in EMT with visualizations that explain why and

When the optimization problem can not be solved using math-

when EMT performs well, and then report the results in evolu-

ematical methods, the usual alternative is to use randomized

tionary many-task optimization. Finally, in Section 4, we give a optimization algorithms such as evolutionary algorithms (EAs).

conclusion and present the ideas for future work.

These algorithms are characterized by a population of solutions

Permission to make digital or hard copies of part or all of this work for personal 2

EVOLUTIONARY MULTITASKING

or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and Evolutionary multitasking is characterized by the simultaneous

the full citation on the first page. Copyrights for third-party components of this existence of multiple decision spaces corresponding to different

work must be honored. For all other uses, contact the owner/author(s).

problems, which may or may not be independent, each with a

Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

© 2021 Copyright held by the owner/author(s).

unique decision space landscape. In order for EMT to have cross-

domain optimization properties, Gupta et al. proposed to use a

11





Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

Andrejaana Andova and Bogdan Filipič

uniform genetic code in which each decision variable is encoded

performance. In particular, it is important to develop a measure

with a random number from [0, 1]. Decoding such a represen-

of the inter-task complementarity used during the process of

tation in continuous problems is done by using the following

multitasking. To this end, a synergy metric that captures and

equation for each decision variable:

quantifies how similar two problems are has been proposed [7].

(

The main idea behind the synergy metric is to use the dot product

𝐿)

(𝑈 )

(𝐿)

𝑢

= 𝑢

+ (𝑢

− 𝑢

) · 𝑣 ,

(2)

𝑖

𝑖

𝑖

𝑖

𝑖

between the gradient of a given solution in one problem, and the

where 𝑢 is the decision variable in the original space, and 𝑣 is vector pointing to the global optimum of another problem. If the

𝑖

𝑖

the decision variable in the encoded space. The dimensionality of

dot product of a given solution is larger than 0, the solution of the the solution vector is equal to max {𝐷 }, where 𝐷 represents

first problem is pushing the candidate solution in the direction of

𝑗

𝑗

𝑗

the dimensionality of a single optimization problem. This type

the global optimum of the second problem. If the dot product is

of encoding allows problems to share decision variables at the

smaller than 0, the solution is pushed in the opposite direction.

beginning of the genetic code, which contributes to the transfer

of useful genetic material from one problem to another.

3

EXPERIMENTS AND RESULTS

Since EMT attempts to solve multiple problems simultaneously

EMT is a novel concept in evolutionary optimization, and thus,

using a single population, it is necessary to formulate a new

a limited number of experiments were carried out so far. We

technique for comparing population members. To this end, a set

present some experiments performed and results obtained using

of additional properties is defined for each individual 𝑥 in the

𝑖

EMT in both multi- and many-task optimization.

population as follows.

• Skill factor: The skill factor 𝜏 of 𝑥 is the one problem,

3.1

Multitask Optimization

𝑖

𝑖

among all problems in EMT, for which the individual is

In the multitask optimization experiments, we took two fre-

specialized. This skill factor can be assigned in a complex

quently used optimization problems, i.e., 50-dimensional (50D)

way by selecting the best individuals for each task or by

Sphere and Ackley. We solved them using EMT and a genetic

randomly assigning each individual one task for which it

algorithm (GA). To be able to compare the results, we used the

is specialized. In our case, we will use the later, simpler

same population size and the same number of function evalua-

method for assigning the skill factor.

tions per problem. The 𝑟𝑚𝑝 parameter in EMT was set to 0.3, and

• Scalar fitness: The scalar fitness is the fitness of an indi-

for GA we used the default parameter values as defined in pymoo

vidual for the problem it is specialized.

[2]. We monitored the difference between EMT and GA over To compare two solutions, we use the scalar fitness and the skill

time. If the difference is positive, EMT performs better than GA,

factor. The scalar fitness shows how good a solution is for a

while if it is negative, GA performs better than EMT. Because the

given problem, and the skill factor shows for which problem the

fitness values vary between different problems, we normalized

solution performs best. A solution 𝑥 is better than 𝑥 if and

the difference between EMT and GA in each problem by dividing

𝑎

𝑏

only if both have the same skill factor and 𝑥 has a higher scalar the values with the highest absolute difference.

𝑎

fitness than 𝑥 . If the solutions have different skill factors, they In the first experiment, the optima of the two problems were

𝑏

are incomparable.

placed at the opposite ends of the search space. Because of this,

the problems have very little common information, and the syn-

2.1

Assortative Mating

ergy function mostly takes negative values. This is visualized for To produce offspring, the authors of EMT [8] used assortative a 2D Sphere function in Figure 1 and for a 2D Ackley function mating as a reproduction mechanism. In assortative mating, two

in Figure 2. The normalized difference between EMT and GA randomly selected parents can undergo crossover if they have

in optimizing 50D Sphere and Ackley functions is presented in

the same skill factor. If, on the other hand, their skill factors differ, Figure 3. From the results, we can see that GA performs better crossover occurs only with a given random mating probability

on these problems.

𝑟 𝑚𝑝 , otherwise, mutation takes place. A value of 𝑟𝑚𝑝 close to 0

means that only culturally identical individuals are allowed to

perform crossover, while a value close to 1 allows completely

random mating.

2.2

Selective Imitation

Evaluating each individual for each problem is computationally

expensive. For this reason, each child is evaluated only on one

problem, which is the skill factor that one of its parents has. In this way, the total number of function evaluations is reduced,

while the solution is still evaluated on the problem on which

it most likely performs well. The procedure is called selective

imitation.

Figure 1: Synergy metric on the Sphere function solved to-

gether with the Ackley function when the optima are far

2.3

Landscape Analysis

away.

In multitask machine learning, it is well known that useful infor-

mation cannot always be found for two problems. Therefore, to

In Figure 4, we present the results from the second experiment enable further success in the field of evolutionary multitasking,

where the optima of 50D Sphere and Ackley functions were

it is important to develop a meaningful theoretical explanation

placed closer together. Here, we can see that the optimization

of when and why implicit genetic transfer can lead to improved

of the Sphere function does not show significant improvement

12





Some Experimental Results in Evolutionary Multitasking

Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

Figure 2: Synergy metric on the Ackley function solved

together with the Sphere function when the optima are

Figure 4: Normalized difference between multitask and

far away.

single-task optimization on 50D Sphere and Ackley func-

tions when the optima are close.

Figure 3: Normalized difference between multitask and

Figure 5: Synergy metric on the Sphere function solved

single-task optimization on 50D Sphere and Ackley func-

together with the Ackley function when the optima are

tions when the optima are far away.

close.

when being performed together with the optimization of the

Ackley function, but on the Ackley function EMT converges

to the optimal solution much faster. An explanation for this is

illustrated in 2D in Figures 5 and 6. Here we can see that the synergy in the Sphere space is mostly equal to 0, except for

some small parts where it rises to +10 and falls to −10. Because

both the positive and the negative parts of the synergy values

of the Sphere problem are small, we can notice no difference in

convergence on the Sphere problem.

In contrast, more than half of the space of the Ackley function

has a positive synergy metric, indicating that this part of the

space appoints the solutions in the right direction toward the

global optimum. On the other hand, most of the decision space

Figure 6: Synergy metric on the Ackley function solved

of the Ackley function has constant fitness values, which compli-

together with the Sphere function when the optima are

cates the GA search for the global optimum. For this reason, the

close.

information transferred from the Sphere problem to the Ackley

problem is useful, and thus we can see faster convergence when

solving the two problems together using EMT.

number of problems we are trying to solve does not cause diffi-

culties to EMT. If the problems are similar, we can solve many

3.2

Many-Task Optimization

problems simultaneously without losing efficiency.

When solving more than three tasks simultaneously, we are deal-

Figure 8 shows the results obtained when solving six well-ing with a many-task optimization. In Figure 7, we present the known optimization problems at the same time: Ackley, Sphere,

results obtained by randomly shifting (within a small, 10% range

Rastrigin, Rosenbrock, Schwefel, and Griewank, all 50D. From the

of the total space) the global optimum of both the Ackley and

results, we can notice that although the optimization procedure

the Sphere function 25 times, resulting in 50 different 50D opti-

converges faster for most of the functions, for the Sphere and

mization problems. During the optimization process, we used the

the Schwefel function the convergence speed of the optimization

same algorithm parameter values for EMT and GA as reported

process drops. The same pattern can be noticed in Figure 9 where in Section 3.1. In the results, we can notice similar patterns as

the optimum of each function is shifted 8 times, resulting in

when solving just two problems. This proves that increasing the

6 ∗ 8 = 48 problems altogether.

13





Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

Andrejaana Andova and Bogdan Filipič

However, if the problems are too different, the performance of

the optimization drops. To explain why EMT works well on some

problem pairs and why on some others it does not, we provided

visualizations of the synergy metric.

We so far tested EMT on simple benchmark functions that are

usually used for single-objective optimization. However, in future work, we plan to test it also on real-world scenarios with more

complex functions and constraints. Furthermore, so far we have

used the synergy metric to explain why some problems are solved

Unfortunately, with this metric we can not strictly determine

when solving two problems will be successful. Thus, one possible

future direction is to develop machine learning methods that

predict when multitasking a set of problems would be successful.

Figure 7: Normalized difference between multitask and

This may be useful for cloud systems that could form several

single-task optimization on 50 problems originating from

groups of similar problems and then solve them in a multitask

50D Sphere and Ackley functions whose optima are

manner.

shifted close to each other.

5

ACKNOWLEDGMENTS

We acknowledge financial support from the Slovenian Research

Agency (young researcher program and research core funding

no. P2-0209).

REFERENCES

[1] Kavitesh Kumar Bali, Abhishek Gupta, Liang Feng, Yew

Soon Ong, and Tan Puay Siew. 2017. Linearized domain

adaptation in evolutionary multitasking. In 2017 IEEE Con-

gress on Evolutionary Computation (CEC). IEEE, 1295–1302.

[2] Julian Blank and Kalyanmoy Deb. 2020. Pymoo: Multi-

objective optimization in Python. IEEE Access, 8, 89497–

89509.

Figure 8: Normalized difference between multitask and

[3] Kalyanmoy Deb. 2001. Multi-Objective Optimization using

single-task optimization on six well-known 50D optimiza-

Evolutionary Algorithms. John Wiley & Sons, Chichester.

tion problems when the optima are shifted close to each

[4] Liang Feng, Lei Zhou, Jinghui Zhong, Abhishek Gupta,

other.

Yew-Soon Ong, Kay-Chen Tan, and Alex Kai Qin. 2018.

Evolutionary multitasking via explicit autoencoding. IEEE

Transactions on Cybernetics, 49, 9, 3457–3470.

[5] Maoguo Gong, Zedong Tang, Hao Li, and Jun Zhang. 2019.

Evolutionary multitasking with dynamic resource allocat-

ing strategy. IEEE Transactions on Evolutionary Computa-

tion, 23, 5, 858–869.

[6] Abhishek Gupta, Jacek Mańdziuk, and Yew-Soon Ong.

2015. Evolutionary multitasking in bi-level optimization.

Complex & Intelligent Systems, 1, 1-4, 83–95.

[7] Abhishek Gupta, Yew-Soon Ong, Bingshui Da, Liang Feng,

and Stephanus Daniel Handoko. 2016. Landscape synergy

in evolutionary multitasking. In 2016 IEEE Congress on

Evolutionary Computation (CEC). IEEE, 3076–3083.

[8] Abhishek Gupta, Yew-Soon Ong, and Liang Feng. 2015.

Figure 9: Normalized difference between multitask and

Multifactorial evolution: Toward evolutionary multitask-

single-task optimization on 48 problems originating from

ing. IEEE Transactions on Evolutionary Computation, 20, 3,

six well-known 50D optimization problems whose optima

343–357.

are shifted close to each other.

[9] Abhishek Gupta, Yew-Soon Ong, Liang Feng, and Kay

Chen Tan. 2016. Multiobjective multifactorial optimiza-

4

CONCLUSION AND FUTURE WORK

tion in evolutionary multitasking. IEEE Transactions on

Cybernetics, 47, 7, 1652–1665.

We presented our experimental results on solving multiple opti-

[10] Yu-Wei Wen and Chuan-Kang Ting. 2017. Parting ways

mization problems simultaneously using a novel method called

and reallocating resources in evolutionary multitasking.

evolutionary multitasking. We were solving just two optimiza-

In 2017 IEEE Congress on Evolutionary Computation (CEC).

tion problems, but also as many as 50 optimization problems

IEEE, 2404–2411.

at the same time. From the experimental results, we can con-

clude that there are some groups of problems for which EMT can

improve the speed of convergence of the optimization process.

14





Intent Recognition and Drinking Detection For Assisting Kitchen-based Activities

Carlo M. De Masi

Simon Stankoski

carlo.maria.demasi@ijs.si

simon.stankoski@ijs.si

Department of Intelligent Systems

Department of Intelligent Systems

Jožef Stefan Institute

Jožef Stefan Institute

Ljubljana, Slovenia

Ljubljana, Slovenia

Vincent Cergolj

Mitja Luštrek

vc2756@student.uni- lj.si

mitja.lustrek@ijs.si

Univerza v Ljubljani, Fakulteta za elektrotehniko

Department of Intelligent Systems

Department of Intelligent Systems

Jožef Stefan Institute

Jožef Stefan Institute

Ljubljana, Slovenia

Ljubljana, Slovenia

ABSTRACT

This paper is organized as follows. Section 2 discusses the related work. Section 3 presents the system architectures. Section We combine different computer-vision (pose estimation, object

4 describes the recognition modules of the system. Section 5

detection, image classification) and wearable based activity recog-shows the results of the recognition modules. Finally, Section 6

nition methods to analyze the user’s behaviour, and produce a

concludes the paper.

series of context-based detections (detect locations, recognize

activities) in order to provide real-time assistance to people with mild cognitive impairment (MCI) in the accomplishment of every

2

RELATED WORK

day, kitchen-related activities.

2.1

Drinking Detection From Wearables

KEYWORDS

Recent advances in the accuracy and accessibility of wearable

sensing technology (e.g., commercial inertial sensors, fitness

computer vision, activity recognition, object detection, pose esti-bands, and smartwatches) has allowed researchers and practi-

mation

tioners to utilize different types of wearable sensors to assess

fluid intake in both laboratory and free-living conditions.

1

INTRODUCTION

The necessity for fluid intake monitoring emerges as a result of

people’s lack of awareness of their hydration levels. Dehydration

Smart home technologies have been extensively adopted for mea-

can lead to many severe health problems like organ and cognitive

suring and decreasing the impact of Mild Cognitive Impairment

impairments. Therefore, a system that can continuously track

(MCI) on everyday life [9]. In the scope of the CoachMyLife (CML) the fluid intake and provide feedback to the user if useful.

project we have been developing a system employing different

In [1], the authors explored the possibility of recognizing drink-machine learning techniques with the aim of assisting persons

ing moments from wrist-mounted inertial sensors. They used

affected by MCI in performing activities in their apartments, with adaptive segmentation to overcome the problem with variable

a particular focus on tasks related to the kitchen.

length of the drinking gestures. They used random forest algo-

In a previous work, we presented one of the first components

rithm, trained with 45 features, and obtained an average precision of this system, i.e. a computer vision pipeline which allows to

of 90.3% and an average recall of 91.0%. In [5], the authors em-detect the activity of drinking, by analyzing the video collected

ployed a two-step detection procedure, enabling them to detect

by an RGB camera through a 3D Convolutional Neural Network

drinking moments and estimate the fluid intake. They extracted

(3D-CNN) [12].

28 statistical features, from which only six were selected using

In the present paper, we present our work on extending said

backward feature selection. Finally, they trained a Conditional

pipeline, by discussing (i) a drinking-detection algorithm based

Random Field model, resulting in a precision of 81.7% and re-

on motion data from a wristband, which can be used to further

call of 77.5%. In [4], the authors used a machine-learning based validate the one based on computer vision, and to replace it in

model to detect hand-to-mouth gestures. Similarly as the previ-

situations where the activity is not performed in front of the

ous methods, they extracted 10 time-domain features and trained

camera; (ii) a method based on pose detection to identify inter-

a random forest classifier. They validated their method in a free-

actions of the user with their environment, in order to perform

living scenario and obtained precision of 84% and recall of 85%.

intent recognition, and (iii) a possible new implementation of our Although remarkable results were achieved, the evaluation of the

previous computer-vision pipeline for drinking detection that

studies is limited and it is not showing the real-life performance.

can be deployed on edge devices.

Permission to make digital or hard copies of part or all of this work for personal 2.2

Activity Recognition From Videos

or classroom use is granted without fee provided that copies are not made or In recent years, the problem of computer-based Human Activity

distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this Recognition (HAR) of daily living has been tackled by different

work must be honored. For all other uses, contact the owner /author(s).

computer-vision methods.

Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

HAR can be performed directly on RGB images and videos by

© 2021 Copyright held by the owner/author(s).

analyzing: (i) the spatial features in each frame, thus obtaining

15





Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Trovato and Tobin, et al.

predictions for each frame that can then be extended to the whole

4

INTENT RECOGNITION

video by pooling or by a recurrent-based neural networks [2], (ii) One of the main goals of the CML project is to provide users with

the temporal features related to motions and variations between

real-time, context-based notifications to assist them in perform-

frames [6], or (iii) some combination of the two [10].

ing activities.

The most recent approaches aimed at simultaneous evaluation

This is achieved in two steps. First, by combining computer-

of both spatial and temporal features involve the usage of 3D-

vision and the wearable device, the system detects real-time

CNN, i.e., convolutional models characterized by an additional

events, such as the position of the user, their interaction with

third temporal dimension [12].

the environment, the displacement of a mug the user is expected

An alternative approach, not involving the direct analysis of

to drink from, the opening/closing of cabinet and fridge door,

the whole frames, consists in exploiting the information provided

drinking and eating.

by human pose estimation, so that body keypoints coordinates,

Then, these events are passed to the intent recognition module,

reconstructed in a 2D or 3D space, can be fed to deep-learning

which uses them to predict which activity the user is performing,

models to provide predictions [3].

and provide assistance if needed.

We adopted a Single Shot MultiBox Detector (SSD) [8] model, 3

ADOPTED HARDWARE

pre-trained on the 80 classes of the COCO dataset [7] for the 3.1

Wristband

detection of the user, and fine-tuned on a custom dataset we

collected to locate the position of the mug. Pose estimation, which The drinking-detection procedure is implemented on a wrist-is used to track the movement of the user’s hands and detect

band which is equipped with a nRF52840 System On Chip (SoC)

interactions with domestic appliances, is achieved by a SimpleNet

module. The SoC offers a large amount of Flash and RAM, 1MB

model with a ResNet backbone [13].

and 256 kB, respectively. Additionally, it has protocol support

for Bluetooth Low Energy (BLE). The architecture of nRF52840

is based on 32-bit ARM® Cortex™-M4 CP U with floating point

4.1

Regions of Interest

unit running at 64 MHz. The wristbands power supply source

During the initial setup, the user is asked to identify some regions is a battery with a capacity of 500 mAh. The measurements of

of interest (ROIs) in the camera image, which can be either single accelerations and angular velocities are performed by the system-or double-zoned.

in-package LSM6DSL, manufactured by STM. It is equipped with

In the first case, the ROI is "activated" when the user’s feet are a 3D digital accelerometer and a 3D digital gyroscope based on

within the selected region (Fig. 1a), whereas double-zone ROIs MEMS technology that operates at 0.65 mA in high-performance

are used to detect if the user is in the desired area and/or if their mode and allows low power consumption with constant opera-hands are in the selected upper area (Fig. 1b).

tion. The most prominent feature of the Inertial Measurement

Unit (IMU) is a 4 kB FIFO (First In First Out) buffer, which stores the data of the accelerometer and gyroscope. This allows for very

4.2

Intent Recognition

low power operation, as the SoC wakes up only when triggered

The events detected by the computer vision pipeline are passed

by an "FIFO full" interrupt event.

to the intent recognition module, which predicts the activity the

user is currently engaged on.

3.2

Local Deployment of The Computer

Currently, this prediction is based on a set of pre-determined

Vision System

rules. A number of possible activities is manually inserted, each

The computer vision pipeline for drinking detection we previ-

formed by different steps, corresponding to possible events that

ously developed for the project worked by retrieving the video

can be detected by the computer vision system (Fig. 2a). Different stream collected by an IP camera in the user’s apartment, and

activities can share one or more steps, and as the system detects

analyzing it on a remote server. This approach, however, pre-

the completion of the various steps, the list of possible ongoing

sented issues related to the remote access to the camera, which

activities gets reduced (Fig. 2b, 2c), until only one activity is can sometimes be blocked by the router’s firewall functionalities, identified and followed until its completion (Fig. 2d).

and raised safety and privacy concerns with the users.

If too long of a time interval passes between the completion

For these reasons, we have been working on deploying the

of two steps, the activity is classified as "interrupted", and the CML system on a local device. After some unsuccessful attempts

system can show a notification to the user, asking if they require to implement the system on Android devices by using frame-assistance.

1

2

works such as Apache TVM

or Deep Java Library (DJL) , we

3

opted for deployment on a Jetson NANO device .

4.3

Drinking Detection From Computer

Direct deployment of our system on the device was possible,

Vision on the Jetson NANO

although not immediate, but the resulting performance was sub-

optimal in terms of the FPS reached by the various detection

The model we previously adopted to perform activity recognition

algorithms (≈2 FPS for the object detection). To overcome this,

from videos is particularly computationally expensive and so,

we optimized said algorithms by TensorRT, a library built on

although it proved to be very effective in the detection of drinking NVIDIA’s CUDA library for parallel programming, thus improv-events, it was not possible to implement it on the Jetson NANO.

ing inference performance for deep learning models (≈22 FPS for

For this reason, we are currently collecting a dataset of short

the object detection).

video clips, passing them through a pose estimation model, in

order to obtain the 2D position of 18 body parts across a time

1 https://tvm.apache.org/

series of frames, with an associated class label for the frame series.

2 https://djl.ai/

3

Then, this will be analyzed through an LSTM-based model to

https://www.nvidia.com/en-us/autonomous-machines/embedded-

systems/jetson-nano/

perform HAR.

16





Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

(a)

(b)

Figure 1: Triggers based on user’s location and their interaction with the environment.

Interact: cabinet

Interact: cabinet

Interact: table

Interact: cabinet

Interact: cabinet

Interact: table

Next to: sink

Next to: fridge

Interact: counter

Next to: sink

Next to: fridge

Interact: counter

Interact: stoves

Interact: coffee machine

Interact: table

Interact: stoves

Interact: coffee machine

Interact: table

BOIL

MAKE

EAT

BOIL

MAKE

EAT





WATER

COFFEE

SOMETHING





WATER

COFFEE

SOMETHING

(a)

(b)

Interact: cabinet

Interact: cabinet

Interact: table

Interact: cabinet

Interact: cabinet

Interact: table

Next to: sink

Next to: fridge

Interact: counter

Next to: sink

Next to: fridge

Interact: counter

Interact: stoves

Interact: coffee machine

Interact: table

Interact: stoves

Interact: coffee machine

Interact: table

BOIL

MAKE

EAT

BOIL

MAKE

EAT





WATER

COFFEE

SOMETHING





WATER

COFFEE

SOMETHING

(c)

(d)

Figure 2: As the computer vision system detects the completion of various steps, the list of possible ongoing activities gets reduced, until one of them is completed or interrupted.

4.4

Drinking Detection Using a Wearable

wait for the next AWT event. Otherwise, if at least one prediction device

is positive, the machine-learning procedure continues to work

for another three new batches of data.

Due to the desired minimum power consumption, the drinking

The machine-learning method for detection of drinking ges-

detection was implemented directly on the wristband. This is

tures is based on time- and frequency-domain features. The raw

preferable as it eliminates the need to transfer all the raw sensor data is segmented into 5-second windows and 216 features are

data to a smartphone or some sort of central device. Raw sen-

extracted in total. We used a relatively simple approach due to

sor data transmission is clearly undesirable due to high power

the memory limitation of the wristband. The deployed model

consumption and it is not possible if the central device is not

was trained using the drinking dataset described in Section 4.4.1

nearby.

and additional non-drinking data collected in real-life scenario

The first step of drinking detection using the wristband is

[11].

to enable the IMU in activity/inactivity recognition mode. This

allows the IMU to be in a low power state for the most part of

4.4.1

Drinking Dataset. For the aim of this study, we recruited

the day.

19 subjects (11 males and 8 females). Each subject was equipped

When activity is recognized the IMU enables absolute wrist

with the wristband described in Section 3.1. We developed a detection (AWT) which checks if the angle between the horizontal

custom application that ran on the wristband and collected three-

plane and the Y axis of the IMU is larger than 30 degrees. If the

axis accelerometer and three-axis gyroscope data at a sampling

condition is met the IMU is enabled in batching mode, storing

4

frequency of 50 Hz. The dataset

is publicly available and we

accelerometer and gyroscope data in the FIFO buffer. Every time

hope that it will serve researchers in future studies.

the FIFO buffer is full, data is transferred to the SoC, where we

We developed a general procedure for the participants to fol-

directly start the machine learning pipeline. This procedure is

low during the data collection process. The ground truth was

repeated for three batches of IMU readings. If all three predictions registered manually by participants pressing a button on the

from the machine learning model are non drinking, we disable

wristband before performing the gesture and after finishing the

the gyroscope, we stop the machine learning procedure and we

4 https://github.com/simon2706/DrinkingDetectionIJS

17





Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Trovato and Tobin, et al.

gesture. The data collection procedure included drinking from

REFERENCES

six different container types—namely, bottle, coffee cup, coffee

[1]

Keum San Chun, Ashley B Sanders, Rebecca Adaimi, Necole

mug, glass, shot glass and wine glass.

Streeper, David E Conroy, and Edison Thomaz. 2019. To-

For each participant we collected 36 drinking episodes (3 fluid

wards a generalizable method for detecting fluid intake

level x 6 containers x 2 positions). The idea of the different fluid with wrist-mounted sensors and adaptive segmentation.

level was to obtain drinking episodes with a short, medium and

In Proceedings of the 24th International Conference on Intel-

long duration. We also considered different body positions. The

ligent User Interfaces, 80–85.

participants first performed the drinking gestures while being

[2]

Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadar-

seated and afterwards they repeated the same gestures while

rama, et al. 2015. Long-term recurrent convolutional net-

standing.

works for visual recognition and description. In Proceed-

ings of the IEEE conference on computer vision and pattern

5

RESULTS AND DISCUSSION

recognition, 2625–2634.

5.1

Intent Recognition and Local

[3]

Giovanni Ercolano and Silvia Rossi. 2021. Combining cnn

Implementation of Drinking Detection

and lstm for activity of daily living recognition with a 3d

matrix skeleton representation. Intelligent Service Robotics,

A pilot phase will begin shortly, during which the intent recogni-

14, 2, 175–185.

tion module will be evaluated.

[4]

Diana Gomes and Inês Sousa. 2019. Real-time drink trigger

Regarding the new model for drinking detection, a prelimi-

detection in free-living conditions using inertial sensors.

nary test of our new approach, ran on a subset of the Berkeley

Sensors, 19, 9, 2145.

5

Multimodal Human Action Database (MHAD) dataset

, reached

[5]

Takashi Hamatani, Moustafa Elhamshary, Akira Uchiyama,

an accuracy of over 90%, and we’ll extend the analysis to our case and Teruo Higashino. 2018. Fluidmeter: gauging the hu-once the dataset collection will be over.

man daily fluid intake using smartwatches. Proceedings of

5.2

Wearable Sensing Results

the ACM on Interactive, Mobile, Wearable and Ubiquitous

Technologies, 2, 3, 1–25.

For evaluation, the leave-one-subject-out (LOSO) cross-validation

[6]

Ammar Ladjailia, Imed Bouchrika, Hayet Farida Merouani,

technique was used. In other words, the models were trained on

Nouzha Harrati, and Zohra Mahfouf. 2020. Human activ-

the whole dataset except for one subject on which we later tested

ity recognition via optical flow. Neural Computing and

the performance.

Applications, 32, 21, 16387–16400.

For the drinking detection model, we considered several clas-

[7]

Tsung-Yi Lin, Michael Maire, Serge Belongie, et al. 2014.

sifiers including logistic regression (LR), linear discriminant anal-Microsoft coco: common objects in context. (2014). arXiv:

ysis (LDA), k-nearest neighbors (KNN), naive Bayes (NB) and

1405.0312 [cs.CV].

XGBoost.

[8]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, et al. 2016.

The obtained results are shown in Table 1. It can be clearly Ssd: single shot multibox detector. Lecture Notes in Com-seen that XGBoost outperforms all other classifiers. However, due

puter Science, 21–37. issn: 1611-3349. doi: 10.1007/978-

to the technical limitations described in Section 3.1 the trained

3- 319- 46448- 0_2. http://dx.doi.org/10.1007/978- 3- 319-

model is unable to fit below 100 KB. The size of the LR model

46448- 0_2.

is only 2 KB, which is optimal for our device. Furthermore, the

[9]

Maxime Lussier, Stéphane Adam, Belkacem Chikhaoui,

results obtained with LR are only 0.03 lower compared to those

Charles Consel, Mathieu Gagnon, Brigitte Gilbert, Sylvain

from XGBoost. Therefore, we deployed the model trained with

Giroux, Manon Guay, Carol Hudon, Hélène Imbeault, et al.

the LR classifier.

2019. Smart home technology: a new approach for per-

formance measurements of activities of daily living and

6

CONCLUSIONS

prediction of mild cognitive impairment in older adults.

We presented our work on drinking detection using wearables

Journal of Alzheimer’s Disease, 68, 1, 85–96.

and intent recognition/drinking detection using computer vision.

[10]

Karen Simonyan and Andrew Zisserman. 2014. Two-stream

A pilot phase, beginning in October 2021, will provide thor-

convolutional networks for action recognition in videos.

ough testing of the functionalities described in the paper. Nonethe-In Advances in neural information processing systems, 568–

less, the results obtained from the internal testing for each module 576.

of the system show promising results for both drinking (with

[11]

Simon Stankoski, Marko Jordan, Hristijan Gjoreski, and

both wearables and computer vision) and intent recognition.

Mitja Luštrek. 2021. Smartwatch-based eating detection:

data selection for machine learning from imbalanced data

5 http://tele-immersion.citris-uc.org/berkeley_mhad

with imperfect labels. Sensors, 21, 5. issn: 1424-8220. doi:

10.3390/s21051902. https://www.mdpi.com/1424- 8220/21/

Table 1: Comparison of different classifiers for detection

5/1902.

of drinking activity.

[12]

Gül Varol, Ivan Laptev, and Cordelia Schmid. 2017. Long-

term temporal convolutions for action recognition. IEEE

Method

Precision

Recall

F1 score

transactions on pattern analysis and machine intelligence,

Logistic regression

0.87

0.77

0.81

40, 6, 1510–1517.

Linear discriminant analysis

0.54

0.69

0.55

[13]

Bin Xiao, Haiping Wu, and Yichen Wei. 2018. Simple base-

K-nearest neighbors

0.84

0.69

0.75

lines for human pose estimation and tracking. In Proceed-

Naive Bayes

0.68

0.85

0.74

ings of the European conference on computer vision (ECCV),

XGBoost

0.89

0.81

0.84

466–481.

18





Anomaly Detection in Magnetic Resonance-based Electrical Properties Tomography of in silico Brains

Ožbej Golob

Alessandro Arduino

Oriano Bottauscio

University of Ljubljana

Istituto Nazionale di Ricerca

Istituto Nazionale di Ricerca

Faculty of Computer and

Metrologica

Metrologica

Information Science

Torino, Italy

Torino, Italy

Ljubljana, Slovenia

a.arduino@inrim.it

o.bottauscio@inrim.it

ozbej.golob@gmail.com

Luca Zilberti

Aleksander Sadikov

Istituto Nazionale di Ricerca

University of Ljubljana

Metrologica

Faculty of Computer and

Torino, Italy

Information Science

l.zilberti@inrim.it

Ljubljana, Slovenia

aleksander.sadikov@fri.uni- lj.si

ABSTRACT

fields can easily penetrate into most biological tissues, making

EPT suitable for imaging of the whole body. The MRI scans for

Magnetic resonance-based electrical properties tomography (EPT)

EPT are performed using a standard MRI scanner, and its spa-

is one of the novel quantitative magnetic resonance imaging

cial resolution is determined by MRI images and quality of used

techniques being tested for use in clinical practice. This paper

𝐵

-mapping technique [9].

presents preliminary research and results of automated detection

1

The objective of this research was to develop and evaluate

of anomalies from EPT images. We used in silico data based on

algorithms to automatically detect anomalies of different sizes

anatomical human brains in this experiments and developed two

in the EPT images. The data consisted of in silico simulated

algorithms for anomaly detection. The first algorithm employs a

brain scans of phantoms that either contained an anomaly or

standard approach with edge detection and segmentation while

not. The evaluation was aimed towards answering whether an

the second algorithm exploits the quantitative nature of EPT and

anomaly can be detected or not, and how large an anomaly can

works directly with the measured electrical properties (electrical be (reasonably) reliably detected. This represents an initial step conductivity and permittivity). The two algorithms were com-towards the potential clinical use of EPT.

pared on – as of yet – noiseless data. The algorithm using the

standard approach was able to quite reliably detect anomalies

2

METHODS

roughly the size of a cube with a 14 mm edge while the EPT-based

algorithm was able to detect anomalies roughly the size of a cube

2.1

Data Acquisition

with a 12 mm long edge.

The MRI acquisition of the EPT inputs has been simulated in a

noiseless case. Thus, the result of the electromagnetic simulation KEYWORDS

at RF has been directly converted in the acquired data, with no

electrical properties tomography (EPT), magnetic resonance imag-

further post-processing. Precisely, the 𝐵

field generated by a

1

ing (MRI), automatic anomaly detection, artificial intelligence

current-driven 16-leg birdcage body-coil (radius 35, height 45)

operated both in transmission and in reception with a polarisation 1

INTRODUCTION

switch has been computed in presence of anatomical human

heads with a homemade FEM–BEM code [2]. The simulations The frequency-dependent electrical properties (EPs), including

have been conducted at 64 (i.e. the Larmor frequency of a 1.5

electrical conductivity and permittivity, of biological tissues pro-scanner).

vide important diagnostic information, e.g. for tumour charac-

The acquisitions of 19 human head models from the XCAT

terisation [9]. EPs can potentially be used as biomarkers of the library [6] have been simulated. The considered population is healthiness of various tissues. Previous studies, not based on

statistically representative of different genders and ages. For each magnetic resonance imaging (MRI), have shown that various

head model, 10 different variants are considered:

diseases cause changes of EPs in the tissue [3].

Electrical properties tomography (EPT) is used for quantita-

(1) Two physiological variants with the original distribution

tive reconstruction of EPs distribution at radiofrequency (RF)

of the biological tissues. In one case, the nominal electrical

with spatial resolution of a few millimetres. EPT requires no elec-conductivity provided by the IT’IS Foundation database [5]

trode mounting and, during MRI scanning, no external energy

is assigned to each tissue. In the other case, the electrical

is introduced into the body other than the 𝐵

fields. Applied

conductivity of white and grey matter is sampled from a

1

𝐵1

uniform distribution that admits a variation up to 10 with

Permission to make digital or hard copies of part or all of this work for personal respect to the nominal value. This will be referred to as

or classroom use is granted without fee provided that copies are not made or the physiological variability of the electrical conductivity.

distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this (2) Eight pathological variants, in which a spherical patho-work must be honored. For all other uses, contact the owner /author(s).

logical inclusion is inserted in the white matter tissue.

Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

The radius of the inclusion ranges from 5 to 45 and its

© 2021 Copyright held by the owner/author(s).

electrical conductivity is set equal to that of the white

19





Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

Golob et al.

matter increased by a factor uniformly sampled from 10

to 50 of the nominal value, because previous experimental

results have shown that pathological tissues have higher

EP values than healthy tissue [7, 8]. The location of the inclusion within the head is selected with a random procedure and only its intersection with the white matter tissue

is kept in the final model (see Fig. 1 panels a and d). All the pathological variants take into account the physiological variability in the determination of the white and grey

matter electrical conductivity.

2.2

Reconstruction Techniques

In order to retrieve the distribution of the electrical conductiv-

ity, the phase-based implementations of Helmholtz-EPT (H-EPT)

Figure 1: Median electrical conductivity distribution by re-

and convection-reaction–EPT (CR-EPT) provided by the open-

gions. (a) Segmented healthy MRI image. (b) Median elec-

source library EPTlib [1] have been used. For each head model, trical conductivity distribution. (c) Detected regions (bor-the distribution of the transceive phase [3] (input of phase-based dered red). (d) Segmented pathological MRI image (anom-EPT) is obtained by linearly combining the phases of the rotat-

aly is yellow). (e) Median electrical conductivity distribu-

ing components of 𝐵

simulated both in transmission and in

1

tion. (f) Detected regions (bordered red). Please note that

reception [1].

not all of the regions are visible as only a 2D slice is shown

Since noiseless inputs are considered, the smallest filter has

while the data is 3D.

been used both in H-EPT and in CR-EPT. Moreover, CR-EPT

has been applied for a volume tomography, with an electrical

conductivity of 0.1 forced at the boundaries and an artificial

−4

diffusion coefficient equal to 10

.

biomarker of healthy brain. Mandija et al. [4] presented mean Currently, the proposed anomaly detection algorithms have

electrical conductivity and standard deviation of white and grey

been tested only on the H-EPT results.

matter as a reliable measure of whether the brain contains patho-

logical tissue.

2.3

Anomaly Detection

In input data for our experiments, electrical conductivity is

We developed two anomaly detection algorithms: (i) a more clas-

distributed from 90% to 110% of nominal value for white mat-

sical approach for anomaly detection in MR images and (ii) an

ter, and from 110% to 150% for anomalies. However, it must be

EPT-based approach working with direct quantitative properties

noted that these are the values used for setting up the phantoms,

estimated by the MRI-based EPT.

and that these values are then only approximated when EPT re-

construction is performed. These reconstructed properties have

2.3.1

Classical Approach. The classical approach uses standard

been used as input for anomaly detection. The algorithm detects

techniques used for anomaly detection in MR images. This ap-

anomalies based on the difference between white matter and

proach could be applied (also) on standard MR images as it is

anomalies. The algorithm uses noiseless EPT images, produced

independent of the MRI technique. The algorithm uses noiseless

with H-EPT, as input data.

EPT images, produced with Helmholtz reconstruction technique,

The algorithm, as the classical one, receives as input previ-

as input data.

ously segmented white matter from the whole EPT image. It

The algorithm receives previously segmented (this segmenta-

then detects all voxels that have electrical conductivity between

tion was not of interest in this research) white matter from the

110% and 150% of median electrical conductivity of white matter

EPT image and detects the edges in it. The edges are detected

and marks them as a potential anomaly. These voxels, marked

using a simple gradient edge detection technique. The gradient

as potentially being an anomaly, are then grouped into regions

is calculated for each voxel based on the directional change of

based by their location. The algorithm ignores all smaller regions electrical conductivity of neighbouring voxels. The edges are

(below a set size threshold) that likely represent noise and reconrepresented as borders between white matter and other brain

struction errors. All the remaining regions are classified as the

tissues as well as borders between white matter and anomalies.

anomaly.

Edge voxels are ignored in order to avoid H-EPT reconstruction

errors, which occur at borders between tissues [4].

3

RESULTS

The algorithm then calculates median electrical conductivity

Figure 2 shows the predictions of whether an image contains an of all regions as separated by the detected edges. Figure 1 shows anomaly or not for both algorithms – classical on the left (a) and median electrical conductivity distribution by regions in a sample EPT approach on the right (b). Each EPT image corresponds to

image.

one bar on the chart and they are arranged with the increasing

The k-means algorithm is then employed for the classifica-

size of the anomaly; the size of the bar represents the size of

tion of regions into healthy and anomaly-containing ones. The

the anomaly in voxels. The bars are cut off at 2,000 voxels for

algorithm classifies an MR image based on median electrical con-

easier viewing. Only images actually containing the anomaly are

ductivity of each region. The anomaly location is associated with

shown; for the others the false positives (FP) rate describes the

the regions detected as containing the anomaly.

performance of the two algorithms. The green colour represents

2.3.2

EPT Approach. EPT differs from standard MRI techniques

correct predictions and the red colour the incorrect ones. The

by representing EPs as quantitative values. EPs are a reliable

yellow colour means that the algorithm correctly predicted the

20





Anomaly Detection in MR-based EPT of in silico Brains Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

Table 1: Classification evaluation of the classical ap-

Table 3: Classification evaluation of the EPT approach.

proach.

Measure

Training data

Test data

Measure

Training data

Test data

Precision

0.976

0.971

Precision

0.975

1.000

Recall

0.769

0.708

Recall

0.750

0.708

F1 score

0.860

0.819

F1 score

0.848

0.829

Accuracy

0.800

0.750

Accuracy

0.785

0.767

Table 4: Localisation evaluation of the EPT approach.

Table 2: Localisation evaluation of the classical approach.

Measure

Training data

Test data

Measure

Training data

Test data

IoU

0.381 ± 0.140

0.435 ± 0.125

IoU

0.197 ± 0.116

0.244 ± 0.110

Precision

0.874 ± 0.208

0.900 ± 0.177

Precision

0.932 ± 0.202

0.988 ± 0.050

Recall

0.396 ± 0.142

0.450 ± 0.126

Recall

0.204 ± 0.123

0.245 ± 0.110

F1 score

0.535 ± 0.166

0.594 ± 0.142

F1 score

0.313 ± 0.163

0.379 ± 0.143

Analogously, Table 3 shows the results of classification evaluation of the EPT approach and Table 4 shows the results of presence of the anomaly, but for the wrong reasons (hence Inter-localisation evaluation of the EPT approach. Again, IoU and F1

section over Union (IoU) is zero) – these cannot be counted as

score values are reduced as the result of ignoring anomaly edge

correct performance. Some misclassifications are labeled with the

voxels.

most likely cause: either that the anomaly is scattered in several An example of anomaly localisation is shown in Figure 3. As smaller regions (each below the detection threshold size) or, in

shown in the image, the EPT approach is generally better at

case of the EPT approach, that the anomaly is too close to the top anomaly localisation than the classical approach.

border and is ”overshadowed” by the cranium. For the unlabelled

misclassifications the most likely reason is the small size of the 4

DISCUSSION AND CONCLUSIONS

anomaly.

Figure 2 captures rather well the minimal anomaly size where The results indicate potential for future use of the EPT technique each algorithm starts performing quite reliably. The classical

for the anomaly detection in clinical practice. The results in terms approach detects anomalies larger than 350 voxels and the EPT

of the anomaly size are on par with what a trained radiologist is

approach detects anomalies larger than 170 voxels. Since each

able to detect manually.

voxel represents a cube with a 2 mm edge, these volumes trans-

EPT, being a quantitative technique, offers the advantage of

late roughly to a cube with the edge of 14 mm for the classical

comparability of the images (e.g. in longitudinal monitoring of the approach and a cube with the edge of slightly less than 12 mm

patient) compared to the standard qualitative MRI. Furthermore,

for the EPT approach.

the direct EPT approach performed better than the classical one

Tables 1-4 further clarify the results. The images were split via edge detection. It is also less complex and this can often be a into a training set, used to optimise several internal parameters

bonus in practical applications.

and a test set for independent evaluation. Internal parameters of

However, this is a pilot study and further research is required

the classical approach specify: (i) minimum gradient value for

to put these approaches into actual practice. The biggest limita-

a voxel to be recognized as an edge; (ii) electrical conductivity

tion of the presented study and results is that the images, while

difference between anomaly and healthy tissue; (iii) minimum

being an actual EPT reconstruction, were deliberately noiseless.

region size. Internal parameters of the EPT approach specify: (i)

With the introduction of noise the data would very much resem-

how many initial slices of white matter are ignored (to avoid

ble the actual in vivo cases, however the obtained results will

reconstruction errors); (ii) minimum region size. The split, while likely be worse. A lot of further work, mostly on noise reduction

random in nature, was made based on individual phantom heads

and detection in presence of noise is likely still required.

– the same head with different anomalies simulated could not be

Moreover, currently only the data captured using H-EPT is

both in the test and training set. The training set consisted of 130

used. This technique causes (large) reconstruction errors which

images (including 26 not containing an anomaly), and the test set

occur at the borders between tissues. The results could poten-

consisted of 60 images (including 12 not containing an anomaly).

tially be improved by combining H-EPT and CR-EPT [1], as the Table 1 shows the results of classification evaluation of the latter technique does not cause reconstruction errors at borders

classical approach and Table 2 shows the results of localisation between tissues.

evaluation using the classical approach. The localisation results

The anomaly localisation could also be improved by not ignor-

are reported as mean ± standard deviation of electrical conduc-

ing edges. The edges would still be removed when anomalies are

tivity. The values of IoU and F1 score for localisation are lower as detected, however, once an anomaly is detected, the edges around

a result of ignoring anomaly edge voxels. Anomaly edge voxels

the anomaly could be classified as anomaly, thus improving the

are ignored because of H-EPT reconstruction errors. This is not

IoU and the F1 score.

an issue for anomaly detection as values of precision are still

In addition to the mean value of electrical conductivity, the

high. Values of IoU and F1 score of localisation will be improved

standard deviation of the electrical conductivity could also be

by acknowledging edges of anomaly after it is already detected.

taken into account when detecting edges and anomalies.

21





Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

Golob et al.

Figure 2: Predictions of anomaly detection algorithms. (a) Classical approach. (b) EPT approach.

Kuster. 2018. IT’IS database for thermal and electromag-

netic parameters of biological tissues. Version 4.0. (2018).

doi: 10.13099/VIP21000- 04- 0.

[6]

W.P. Segars, B.M.W. Tsui, J. Cai, F.-F. Yin, G.S.K. Fung, and

E. Samei. 2018. Application of the 4-D XCAT phantoms

in biomedical imaging and beyond. IEEE Transactions on

Medical Imaging, 37, 3, 680–692.

[7]

Andrzej J. Surowiec, Stanislaw S. Stuchly, J. Robin Barr,

and Arvind Swarup. 1988. Dielectric properties of breast

Figure 3: Anomaly localization. (a) Segmented pathologi-

carcinoma and the surrounding tissues. IEEE Transactions

cal MRI image. (b) Localization of classical approach (de-

on Biomedical Engineering, 35, 4, 257–263.

tected anomaly is red). (c) Localization of EPT approach

[8]

B.A. Wilkinson, Rod Smallwood, A. Keshtar, J. A. Lee, and

(detected anomaly is red).

F.C. Hamdy. 2002. Electrical impedance spectroscopy and

the diagnosis of bladder pathology: a pilot study. The Jour-

nal of urology, 168, 4, 1563–1567.

Finally, once results achieved on EPT images of phantom brain

[9]

Xiaotong Zhang, Jiaen Liu, and Bin He. 2014. Magnetic-

are satisfactory, implemented approaches could be tested on in

resonance-based electrical properties tomography: a review.

vivo data.

IEEE Reviews in Biomedical Engineering, 7, 87–96. doi: 10.

ACKNOWLEDGMENTS

1109/RBME.2013.2297206.

The results presented here have been developed in the frame-

work of the EMPIR Project 18HLT05 QUIERO. This project has

received funding from the EMPIR programme co-financed by

the Participating States and from the European Union’s Horizon

2020 research and innovation programme.

REFERENCES

[1]

A. Arduino. 2021. EPTlib: an open-source extensible collec-

tion of electric properties tomography techniques. Applied

Science, 11, 7, 3237.

[2]

O. Bottauscio, M. Chiampi, and L. Zilberti. 2014. Massively

parallelized boundary element simulation of voxel-based

human models exposed to MRI fields. IEEE Transactions on

Magnetics, 50, 2, 7025504.

[3]

Jiaen Liu, Yicun Wang, Ulrich Katscher, and Bin He. 2017.

Electrical properties tomography based on 𝐵

maps in MRI:

1

principles, applications, and challenges. IEEE Transactions

on Biomedical Engineering, 64, 11, 2515–2530. doi: 10.1109/

TBME.2017.2725140.

[4]

Stefano Mandija, Petar I. Petrov, Jord J. T. Vink, Sebastian

F. W. Neggers, and Cornelis A. T. van den Berg. 2021. Brain

tissue conductivity measurements with MR-electrical prop-

erties tomography: an in vivo study. Brain topography, 34,

1, 56–63.

[5]

Hasgall P.A., Di Gennaro F., C. Baumgartner, E. Neufeld,

B. Lloyd, M.C. Gosselin, D. Payne, A. Klingenböck, and N.

22





Library for Feature Calculation in the Context-Recognition Domain

Vito Janko

Matjaž Boštic

Jožef Stefan Institute

Jožef Stefan Institute

Department of Intelligent Systems

Department of Intelligent Systems

Ljubljana, Slovenia

Ljubljana, Slovenia

vito.janko@ijs.si

bosticmatjaz@gmail.com

Junoš Lukan

Gašper Slapničar

Jožef Stefan Institute

Jožef Stefan Institute

Department of Intelligent Systems

Department of Intelligent Systems

Ljubljana, Slovenia

Ljubljana, Slovenia

Jožef Stefan International Postgraduate School

Jožef Stefan International Postgraduate School

Ljubljana, Slovenia

Ljubljana, Slovenia

junos.lukan@ijs.si

gasper.slapnicar@ijs.si

ABSTRACT

of a new context recognition system can be tedious and time-

consuming. It usually consists of collecting relevant sensor data, Context recognition is a mature artificial intelligence domain

parsing it to a suitable format, calculating features based on this with established methods for a variety of tasks. A typical machine data and finally training the model.

learning pipeline in this domain includes data preprocessing, fea-

In this work we present a Python library focused on streamlin-

ture extraction and model training. The second of these steps is

ing this process. Its main functionality is calculating the features typically the most challenging, as sufficient expert knowledge

from sensor data. It can generate over a hundred different fea-

is required to design good features for a particular problem. We

tures that have proven themselves in various context-recognition

present a Python library which offers a simple interface for fea-

projects we tackled in the past [4, 3, 5]. Loosely, the features can ture calculation useful in a myriad of different tasks, from activity be divided in two categories: those suitable for motion data (e.g.

recognition to physiological signal analysis. It also offers addi-

generated by accelerometer or gyroscope) and those specialized

tional useful tools for data preprocessing and machine learning,

for physiological signals.

such as a custom wrapper feature selection method and predic-

Furthermore, the library implements some other functionalit-

tion smoothing using Hidden Markov Models. The usefulness

ies that are often used in context recognition pipelines: reshaping and usage is demonstrated on the 2018 SHL locomotion challenge

data into windows, re-sampling the data, selecting the best fea-

where a few simple lines of code allow us to achieve solid predicttures after generating them and a method for smoothing the

ive performance with F1 score of up to 93.1, notably surpassing

final predictions of the classifier using a Hidden Markov Model

the baseline performance and nearing the results of the winning

approach.

submission.

To demonstrate the usefulness of the library we used its func-

KEYWORDS

tionalities exclusively (with the exception of a generic Random

Forest classifier [11]) on the SHL Challenge dataset [16]. We feature calculation, python library, context recognition, machine

demonstrate the whole pipeline, from reading in the raw data

learning

to the finished context-recognition system that is comparable to

1

INTRODUCTION

the best-performing submissions in the SHL Challenge.

Context recognition is a vague term encompassing a variety of

2

LIBRARY FUNCTIONALITIES

tasks where sensors are put on (or around) a person and are

then used to determine something about them. For example,

The library is implemented in Python as this has been the most

sensors in a smartphone can determine if a user is standing,

popular data science language in recent years [6]. It is available in walking, running or even falling. A wristband sensor can read

a public repository with pip install cr-features command.

physiological signals like heart-rate or sweating to determine

Its main and most valuable functionality lies in feature gener-

stress or blood pressure. These kinds of applications are usually

ation. The ‘motion features’ are listed in Section 2.1, while the used for self monitoring in sport activities or for helping the

‘physiological features’ are described in Section 2.2. Remaining users manage various medical conditions.

non-feature related functionalities are explained in Section 2.3.

The context-recognition field is quite mature and its applic-

ations often come pre-installed in many commercial devices

2.1

Motion Sensors Features

like wristbands and smartphones. Nonetheless, the development

Features listed in the first two subsections are general and can be Permission to make digital or hard copies of part or all of this work for personal applied on any sensor data time-series. The last subsection (2.1.3),

or classroom use is granted without fee provided that copies are not made or on the other hand, lists features that have an additional semantic distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this interpretation for acceleration and require data from three (x,y,z) work must be honored. For all other uses, contact the owner /author(s).

axes. The library defines similar sensor subsets for some other

Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

sensors (e.g. gyroscope). Only a subset of features is listed for

© 2021 Copyright held by the owner/author(s).

brevity, while the full list can be found in the documentation [1].

23





Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Vito Janko, Matjaž Boštic, Junoš Lukan, and Gašper Slapničar

2.1.1

General Statistical Features.

between sympathetic and parasympathetic regulation of heart

•

beat [10] and is thus an especially useful physiological indicator.

Basic statistical measures: maximum, minimum, standard

Calculation of features related to cardiovascular activity fol-

deviation, median, mean difference between samples.

•

lows recommendations by Malik, Bigger, Camm, Kleiger, Malli-

Number of peaks – useful for detecting and counting steps,

ani, Moss and Schwartz [8]. To describe heart rate variability, the estimating the energy expenditure and determining the

Fourier transform of inter-beat intervals is calculated and then

frequency of motion: peak count, number of times data

several frequency features are derived from the spectrum [5].

crossed its mean value, longest time data was above or

below its mean value.

2.2.2

Skin Conductivity. Electrical conductivity of the skin var-

• Different data aggregations that can indicate the intensity

ies due to physiological changes in sweat glands, which are con-

of the activity: (squared) sum of values, sum of absolute

trolled by the autonomic nervous system. In a simple model of

values.

resistive properties of skin and sweat glands, whenever the level

• Autocorrelations (i.e. how similar the data is to a shifted

of sweat in the glands is increased, its conductivity also increases version of itself ) which indicate periodicity: autocorrela-

[2]. Sweat glands thus act as variable resistors and actual sweat-tion for raw data, for peak positions, for mean crossings.

ing, that is sweat secretion from the glands, is not needed for this

• Data shape: skewness (a measure of symmetry, or more

change to be measurable.

precisely, the lack of symmetry), kurtosis (a measure of

Changes in skin conductivity are not only triggered by other

whether the data is heavy-tailed or light-tailed relative to

physiological changes, such as the ones in (skin) temperature,

a normal distribution), interquartile range.

but also reflect psychological processes. Skin conductivity can

indicate cognitive activity or emotional responses and can do so

2.1.2

Frequency Features. They are calculated by first comput-

with good sensitivity [see 7, for an exhaustive review].

ing an estimate of the power spectral density of the signal via a

Sweat glands continuously adapt to their environment and

periodogram. We used the Welch’s method which is an improve-

their reactions can be slow or fast. Two main modes of fluctu-

ment over the traditional methods in that it reduces noise in the

ations are therefore distinguished: skin conductance level changes, estimated power spectra.

which are slow variations of the general trend, also called tonic

Once the periodogram is obtained, the following features are

electrodermal measures, and skin conductance responses, quick

computed: the magnitude value of the three highest peaks in

reactions, also called phasic electrodermal measures [13].

periodogram, the three highest frequencies corresponding to

To calculate skin conductivity features the two components

the highest peaks, energy of the signal calculated as the sum

are first separated. This is done using the EDA Explorer library

of squared FFT component magnitudes, entropy of the signal

[14] which enables searching for peaks (SCRs) in the signal by computed as the information entropy of the normalized FFT com-specifying their desired characteristics.

ponent magnitudes, and the distribution of the FFT magnitudes

The signal is first filtered using a Butterworth low-pass filter

into 10 equal sized bins ranging from 0 Hz to 𝐹 /2, where 𝐹

is

𝑠

𝑠

from SciPy [15]. Next, the peaks are detected by considering their the sampling frequency. Finally, we also computed the previously

amplitude, onset, and offset time.

described skewness and kurtosis for the periodogram.

Once the SCRs are found, their characteristics are calculated

Most of the described features are useful for finding different

which can be used as features. These include their number and

periodic patterns, how often they occur and how intense they

rate (relative frequency in time) as well as the means and maxima

are.

of various characteristics, such as their maximum amplitude, their 2.1.3

Accelerometer Features.

duration, increase time etc.

The tonic component is calculated using peakutils [9]. It

• Phone rotation estimation. First, roll and pitch are calcu-

is detected as the signal baseline, fitting a 10-th degree polyno-

lated, then we calculate their characteristics: mean, stand-

mial to the signal. Similarly to the phasic component, statistical ard deviation, peaks, autocorrelations.

features are calculated, such as the difference between this com-

• Physical interpretations: velocity, kinetic energy.

ponent and the raw signal, and the sum of its derivative.

• Comparing data axis; useful for determining the sensor

orientation relative to the direction of motion: correlation

2.2.3

Skin Temperature. Skin temperature is a fairly simple phy-

between axis data, comparing their means, mean direction

siological parameter, both from the point of view of measurement

of the vector they form.

as well as feature calculation. It can still serve as an indicator of affect [7]. Unlike the other physiological parameters which make 2.2

Physiological Features

use of expert features only some generic statistical features are

Physiological features are useful for obtaining information about

calculated for this indicator.

a person’s physiological state, typically reflected in their cardi-2.3

Other Functionalities

ovascular response. We computed several features from signals

obtainable from many modern wristbands as described in the

The following functionalities are not directly related to the fea-

sections below.

ture generation but are nonetheless often used in conjunction

with it – and can thus make the workflow more straightforward.

2.2.1

Heart Rate and Heart Rate Variability. Cardiovascular meas-

ures are widely used to predict both medical problems as well as

2.3.1

Resize, Resample. The presented library works with raw

psychological processes [7]. They range from simple heart rate data in matrix form: each row representing one window of data,

calculations to more complex heart rate variability indicators.

i.e. one instance. If the original data is in the form of 1D time-

Heart rate variability is a measure of how quickly heart rate it-

series, the convertInputInto2d function can reformat it in the

self changes and it is usually calculated on a beat-by-beat basis, required format. It can work both with windows of fixed number

considering the inter-beat interval (IBI). It reflects the interaction of data samples as well as windows representing a fixed time

24





Feature library

Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

interval. Another frequent pre-processing step is down-sampling

In the actual challenge, the subset used was the data recor-

the data and it can be done with the resample function.

ded by one of the three participants, which included 82 days

of recording, split into the training set (271 hours) and testing

2.3.2

Wrapper Feature Selection. While many feature selection

set (95 hours). Raw data from 7 sensors was provided: acceler-

libraries already exist (e.g. scikit-learn [11]), we implemented ometer, gyroscope, magnetometer, linear acceleration, gravity,

another one in this library as it was frequently used in our pre-

orientation and air pressure. All were sampled at 100 Hz [16].

vious work [4, 3]. It combines the relatively common ‘wrapper’

Data was split into 1-minute segments using a sliding window

approach with reducing the feature count using correlations. It

without overlap and then randomly shuffled, providing consistent

works in three steps:

instances. Finally, the training data had 16 310 such instances and (1) Calculate the information gain for every feature and rank

test data had 5698, where each instance contained 6000 samples.

them based on it.

This highlights the sheer size of the data and the challenges in

(2) Calculate the correlation between each feature pair. If the

processing it in full.

correlation exceeds the given threshold, discard the one

with lower information gain.

3.2

Methods

(3) Create the classifier using only the highest ranking feature

and measure the accuracy using a validation set. Then add

We used a traditional ML pipeline for this task: first preprocessing the second feature and measure the accuracy again. If it

the data, then computing informative features, selecting the best

was the same or higher, keep the feature, otherwise discard

of them and finally using them to train and evaluate a classifica-

it. Repeat for all other remaining features.

tion model. We added another not so traditional step: smoothing

the predictions using HMM.

2.3.3

Hidden Markov Model Smoothing. The final functionality

All steps except training and evaluation were done in few lines

is a tool to post-process the predictions of the context-recognition using the presented library; the Python code (with some missing

system – taking into account the temporal dependencies between

steps in comments) is given below. All classification was done

the instances.

using scikit-learn implementation of Random Forest with default

Take an example in which the classifier predicts the following

parameters.

minute-by-minute sequence: ‘subway’, ‘subway’, ‘bus’, ‘subway’,

‘subway’. It is far more likely that the ‘bus’ prediction is a mis-from CalculatingFeatures import resample,

classification than switching vehicles for just a minute.

calculateFeatures, selectFeatures, hmm_smoothing

Such a sequence can be corrected using a Hidden Markov

Model (HMM). This model assumes that there are hidden states

# Data was already windowed

# Data was resampled from 100 Hz to 20 Hz

corresponding to real activities which emit visible signals – clas-acc_x = pd.read_csv(path, sep=" ")

sifications. The parameters of this models can be inferred from

acc_x = resample(acc_x, 6000, 1200)

the matrix of transitions probabilities and confusion matrix of

# Repeat for all data types (and axes)

the predictor.

features_train = calculateFeatures(

Once the parameters are estimated, the Viterbi algorithm is

acc_x,

used in the background to determine the most likely sequence

acc_y,

of hidden states (activities) given visible emissions (predictions).

acc_z,

In many domains [4, 12] this method significantly improves the featureNames=accelerationNames,

final prediction accuracy.

prefix="acc",

While this method is least connected to the feature generation,

)

# Repeat for all data types and train/test/valid sets

we have not seen it implemented in a different library and have

# Merge in one dataframe

found it greatly useful.

selected = selectFeatures(

3

USAGE EXAMPLE

features_train, features_validation

)

We illustrate the usage of our library with an example: The Sussex-f1, cf, predictions = evaluate(

Huawei Locomotion Challenge 2018 [16]. This was a worldwide features_train[selected],

open activity recognition challenge with monetary incentives,

features_test[selected],

labels_train,

organized as part of the HASCA workshop within UbiComp

labels_test,

conference. 17 teams participated with 19 submissions. The goal

)

was to train a recognition pipeline on the provided training data

smoothed = hmm_smoothing(labels_train, cf, predictions)

and then use it to classify the withheld test data as well as possible

# smoothed is an array representing final output

in terms of the 𝐹

score metric.

1

3.1

SHL Dataset

3.3

Results

The challenge used a subset of the full dataset which was recorded over a period of 7 months by 3 participants engaging in 8 different We compared the results – in terms of 𝐹

score – of different

1

modes of transportation (still, walk, run, bike, car, bus, train and stages in the machine learning pipeline against the top three

subway). The phones were worn on 4 body positions, namely the

submissions in the competition.

hand, torso, hip pocket and in a backpack and recorded 16 sensor

In the first stage we used just the mean and standard devi-

modalities simultaneously. This totalled to 2812 hours of labelled ation as features (and calculated them for each data modality) to

data and this is considered one of the largest such datasets openly provide a baseline solution. Next, we calculated some features us-available [16].

ing the presented library. We then selected only a subset of them

25





Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Vito Janko, Matjaž Boštic, Junoš Lukan, and Gašper Slapničar

Table 1: A comparison of different versions of the pipeline,

Mlakar, Mitja Luštrek et al. 2020. Classical and deep learn-

against the best submissions in the SHL Challenge. The

ing methods for recognizing human activities and modes

number of features used in our methods is also listed.

of transportation with smartphone sensors. Information

Fusion, 62, 47–62.

Experiment

# features

[5]

Martin Gjoreski, Mitja Luštrek, Matjaž Gams and Hristijan

𝐹

score

1

Gjoreski. 2017. Monitoring stress with a wrist device using

Baseline

38

80.3

context. Journal of Biomedical Informatics, 73, 159–170.

All features

298

87.7

doi: 10.1016/j.jbi.2017.08.006.

Feature selection

130

87.1

[6]

Harshil. 2021. Tools of the trade: a short history. https :

HMM

130

93.1

/ / www . kaggle . com / haakakak / tools - of - the - trade - a -

Third place

/

87.5

short- history/. Accessed: 2021-09-20. (2021).

Second place

/

92.4

[7]

Sylvia D. Kreibig. 2010. Autonomic nervous system activ-

First place

/

93.9

ity in emotion: a review. Biological Psychology, 84, 3, 394–

421. doi: 10.1016/j.biopsycho.2010.03.010.

[8]

M. Malik, J. T. Bigger, A. J. Camm, R. E. Kleiger, A. Malliani,

and again measured the performance. Finally, we used the HMM

A. J. Moss and P. J. Schwartz. 1996. Heart rate variability:

smoothing; a post-processing step described in Section 2.3.3.

standards of measurement, physiological interpretation,

Results are shown in Table 1. It shows that the features gener-and clinical use. European Heart Journal, 17, 3, 354–381.

ated by the library substantially improve the performance. The

doi: 10.1093/oxfordjournals.eurheartj.a014868.

feature selection, on the other hand – while significantly reducing

[9]

Lucas Hermann Negri and Christophe Vestri. 2017. Lucash-

the number of features required – did not increase performance.

n/peakutils: v1.1.0. (2017). doi: 10.5281/ZENODO.887917.

Of note, the performance did increase in the internal validation

[10]

M Pagani, F Lombardi, S Guzzetti, O Rimoldi, R Furlan,

set, but this gain did not translate to the test set. The final jump P Pizzinelli, G Sandrone, G Malfatto, S Dell'Orto and E

in performance was achieved using the HMM smoothing and we

Piccaluga. 1986. Power spectral analysis of heart rate and

highly recommend this method in this and similar domains.

arterial pressure variabilities as a marker of sympatho-

Using just the methods in the presented library and no para-

vagal interaction in man and conscious dog. Circulation

meter or method tuning we achieved the results comparable with

Research, 59, 2, 178–193. doi: 10.1161/01.res.59.2.178.

the first placed submission to the challenge.

[11]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B.

Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss,

4

CONCLUSION

V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M.

In this paper we demonstrated the base usage of a Python library

Brucher, M. Perrot and E. Duchesnay. 2011. Scikit-learn:

capable of calculating features suitable for the context recognition machine learning in Python. Journal of Machine Learning

domain. The most important features that can be calculated are

Research, 12, 2825–2830.

listed in this paper with specialized ones thoroughly described.

[12]

Clément Picard, Vito Janko, Nina Reščič, Martin Gjoreski

We also showed on a topical example (SHL Challenge dataset)

and Mitja Luštrek. 2021. Identification of cooking pre-

how only a few lines of code can generate a very capable context-

paration using motion capture data: a submission to the

recognition system that can compete with the best entries submit-

cooking activity recognition challenge. In Human Activity

ted to this challenge. Such system can be improved with extensive

Recognition Challenge. Springer, 103–113.

tuning but we provide a solid starting point.

[13]

Society for Psychophysiological Research Ad Hoc Com-

It is our hope that by making this library publicly available

mittee on Electrodermal Measures. 2012. Publication re-

we can help the workflow of many future context-recognition

commendations for electrodermal measurements. Psycho-

researchers.

physiology, 49, 8, 1017–1034. doi: 10.1111/j.1469- 8986.

2012.01384.x.

ACKNOWLEDGMENTS

[14]

Sara Taylor, Natasha Jaques, Weixuan Chen, Szymon Fedor,

We acknowledge the financial support from the Slovenian Re-

Akane Sano and Rosalind Picard. 2015. Automatic identi-

search Agency (research core funding No. P2-0209).

fication of artifacts in electrodermal activity data. In 2015

37th Annual International Conference of the IEEE Engin-

REFERENCES

eering in Medicine and Biology Society (EMBC). IEEE. doi:

10.1109/embc.2015.7318762.

[1]

Matjaž Boštic, Vito Janko, Gašper Slapničar, Jakob Valič

[15]

Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt

and Junoš Lukan. 2021. cr-features. A library for feature

Haberland, Tyler Reddy, David Cournapeau, ..., Paul van

calculation in the context-recognition domain. https : / /

Mulbregt and SciPy 1.0 Contributors. 2020. SciPy 1.0: Fun-

repo . ijs . si / matjazbostic / calculatingfeatures. Accessed: damental Algorithms for Scientific Computing in Python.

2021-09-20. (2021).

Nature Methods, 17, 261–272. doi: 10.1038/s41592- 019-

[2]

Wolfram Boucsein. 2012. Electrodermal activity. Springer

0686- 2.

Science & Business Media.

[16]

Lin Wang, Hristijan Gjoreski, Kazuya Murao, Tsuyoshi

[3]

Božidara Cvetković, Robert Szeklicki, Vito Janko, Przemyslaw

Okita and Daniel Roggen. 2018. Summary of the sussex-

Lutomski and Mitja Luštrek. 2018. Real-time activity mon-

huawei locomotion-transportation recognition challenge.

itoring with a wristband and a smartphone. Information

Fusion

In Proceedings of the 2018 ACM International Joint Con-

, 43, 77–93.

ference and 2018 International Symposium on Pervasive

[4]

Martin Gjoreski, Vito Janko, Gašper Slapničar, Miha Mlakar,

and Ubiquitous Computing and Wearable Computers. ACM,

Nina Reščič, Jani Bizjak, Vid Drobnič, Matej Marinko, Nejc

1521–1530. doi: 10.1145/3267305.3267519.

26





Določanje slikovnega prostora na umetniških slikah Reconstruction of image space depicted on artistic paintings

Nadezhda Komarova

Borut Batagelj

Gregor Anželj

Narvika Bovcon

nadezhdakomarova7@gmail.com

Franc Solina

gregor.anzelj@gmail.com

borut.batagelj@f ri.uni- lj.si

Gimnazija Bežigrad

narvika.bovcon@f ri.uni- lj.si

1000 Ljubljana, Slovenia

franc.solina@f ri.uni- lj.si

Fakulteta za računalništvo in informatiko

Univerza v Ljubljani

POVZETEK

dostopne na medmrežju, na primer Google Arts and Culture, Wi-

kimedia Commons, Getty Open Content Program, ADA (Archive

V članku poročamo o analizi slikovnega prostora na umetniških

of Digital Art) in druge [4]. Z analizo in vizualizacijo velikih ume-slikah s pomočjo metod računalniškega vida. Naš cilj je bil, da

tniških zbirk se je prvi začel ukvarjati Lev Manovich [8]. Leta

ugotovimo, ali je možno zgolj na osnovi zaznave obrazov na sli-

2012 je preučeval vizualizacijske metode za družboslovne vede

kah določiti prostorsko organizacijo slike. Analiza je potekala

in medijske raziskave. Ukvarjal se je z informativno, uporabno

na izbranem vzorcu 3356 slik. Najprej smo določili tridimenzi-

in estetsko vrednostjo vizualizacij [9].

onalne koordinate zaznanih obrazov na posamezni sliki. Nato

Analiza razlik med predstavitvijo prostora s fotografijo in ume-

smo tem točkam priredili ravnino. Slikovni prostor smo tako do-

tniško sliko je bila narejena leta 2014 [11]. S statistično analizo ločili z enačbo prirejene ravnine oziroma kotom med to ravnino

slik tihožitij, ki so jih ustvarili udeleženci eksperimenta, so ugoto-in slikovno ravnino. Bolj kot je ravnina, ki jo določajo obrazi,

vili, da so predmeti, na katere so udeleženci usmerjali pozornost, nagnjena od navpične smeri, globlji je prikazani slikovni prostor.

naslikani večji kot so na fotografijah. Zato je vprašanje, ali je

KLJUČNE BESEDE

dosledna uporaba linearne perspektive najbolj primerna metoda

za posnemanje sveta [1]. Umetnostna zgodovina nam nazorno

računalniški vid, slikovni prostor, zaznava obrazov, umetnostna

prikaže, da so umetniki za posnemanje sveta uporabljali zelo

zgodovina

različne pristope.

ABSTRACT

Pri naši analizi slikovnega prostora smo izhajali iz dveh pred-

postavk:

In the article, we report on the analysis of the image space de-

(1) v raziskavi želimo analizirati veliko število umetniških slik

picted on artistic paintings utilizing methods of computer vision.

v smislu današnjega trenda Big Data,

Our aim was to find out whether one can recover the spatial

(2) uporabiti želimo take metode računalniškega vida, ki de-

organization of a picture based on detection of faces. The anal-

lujejo hitro in čimbolj zanesljivo.

ysis was conducted on the sample of 3356 paintings. First, 3D

coordinates of faces were determined. Then, a plane was fitted to

Med hitre in zanesljive metode računalniškega vida zagotovo

the faces on every painting. Images were therefore described in

sodi zaznava in identifikacija oseb na osnovi njihovih obrazov.

terms of the angle between the fitted plane and the picture plane.

Zaradi varnostnih razlogov se je teh problemov na področju bio-

The bigger the angle between both planes, the deeper the picture

metrije lotilo že zelo veliko znanstvenikov. Danes obstajajo hitre space depicted.

in zanesljive metode za zaznavo in analizo obrazov na slikah [10].

Za navdih nam je služil članek Irvinga Zupnicka iz leta 1959

KEYWORDS

[14], objavljen še veliko pred uporabo računalnikov v likovni

computer vision, image space, face detection, art history

umetnosti, ki opisuje kako je na slikah iz različnih umetnostnih

obdobjih organiziran slikovni prostor. Zato smo si zastavili vpra-

1

UVOD IN MOTIVACIJA

šanje, ali je mogoče s pomočjo metod računalniškega vida rekon-

struirati slikovni prostor na umetniških slikah? Bolj konkretno,

Odločili smo se povezati dve raziskovalni področji, ki sta si na-

ali ga je mogoče rekonstruirati na osnovi zaznave obrazov na

videz zelo vsaksebi, to je umetnostna zgodovina in umetna in-

slikah? Določitve 3D razsežnosti prostora, upodobljenega na sliki, teligenca. Metode računalniškega vida se že redno uporabljajo

smo se lotili na osnovi pozicije obrazov na sliki (𝑥 in 𝑦 koorditudi za analizo umetniških slik [12]. Večina teh raziskav je osre-

nate) in njihove velikosti, kar nam daje grobo informacijo o tretji dotočena na analizo posameznih ali manjšega števila umetniških

dimenziji 𝑧 — to je oddaljenosti obraza od opazovalca. Ta pristop slik. Po drugi strani smo danes v dobi velepodatkov (angl. Big

Data

seveda temelji na predpostavki, da so na slikah ljudje oziroma

), saj je vedno več informacij dostopnih v digitalni obliki.

da so upodobljeni njihovi dovolj veliki obrazi. Resda v zgodovini

Tudi velike zbirke reprodukcij umetniških slik so danes prosto

likovne umetnosti poznamo veliko tihožitij ali pokrajinskih slik,

Permission to make digital or hard copies of part or all of this work for personal na katerih ni obrazov. Toda velika večina umetniških slik iz obdo-or classroom use is granted without fee provided that copies are not made or bja pred izumom fotografije dejansko upodablja ljudi oz. njihove

distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this obraze.

work must be honored. For all other uses, contact the owner /author(s).

Iz javno dostopnih baz umetniških slik smo za našo študijo

Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

izbrali testno množico 3356 slik iz različnih umetnostnozgodo-

© 2021 Copyright held by the owner/author(s).

vinskih obdobij in žanrov.

27





Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

Nadezhda Komarova, Gregor Anželj, Borut Batagelj, Narvika Bovcon, and Franc Solina 2

SLIKOVNI PROSTOR NA UMETNIŠKIH

SLIKAH

Slika 3: Vijoličasta ravnina, ki se prilega 3D pozicijam

obrazov na Renoirjevem Plesu v Le Moulin de la Galette

in in rdeča ravnina 𝑧 = 0 – ploskev slikarskega platna, na

kateri smo zaznali obraze.

upodabljanje prostora: velikosti oseb niso določene s prostorskim

oddaljevanjem, temveč npr. z družbenim statusom.

Slika 1: Auguste Renoir, Ples v Le Moulin de la Galette; vi-

dijo se zaznani obrazi. Velikost obrazov jasno odraža glo-

3

ZAZNAVA OBRAZOV

bino slikarskega prostora.

Predpostavili smo, da so resnični obrazi pri vseh osebah pribli-

žno enako veliki. Zato so bili večji obrazi obravnavani kot bližji površini slike in manjši kot bolj oddaljeni od površine slike oz.

od opazovalca.

Zaznani so bili z orodjem RetinaFace, ki izvede dvodimenzio-

nalno poravnavo in tridimenzionalno rekonstrukcijo obraza [2].

Zasnovan je na osnovi globoke nevronske mreže.

Detektor vrne podatke o obrazih v dvodimenzionalnem pro-

storu površine slike, torej imajo središča obraznih okvirjev in

točke oči, nosu ter ust samo 𝑥 in 𝑦 koordinate. Toda za rekonstrukcijo tridimenzionalnega prostora slike potrebujemo tudi globine

obrazov oz. koordinato 𝑧 . Tridimenzionalni prostor, kot ga prikazuje umetniška slika, se razlikuje od fotografskega predvsem zato, ker slikarji redko dosledno upoštevajo linearno perspektivo. Na

fotografijah je perspektiva po drugi strani bolj konsistentno dolo-

čena. Zato je na njih mogoče z enačbo (1) [6] določiti oddaljenost predmeta od kamere:

𝑓 · ℎ

· ℎ

𝑟

Slika 2: Poslikava v grobnici Unsu. Vsi obrazi se enake

𝑑 =

(1)

ℎ

· ℎ

𝑖

𝑠

velikosti, ves slikarski prostor je zgoščen kar v ravnini po-

Z enačbo (1) izračunamo oddaljenost 𝑑 objekta v milimetrih, če

slikave.

je 𝑓 goriščna razdalja fotoaparata, ℎ

resnična višina objekta v

𝑟

Vsakemu obrazu na slikah smo priredili tridimenzionalne koor-

milimetrih, ℎ višina slike v pikslih, ℎ

višina objekta na sliki v

𝑖

dinate, ki pa niso bile zanesljive v absolutnih vrednostih, temveč pikslih in ℎ

višina senzorja fotoaparata v milimetrih. Z njo so

𝑠

odražajo zgolj relativne razdalje. Nato smo tem obraznim točkam

bile določene tudi oddaljenosti obrazov na slikah v vzorcu, pri

priredili ravnine s smislu vsote najmanjših kvadratov razdalj med

čemer so bile uporabljene vrednosti goriščne razdalje in višine

točkami in iskano ravnino. Pri tistih slikah, ki prikazujejo obraze, senzorja, kvocient katerih opiše, kako vidijo človeške oči. Četudi ki so v spodnjem delu slike opazovalcu blizu, in se višje na sliki je bilo po tem postopku nemogoče določiti natančne tridimen-postopno oddaljujejo (Slika 1), so dobljene ravnine bolj nagnjene

zionalne koordinate obrazov na sliki, so bile določene relativne

v globino kot pri tistih, kjer so vsi obrazi približno na enaki razda-oddaljenosti med obrazi in površino slike. Za namen te raziskave

lji od opazovalca (Slika 2). V takih primerih je dobljena ravnina

tudi niti ni pomembno, če zaznamo vse obraze na sliki.

skorajda vzporedna s površino slike. Rafaelova Atenska šola in

staroegipčanska poslikava v grobnici Unsu imata zelo različni

4

GEOMETRIJSKA INTERPRETACIJA

prostorski ureditvi. Na prvi sliki se obrazi zmanjšujejo z odda-

PROSTORA

ljevanjem ljudi. Ravnina, prirejena točkam na tej sliki, je zato

Parametre

nagnjena v globino (Slika 3).

𝐴, 𝐵 in 𝐶 enačbe ravnine 𝑧 = 𝐴𝑥 + 𝐵𝑦 + 𝐶 smo določili

z minimizacijo funkcije

Po drugi strani tudi poslikava na Sliki 2 prikazuje množico

ljudi, vendar so vsi enake višine in njihovi obrazi so enako veliki.

𝑚

Õ

Ravnina, prirejena obrazom na egipčanski sliki, je zato vzporedna

𝐸 (𝐴, 𝐵, 𝐶 ) =

(𝐴𝑥 + 𝐵𝑦 + 𝐶 − 𝑧 )2,

(2)

𝑖

𝑖

𝑖

ravnini 𝑧 = 0. Za egipčansko slikarstvo je značilno konceptualno

𝑖 =1

28





Določanje slikovnega prostora na umetniških slikah

Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

kjer 𝑚 pomeni število točk in 𝑥 , 𝑦

ter 𝑧

koordinate točk.

𝑖

𝑖

𝑖

Funkcija (2) doseže minimum, ko je ∇𝐸 = (0, 0, 0) [3]. Za gradi-

𝜕𝐸

𝜕𝐸

𝜕𝐸

𝜕𝐸

𝜕𝐸

ent te funkcije velja ∇𝐸 = ( 𝜕𝐸 ,

,

), kjer so

,

in

𝜕𝐴

𝜕𝐵

𝜕𝐶

𝜕𝐴

𝜕𝐵

𝜕𝐶

naslednji.

𝑚

𝜕𝐸

Õ

=

2

𝑥 (𝐴𝑥

+ 𝐵𝑦 + 𝐶 − 𝑧 )

(3)

𝑖

𝑖

𝑖

𝑖

𝜕𝐴

𝑖 =1

𝑚

𝜕𝐸

Õ

=

2

𝑦 (𝐴𝑥

+ 𝐵𝑦 + 𝐶 − 𝑧 )

(4)

𝑖

𝑖

𝑖

𝑖

𝜕𝐵

𝑖 =1

𝑚

𝜕𝐸

Õ

=

2

(𝐴𝑥 + 𝐵𝑦 + 𝐶 − 𝑧 )

(5)

𝑖

𝑖

𝑖

𝜕𝐶

𝑖 =1

Slika 4: Razporeditev razredov pri gručenju na osnovi rav-

Tako množici 3D točk priredimo ravnino z minimizacijo razdalj

nin. Gruče so razpršene in izrazite razmejitve med njimi

med temi točkami in njihovimi slikami na ploskvi v smeri 𝑧 .

ni.

Koeficienti A, B in C so zato rešitve sistema linearnih enačb (6), (7) in (8).

𝑚

𝑚

𝑚

𝑚

Õ

Õ

Õ

Õ

2

𝐴

𝑥

+ 𝐵

𝑥 𝑦

+ 𝐶

𝑥

=

𝑥 𝑧

(6)

𝑖

𝑖

𝑖

𝑖

𝑖

𝑖

𝑖 =1

𝑖 =1

𝑖 =1

𝑖 =1

𝑚

𝑚

𝑚

𝑚

Õ

Õ

Õ

Õ

2

𝐴

𝑥 𝑦

+ 𝐵

𝑦

+ 𝐶

𝑦

=

𝑦 𝑧

(7)

𝑖

𝑖

𝑖

𝑖

𝑖

𝑖

𝑖 =1

𝑖 =1

𝑖 =1

𝑖 =1

𝑚

𝑚

𝑚

Õ

Õ

Õ

𝐴

𝑥

+ 𝐵

𝑦

+ 𝐶

=

𝑧

(8)

𝑖

𝑖

𝑖

𝑖 =1

𝑖 =1

𝑖 =1

5

REZULTATI

Slika 5: Zastopanost posameznih kotov za slike v testni

Slike smo izbrali iz prostodostopne zbirke WikiArt (https://www.

množici.

wikiart.org), kjer so umetnine med drugim razdeljene po žanrih.

ustrezajo slikam, kjer se upodobljene osebe enotno oddaljujejo

Izbrana so bila slikarska dela (potrebno je bilo izločiti npr. kipar-oz. približujejo. Če je bil kot med ravnino, ki je bila prirejena

ska), kjer je bilo upodobljenih več ljudi. Iz zbirke WikiArt so bila obrazom na sliki, in ravnino 𝑧

= 0 izračunan kot natančno 0

zato izbrana dela iz žanrov pastorale (77 slik), allegorical painting stopinj, je to pomenilo, da na sliki ni bilo zaznanega nobenega

(1225 slik), history painting (1377 slik) in literary painting (667

obraza, samo en obraz ali pa so imeli vsi obrazi enake globine. Na slik), in sicer skupaj 3356 slik. Poleg žanra smo imeli tudi podatke intervalu od 0 do 5 stopinj (Slika 5) je bil najpogostejši barok, na o umetnostno zgodovinskem obdobju v katero sodi posamezna

preostalih intervalih po romantika. Ni pa na nobenem intervalu

slika. Zanimalo nas je, kako lahko le na osnovi teh podatkov

močno prevladoval le en slog, saj je odstotek slik, ki je pripadal smiselno razdelimo testno množico slik z metodo gručenja in ali

najpogostejšemu slogu v posameznem intervalu med 20 in 30%.

je ta delitev relevantna z vidika umetnostno zgodovine.

Za določitev korelacije med časom nastanka posamezne slike

Kot kriterij pri gručenju so bile uporabljene enačbe ravnin ter

in kotom med ravninama za to sliko je bil uporabljen Spearma-

kot med prirejeno ravnino in slikovno ravnino 𝑧 = 0. Detektor

RetinaFace

nov koeficient korelacije. Ta predstavlja neparametersko stopnjo

opiše slednje s tremi parametri – rotacijami okoli

povezanosti med spremenljivkama oz. kako dobro je mogoče

osi 𝑥 , 𝑦 in 𝑧 (v pozitivni in negativni smeri). Pri posamezni sliki opisati njun odnos z monotono funkcijo [13]. Koeficient je bil

so bile izbrane rotacije v vsaki smeri z največjimi absolutnimi

0.183, kar predstavlja šibko pozitivno korelacijo. p vrednost je

vrednostmi.

bila v tem primeru blizu 0, kar pomeni, da korelacija med letom

Gručenje je bilo opravljeno z algoritmom BIRCH, implemen-

nastanka slike in kotom, ki odraža slikarsko globino ni linearna.

tiranim s knjižnico scikit-learn. BIRCH (angl. Balanced Iterative

Reducing and Clustering using Hierarchies

Na prikazu na Sliki 6 je razvidno, da če opazujemo obdobje od

) je algoritem gručenja,

približno leta 1700 in vse do danes, povprečen kot med ravninama

ki je posebej prilagojen delu z večjimi podatkovnimi vzorci [7].

za posamezna desetletja blago narašča.

Na Sliki 4 so ekstremne vrednosti izločene. Prikazana je raz-

poreditev slik po gručenju na osnovi ravnin. Bila je izvedena

6

RAZPRAVA

primerjava tega, katerim umetnostnim slogom pripadajo slike v

posameznih razredih. To je bilo mogoče, saj je bila vsaka slika

Glavna hipoteza naše raziskave je bila, ali lahko na nek enostaven v zbirki označena poleg žanra tudi z letom nastanka in ume-način ugotovimo kakšen je slikarski prostor, to je, kako izrazita

tnostnim slogom (barok, romantika ipd.). Število razredov smo

je globinska dimenzija na dani umetniški sliki. Slikarski prostor

omejili na deset. Zaradi izrazite drugačnosti prostorske razpore-

pa je povezan tako z umetnostno zgodovinskim obdobjem v ka-

ditve na nekaterih slikah so bile slednje izločene v posamezne

terega sodi slika, kot tudi z žanrom slike. Na ta način se nam

razrede (2, 5, 6, 7 in 8). Ti razredi vsebujejo le po eno sliko in niso odpira možnost avtomatske klasifikacije velikega števila slik, bo-vidni na Sliki 4.

disi s statističnimi metodami, še bolj pa bi prišle v poštev metode Histogram na Sliki 5 prikazuje zastopanost različnih intervalov

strojnega učenja.

kotov v proučevanem vzorcu. Vidi se, da je bil največji delež

Odločili smo se, da bomo slikovni prostor določali posredno s

slik takih, kjer je bil kot med ravninama med 15 in 20 stopinj,

pomočjo zaznave obrazov. Ko je bil posamezen obraz zaznan z

kar se zdi relativno malo. Večji koti med ravninama večinoma

orodjem RetinaFace, je bil s tem določen obrazni okvir na določeni 29





Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

Nadezhda Komarova, Gregor Anželj, Borut Batagelj, Narvika Bovcon, and Franc Solina na vprašanja, ki si jih umetnostni zgodovinarji do sedaj sploh še

niso upali zastaviti.

LITERATURA

[1]

Katarina Bebar. “Upodabljanje prostora po načelih line-

arne perspektive s pomočjo obogatene resničnosti”. V:

Likovne besede 114 (2020), str. 14–21.

[2]

Jiankang Deng in sod. “RetinaFace: Single-Shot Multi-

Level Face Localisation in the Wild”. V: 2020 IEEE/CVF

Conference on Computer Vision and Pattern Recognition

(CVPR). 2020, str. 5202–5211. doi: 10.1109/CVPR42600.202

0.00525.

Slika 6: Koti med ravninama v odvisnosti od časa nastanka

[3]

David Eberly. “Least Squares Fitting of Data”. V: Magic

slike. Rdeče točke predstavljajo povprečen kot za posame-

Software, Inc. (sep. 2001). url: http://www.sci.utah.edu/~b

zno desetletje.

alling/FEtools/doc_f iles/LeastSquaresFitting.pdf .

[4]

Image resources: Free image resources. Sotheby’s Institute

koordinati 𝑥 in 𝑦 na ravnini slike. Velikost obraznega okvirja pa of Art. url: https://sia.libguides.com/images/f reeimagere

nam je dal še informacijo o relativni oddaljenosti obraza 𝑧 od

sources (pridobljeno 1. 3. 2021).

ravnine slike. Zanesljivost zaznave obrazov na umetniških slikah

[5]

Jure Kovač, Peter Peer in Franc Solina. “Automatic natural

je bil verjetno nekoliko slabši, saj je bil RetinaFace naučen na fo-and man-made scene differentiation using perspective

tografijah obrazov in ne na umetniških upodobitvah [2]. V kakšni

geometrical properties of the scenes”. V: Proceedings 15th

prihodnji raziskavi bi lahko uporabili še dodatne informacije, ki

International Conference on Systems, Signals and Image

jih daje orodje RetinaFace za zaznavo obrazov: orientacija obraza, Processing. Bratislava, 2008, 507–510.

lega oči, nosu in ust, spol ter starost osebe, določeno na osnovi

[6]

Yun Liang. How to measure the real size of an object from

obraza. Poleg tega bi lahko v prihodnjih raziskavah pri analizi slik an unknown picture? Jan. 2015. url: https://www.researc

upoštevali tudi barvno sestavo in druge slikovne značilke, ki jih

hgate.net/post/How- to- measure- the- real- size- of - an- obj

lahko robustno določimo z metodami računalniškega vida [12].

ect- f rom- an- unknown- picture.

Sami smo se ukvarjali npr. z detekcijo črt perspektivne projekcije

[7]

Cory Maklin. “BIRCH Clustering Algorithm Example In

na fotografijah [5, 1].

Python”. V: towards data science (jul. 2019). url: https://to

Četudi smo v našem preizkusu metode likovna dela združevali

wardsdatascience.com/machine- learning- birch- clusterin

v razrede po podobnosti prostorske ureditve, se niso pokazale

g- algorithm- clearly- explained- f b9838cbeed9.

stroge meje med umetnostnimi slogi slik. Informativna pa je bila

[8]

Lev Manovich. “Data Science and Digital Art History”. V:

korelacija med časom nastanka dela in kotom med ravninama.

International Journal for Digital Art History 1 (jun. 2015).

V izbranem vzorcu slik različni umetnostnozgodovinski slogi

doi: 10.11588/dah.2015.1.21631.

niso bili povsem enakomerno zastopani in je bilo npr. veliko del

[9]

Lev Manovich. Museum without walls, art history without

iz romantike. Za vsako zgodovinsko obdobje so najverjetneje

names: visualization methods for Humanities and Media

izrazite določene medsebojne povezanosti teh značilnosti. Usta-

Studies. Oxford Handbook Online, 2013. doi: 10.1093/oxf

ljen umetnostnozgodovinski pristop pri analizi slik je sočasno

ordhb/9780199757640.013.005.

opazovanje dveh ali več del, pri katerih raziskovalec na osnovi

[10]

Mohd Nayeem. “Exploring Other Face Detection Approa-

svojega predhodnega znanja izloči značilne poteze, razlike ipd.

ches(Part 1) — RetinaFace”. V: Analytics Vidhya (jul. 2020).

[8]. Strojno učenje bi na tej točki postalo učinkovito, saj po eni url: https://medium.com/analytics- vidhya/exploring- oth

strani nudi možnost analize velike količine podatkov, odkrivanje

er- f ace- detection- approaches- part- 1- retinaf ace- 9b00f 4

sočasnih povezav med različnimi značilkami, po drugi strani pa

53f d15.

zagotavlja objektivnost matematičnih pristopov. Zato bi bilo v

[11]

Robert Pepperell in Manuela Braunagel. “Do Artists Use

nadaljevanju koristno uporabiti poleg obrazov tudi druge infor-

Linear Perspective to Depict Visual Space?” V: Perception

macije na slikah. Potrebno pa je upoštevati, da delitev umetniških 43 (avg. 2014), 395 – 416. doi: 10.1068/p7692.

del ne more biti absolutna, saj umetnostno zgodovino sestavljajo

[12]

David G. Stork. “Computer Vision and Computer Graphics

posamezni umetniki, vsak od njih ustvarja v svojem lastnem

Analysis of Paintings and Drawings: An Introduction to

slogu, ki lahko do neke mere sledi splošnim trendom obdobja,

the Literature”. V: Computer Analysis of Images and Pat-

vendar nikoli popolnoma. Tudi posamezni likovni umetniki v

terns. Ur. Xiaoyi Jiang in Nicolai Petkov. Berlin, Heidelberg:

času svoje kariere lahko spremenijo svoj umetniški slog.

Springer Berlin Heidelberg, 2009, str. 9–24. doi: 10.1007/9

78- 3- 642- 03767- 2_2.

7

ZAKLJUČEK

[13]

Eric W. Weisstein. “Spearman Rank Correlation Coeffici-

V članku smo pokazali nov pristop k avtomatski analizi umetni-

ent”. V: MathWorld, a Wolfram Web Resource (brez datuma).

ških slik z uporabo metod računalniškega vida. Demonstrirali

url: https://mathworld.wolf ram.com/SpearmanRankCorr

smo, da je z metodo zaznave obrazov na slikah možno nasloviti

elationCoef f icient.html.

tudi bolj kompleksna vprašanja, kot v našem primeru organiza-

[14]

Irving L. Zupnick. “Concept of Space and Spatial Organiza-

cija prostora na slikah. Čeprav rezultati te raziskave morda niso

tion in Art”. V: The Journal of Aesthetics and Art Criticism

tako jasno izraženi in niso reproducirali rezultatov umetnostnih

(dec. 1959), str. 215–221. doi: 10.2307/427268.

zgodovinarjev, se uporaba računalnikov na področju umetnostne

zgodovine kot na sploh v humanistiki šele zares začenja. Raču-

nalniško zasnovane analitične metode bodo omogočile odgovore

30





Automated Hate Speech Target Identification

∗

∗

∗

Andraž Pelicon

Blaž Škrlj

Petra Kralj Novak

andraz.pelicon@ijs.si

blaz.skrlj@ijs.si

petra.kralj.novak@ijs.si

Jožef Stefan Institute

Jožef Stefan Institute

Jožef Stefan Institute

Jamova cesta 39

Jamova cesta 39

Jamova cesta 39

Ljubljana, Slovenia

Ljubljana, Slovenia

Ljubljana, Slovenia

ABSTRACT

POS tag, keyword-based, knowledge graph-based and relational

features) and two types of document embeddings (non-sparse rep-

We present a new human-labelled Slovenian Twitter dataset an-

resentations). To our knowledge, this is one of the first attempts notated for hate speech targets and attempts to automated hate

to solve a Slovene-based text classification task with an autoML

speech target classification via different machine learning ap-

approach. Finally, we trained a model based on the SloBERTa

proaches. This work represents, to our knowledge, one of the

pre-trained language model [11], a state-of-the-art transformer-first attempts to solve a Slovene-based text classification task with based language model pre-trained on a Slovenian corpus and a

an autoML approach. Our results show that the classification task

set of baselines.

is a difficult one, both in terms of annotator agreement and in

Our results show that the context-aware SloBERTa model

terms of classifier performance. The best performing classifier is significantly outperforms all the other models. This result, to-SloBERTa-based, followed by AutoBOT-neurosymbolic-full.

gether with the lower inter-annotator scores, confirms our initial KEYWORDS

assumption that hate speech target identification is a complex

semantic task that requires a complex understanding of the text

hate speech targets, autoML, text features spaces

that goes beyond simple pattern matching. The SloBERTa model

reaches annotator agreement in terms of classification accuracy,

1

INTRODUCTION

indicating a fair performance of the model.

Hate speech and offensive content has become pervasive in social

media and has become a serious concern for government orga-

2

DATA

nizations, online communities, and social media platforms [13].

We collected almost three years worth of all Slovenian Twitter

Due to the amount of user-generated content steadily increasing,

data in the period from December 1, 2017, to October 1, 2020, in

the research community has been focusing on developing com-

total 11,135,654 tweets. The period includes several government

putational methods to moderate hate speech on online platforms

changes, elections and the first Covid-19-related lockdown. We

[6, 1, 8]. While several of the proposed methods achieve good used the TweetCat tool [5], which is developed for harvesting performance on distinguishing hateful and respectful content,

Twitter data of less frequent languages.

several important challenges remain, some of them related to

the data itself. Several studies report both low amounts of hate

2.1

Annotation Schema

speech instances in the labelled datasets, as well as relatively low agreement scores between annotators [9]. The low agreement Our annotation schema is adapted from OLID [13] and FRENK [4].

score between annotators indicates that recognizing hate speech

It is a two-step annotation procedure. After reading a tweet,

is a hard task even for humans suggesting that this task requires

without any context, the annotator first selects the type of speech.

a more broad semantic interpretation of the text and its context

We differentiate between the following speech types:

beyond simple pattern matching of linguistic features.

0 acceptable - non hate speech type: speech that does not

To test this assumption, we have gathered a new Slovenian

contain uncivil language;

1

dataset containing tweets annotated for hate speech targets

.

1 inappropriate - hate speech type: contains terms that are

This dataset builds on the dataset used for detecting hate speech

obscene, vulgar but the text is not directed at any person

communities [3] and topics [2] on Slovenian Twitter. The dataset specifically;

is available in the clarin.si dataset repository with the handle:

2 offensive - hate speech type: including offensive gener-

https://www.clarin.si/repository/xmlui/handle/11356/1398.

alization, contempt, dehumanization, indirect offensive

Next, we addressed the hate speech target classification task

remarks;

by the autoML approach autoBOT [10]. The key idea of autoBOT

3 violent - hate speech type: author threatens, indulges,

is that, instead of evolving at the learner level, evolution is con-desires or calls for physical violence against a target; it

ducted at the representation level. The proposed approach con-

also includes calling for, denying or glorifying war crimes

sists of an evolutionary algorithm that jointly optimizes various

and crimes against humanity.

sparse representations of a given text (including word, subword,

If the annotator chooses either the offensive or violent hate

∗ All authors contributed equally to this research.

speech type, they also include one of the twelve possible targets

1 Slovenian Twitter dataset 2018-2020 1.0: http://hdl.handle.net/11356/1423

of hate speech:

Permission to make digital or hard copies of part or all of this work for personal

• Racism (intolerance based on nationality, ethnicity, lan-

or classroom use is granted without fee provided that copies are not made or guage, towards foreigners; and based on race, skin color)

distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this

• Migrants (intolerance of refugees or migrants, offensive

work must be honored. For all other uses, contact the owner /author(s).

generalization, call for their exclusion, restriction of rights,

Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

non-acceptance, denial of assistance . . . )

© 2021 Copyright held by the owner/author(s).

• Islamophobia (intolerance towards Muslims)

31





Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

Pelicon et al.

• Antisemitism (intolerance of Jews; also includes conspir-

acy theories, Holocaust denial or glorification, offensive

stereotypes . . . )

• Religion (other than above)

• Homophobia (intolerance based on sexual orientation and/or

identity, calls for restrictions on the rights of LGBTQ per-

sons

• Sexism (offensive gender-based generalization, misogynis-

tic insults, unjustified gender discrimination)

• Ideology (intolerance based on political affiliation, politi-

cal belief, ideology. . . e.g. “communists”, “leftists”, “home

defenders”, “socialists”, “activists for. . . ”)

• Media (journalists and media, also includes allegations of

unprofessional reporting, false news, bias)

• Politics (intolerance towards individual politicians, author-

Figure 1: Number of annotated examples for hate speech

ities, system, political parties)

type and target. The class distribution is severely unbal-

• Individual (intolerance towards any other individual due

anced.

to individual characteristics; like commentator, neighbor,

acquaintance )

either individuals or groups), 4% as inappropriate (mostly con-

• Other (intolerance towards members of other groups due

taining swear words), and the remaining 61% as acceptable. In the

to belonging to this group; write in the blank column on

evaluation set, which is a random selection of 10.000 Slovenian

the right which group it is)

tweets, only 69 tweets were labelled as violent by at least one

2.2

Sampling for Training and Evaluation

annotator, which is about 0.3%.

The training dataset for hate speech type includes 34,204 ex-

The training set is sampled from data collected before February

amples and the evaluation dataset includes 6,430 examples. Many

2020. The sampling was intentionally biased to contain as much

of the examples are repeated (by two annotations for the same

hate speech as possible in order to obtain enough organic exam-

tweet), yet conflicting (due to annotator disagreement). The trainples to train the model successfully. A simple model was used to

ing and evaluation sets for hate speech type and target are sum-

flag potential hate speech content, and additionally, filtering by marized in Table 1.

users and by tweet length (number of characters) was applied.

The overall annotator agreement for hate speech target on the

2

50,000

tweets were selected for annotation.

training set is 63.1%, and Nominal Krippendorf Alpha is 0.537.

The evaluation set is sampled from data collected between

The annotator agreement for hate speech target on the evalua-

February 2020 and August 2020. Contrary to the training set, the

tion set is 62.8%, and Nominal Krippendorf Alpha is 0.503. These

evaluation set is an unbiased random sample. Since the evaluation

scores indicate that the dataset is of high quality compared to

set is from a later period compared to the training set, the possi-other datasets annotated for hate speech, yet the relatively low

bility of data linkage is minimized. Furthermore, the estimates

agreement indicates that the annotation task is difficult and am-

of model performance made on the evaluation set are realistic,

biguous even for humans.

or even pessimistic, since the model is tested on a real-world

distribution of data where hate speech is less prevalent than in

3

EXPERIMENTS

the biased training set. The evaluation set is also characterized

We compare different machine learning algorithms on the hate

by a new topic, COVID-19; this ensures that our model is robust

speech target identification task. They belong to one of the fol-

to small contextual shifts that may be present in the test data. For lowing three categories: classical, representation optimization

the evaluation set, 10,000 tweets were selected to be annotated.

and deep learning. The results are presented in Table 1.

2.3

Annotation Procedure

3.1

autoBOT - an autoML for texts

Each tweet was annotated twice: In 90% of the cases by two dif-

With the increasing amounts of available computing power, au-

ferent annotators (to estimate inter-annotator agreement) and

tomation of machine learning has become an active research

in 10% of the cases by the same annotator (to assess the self-

endeavor. Commonly, this branch of research focuses on auto-

agreement). Special attention was devoted to an evening out

matic model selection and configuration. However, it has recently

the overlap between annotators to get agreement estimates on

also been focused on the task of obtaining a suitable representa-

equally sized sets. Ten annotators were engaged for our annota-

tion when less-structured inputs are considered (e.g. texts). This tion campaign. They were given annotation guidelines, a training

work represents, to our knowledge, one of the first attempts to

session and a test on a small set to evaluate their understanding

solve a Slovene-based text classification task with an existing

of the task and their commitment before starting the annota-

autoML approach. The in-house developed method, called au-

tion procedure. The annotation process lasted four months, and

toBOT [10], has already shown promising results on multiple it required about 1,200 person-hours for the ten annotators to

shared tasks (and in extensive empirical evaluation). Albeit it

complete the task.

commonly scores on average worse than large, multi million-

In the training set, intentionally biased in favour of hate speech, parameter neural networks, it remains interpretable and does

about 1% of tweets were labelled as violent, 34% as offensive (to

not need any specialized hardware. Thus, this system serves

as an easy-to-obtain baseline which commonly performs better

2 Some annotators skipped some examples.

than ad hoc approaches such as, e.g. word-based features coupled

32





Hate speech targets

Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

with, e.g. a Support Vector Machine (SVM). The tool has mul-

The SloBERTa-based predictor performed the best, however, is

tiple configurations which determine the feature space that is

also the one which includes the highest number of tunable pa-

being evolved during the search for an optimal configuration of

rameters (more than 100m). The next series of learners are based

both the representation of a given document, but also the most

on autoBOT’s evolution and perform reasonably well. Interest-

suitable learner. We left all settings to default, varying only the ingly, autoBOT variants which exploit only symbolic features

representation type, which was either symbolic, neuro-symbolic-

perform better than the second neural network-based baseline

lite, neuro-symbolic-full or neural. Detailed descriptions of these which was not pre-trained specifically for Slovene – the mpnet.

3

feature spaces are available online . The main difference between The remaining baselines perform worse, albeit having a similar

these variants is that the neuro-symbolic ones simultaneously

number of final parameters to the final autoBOT-based models

consider both symbolic and sub-symbolic feature spaces (e.g.

(tens of thousands at most). The autoBOT-neural, which imple-

tokens and embeddings of the documents), whilst symbolic or

ments the two main doc2vec variants, performs better than the

neural-only consider only one type. The neural variant is based

naïve doc2vec implementation, however not notably better.

on the two non-contextual doc2vec variants and commonly does

To better understand the key properties of the data set which

not perform particularly well on its own.

carry information relevant for the addressed predictive task, we

3.2

Deep Learning

additionally explored autoBOT-symbolic’s ‘report’ functionality,

which offers insight into the importance of individual feature

We trained a modelbased on the SloBERTa pre-trained language

subspaces. Each subspace and each feature in the subspace has a

model [11]. SloBERTa is a transformer-based language model weight associated with it: the larger the weights, the more rel-that shares the same architecture and training regime as the

evant a given feature type was for the learner. Visualization of

Camembert model [7] and is pre-trained on Slovenian corpora.

these importances is shown in Table 2. It can be observed that For fine-tuning of the SloBERTa language model, we first split

character-based features were the most relevant for this task.

the original training set into training and validation folds in the This result is in alignment with many previous results on tweet

90%:10% ratio. We used the suggested hyperparameters for this

classification, where e.g. punctuation-level features can be sur-

model. We used the Adam optimizer with the learning rate of

prisingly effective. Furthermore, relational token features were

2𝑒 − 5 and learning rate warmup over the first 10% of the training also relevant. This feature type can be understood as skip-grams

instances. We used a weight decay set to 0.01 for regularization.

with dynamic distances between the two tokens. This feature

The model was trained for maximum 3 epochs with a batch size of

type indicates that short phrases might have been of relevance.

32. The best model was selected based on the validation set score.

Interestingly, keyword-based features were not relevant for the

We performed the training of the models using the HuggingFace

learner. Further, autoBOT, being effectively a fine-tuned linear

Transformers library [12].

learner, also offers direct insight into fine-grained performances.

We tokenized the textual input for the neural models with the

Examples for the top five features per type are shown in Table 2.

language model’s tokenizer. For performing matrix operations

efficiently, all inputs were adjusted to the same length. After tok-5

CONCLUSION

enizing all inputs, their maximum length was set to 256 tokens.

In this work we present a new dataset of Slovenian tweets anno-

Longer sequences were truncated, while shorter sequences were

tated for hate speech targets. To develop effective computational

zero-padded. The fine-tuned model is available at the Hugging-

models to solve this task we use two approaches: the autoML

4

Face repository .

approach combining symbolic and neural representations and a

3.3

Other Baseline Approaches

contextually-aware language model SloBERTa.

The two mentioned approaches have demonstrated state-of-the-

The results show that the context-aware SloBERTa model

art performance; however, to establish their performance on this

significantly outperforms all the other trained models. This result, new task, we also implemented the following baselines. First,

together with the lower inter-annotator scores, confirm our initial a simple majority classifier to establish the worst-case perfor-assumption that hate speech target identification is a complex

mance. Next, a doc2vec-based representation learner was coupled

semantic task that requires a more complex understanding of

with a linear SVM (doc2vec). The svm-word is a sparse TF-IDF

the text that goes beyond simple pattern matching. However,

representation of the documents coupled with a linear SVM. Sim-

the seemingly simpler models may still offer distinct advantages

ilarly, the svm-char, however, the representations are based on

over the more complex neural models. First, the auto-ML models

characters in this variant. The two alternatives use logistic re-

tested in this work are easily interpretable, offering insights into gression (lr-word, lr-char ). As another strong baseline, we used a textual features which contribute to the classification. On the

multilingual language model called MPNet to obtain contextual

other hand, the neural language models generally work as black-

representations, coupled with an SVM classifier. The baseline

boxes, and the extent of their interpretability is still an open

doc2vec model was trained for 32 epochs with eight threads. The

research question. Second, the auto-ML models are significantly

min_count parameter was set to 2, window size to 5 and vector

more straightforward to deploy as they tend to be much less

size to 512. For SVM and logistic regression (LR)-based learners,

computationally demanding both in terms of RAM and CP U

a grid search including the following regularization values was

usage. Neural language models are able to solve harder tasks

traversed: {0.1, 0.5, 1, 5, 10, 20, 50, 100, 500}.

but their increased number of parameters usually makes them a

considerable challenge to deploy in a scalable fashion.

4

RESULTS

The classification results for the discussed learning algorithms

ACKNOWLEDGEMENTS

are given in Table 1. The results are sorted by learner complexity.

We would like to thank the Slovenian Research Agency for the

3 autoBOT feature spaces: https://skblaz.github.io/autobot/features.html

4

financing of the second researcher (young researcher grant) and

Hate speech target classification model: https://huggingface.co/IMSyPP/hate_

speech_targets_slo

the financial support from research core funding no. P2-103. The

33

Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Pelicon et al.

Table 1: Overview of the classification results. The SloBERTa model significantly outperforms all the other models and reaches inter-annotator agreement.

Classification model

Accuracy

Macro Rec

Macro Prec

Macro F1

majority

40.79%

8.33%

3.40%

4.83%

doc2vec

43.25%

20.65%

20.67%

19.76%

AutoBOT-neural (9h)

45.79%

15.37%

20.00%

16.10%

svm-word

50.39%

21.40%

25.75%

22.02%

lr-word

50.39%

21.40%

25.75%

22.02%

lr-char

51.21%

25.14%

28.17%

26.10%

svm-char

51.90%

23.47%

27.59%

24.20%

AutoBOT-neurosymbolic-lite (4h)

54.26%

27.34%

35.06%

28.90%

Paraphrase-multilingual-mpnet-base-v2 + Linear SVM

55.40%

40.24%

44.29%

41.20%

AutoBOT-symbolic (9h)

55.99%

29.68%

37.86%

31.32%

AutoBOT-neurosymbolic-full (4h)

56.28%

32.29%

37.83%

33.07%

SloBERTa

63.81%

53.03%

45.63%

48.28%

Table 2: Most relevant features per feature subspace. Feature subspaces are ordered relative to their importance. Individual numeric values next to each feature represent that feature’s importance for the final learner. The features are sorted per-type. Note the word_features and their alignment with what a human would associate with hate speech.

char_features

ta s : 3.56

ni d : 2.73

lič : 2.69

ola : 2.58

ne m : 2.5

relational_features_token

pa–3–je : 2.23

pa–2–se : 2.12

v–2–pa : 1.78

ne–1–pa : 1.75

v–2–se : 1.71

pos_features

nnp nn nnp : 1.77

nnp jj nn : 1.75

nnp jj : 1.57

cc : 1.46

nn nn rb : 1.45

word_features

idioti : 1.09

riti : 0.95

tole : 0.95

sem : 0.94

fdv : 0.93

relational_features_char

e–3–d : 1.74

i–3–s : 1.56

n–3–z : 1.48

h–5–v : 1.43

z–4–t : 1.4

topic_features

topic_12 : 0.14

topic_2 : 0.02

topic_0 : 0.0

topic_1 : 0.0

topic_3 : 0.0

keyword_features

007amnesia : 0.0

15sto : 0.0

24kitchen : 0.0

2pira : 0.0

2sto7 : 0.0

work was also supported by European Union’s Horizon 2020 re-

[7]

L. Martin, B. Muller, P. J. Ortiz Suárez, Y. Dupont, L. Ro-

search and innovation programme project EMBEDDIA (grant no.

mary, É. de la Clergerie, D. Seddah, and B. Sagot. 2020.

825153) and the European Union’s Rights, Equality and Citizen-

CamemBERT: a tasty French language model. In Proceed-

5

ship Programme (2014-2020) project IMSyPP (grant no. 875263).

ings of the 58th Annual Meeting of the Association for Com-

putational Linguistics. Association for Computational Lin-

REFERENCES

guistics, Online, (July 2020), 7203–7219.

[1]

P. Badjatiya, S. Gupta, M. Gupta, and V. Varma. 2017. Deep

[8]

A. Pelicon, R. Shekhar, B. Škrlj, M. Purver, and S. Pol-

learning for hate speech detection in tweets. In Proceedings

lak. 2021. Investigating cross-lingual training for offensive

of the 26th international conference on World Wide Web

language detection. PeerJ Computer Science, 7, e559.

companion, 759–760.

[9]

B. Ross, M. Rist, G. Carbonell, B. Cabrera, N. Kurowsky,

[2]

B. Evkoski, I. Mozetic, N. Ljubesic, and P. Kralj Novak.

and M. Wojatzki. 2017. Measuring the reliability of hate

2021. Community evolution in retweet networks. arXiv

speech annotations: the case of the european refugee crisis.

preprint arXiv:2105.06214.

arXiv preprint arXiv:1701.08118.

[3]

B. Evkoski, A. Pelicon, I. Mozetic, N. Ljubesic, and P. Kralj

[10]

B. Škrlj, M. Martinc, N. Lavrač, and S. Pollak. 2021. Autobot:

Novak. 2021. Retweet communities reveal the main sources

evolving neuro-symbolic representations for explainable

of hate speech. (2021). arXiv: 2105.14898 [cs.SI].

low resource text classification. Machine Learning, 110, 5,

[4]

N. Ljubešić, D. Fišer, and T. Erjavec. 2019. The frenk datasets

989–1028. issn: 1573-0565. doi: 10.1007/s10994- 021- 05968-

of socially unacceptable discourse in slovene and english.

x.

(2019). arXiv: 1906.02045 [cs.CL].

[11]

M. Ulčar and M. Robnik-Šikonja. 2021. Slovenian RoBERTa

[5]

N. Ljubešić, D. Fišer, and T. Erjavec. 2014. TweetCaT: a

contextual embeddings model: SloBERTa 2.0. Slovenian

tool for building Twitter corpora of smaller languages.

language resource repository CLARIN.SI. (2021). http://

In Proceedings of the Ninth International Conference on

hdl.handle.net/11356/1397.

Language Resources and Evaluation. European Language

[12]

T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A.

Resources Association (ELRA), Reykjavik, Iceland, (May

Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison,

2014).

S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu,

[6]

S. Malmasi and M. Zampieri. 2018. Challenges in discrimi-

T. Le Scao, S. Gugger, M. Drame, Q. Lhoest, and A. M.

nating profanity from hate speech. Journal of Experimental

Rush. 2019. HuggingFace’s Transformers: State-of-the-art

& Theoretical Artificial Intelligence, 30, 2, 187–202.

Natural Language Processing. ArXiv, abs/1910.03771.

[13]

M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, N. Farra,

5 The content of this publication represents the views of the authors only and is their and R. Kumar. 2019. Predicting the Type and Target of

sole responsibility. The European Commission does not accept any responsibility for use that may be made of the information it contains.

Offensive Posts in Social Media. In Proceedings of NAACL.

34





SiDeGame: An Online Benchmark Environment for

Multi-Agent Reinforcement Learning

Jernej Puc

Aleksander Sadikov

jernej.puc@fs.uni- lj.si

aleksander.sadikov@fri.uni- lj.si

University of Ljubljana

University of Ljubljana

Faculty of Mechanical Engineering

Faculty of Computer and Information Science

Ljubljana, Slovenia

Ljubljana, Slovenia

ABSTRACT

“capture the flag” in first-person view, while using similar input and output schemes to those of human players. However, the

Modern video games present a challenging benchmark for ar-

project is based on an inaccessible implementation of a funda-

tificial intelligence research. Various technical limitations can

mentally shallow game mode, which makes it untenable as a

often lead to playing interfaces that are heavily biased in terms

benchmark for reinforcement learning. Nonetheless, it shows a

of ease of learning for either humans or computers, and it is dif-

type of game that can suit the given requirements.

ficult to strike the right balance. In this paper, a new benchmark The first-person shooter (FPS) genre has many interesting

environment is presented, which emphasises the role of strategic

representatives, some of which have already been repurposed

elements by enabling more equivalent interfaces, is suitable for

as reinforcement learning environments [4, 5]. Unsuitably, they reinforcement learning experiments on widely distributed sys-tend to revolve around simpler content, such as single-player

tems, and supports imitation learning, as is demonstrated. The

or deathmatch scenarios, and are not straight-forward for re-

environment is realised as a team-based competitive game and

searchers to customise. Indeed, accessibility and modifications

its source code is openly available at a public repository.

generally require developer support and cooperation [6].

KEYWORDS

Confronted with this barrier, recent work on Counter-Strike:

Global Offensive (CSGO) [6] resigned itself to the limits of imita-simulation environment, multi-agent system, deep neural net-

tion learning, which could be facilitated by external recording of works, imitation learning, reinforcement learning

public matches. Although CSGO’s standard competitive mode is

1

INTRODUCTION

fittingly strategic, it, instead, focused on the mentioned death-

match, and withheld information from agents by ignoring sound

Reinforcement learning is a powerful concept that can be used

and having them use cropped and downscaled image inputs with

to take on highly complex challenges. In its advancement, video

common information omitted or rendered unrecognisable.

games have emerged as suitable benchmarks: they define clear

This paper also considers imitation learning, in attempt of

goals, allow agents to be compared between themselves and with

establishing a baseline and starting point for eventual reinforce-

humans, and, in comparison to preceding milestones [7], they ment learning, akin to the approach of AlphaGo [7] and AlphaS-begin to incorporate complexities of the real world.

tar [8]. The deep neural network architecture that was used Success has been achieved even in notably difficult tasks, such

in these experiments accepts audio inputs similarly to instances

as the modern games of StarCraft II [8] and Dota 2 [1]. However, from the literature [3], which convert sounds into their frequency being modern games, the authors were forced to compromise:

domain representations using the discrete Fourier transform.

the intricate and graphically intensive input spaces had to be simplified and transformed, while combinatorically overwhelming

3

THE SDG ENVIRONMENT

action spaces were functionally changed until superhuman per-

SiDeGame relies on the game rules of CSGO to provide a founda-

formances could, as well, be attributed to advantages of different tion of notable depth. Crucially, the observation space is simpli-playing conditions.

fied by viewing the environment from a top-down perspective

Search for examples that could compare in strategic depth and

in low resolution to allow modern deep neural networks to pro-

cultivate a competitive player base, while enabling consistent

cess it directly. Consequently, not all aspects of the game could

interfaces and being open to researchers leaves few options but

be reasonably adapted and the action space could not be fully

to create one anew. This has led us to create SiDeGame, the

preserved, yet the playing experience remains egocentric and is

“simplified defusal game” (abbrev. SDG), which incorporates key

largely consistent with true first-person control schemes.

rules of an established video game title in a computationally

and perceptively simpler simulation environment, accessible at:

3.1

Description

https://github.com/JernejPuc/sidegame-py

By the rules carried over from CSGO, two teams of 5 players each

2

RELATED WORK

asymmetrically compete in attack and defence: the goal of one

team is to detonate a bomb at one of two preset locations, while

Importance of an even playing field has been emphasised by

the goal of the other is to prevent them from doing so. After a

authors of the For The Win (FTW) agents [4], playing a form of certain number of rounds, the teams switch sides, and the first

Permission to make digital or hard copies of part or all of this work for personal to pass a threshold of rounds won is declared the winner.

or classroom use is granted without fee provided that copies are not made or In the course of a round, players must navigate a map, an

distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this artificial environment with carefully placed tactical elements of

work must be honored. For all other uses, contact the owner /author(s).

various degrees of passage and cover. Besides weaponry, a player

Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

can utilise auxiliary equipment, the availability of both of which

© 2021 Copyright held by the owner/author(s).

depends on prior survival and economic rewards.

35





Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

Jernej Puc and Aleksander Sadikov

Figure 1: Screenshots of various views encountered in SiDeGame.

Additionally interesting for AI research are aspects of the

be eventually expressed, causing unintended consequences and

game that encourage or demand active coordination, such as

leading to practically unplayable conditions. Training regimes

shared economy, unassigned roles, and imperfect information on

should, for example, reduce the regularity of sampling, bound

teammates’ status and surroundings.

sampling within acceptable thresholds, or use more sophisticated

contextual rules to confirm the agent’s intent.

3.2

Observations

3.4

Execution

The majority of information is provided through the image dis-

play, several screenshots of which can be seen in Figure 1. Images Multi-agent interaction is built upon separate server and client

are generated at a low base resolution of 256×144 pixels, con-

processes regularly exchanging state and event information via

straining the visual elements to be small and carefully placed,

packet communication using the UDP protocol. Simulations are

while remaining easily distinguishable. The human interface sim-

intended to run in real-time, but can have their tick rate and time ply upscales the display with nearest neighbour interpolation,

scale adjusted on both authoritative and local ends.

ensuring equivalence of available information.

With the exception of pixel-wise iteration for tracing lines of

The main view is based on projection of the radar image of a

sight and disregarding the dependencies of imported extensions,

classic CSGO map, Cache, which has only minor vertically over-

the environment is fully implemented in the Python program-

lapping components and thus proved easiest to adapt. Alternative

ming language. Despite clear inefficiencies, this development

views include the inventory wheel, map plan, and communica-

choice streamlines integration with machine learning solutions,

tion wheels. The latter are used to construct short messages of

which predominantly relate to the Python ecosystem, and eases

grounded signs that are appended to the chat log in the sidebar

code readability and customisation. Server and client processes

and allow explicit coordination within the team.

are spawned as single Python processes that are restricted to the

Since projection is egocentric, the prominent role of sound

CP U, enabling mass parallelisation and preserving GP U resources

is retained: other agents out of line of sight may still give off

for learning processes.

some information regarding their relative position, equipment,

For AI agents, development targeted 30 updates per second,

and preparedness. To support the advantages of awareness of

which had been deemed acceptable to human opponents, al-

sound, spatial audio is implemented by convolving sound signals

though higher tick rates can be achieved at both the original

with HRIR filters [2], while amplitude and frequency attenuation (144p) and reasonably upscaled (e. g. 720p) resolutions. This could characteristics were empirically formulated. SiDeGame supports

also be used to speed up the simulation, subject to the computa-

conversion of sounds into spectral vectors, which were used in

tional stability and potential overhead of a specific configuration.

the experiments of this work directly, but can also be accumulated and later processed in the form of a spectrogram.

3.5

Online Play

If there is a delay between action inference and its effect in the In the context of agent evaluation and comparison, capability of

environment, an input analogous to proprioception can also be

online play, where actors, both human and artificial, can com-

considered. It can be trivially simulated by tracking the effective pete remotely and without having to share their program, is an

mouse and keyboard states, i. e. which keys are pressed and how

essential component, as outcomes of adversarial games cannot

the cursor is moving at a given time.

be compared in isolation.

Feasible physical distance between actors in a match is expe-

3.3

Actions

rientially limited by temporal delays that arise from communi-

The game expects 19 binary inputs, corresponding to distinct key

cation steps in the client loop. Inclusion of select networking

presses, one ternary value for scrolling the chat log, and two real concepts, such as client-side state prediction and reconciliation, values for controlling cursor movement. In general, combinations

foreign entity interpolation, and server-side lag compensation,

of these can legitimately be executed simultaneously, providing

should maintain playable conditions to a large extent even among

no benefit to the use of compound actions.

international participants.

It should be noted that some of the keys, pertaining to al-

In extrapolation, online play could also support widely dis-

ternative views or otherwise functional when kept held down,

tributed multi-agent reinforcement learning experiments in the

expect unperturbed presses lasting several seconds. For stochastic form of large-scale population-based training [4, 8]. These are policies, where actions during training are sampled, this dura-subject to training and inference data transfer constraints, which tion could be long enough to cause even minute probabilities to

can be alleviated by slowing down the simulation and having the

36





An Online Benchmark Environment for Multi-Agent Reinforcement Learning Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

Mouse/kbd. state

data pass fewer bottlenecks. In a general configuration, multiple

process groups each reserve a subset of agents (unique model

Spectra

parameters) from the global pool and train them with locally

Mouse/kbd.

distributed processes, while their instances participate in shared state

CNN

RNN

matches, as depicted in Figure 2.

RNN

RNN

FC

Cutout coords.

Global

Model

controller

Image

Conv

params.

CNN

RNN

a)

CNN

Cutout coords.

b)

1

2

N

CNN

Inf.

Opt.

Inf.

Opt.

Inf.

Opt.

Figure 3: The deep neural network architecture used

in our experiments: The visual (red), audio (blue), and

c)

mouse/keyboard state (yellow) encoding pathways con-

Auth. srv.

verge in the recurrent core (green). Moreover, visual encod-

simulations

ing splits off into focused encoding by cropping the input

image as specified by the cutout coordinates (orange).

Figure 2: Online multiplayer reinforcement learning: a)

The global controller process oversees all of the models

many of them and is relatively dense, hinting at the inevitability in a population of agents, ensuring they are not simultane-that not all bits of visual information can be equally accounted

ously being updated by any two process groups. b) Process

for at any given time. Generally, this could be addressed with

groups consist of a local controller and locally distributed

sufficiently high model capacity and appropriate use of attention-

inference, optimisation, and actor processes. c) All actor

based layers. In this work, however, the visual pathway was

instances may interact through remote environments sim-

explicitly split into primary and focused visual encoding, based

ulated by authoritative servers.

on the intuition of human visual perception, where only a small

part of our field of view is perceived in sharp detail.

Instead of ingesting full-scale image data, focused visual en-

3.6

Replay System

coding processes cutouts of much smaller size, so that singular

entities can be unambiguously observed. The cutout coordinates

The packets of information that a client exchanges with the

are obtained from a spatial probability distribution along with fu-server in the course of a session are made to be sufficient to

ture mouse and key states as outputs of the network. If they were, faithfully reproduce the player’s perspective. Byte strings can be instead, determined internally, the cropping operation would

gathered, annotated, and saved as binary files, which can then be

need to be differentiable, which could prove hard to satisfy.

replayed in real-time or manually stepped to inspect and extract

the player’s observations and actions, statistics, or other aspects 4.2

Imitation Learning

of the underlying game state. Replays are an important resource

Imitation learning aims to align the agent’s behaviour to that of

for review and analysis of competitive games, but were primarily

a number of demonstrators, e. g. experienced humans. Among its

included in SiDeGame for the purposes of imitation learning.

basic methods is behavioural cloning, which relies on a dataset

4

SUPERVISED LEARNING BASELINE

𝐷 = {{𝑜

}

}} of pairs of observations

1, 𝑎1

, . . . , {𝑜

, 𝑎

𝑜 and target

𝑁

𝑁

actions 𝑎. The agent with parameterisation 𝜃 is tasked to predict Within the limits of available computational resources and in

for each observation 𝑜

such an action ˆ

𝑎

to satisfy the following

𝑖

𝑖

view of the scale of exemplary projects [1, 8], the estimated level optimisation problem:

of parallelisation, required for meaningful results of reinforce-

𝑁

ment learning experiments in an acceptable time frame, could

∗

1

Õ

𝜃

= arg min

𝐿 (𝑎 , ˆ

𝑎 ),

(1)

not be reached. Instead, a baseline and a starting point for rein-

𝑖

𝑖

𝑁

𝜃

𝑖 = 1

forcement learning was attempted to be achieved with imitation

learning, a form of supervised learning from demonstrations.

where the loss function 𝐿, evaluating similarity between predicted and imitated actions, is dependant on the form of the action space.

4.1

Agent Model Architecture

In this experiment, all outputs of the model were made discrete

and the loss function formulated as an average of cross-entropy

The agent’s policy was modelled as a parameterised deep neural

terms for 𝑇 sub-actions of 𝐶 categories:

network according to the architecture depicted in Figure 3.

The model is composed of common elements: residual convolu-

𝑇



𝐶𝑡



Õ

Õ

𝑡 ,𝑐

𝑡 ,𝑐

tional blocks, recurrent cells, and fully-connected layers, forming

𝐿 (𝑎 , ˆ

𝑎 ) = −

𝑎

log ˆ

𝑎

(2)

𝑖

𝑖

𝑖

𝑖

recognisable sub-networks, such as the recurrent core, which

𝑡 = 1

𝑐 = 1

provides the agent with memory and delay compensation, input

After the gradients are numerically computed with regards to

encoding pathways, and distinct output heads.

the depth of truncated backpropagation through time, parame-

The irregularity of visual encoding stems from the considera-

ter updates are applied using one of the standard optimisation

tion that, while visual elements are simple, the display includes

algorithms.

37





Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Jernej Puc and Aleksander Sadikov

4.3

Demonstrations

It seemed to respond to the presence and movement of other

entities in its vicinity, was able to navigate across the map towards A collection of replays was recorded from a short session between

a tactical objective without hindering collisions and seemingly

10 demonstrators of negligible experience with SiDeGame, but

hide behind cover, but failed to demonstrate offensive behaviour.

with varying degrees of familiarity with related video games.

Seven hours or 770,000 samples of total play were obtained at

5

CONCLUSIONS & FUTURE WORK

30 frames per second, which is unideally low, especially since

samples and episodes are highly correlated.

Attributing the shortcomings of recent works in deep reinforce-

Main sub-actions were extracted from mouse and keyboard

ment learning to inconsistencies between human and AI inter-

states, while focused cutout coordinates would require logistical

faces, a new benchmark environment has been created in the

and sensory measures that were infeasible to procure. Instead,

form of a lightweight multi-agent game with various tools for

the coordinates were manually labelled by viewing replays at 75%

training and evaluation of agents. In addition to addressing these speed and tracing paths between estimated points of contextual

concerns, the simulation environment is based on a renowned tac-

interest. These labels, while not ideal, fared noticeably better

tical video game, providing interesting challenges for AI research, than synthetically generated pseudo-labels.

particularly in domains of sound and explicit communication.

Amid data extraction, observation-action pairs had actions

In approaching the game with imitation learning, the trained

shifted by 6 steps, conditioning the model to predict actions after agent failed to develop practically meaningful behaviours when

a temporal delay close to the human response time.

trained on arguably few demonstrations and was found lack-

ing as a starting point for reinforcement learning experiments.

4.4

Results

Nevertheless, the presented agent model architecture is general

enough to be applicable to other common tasks with standard

The neural network, consisting of approx. 2.9M parameters, and

computer peripherals and lends itself to further experimentation.

training procedure were implemented using the PyTorch package.

Online characteristics of the created environment hint at its po-

For training, a machine with 4 Nvidia 1080Ti GP Us was avail-

tential for large-scale reinforcement learning experiments, with

able. Each GP U corresponded to an optimisation process, which

its accessibility and adaptability allowing the AI community to

received an approximately equal share of training sequences and

explore this and other directions. At the same time, certain com-

progressed them chronologically in batches of 12 sequences and

ponents of the environment that are not specific to AI research

epochs of 30 steps. After every epoch, the gradients with regard to could also prove useful to a wider community, outside of the

the loss were computed with truncated backpropagation through

scope of its primary intent.

time separately on each GP U, synchronously averaged between

them, and used to separately update their copy of the model pa-

REFERENCES

rameters using the AdamW optimisation algorithm with a cosine

1-cycle learning rate schedule.

[1]

Christopher Berner et al. 2019. Dota 2 with large scale

The main training process ran for 300,000 steps over 6 days.

deep reinforcement learning. CoRR, abs/1912.06680. arXiv:

The large variance in the loss in Figure 4 can be attributed to

1912.06680. http://arxiv.org/abs/1912.06680.

differences between game phases and subtler characteristics of

[2]

Fabian Brinkmann et al. 2017. A high resolution and full-

demonstrators, which were found to be distinct from degrees of

spherical head-related transfer function database for differ-

capability and activity.

ent head-above-torso orientations. J. Audio Eng. Soc, 65, 10,

841–848. doi: 10.17743/jaes.2017.0033. http://www.aes.org/

Loss value

Cross entropy terms

e- lib/browse.cfm?elib=19357.

1.0

7

Focal coords.

[3]

Shashank Hegde, Anssi Kanervisto, and Aleksei Petrenko.

Keys (sum)

0.8

6

Keys (average)

2021. Agents that listen: high-throughput reinforcement

5

Ver. mouse mvmt.

0.6

learning with multiple sensory systems. CoRR, abs/2107.02195.

4

Hor. mouse mvmt.

arXiv: 2107.02195. https://arxiv.org/abs/2107.02195.

0.4

3

[4]

Max Jaderberg et al. 2018. Human-level performance in

2

0.2

first-person multiplayer games with population-based deep

1

0.0

0

reinforcement learning. CoRR, abs/1807.01281. arXiv: 1807.

0

75000 150000 225000 300000

0

75000 150000 225000 300000

Update step

Update step

01281. http://arxiv.org/abs/1807.01281.

[5]

Michal Kempka et al. 2016. ViZDoom: A Doom-based AI

research platform for visual reinforcement learning. CoRR,

Figure 4: Loss progression over the course of training. Left:

abs/1605.02097. arXiv: 1605 . 02097. http : / / arxiv. org / abs /

Average loss value enveloped by minimum and maximum

1605.02097.

evaluations. Right: Averages of constituent terms.

[6]

Tim Pearce and Jun Zhu. 2021. Counter-Strike deathmatch

with large-scale behavioural cloning. CoRR, abs/2104.04258.

Figure 4 shows that, by the end of the training schedule, only arXiv: 2104.04258. https://arxiv.org/abs/2104.04258.

imitation of focal coordinates leaves room for improvement,

[7]

David Silver et al. 2016. Mastering the game of go with

while other terms in the loss function have already overfitted.

deep neural networks and tree search. Nature, 529, 7587,

Due to the relatively small size of the network, overfitting had

484–489. issn: 1476-4687. doi: 10.1038/nature16961. https:

been underestimated, although the outcome could have been

//doi.org/10.1038/nature16961.

inevitable with the given amount of data.

[8]

Oriol Vinyals et al. 2019. Grandmaster level in StarCraft II

In practice, the trained agent’s behaviour was greatly sensitive

using multi-agent reinforcement learning. Nature, 575, 7782,

to even imperceptibly slight changes in starting conditions. Its

350–354. issn: 1476-4687. doi: 10.1038/s41586- 019- 1724- z.

switching between alternative views was debilitatingly chaotic

https://doi.org/10.1038/s41586- 019- 1724- z.

and had to be suppressed to allow expression of other behaviours.

38





Question Ranking for Food Frequency Questionnaires Nina Reščič

Mitja Luštrek

nina.rescic@ijs.si

mitja.lustrek@ijs.si

Department of Intelligent Systems, Jožef Stefan Institute

Department of Intelligent Systems, Jožef Stefan Institute

Jožef Stefan International Postgraduate School

Ljubljana, Slovenia

Ljubljana, Slovenia

ABSTRACT

This paper explores the ranking of questions and is the next

step from our previous work. With ranking the questions by im-

Food Frequency Questionnaires (FFQs) are probably the most

portance and asking them in the ranked order, it can be expected

commonly used dietary assessment tools. In the WellCo project,

that quality of predictions will improve with each additional

we developed the Extended Short Form Food Frequency Question-

answer and we are not limited with the constraint that certain

naire (ESFFFQ), integrated into a mobile application, in order to

number of questions should be answered. We addressed the prob-

monitor the quality of users’ nutrition. The developed question-

lem as a single-target problem for classification and regression.

naire returns diet quality scores for eight targets — fruit intake, vegetable intake

Additionally, we tested the algorithms on different representa-

, fish intake, salt intake, sugar intake, fat intake, fi-

bre intake

tions of features for both type of problem. The findings of this

and protein intake. This paper explores the single-target

paper could be used for setting the baseline for our future re-

problem of question ranking. We compared the question ranking

search.

of the machine learning algorithms on three different types of

features for classification and regression problems. Our findings

2

METHODOLOGY

showed that the addressing problem as a regression problem

performs better than treating it as a classification problem and

2.1

Problem outline

the best performance was achieved by using a Linear Regression

In our previous research [6, 7] we tried to find subsets of questions on features, where answers were transformed to frequencies of

that would allow us to ask the users about their dietary habits

consumption of certain food groups.

with as few questions as possible and still get sufficient information to evaluate their nutrition. For this we used the Extended

KEYWORDS

Short Form Food Frequency Questionnaire (ESFFFQ) [5]. The nutrition monitoring, FFQs, question ranking

questionnaire returns diet quality scores scores for fruit intake, vegetable intake, fish intake, salt intake, sugar intake,fat intake, fibre intake and protein intake. We calculate the nutrient intake

1

INTRODUCTION

amounts and from there we further calculate the diet quality

Adopting and maintaining a healthy lifestyle has become ex-

scores.

tremely important and healthy nutrition habits represent a major

The questionnaire was included in a mobile application, where

part in achieving this goal. Self-assessment tools are playing a big the system asked the users about their diet with one or two

role in nutrition monitoring and many applications are including

questions per day. The answers were saved into a database and

Food Frequency Questionnaires (FFQs) as a monitoring tool, due

every fortnight the quality scores were recalculated. As it could

to they in-expensiveness, simplicity and reasonably good assess-

happen that the users did not answer all the questions by the

ment [8, 3]. An FFQ is a questionnaire that asks the respondents time the recalculation was done, it was of great importance to ask about the frequency of consumption of different food items (e.g.,

the questions in the right order. In the terminology of machine

"How many times a week do you eat fish?"). In the EU-funded learning this would be a feature ranking problem. We explored

project WellCo we developed and validated an Extended Short

the problem as a set of single-target problems — separately for

Form Frequency questionnaire (ESFFFQ) [5] that was included individual outcome scores. As three of the diet quality scores

in a health coaching application for seniors.

(fruit, vegetable and fish intake) are only dependent on one or two Cade et al. [2] suggest that for assessment of dietary data short questions, the problem of feature ranking is trivial. Therefore we FFQs could be sufficient and that marginal gain in information is

explored the problem for the remaining five targets — fat intake,

decreasing with extensive FFQs. Block et al. [1] concluded that sugar intake, fibre intake, protein intake and salt intake.

longer and reduced return comparable values of micronutrients

intake. Taking this idea a step forward, we explored the possi-

2.2

Dataset

bilities to get the most information even if one does not answer

We got the answers to ESFFFQ from 92 adults as a part of the

the whole questionnaire. In our previous work we explored how

WellCo project and additionally from 1039 adults included in

to find the smallest set of questions that still provides enough

SIMenu, the Slovenian EUMenu research project [4]. The ques-information by applying different feature selection techniques

tions included in the ESFFFQ were a subset of the questions in the

[6, 7].

FFQ in SIMenu. Furthermore, the answers (consumption frequen-

cies) were equivalent in both questionnaires, and consequently

Permission to make digital or hard copies of part or all of this work for personal extracting the answers from SIMenu and adding them to the

or classroom use is granted without fee provided that copies are not made or answers from the ESFFFQ was a very straightforward task.

distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner /author(s).

2.3

Feature ranking

Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

To do the experiments, we first randomly split the data into

© 2021 Copyright held by the owner/author(s).

validation and training sets in ratio 1:3. To train the models and 39





Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

Reščič et al.

rank the features we then used 4-fold cross-validation on the

frequencies or amounts, we get better results on the validations

training set and used the average feature importance from all 4

set than with RF.

folds as the final feature ranking.

The ranked features were used to predict quality scores (clas-

sification problem) and nutrient amount (regression problem),

by adding the question as they were ranked. In this paper we

present the results for two commonly used machine learning

algorithms — Logistic/Linear Regression and Random Forest

Classifier/Regressor. To rank the features we used the absolute

value of the coefficients in the Linear/Logistic Regression and the feature_importance attribute as implemented in the Random

Forest Classifier/Regressor in the sklearn library.

Additionally we compared different feature representations —

Figure 1: Results on validation set for fat intake

features where answers are represented with nominal discrete

equidistant values (once per week is represented as integer 2),

Sugar. For sugar intake the story is very similar. RF performed

features where answers were transformed into frequencies of

fairly well for the first few questions and then the accuracy began consumption (once per week is represented as approx. 0.14 per

to fall. The best performing algorithm was the LR on the features

day) and features where answers were transformed into amounts

(Figure 2, where the answers were transformed into frequencies.

of nutrients (once per week is represented as grams/day). In the

last represenation, the features differed between the targets sugar, fat, salt, fibre and protein. We ran the experiments for five diet categories (fat intake, sugar intake, fibre intake, protein intake and salt intake) for both classification and regression problem. In both cases we started with the best ranked question, trained the

model and compared results on train and validation sets. Then

we added the second best ranked question, trained the models

and compared the results. We added the questions one by one

until the last one.

Figure 2: Results on validation set for sugar intake

3

RESULTS

3.1

Classification problem

Fibre. For fibre intake the RF algorithms performed better for a

very long time (Figure 3) and it reached the best accuracy after For classification we tried to predict the quality scores for each of 6 questions. The LR performed worse, and it did similarly badly

the five nutrition categories. There were three scores - 2 (good), on the training set as well.

1 (medium) and 0 (bad). The distribution of the scores for all the categories is shown in Table 1.

Table 1: Distribution of target values for classification

Score

Fat

Sugar

Fibre

Protein

Salt

2

51%

74%

26%

79%

32%

1

31%

14%

22%

13%

47%

0

18%

12%

52%

8%

21%

Figure 3: Results on validation set for fibre intake

We compared Random Forest Classifier and Logistic Regres-

sion for three different types of features - discrete equidistant

Protein. For protein intake (Figure 4) the results are similar to answers, answers transformed to frequencies and answers trans-those for fibre intake. However, in case of protein intake the

formed to amounts.

majority class is 79% and most of the algorithms almost never

Fat.

exceeded this value.

For Random Forest (RF) there was not a big difference be-

tween the three representations of the features. With all three,

the highest accuracy on the validation set (79%) is achieved with

5 questions and afterwards the accuracy starts falling and stays

on the interval between 75% and 79%. This clearly indicates over-

fitting, which is confirmed by the fact that the accuracy for RF

on the training set was 100% from the fifth question. A similar

situation happened for all the remaining targets and will not be

repeated in the following subsections. On the training set Lo-

gistic Regression (LR) had worse results than the RF and it also

performed the worst from all algorithms when run on the dis-

Figure 4: Results on validation set for protein intake

crete features. However, when the features are transformed into

40





Question Ranking for Food Frequency Questionnaires

Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

Salt. For salt intake the best model is the LR on the answers

transformed to amounts. As seen in Figure 5, it exceeded the RF

algorithms for almost 20% from eleventh added question on and

predicted the quality scores with more than 90% accuracy with

only 14 questions, which is half of the questionnaire.

Figure 7: Results on validation set for sugar intake

Fibre. Classification for fibre intake was very bad, however, when considering it as a regression problem, the LR on ’frequency’

Figure 5: Results on validation set for salt intake

features’ predicted the amounts with error smaller than 2 grams

when more than eleven questions were used 8. Considering Table

2 this means that predicting how bad/good the fibre intake was 3.2

Regression problem

done better then predicting if it is bad or good.

While knowing the quality score is a valid first information

whether one’s diet is good or not, generally more interesting

information is how good (or how bad) it really is. Therefore it is reasonable to look at the same problem as a regression problem

, where we try to predict the actual amount (in grams) of con-

sumed nutrients. Again we explored the performance of Random

Forest Regressor (RF) and Linear Regression (LR) on the three

previously described feature sets.

Table 2: Nutrient intake in grams/day to quality scores

Score

Fat[g]

Sugar[g]

Fibre[g]

Protein[g]

Salt[g]

Figure 8: Results on validation set for fibre intake

2

≤ 74

≤ 55

≥ 30

≥ 55

≤ 6

1

else

else

else

else

else

0

≥ 111

≥ 82

≤ 25

≤ 45

≥ 9

Protein. For protein intake all algorithms had a similar perfor-

Fat.

mance up to ten included questions, however, the LR on the

The best performing algorithm for fat intake was the LR on

’frequency features’ started to perform better and better with

the answers transformed to frequencies. The overfitting of the RF

each added questions and predicted the amount of protein con-

is even more visible than with the classification problem as the

sumption with error of 5 grams (Figure 9).

errors for these models did not fall under 20 grams even if all the questions were used, while the error of the LR on the feature sets where the answers are transformed to frequencies or amounts

was smaller than 5 grams from eleven included questions (Figure

6).

Figure 9: Results on validation set for protein intake

Figure 6: Results on validation set for fat intake

Salt. Similarly to the protein intake all algorithms performed

Sugar. Similarly to fat intake, LR with the ’frequency features’

with a comparable error up to nine included questions, and after

performed best (Figure 7). However the LR on the ’amounts fea-that LR using the features transformed to frequencies started to

tures’ performed well for more than 15 questions, but predicted

perform way better and predicted salt intake with error smaller

the worst for the first eleven included questions.

than 1 gram with eleven included questions (Figure 10).

41





Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

Reščič et al.

several nutrition quality scores but still would want to avoid

answering too many questions. Next, probably more important

and interesting research problem, is how to use the answers

already provided to our advantage — so instead of statically

ranking the questions we would rather explore how we could

improve the prediction performance by dynamically ranking and

asking the questions.

ACKNOWLEDGMENTS

Figure 10: Results on validation set for salt intake

WellCo Project has received funding from the European Union’s

Horizon 2020 research and innovation programme under grant

agreement No 769765.

3.3

Discussion

The authors acknowledge the financial support from the Slove-

nian Research Agency (research core funding No. P2-0209).

We compared performance of feature ranking for two different

The WideHealth project has received funding from the Euro-

machine learning algorithms on three different types of features

pean Union’s Horizon 2020 research and innovation programme

for both classification and regression problems. While the classi-

under grant agreement No 95227.

fication problem might give the general idea about one’s dietary

habits, it is inclined towards overfitting even for very simple mod-REFERENCES

els, such as Logistic Regression, while more complex algorithms,

[1]

Block G, Hartman AM, and Naughton D. 1990. A reduced

Random Forest Classifier in our case, are even more subject to

dietary questionnaire: development and validation. Epi-

this deficiency. By predicting amounts instead of quality scores,

demiology, 1, 58–64. doi: 10 . 1097 / 00001648 - 199001000 -

one gets information about how good/bad the dietary habits are

00013.

instead of just if they are good or bad.

[2]

Cade J., Thompson R., Burley V., and Warm D. 2002. Devel-

Transforming features from discrete equidistant values to fre-

opment, validation and utilisation of food-frequency ques-

quencies or amounts of nutrients proved to be a very good ap-

tionnaires – a review. Public Health Nutrition, 5, 4, 567–587.

proach. The transformation gave better results for both classifi-

doi: 10.1079/PHN2001318.

cation and regression problem for both Random Forest Regres-

[3]

Shim JS, Oh K, and Kim HC. 2014. Dietary assessment meth-

sor/Classifier and Logistic/Linear Regression. While the perfor-

ods in epidemiologic studies. Epidemiol Health, 36. doi:

mance of both algorithms on features transformed to frequencies

10.4178/epih/e2014009.

and features transformed to amounts for the classification prob-

[4]

Gregorič M, Blaznik U., Delfar N., Zaletel M., Lavtar D.,

lem was comparable, and Linear Regression on features trans-

Koroušić-Seljak B., Golja P., Zdešar Kotnik K., Pravst I.,

formed to amounts gave markedly better results for salt intake,

Fidler Mis N., Kostanjevec S., Pajnkihar M., Poklar Vatovec

the Linear Regression on features transformed to frequencies

T., and Hočevar-Grom A. 2019. Slovenian national food

outperformed all other combinations of features and algorithms

consumption survey in adolescents, adults and elderly :

for the regression problem for all of the targets. The reason for

external scientific report. EFSA Supporting Publications, 16,

this is that linear regression on amounts is a very good match in

11, 1729E. doi: 10.2903/sp.efsa.2019.EN- 1729.

the sense that the target variable (total amount) is the sum of all

[5]

Reščič N., Valenčič E., Mlinarič E., Seljak Koroušić B., and

features (partial amounts).

Luštrek M. 2019. Mobile nutrition monitoring for well-

Transforming the features to frequencies instead to amounts

being. In (UbiComp/ISWC ’19 Adjunct). Association for

has another advantage — frequencies transformed to amounts are

Computing Machinery, London, United Kingdom, 1194–1197.

specific to each target, while features transformed to frequencies doi: 10.1145/3341162.3347076.

are equal for all targets. This is an important finding for possible

[6]

Reščič N., Eftimov T., Koroušić Seljak B., and Luštrek M.

future research where one would address ranking of questions as

2020. Optimising an ffq using a machine learning pipeline

a multi-target problem. Additionally, regression problem using

to teach an efficient nutrient intake predictive model. Nu-

Linear Regression on features transformed to frequencies could

trients, 12, 12. doi: 10.3390/nu12123789.

solve as a baseline for future experiments.

[7]

Reščič N., Eftimov T., and Seljak Koroušić B. 2020. Com-

4

CONCLUSION AND FUTURE WORK

parison of feature selection algorithms for minimization of

target specific ffqs. In 2020 IEEE International Conference on

Ranking the questions of FFQs when it could be expected that not

Big Data (Big Data), 3592–3595. doi: 10.1109/BigData50022.

all of the questions will be answered is an important step when

2020.9378246.

building models for predicting quality of one’s diet. In this paper

[8]

Thompson T. and Byers T. 1994. Dietary assessment re-

we compared two feature ranking algorithms on three different

source manual. The Journal of nutrition, 124, (December

types of features for classification and regression problem for

1994), 2245S–2317S. doi: 10.1093/jn/124.suppl_11.2245s.

five targets. The findings of this paper show that considering

the problem as a regression problem on features transformed

to frequencies and using a simple machine learning algorithms

(Linear Regression) gives the best results for all five targets and provides baseline for future experiments.

There are several possibilities for future work. As hinted in the

previous section, the question of multi-target question ranking

is one of the first that appears — one might want to monitor

42





Daily Covid-19 Deaths Prediction For Slovenia

David Susič

"Jožef Stefan" Institute

Ljubljana, Slovenia

david.susic@ijs.si

ABSTRACT

related government interventions (school closing, workplace clos-

ing, cancel public events, restrictions on gatherings, close pub-

In this paper, models for predicting daily Covid-19 deaths for

lic transport, stay at home requirements, restrictions on inter-

Slovenia are analysed. Two different approaches are considered.

nal movement, international travel controls, public information

In the first approach, the models were trained on the fist wave

campaigns, testing policy, contact tracing, and facial coverings), dataset of state intervention plans, cases and country-specific

Covid-19 related cases and deaths, and some static data, in par-

static data for 11 other European countries. The models with the

ticular the country’s population, population density, median age,

best performance in this case were the k-Nearest Neighbors re-

percentage of people over 65, percentage of people over 70, gdp

gressor and the Random Forest regressor. In the second approach,

per capita, cardiovascular death rate, diabetes prevalence, per-

a time-series analysis was performed. The models used in this

centage of female and male smokers, hospital beds per thousand

case were Seasonal Autoregressive Integrated Moving Average

people, and life expectancy. To suppress anomalies in registered

Exogenous and Feed forward Neural Network. For comparison,

cases on Sundays and holidays, a 7-day moving average was

all 4 models were tested on the second wave for Slovenia and

used for both cases and deaths. The dataset covers the European

the model with the best performance was Feed forward Neural

countries of Slovenia, Italy, Hungary, Austria, Croatia, France,

Network, with a mean absolute error of 1.34 deaths.

Germany, Poland, Slovak Republic, Bosnia and Herzegovina, and

KEYWORDS

the Netherlands from January 22, 2020 to December 11, 2020. All

of the countries chosen for this study are geographically next to

Covid-19, deaths, predictions, machine learning

one another and are thus expected to have similar course of epi-

demic. The data on government interventions, cases and deaths

are derived from the "COVID-19 Government response tracker"

1

INTRODUCTION

database, collected by Blavatnik School of Government at Oxford

The aim of this analysis is to find out whether we can predict

University [4]. The intervention values range between 0-4 and Covid-19 deaths for Slovenia based on the characteristics of the

represent their strictness, for example, if only some or all schools epidemic in other European countries, and whether we can pre-are closed. The static data are collected from a variety of sources dict deaths based on a time series analysis of historical data (e.g.

(United Nations, World Bank, Global Burden of Disease, Blavat-

predicting for the second wave based on the first wave infor-

nik School of Government, etc.) [3]. The original data are publicly mation). The main advantage of the first approach is that we

available online. The processed data used for the purpose of this

do not need historical case and death data for the country for

study can be found online at https://repo.ijs.si/davidsusic/covid-

which we are making a prediction (in this case Slovenia), while

seminar-data.

the second approach is generally more accurate but relies on

historical death data. The aim is also to find out which of the two 3

METHODS AND MODELS

approaches provides more accurate predictions. It is important

Two different approaches were considered for the analysis. For

to note that although this is a study for Slovenia, the results can the first part of the analysis, referred to as the country-specific be interpreted as a general assessment of the effectiveness of the approach, the models were trained on the data of government

methods described for predicting Covid-19 deaths and can be

intervention plans, cases, deaths and country-specific static data applied to any country for which the data are available.

for the 10 other European countries, with the aim of predicting

The data used in this analysis are described in Section 2. Sec-deaths for Slovenia. In this case, the predictions were made for

tion 3 provides a description of the approaches and the models.

each day, disregarding the time order. For the second part of the

Section 4 contains a discussion of the determination of the opti-analysis, a time series prediction was performed, using only the

mal parameters of the selected models. The results are given in

daily deaths for Slovenia as data.

Section 5. The conclusion, along with ideas for possible improvements, is given in Section 6.

3.1

Country-Specific Approach

2

DATA DESCRIPTION AND PREPARATION

In the country-specific approach, the selection of the base model

was very important, as models that perform worse than the base

The data used in this paper consist of daily Covid-19 related

model are not worthy of interpretation. The baseline was defined

features at the country level. It contains 12 different Covid-19

as

𝑁

(𝑡 ) = 𝑁

(𝑡 − 14) · 𝑀,

(1)

deaths

cases

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or where 𝑀 = 0.023 is the mortality rate factor of those infected,

distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this calculated as a weighted average of the mortality rates of the

work must be honored. For all other uses, contact the owner /author(s).

countries included in this study [2], and 𝑡 denotes a specific Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

day. This simple model implies, that the number of deaths on

© 2021 Copyright held by the owner/author(s).

a given day 𝑡 is equal to the number of new infections on the

43





Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia David Susič

day 𝑡 − 14, multiplied by the mortality rate factor. The regres-

sor model that were tested are: Random Forest (RF), k-Nearest

0.10

Neighbors (KNN), Stochastic Gradient Descent, Ridge, Lasso, and

Epsilon-Support Vector. Description of all of the models can be

Base

found in the Python scikit-learn documentation [5]. The two that MAE 0.05

KNN

performed significantly better than the baseline were the KNN

regressor and RF regressor. Other regression models performed

RF

the same or worse than the baseline model and were thus not

used in the further analysis. All models were tested in the 10-fold

−28 −24 −20 −16 −12 −8

−4

0

cross-validation with the performance measures mean absolute

Lookback [days]

2

error (MAE), mean squared error (MSE) and 𝑅

score on the data

subset that does not include Slovenia. The measures are defined

as:

𝑛−1

1 Õ

0.04

MAE (𝑦, ˆ

𝑦 ) =

|𝑦 − ˆ

𝑦 |,

(2a)

Base

𝑖

𝑖

𝑛

𝑖 =0

KNN

𝑛−1

MSE 0.02

1 Õ

RF

MSE (𝑦, ˆ

𝑦 ) =

(𝑦 − ˆ

𝑦 )2 ,

(2b)

𝑖

𝑖

𝑛

𝑖 =0

0.00

Í𝑛

(𝑦 − ˆ

𝑦 )2

𝑖

𝑖

2

𝑖 =1

𝑅

(𝑦, ˆ

𝑦 ) = 1 −

,

(2c)

−28 −24 −20 −16 −12 −8

−4

0

Í𝑛

(𝑦 − ¯

𝑦 )2

𝑖

𝑖

𝑖 =1

Lookback [days]

where

ˆ

𝑦 is the predicted value of the 𝑖 -th sample, 𝑦

is the

𝑖

corresponding true value, n is the sample size and ¯

𝑦 is the average

Í𝑛

true value ¯

𝑦 = 1

𝑦

.

1

𝑛

𝑖 =1

For each sample, additional features of the government inter-

1.0

ventions and cases were added for the previous days. The number

Base

of previous days was defined using the lookback parameter. Mod-

re

KNN

els were tested for lookback values between −28 and 0 days. The

0.8

sco

RF

comparison is shown in Figure 1. It can be seen that the perfor-2 R

mance decreases in the range where the lookback is shorter than

0.6

14 days, but does not increase in the range where the lookback

exceeds this value. The main reason for this is probably the fact

−28 −24 −20 −16 −12 −8

−4

0

that most deaths occur within the first 14 days of infection. A

Lookback [days]

lookback of 14 days was used for further analysis as it was found

to be the most appropriate.

3.2

Time-Series Approach

Figure 1: 10-fold cross validation performance measure

In the second approach, a time series analysis was performed. In

of the models for different lookback parameter. The mea-

this case, only daily deaths for Slovenia were used as data. The

sures and its units are are: MAE [deaths/100k] (top), MSE

models used in this case were Seasonal Autoregressive Integrated

[deaths2/100k2] (middle) and 2

𝑅

score (bottom)

Moving Average Exogenous (SARIMAX(p,d,q)(P,D,Q,m) [6] and Feed forward Neural Network (FFNN) [1].

The former is a combination of several different algorithms.

Table 1: 10-fold cross-validation performance measures of

The first is the autoregressive AR (p) model, which is a linear

the predictions for 21 days for SARIMAX and FFNN algo-

model that relies only on past p values to predict current values.

rithms.

The next is the moving average MA (q) model, which uses the

residuals of the past q values to fit the model accordingly. The I(d) 2

MAE

MSE

score

represents the order of integration. It represents the number of

𝑅

2

[deaths]

[deaths ]

times we need to integrate the time series to ensure stationarity.

The X stands for exogenous variable, i.e., it suggests adding a

SARIMAX

1.13

4.81

0.71s

separate other external variable to measure the target variable.

FFNN

0.53

1.15

0.88

Finally, the S stands for seasonal, meaning that we expect our

data to have a seasonal aspect. The parameters P, D, and Q are the seasonal versions of the parameters p, d, and q, and the parameter m represents the length of the cycle.

n-fold cross validation. This means, that there is no random

The FFNN structure included 10 input perceptrons - one for

shuffling of the data. The test set must always be the final portion each death value in the last 10 days, a hidden layer of 64 percep-of the data - the final part of the date range. The concept of

trons, and 1 output perceptron.

forward chaining is shown in Figure 2. The results of the 10-fold Since the future data of the time series contain the information

cross-validation of the predictions for 21 days are shown in Table about the past, a forward chaining approach was performed for

1.

44





Daily Covid-19 Deaths Prediction For Slovenia

Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

each splitting decision, the results and hence the performance

measures are also somewhat random. However, they do follow a

certain trend that becomes apparent when a polyfit is applied. To

reduce the randomness of the results, the average of 3 separate

predictions was calculated for each number of trees.

To determine the best parameters of the SARIMAX model, the

auto_arima algorithm from the Python pmdarima library was

used [7]. The algorithm analyzes the given data and determines the best model and its parameters for that data. In this case, the selected model was SARIMAX(2, 1, 4)(4, 1, 1, 12).

Figure 2: Forward chaining approach to time-series n-fold

In the case of FFNN, the parameter selection was omitted - the

cross-validation.

same model structure was always used.

5

RESULTS

4

MODELS’ PARAMETERS SELECTION

With the optimal parameters selected, the graphs of the pre-

The next step was to determine the optimal parameters of the

dictions can be plotted. The predictions of the country-specific

selected models. For this purpose, the regressor models were

approach are shown in Figure 5.

trained on the same dataset used in the 10-fold cross-validation

and tested on the data for Slovenia. For this particular case, different model parameters were tested to see which performed best.

The MAE [deaths/100k] as a function of parameters K for the

50

Random Forrest Regressor

KNN and as a function of the number of trees for RF are shown

K Nearest Neighbours Regressor

in the Figures 3 and 4, respectively.

40

Baseline

Truth

30

Deaths 20

0.15

10

MAE

0

K = 55

0.10

r-22

r-21

y-21

0

50

100

150

eb-21

Jan-22

F

Ma

Ap

Ma

Jun-20

Jul-20

Aug-19

Sep-18

Oct-18

Nov-17

K

Figure 5: Deaths for Slovenia from 22.1.2020 to 11.12.2020.

Figure 3: MAE of the KNN regressor as function of K.

Models’ predictions, compared to true values.

All models predicted the number of deaths for the first epi-

demic wave fairly accurately. As a result of the unrepresentative

reporting of Covid-19 cases for the second wave, the base model

polyfit

predicts a much lower number of daily deaths. We can also see

0.15

that the KNN regressor predicts the same value from a certain day

forward. The reason for this is most probably that the algorithm

MAE

always finds the same k=55 neighbors, thus always predicts the

0.10

same value. To avoid this, a larger dataset would be required.

MAE for RF, KNN and baseline are shown in Table 2.

50

100

150

200

Table 2: MAE comparison of the country-specific models

Number of trees

for the interval from 22.1.2020 to 11.12.2020.

Figure 4: MAE of the RF regressor as a function of the num-

RF

KNN

baseline

ber of trees.

MAE [deaths]

5.41

5.39

5.48

For the KNN regressor, MAE has a minimum at 𝐾 = 55, while

The predictions for the time interval between 21.11.2020 and

for RF the fitting function shows that the appropriate number of

11.12.2020 for the time-series approach are shown in Figure 6.

trees is 100, since the model does not improve with additional

MAE for FFNN and SARIMAX, shown in Table 3, are substantion-trees at this point. It is important to note that since RF is ran-

ally lower than MAE of the country-specific models. However,

dom in the sense that it randomly selects a subset of features at

the accuracy decreases as the prediction time interval increases.

45





Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia David Susič

Table 3: MAE comparison of the time-series models for the

Table 4: MAE comparison of the models for the interval

interval from 21.11.2020 to 11.12.2020.

from 1.11.2020 to 11.12.2020.

FFNN

SARIMAX

FFNN

SARIMAX

RF Reg.

KNN Reg.

MAE [deaths]

1.24

2.27

MAE [deaths]

1.34

1.67

6.46

8.85

It can be seen that in this case the time-series approach is

more accurate than the country-specific one. However, for longer

time intervals, the country-specific approach is better because

it does not rely on past data. It is important to note that the

country-specific models’ error are actually lower when making

predictions from the start of the epidemic. The reason for this is that for the first 6 months, the numbers of deaths were very low

50

as can be seen in the Figure 5.

The best performing model overall is the FFNN with the MAE

of 1.34 deaths. The reason for the best performance of this model

45

is probably that it had a relatively high number of input pa-

Deaths

rameters. The input layer consisted of 10 perceptrons, i.e. each

FFNN

prediction was based on the values of the last 10 days.

40

SARIMAX

6

CONCLUSION

Truth

In this paper, two different approaches to predicting Covid-19

deaths for Slovenia were tested. Both approaches turned out to

be reliable. The main implications of the presented study are that Nov-17

Nov-21

Nov-25

Nov-29

Dec-03

Dec-07

Dec-11

for short time intervals the time series approach is much more

accurate than the country-specific approach. The advantage of

the country-specific approach is that it can predict the number of Figure 6: Slovenia deaths from 21.11.2020 to 11.12.2020.

deaths for a given day, based on the number of cases, countermea-

Time-series models’ predictions, compared to true values.

sures and country-specific static data, without necessarily having information about the past. On the other hand, for the prediction

of the second wave, where we already know the course of the

epidemic in the first wave, the time series approach is better

To determine the overall best model for such predictions, all 4

- at least for the prediction for Slovenia. In the future studies, models were tested on the second epidemic wave. The predictions

predictions for the third and fourth waves will be analysed.

are visualized in the Figure and the MAEs [deaths] are listed in

the Table 4.

REFERENCES

[1]

Francois Chollet et al. 2015. Keras. https : / / github . com /

fchollet/keras.

[2]

Ensheng Dong et al. 2020. An interactive web-based dash-

board to track covid-19 in real time. The Lancet Infectious

50

Diseases, 20, 5. doi: 10.1016/S1473-3099(20)30120-1. http:

//doi.acm.org/10.1016/S1473- 3099(20)30120- 1.

[3]

Thomas Hale et al. 2020. A cross-country database of covid-

40

19 testing. Scientific Data, 7, 345. doi: 10.1038/s41597-020-

00688- 8. http://doi.acm.org/10.1038/s41597- 020- 00688- 8.

[4]

Thomas Hale et al. 2021. A global panel database of pan-

Deaths

FFNN

30

demic policies (oxford covid-19 government response tracker).

SARIMAX

Nature Human Behaviour, 5, 3529–538. doi: 10.1038/s41562-

RF Reg.

021- 01079- 8. http://doi.acm.org/10.1038/s41562- 021- 01079-

20

KNN Reg.

8.

Truth

[5]

Fabian Pedregosa et al. 2012. Scikit-learn: machine learn-

ing in python. Journal of Machine Learning Research, 12,

( January 2012).

[6]

Skipper Seabold and Josef Perktold. 2010. Statsmodels: econo-

metric and statistical modeling with python. Proceedings of

Nov-01

Nov-07

Nov-13

Nov-19

Nov-25

Dec-01

Dec-07

the 9th Python in Science Conference, 2010, (January 2010).

[7]

Taylor G. Smith et al. 2017. pmdarima: arima estimators for

Python. [Online; accessed 9.1.2021]. (2017). http : / / www.

Figure 7: Slovenia deaths from 1.11.2020 to 11.12.2020.

alkaline- ml.com/pmdarima.

Models’ predictions, compared to true values.

46

Iris Recognition Based on SIFT and SURF Feature Detection

Alenka Trpin

Bernard Ženko

Faculty of Information Studies

Department of Knowledge Technologies

Ljubljanska cesta 31A

Jožef Stefan Institute

8000 Novo mesto, Slovenia

1000 Ljubljana, Slovenia

alenka.trpin@fis.unm.si

bernard.zenko@ijs.si



ABSTRACT

(3) Feature extraction, where a feature vector is generated using

different filters, and (4) comparison, based on different distances Human iris recognition is generally considered to be one of the

(Hamming distance in specific cases) between pairs of

most effective approaches for biometric identification.

transformed iris images and the corresponding masks [10]. The

Identification is required in numerous areas such as security (e.g., comparison step nowadays frequently implemented with a

airports and other buildings, airports), identity verification (e.g., machine learned classification model.

banking, electoral registration), criminal justice system. This This work firs uses Scale Invariant Feature Transform

paper presents an approach for iris image classification that is (SUFT) and Speed Up Robust Features (SURF) algorithms to

based on two popular algorithms for image feature construction

extract image keypoints or descriptors and then the bag of visual

Scale Invariant Feature Transform (SIFT) and Speed Up Robust

words to generate image features that can be used by standard

Features (SURF). Both algorithms were used in combination

supervised machine learning methods. We evaluate our method

with the bag of visual words approach to create descriptive image

on a publicly available iris image dataset.

features that can be used by supervised machine learning

methods and a set of standard machine learning methods (k-Nearest Neighbor, random forest, support vector machines and

2 RELATED WORK

neural networks) were evaluated on publicly available iris data

Iris recognition is frequently used for gender recognition and set.

personal biometric authentication [6, 8, 9]. Ali et. al. applied KEYWORDS

contrast-limited adaptive histogram equalization to the



normalized image. They used SURF and investigated the

Iris recognition, image classification, SIFT features, SURF

necessity of iris image enhancement based on the CASIA-Iris-

features

Interval dataset [1]. Păvăloi and Ignat present experiments carried out with a new approach for iris image classification based on matching SIFT on iris occlusion images. They used the

1 INTRODUCTION

UPOL iris dataset to test their methods [6]. Bansal and Sharma

Biometrics is the science of determining a person's identity and

use a statistical feature extraction technique based on the is an important approach for forensic and security identity correlation between adjacent pixels, which was combined with a

management. Face, fingerprints, voice and iris are the most 2-D Wavelet Tree feature extraction technique to extract

commonly used biometrics identifiers for personal identification.

significant features from iris images. support vector machines They provide characteristics in terms of personal appearance.

(SVM) were used to classify iris images into male or female The biometric system first scans the biometric characteristic, and classes [2]. Salve et. al. used an artificial neural network and then, typically based on a library of scans or classification model SVM as a classifier for iris patterns. Before applying the identifies the person [5].

classifier, the region of interest, i.e., the iris region, is segmented Typical iris recognition system consists of four key modules:

using a Canny edge detector and a Hough transform. The eyelid

(1) image pre-processing, where the system detects the boundary

and eyelash effect are kept to a minimum. A Daugman rubber-

of the pupil and the outer iris, (2) normalization, where the inner plate model is used to normalise the iris to improve

and outer circle parameters obtained from iris localization are computational efficiency and appropriate dimensionality.

given as input. Then, a transformation from polar to Cartesian

Furthermore, the discriminative feature sequence is obtained by

coordinates is applied which maps the circle (iris) into a rectangle.

feature extraction from the segmented iris image using 1D Log

Gabor wavelet [14]. Adamović et. al. applied an approach that



classifies biometric templates as numerical features in the Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed CASIA iris image collection. These templates are generated by

for profit or commercial advantage and that copies bear this notice and the full converting a normalised iris image into a one-dimensional fixed-citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

length code set, which is then subjected to stylometric feature Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia extraction. The extracted features are further used in combination

© 2021 Copyright held by the owner/author(s).

with SVM and random forest (RF) classifiers [15].

47

Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Trpin and Ženko



3 METHODOLOGY

vocabulary (generate a vector of visual word frequencies) [15].

It is worth mentioning that a specific BoVW model is based on a

Our iris recognition approach combines image feature generation

given training dataset and it only includes visual words that algorithms SIFT, SURF, bags of visual words model and

appear in the training images.

standard supervised machine learning classification methods. In



the following subsections we briefly describe each of these 3.4 Classification Methods

components, and then explain how these components are

combined together.

The image classification phase of image analysis can be in principle performed with any machine learning method for

3.1 SIFT

classification. We have decided to evaluate a diverse set of standard methods, which we briefly describe in the following The SIFT algorithm detects a set of local features in an image.

paragraphs.

These features represent local areas of the image, and the The kNN is a supervised method that can be used for

algorithm also computes their description in a form of a vector.

classification and regression. It is a simple algorithm where the

The algorithm proceeds in several stages. The first stage of classification of new instances is based on the majority class of

computation is scale-space extrema detection which searches

the k closest training examples. The closeness is measured with

over all scales and image locations. It employs the so-called a distance measure, which is usually Euclidean, Minkowski or

difference of Gaussian function to identify potential interest Manhattan distance [9].

points that are invariant to scale and orientation. The second RF is a supervised learning algorithm based on the ensemble

stage localizes each candidate at a location. Keypoints are principle of using decision trees as the basic classifier and extracted by detecting scale space extrema. The main idea behind

creating a learning model by combining multiple decision trees.

the scale space extrema detection is to identify stable features The main idea of the RF classifier is to create multiple decision

which are invariant to changes in scale and viewpoint. At this

trees using a bootstrapped sampling method and introduce

point the keypoint descriptors are extracted [4, 6]. In essence, randomness in the individual tree building process. The class SIFT describes each image with a set of keypoints, and each label of a new example is determined by majority voting of all

keypoint is described with a vector of dimension 128. It is worth

trees in the ensemble [11].

mentioning that SIFT can detect different numbers of keypoints



The neural networks (NN) consist of several layers of simple

in different images.

units (neurons), which are simple functions with weight and bias

parameters. Each neuron in one layer is connected to all neurons

3.2 SURF

in the next layer by a process called back-propagation, and uses

The SURF algorithm is based on similar ideas as SIFT, but their

gradient descent to measure the rate of change of the loss implementation is different. It can be used for similar tasks as

function (e.g. Cross-Entropy loss). NN can have different

SIFT, but it is faster, and produces highly accurate results when

structures, but typically have an input layer, one or more hidden

provided appropriate reference images. Instead of difference of

layers and an output layer. Each of these layers contain one or

Gaussian function, SURF uses approximate Laplacian of

more neurons [9, 12, 13]. In this work, we used the adam solver

Gaussian images and a box filter. Determinants of the Hessian

function because it is fast and gives good results. It is an matrix are then used to detect the keypoints. A neighbourhood

optimisation algorithm that uses running averages of the two around the key point is selected and divided into sub-regions and

gradients and other moments of the gradients [13].

then for each sub-region the wavelet responses are taken and For the activation function, we use the logistic or sigmoid represented to get SURF feature descriptor [1, 4]. In the end, each activation function. This determines how nodes in the network

image is again represented with a set of keypoints, which are layer convert a weighted sum of input data into output data. The

described with vectors.

logistic or sigmoid activation function accepts any real value as

input and the output values are from 0 to 1 [12].

3.3 Bag of Visual Words

Support vector machines (SVM) is a discriminant technique

The bag of visual words (BoVW) approach can be used for which means that the classification function takes a data point

transforming or tokenizing keypoint-based image features, such

and assigns it to one of the different classes of the classification as SIFT or SURF, into a fixed number of features, which is task. SVM transform the original data with a kernel function in a

typically required by supervised machine learning methods. At

hyperspace, and then tries to find a hyperplane that distinguishes first generates a visual word vocabulary from a (training) set of

the two classes optimally. This hyperplane is defined with images, and then describes each image with these visual words.

support vectors and distances between support vectors are

The visual word vector of an image contains the presence or maximised. SVM is very effective method for high dimensional

absence information of each visual word in the image. In case of

problems [2, 14].

SIFT or SURF keypoints, for example, the visual word vector

contains numbers of keypoints in an image that are similar to a

3.5 Our Method

given visual word. The process for extracting BoVW features Our approach for iris image classification is a based on the bag

from images involves the following steps: automatically detect

of visual words model, and we use either SIFT or SURF

regions or points of interest, compute local descriptors over those algorithm for image keypoint detection. In the training phase we

points (in our case, this means employing SIFT or SURF

perform the following steps.

algorithm), quantize the descriptors into words to form the visual vocabulary, for example with a clustering algorithm, and find the

occurrences in the image of each specific visual word in the 48

Iris recognition based on SIFT and SURF feature detection Information Society 2020, 5–9 October 2020, Ljubljana, Slovenia



1. For each image 𝑖, the SIFT or SURF algorithm is run,

In our experiments we have used available Python

which detects 𝐾𝑖 keypoints (each keypoint has 𝐷 =

implementations of included algorithms (scikit-learn for machine

128 dimensions).

learning) with their default parameters, except the following:

2. We collect keypoints from all training images, that is,

 k-means: k=500,

∑𝑛 𝐾

𝑖=1

𝑖 keypoints.

 kNN: k=15, Euclidean metric,

3. We cluster the above set of keypoints with the k-means

 RF: number of estimators = 100,

clustering algorithm. Based on preliminary experimets



we decided to use k = 500. The clusters, or their

SVM: linear kernel function,



centroids, represent the visual words for our problem

NN: "adam" solver function, 8 hidden layers and 8

of iris recognition.

neurons, "logistic" activation function.

The classification accuracy was evaluated with 5-fold

4. Now, we use the clustering model to assign each

stratified cross validation. The results are presented separately keypoint in an image to its nearest centroid (visual

for the small and big Ubiris.v1 datasets in Table 1 and 2, word) and sum up the occurrences of these visual

respectively.

words for each image. We end up with image



descriptions, where each image is described with a

vector of length k.

Table 1: Classification accuracy on the small dataset with

5. The dataset derived in the previous step can now be

standard deviation

used to train a classification model with an arbitrary

machine learning method. In our experiments, we have

classifier/keypoint method

SIFT

SURF

used four methods: k-Nearest Neighbor, Random

Forest, Support Vector Machines and Neural

kNN

0,37 ± 0,0

0,46 ± 0,0

Networks.

RF

0,43 ± 0,06

0,63 ± 0,0

In the classification phase, when we need to classify a new

image, we need to perform three steps.

SVM

0,67 ± 0,0

0,86 ± 0,0

NN

1. Run the SIFT or SURF algorithm on the new image to



0,63 ± 0,0

0,77 ± 0,0

detect keypoints (analogous to step 1 in training).



2. Use the clustering model to assign each keypoint to its

The baseline accuracy for the small data set is 0.14 (i.e., nearest centroid and sum up their occurrences to derive

1/number of classes=1/7), and in Table 1 we can see that all visual words vector (analogous to step 4 in training).

instantiations of our method give better results than chance. The

3. Classify the image with the trained classification

NN and SVM classifiers perform much better than RF and

model.

especially kNN. Comparing the keypoint detectors, we can see

We have performed experiments with two keypoint detection

that SURF gives consistently better results than SIFT, although

algorithms (SIFT and SURF) and four classification algorithms

the difference is not very large. The results on the big dataset are, (kNN, RF, SVM and NN), and the results are presented in the

as expected, worse. The default accuracy in this case is 0.0079

next section.

(i.e., 1/127), and again all instantiations of our method give better results than chance. Again, SVM and NN perform best, but for

some reason, NN performs very poorly in combination with

4 RESULTS

SURF keypoints. RF in this case performs only slightly worse

For evaluating our approach, we have used the Ubiris.v1 dataset

than SVM, while kNN is much worse. Also, on this data we can

(http://iris.di.ubi.pt/ubiris1.html). It contains 1865 images of 200

see that SURF keypoints give somewhat better results than SIFT,

x 150 resolution in 24-bit colours. They are grouped in two the only exception is NN, where SURF fails.

subsets: the first contains 1205 images in 241 classes and the In summary we can conclude that for iris recognition the more

second one contains 660 images in 132 classes. Images in the complex learning algorithms (SVM, NN) outperform simpler

first subset have minimal noise factors, especially those related

ones (kNN and even RF), and that the SURF algorithm slightly

to reflections, luminosity, and contrast, because they were outperforms SIFT. However, we can also conclude that iris

captured inside a dark room. The second subset of images was

recognition is a hard problem, which would probably benefit collected in a less controlled setting to introduce natural from application of state-of-the-art deep learning approaches.

luminosity variation. This resulted in more heterogeneous

images with included reflections, contrast, luminosity and focus

Table 2: Classification accuracy on the big dataset with

problems. Images collected at this stage simulate the ones standard deviation

captured by a vision system without or with minimal active participation from the subjects [7].

These two subsets of images do not have the same classes.

classifier/keypoint method

SIFT

SURF

For our experiments we used the examples belonging to a subset

kNN

0,02 ± 0,025 0,06 ± 0,039

of all classes: for the small subset we have selected 7 (the first seven classes) and for the big subset we have selected 127 classes RF

0,1 ± 0,018

0,11 ± 0,014

(the first 127 classes). In the resulting datasets the examples were SVM

0,08 ± 0,039 0,13 ± 0,014

evenly distributed among the selected classes.

NN

0,17 ± 0,01

0,25 ± 0,005



49





Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

Trpin and Ženko



To investigate whether any of the observed differences is

additional feature extractors, like Oriented FAST and Rotated statistically significant, we applied Friedman and Nemenyi tests

BRIEF (ORB) or Local Binary Pattern (LBP), and, given their

as recommended in [8]. The results in the form of an average success in image recognition in general, also convolutional rank diagram with the estimated critical distance is presented in

neural networks approaches. With the latter, we will be

Figure 1 for big dataset and Figure 2 for small dataset.

especially interested in evaluating and comparing the



performance vs. computational cost trade off.



REFERENCES



[1]

Ali, H.S., Ismail, A.I., Farag, F.A. 2016. Speeded up robust features for efficient iris recognition. SIViP 10, 1385–1391 (2016).

[2]

Atul Bansal, Ravinder Agarwal and R. K. Sharma, "SVM Based Gender Classification Using Iris Images," 2012 Fourth International Conference on Computational Intelligence and Communication Networks, 2012, pp.

425-429.

Figure 1: Average rank diagram with the estimated critical

[3]

David G. Lowe. 2004. Distinctive Image Features from Scale-Invariant distance for the evaluated methods (small dataset)

keypoints. International Journal of Computer Vision, 60, 2, pp. 91-110.



[4]

Ebrahim Karami, Siva Prasad, Mohamed Shehata. 2017. Image Matching Using SIFT, SURF, BRIEF and ORB: Performance Comparison for Distorted Images. Newfoundland Electrical and Computer Engineering Conference.

[5]

Hájek J., Drahanský M. 2019. Recognition-Based on Eye Biometrics: Iris and Retina. In: Obaidat M., Traore I., Woungang I. (eds) Biometric-Based Physical and Cybersecurity Systems. Springer, Cham.

[6]

Ioan Păvăloi and Anca Ignat. 2019. Iris Image Classification Using SIFT

Features. 23rd International Conference on Knowledge-Based Systems and Intelligent Information & Engineering Systems, Elsevier. 159 (2019) Figure 2: Average rank diagram with the estimated critical

241–250.

distance for the evaluated methods (big dataset)

[7]

Hugo Pedro Proença and Luís A. Alexandre. 2005. UBIRIS: A noisy iris image database. 13th International Conference on Image Analysis and Processing - ICIAP 2005, Springer, (Sept, 2005) 970-977.

The critical value for the eight classifiers and a confidence

[8]

Janez Demšar. 2006. Statistical Comparisons of Classifiers over Multiple level of 0.05 is 3.031, the critical distance is CD = 4.695605.

Data Sets. J. Mach. Learn. Res., 7, 1-30.

Based on the size of CD we can only claim that the top of

[9]

Jiawei Han, Micheline Kamber and Jian Pei. 2012. Data Mining: Concepts and Techniques. (3rd ed.). The Morgan Kaufmann.

ranked methods and significantly better than the low ranked ones.

[10]

John Daugman. 2004. How iris recognition works. IEEE Trans Cir-cuits For example, NN-SURF, NN-SIFT and SVM-SURF are better

Syst Video Technol 14(1): 21–30.

[11]

Leo Breiman. 2001. Random forests. Machine Learning, 45(1), 5-32.

than KNN-SIFT. On the other hand, the differences among

[12]

Saša Adamović, Vladislav Miškovic, Nemanja Maček, Milan

neighboring methods on the diagram are not significant.

Milosavljević, Marko Šarac, Muzafer Saračević, Milan Gnjatović. 2020.

An efficient novel approach for iris recognition based on stylometric features and machine learning techniques, Future Generation Computer Systems, 107 (2020), 144-157.

5 CONCLUSION

[13]

Shervin Minaee and Abdolrashidi Amirali. 2019. DeepIris: Iris Recognition Using a Deep Learning Approach.

The paper presents an evaluation of a typical bag of visual words

[14]

Sushilkumar S. Salve and S. P. Narote. 2016. Iris recognition using SVM

approach on a specific dataset for human iris recognition. The

and ANN. 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), 474-478.

results show that iris recognition is a relatively hard task and in

[15]

Wadhah Ayadi, Wajdi Elhamzi, Imen Charfi, Mohamed Atri. 2019. A order to improve the accuracy we would need a dataset with more

hybrid feature extraction approach for brain MRI classification based on Bag-of-words. Biomedical Signal Processing and Control, 48, 144-152.

examples of each class. In the future work we plan to evaluate





50





Analyzing the Diversity of Constrained Multiobjective Optimization Test Suites

Aljoša Vodopija

Tea Tušar

Bogdan Filipič

aljosa.vodopija@ijs.si

tea.tusar@ijs.si

bogdan.filipic@ijs.si

Jožef Stefan Institute and

Jožef Stefan Institute and

Jožef Stefan Institute and

Jožef Stefan International

Jožef Stefan International

Jožef Stefan International

Postgraduate School

Postgraduate School

Postgraduate School

Jamova cesta 39

Jamova cesta 39

Jamova cesta 39

Ljubljana, Slovenia

Ljubljana, Slovenia

Ljubljana, Slovenia

ABSTRACT

In this study, we employ the landscape features proposed

in [13] to express and discuss the diversity of frequently used test A well-designed test suite for benchmarking novel optimizers

suites of CMOPs. This is achieved by firstly computing the land-

for constrained multiobjective optimization problems (CMOPs)

scape features and then employing the t-distributed Stochastic

should be diverse enough to detect both the optimizers’ strengths

Neighbor Embedding (t-SNE), a dimensionality reduction tech-

and shortcomings. However, until recently there was a lack of

nique, to embed the 29-D CMOP feature space into the 2-D space.

methods for characterizing CMOPs, and measuring the diversity

Note that due to space limitations, only selected results are shown of a suite of problems was virtually impossible. This study utilizes

1

in this paper. The complete results can be found online .

the landscape features proposed in our previous work to charac-

The rest of this paper is organized as follows. Section 2 pro-terize frequently used test suites for benchmarking optimizers in

vides the theoretical background. In Section 3, we present the solving CMOPs. In addition, we apply the t-distributed Stochastic

landscape features and the t-SNE algorithm. Section 4 is dedi-Neighbor Embedding (t-SNE) dimensionality reduction approach

cated to the experimental setup, while the results are discussed in to reveal the diversity of these test suites. The experimental re-Section 5. Finally, Section 6 summarizes the study and provides sults indicate which ones express sufficient diversity.

an idea for future work.

KEYWORDS

2

THEORETICAL BACKGROUND

constrained multiobjective optimization, benchmarking, land-

A CMOP can be formulated as:

scape feature, t-SNE

minimize

𝑓

(𝑥 ),

𝑚 = 1, . . . , 𝑀

𝑚

(1)

1

INTRODUCTION

subject to

𝑔 (𝑥 ) ≤ 0,

𝑖 = 1, . . . , 𝐼

𝑖

Real-world optimization problems frequently involve multiple

where 𝑥 = (𝑥

) is a search vector,

:

1, . . . , 𝑥

𝑓

𝑆 →

𝐷

𝑚

R are objective

𝐷

objectives and constraints. These problems are called constrained

functions, 𝑔 : 𝑆 →

is a search

𝑖

R constraint functions, 𝑆 ⊆ R

multiobjective optimization problems (CMOPs) and have been

space of dimension 𝐷, and 𝑀 and 𝐼 are the numbers of objectives gaining a lot of attention in the last years [13]. As with other and constraints, respectively.

theoretically-oriented optimization studies, a crucial step in test-If a solution 𝑥 satisfies all the constraints, 𝑔 (𝑥 ) ≤ 0 for 𝑖 =

𝑖

ing novel algorithms in constrained multiobjective optimization

1, . . . , 𝐼 , then it is a feasible solution. For each of the constraints is the preparation of a benchmark test.

𝑔

we can define the constraint violation as 𝑣 (𝑥 ) = max(0, 𝑔 (𝑥 )).

𝑖

𝑖

𝑖

One of the key elements of a benchmark test is the selection of

In addition, an overall constraint violation is defined as

suitable test CMOPs [1]. A well-designed benchmark suite should

𝐼

Õ

include “a wide variety of problems with different characteris-

𝑣 (𝑥 ) =

𝑣 (𝑥 ) .

(2)

𝑖

tics” [1]. This way the benchmark problems are diverse enough

𝑖

to “highlight the strengths as well as weaknesses of different

A solution 𝑥 is feasible iff 𝑣 (𝑥 ) = 0.

algorithms” [1]. However, until recently there existed only few A feasible solution 𝑥 ∈ 𝑆 is said to dominate a solution 𝑦 ∈ 𝑆 if and limited techniques proposed to explore CMOPs [13]. For this

𝑓

(𝑥 ) ≤ 𝑓 (𝑦) for all 1 ≤ 𝑚 ≤ 𝑀, and 𝑓 (𝑥 ) < 𝑓 (𝑦) for at least

𝑚

𝑚

𝑚

𝑚

reason, the test suites of CMOPs were insufficiently understood

∗

one 1 ≤ 𝑚 ≤ 𝑀 . In addition, 𝑥

∈ 𝑆 is a Pareto-optimal solution

and measuring their diversity was virtually impossible.

∗

if there exists no 𝑥 ∈ 𝑆 that dominates 𝑥 . All feasible solutions To overcome this situation, in our previous work [13], we represent a feasible region, 𝐹 = {𝑥 ∈ 𝑆 | 𝑣 (𝑥 ) = 0}. Besides, experimented with various exploratory landscape analysis (ELA)

all nondominated feasible solutions form a Pareto-optimal set,

techniques and proposed 29 landscape features to characterize

𝑆

. The image of the Pareto-optimal set is the Pareto front,

=

o

𝑃o

CMOPs, including their violation landscapes—a similar concept

{𝑓 (𝑥 ) | 𝑥 ∈ 𝑆 }. A connected component (a maximal connected

o

as the fitness landscape where fitness is replaced by the overall

subset with respect to the inclusion order) of the feasible region constraint violation.

is called a feasible component, F ⊆ 𝐹 .

In [13], we introduced analogous terms from the perspective Permission to make digital or hard copies of part or all of this work for personal of the overall constraint violation. A local minimum-violation

or classroom use is granted without fee provided that copies are not made or

∗

distributed for profit or commercial advantage and that copies bear this notice and solution is thus a solution 𝑥 for which exists a 𝛿 > 0 such

the full citation on the first page. Copyrights for third-party components of this

∗

∗

that 𝑣 (𝑥 ) ≤ 𝑣 (𝑥 ) for all 𝑥 ∈ {𝑥

| 𝑑 (𝑥 , 𝑥 ) ≤ 𝛿 }. If there is

work must be honored. For all other uses, contact the owner /author(s).

∗

∗

Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

no other solution 𝑥

∈ 𝑆 for which 𝑣 (𝑥 ) > 𝑣 (𝑥 ), then 𝑥 is a

© 2021 Copyright held by the owner/author(s).

1 https://vodopijaaljosa.github.io/cmop-web/

51





Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Vodopija, et al.

(global) minimum-violation solution

Table 1: The ELA features used to characterize CMOPs cat-

. We denoted the set of all

egorized into four groups: space-filling design, informa-

local minimum-violation solutions by 𝐹

and called a connected

l

tion content, random walk, and adaptive walk [13].

component M ⊆ 𝐹 a local minimum-violation component.

l

In order to express the modality of a violation landscape, we

defined a local search procedure to be a mapping from the search

Space-filling design features

space to the set of local minimum-violation solutions, 𝜇 : 𝑆 → 𝐹 ,

𝑁

l

F

Number of feasible components

such that 𝜇 (𝑥 ) = 𝑥 for all 𝑥 ∈ 𝐹 . A basin of attraction of a local F

Smallest feasible component

l

min

minimum-violation component M and local search 𝜇 is then a

F

Median feasible component

med

subset of 𝑆 in which 𝜇 converges towards a solution from M ,

F

Largest feasible component

max

i.e., B (M ) = {𝑥 ∈ 𝑆 | 𝜇 (𝑥 ) ∈ M }. The violation landscape is O (F

)

Proportion of Pareto-optimal solutions in F

max

max

unimodal if there is only one basin in 𝑆 and multimodal otherwise.

F

Size of the “optimal” feasible component

opt

𝜌

Feasibility ratio

F

3

METHODOLOGY

𝜌

Minimum correlation

min

3.1

ELA Features

𝜌

Maximum correlation

max

𝜌

Proportion of boundary Pareto-optimal solutions

𝜕𝑆𝑜

The landscape features used in this study were introduced in our

Information content features

previous work [13] and can be categorized into four groups: space-

𝐻

Maximum information content

max

filling design, information content, random walk and adaptive

𝜀

Settling sensitivity

𝑠

walk features. They are summarized in Table 1.

𝑀

Initial partial information

0

The space-filling design features are used to quantify the fea-

Random walk features

sible components, the relationship between the objectives and

(𝜌

)

Minimal ratio of feasible boundary crossings

𝜕𝐹

min

constraints, and measure the feasibility ratio and proportion of

(𝜌

)

Median ratio of feasible boundary crossings

𝜕𝐹

med

boundary Pareto-optimal solutions. Next, the information con-

(𝜌

)

Maximal ratio of feasible boundary crossings

𝜕𝐹

max

tent features are mainly used to express the smoothness and

Adaptive walk features

ruggedness of violation landscapes. They are derived by ana-

𝑁 B

Number of basins

lyzing the entropy of sequences of overall violation values as

B

Smallest basin

min

obtained from a random sampling of the search space. Then, the

B

Median basin

med

random walk features considered in this study are used to quan-

B

Largest basin

max

tify the number of boundary crossings from feasible to infeasible

(B )

Smallest feasible basin

F

min

regions. They are used to categorize the degree of segmentation

(B )

Median feasible basin

F

med

of the feasible region. Finally, features from the last group are

(B )

Largest feasible basin

F

max

derived from adaptive walks through the search space. They are

∪B

Proportion of feasible basins

F

used to describe various aspects of basins of attraction in the

𝑣 (B)

Median constraint violation over all basins

med

violation landscapes.

𝑣 (B)

Maximum constraint violation of all basins

max

3.2

Dimensionality Reduction with t-SNE

𝑣 (B

)

Constraint violation of B

max

max

O (B

)

Proportion of Pareto-optimal solutions in B

max

max

The t-SNE algorithm is a popular nonlinear dimensionality re-

B

Size of the “optimal” basin

opt

duction technique designed to represent high-dimensional data

in a low-dimensional space, typically the 2-D plane [12]. First, it converts similarities between data points to distributions. Then,

MW [9]. In addition, we included also a novel suite named RCM [6].

it tries to find a low-dimensional embedding of the points that

In contrast to other suites which consist of artificial test prob-

minimizes the divergence between the two distributions that

lems, RCM contains 50 instances of real-world CMOPs based

measure neighbor similarity—one in the original space and the

on physical models. Note that we actually used only 11 RCM

other in the projected space. This means that t-SNE tries to pre-

problems, since only continuous and low-dimensional problems

serve the local relationships between neighboring points, while

were suitable for our analysis. We considered three dimensions of

the global structure is generally lost.

the search space: 2, 3, 5. It is to be noted that large-scale CMOPs Finding the best embedding is an optimization problem with

were not taken into account since the methodology described

a non-convex fitness function. To solve it, t-SNE uses a gradient

in Section 3 is not sufficiently scalable. This limits our results to descent method with a random starting point, which means that

low-dimensional CMOPs. Table 2 shows the basic characteristics different runs can yield different results. The output of t-SNE

of the studied test suites.

depends also on other parameters, such as the perplexity (similar

For dimensionality reduction, we used the t-SNE implemen-

to the number of nearest neighbors in other graph-based dimen-

tation from the scikit-learn Python package [10] with default sionality reduction techniques), early exaggeration (separation of parameter values. That is, we used the Euclidean distance metric,

clusters in the embedded space) and learning rate (also called 𝜀).

random initialization of the embedding, perplexity of 30, early

The gradients can be computed exactly or estimated using the

exaggeration of 12, learning rate of 200, the maximum number of

Barnes-Hut approximation, which substantially accelerates the

iterations of 1000, and the maximum number of iterations with-

method without degrading its performance [11].

out progress before aborting of 300. The gradient was computed

by the Barnes-Hut approximation with the angular size of 0.5.

4

EXPERIMENTAL SETUP

5

RESULTS AND DISCUSSION

We studied eight suites of CMOPs which are most frequently

used in the literature. These are CTP [2], CF [14], C-DTLZ [5],

The results obtained by t-SNE are shown in Figures 1 and 2.

NCTP [7], DC-DTLZ [8], LIR-CMOP [3], DAS-CMOP [4], and Specifically, the figures show the 2-D embedding of the 29-D

52





Analyzing the Diversity of Constrained Multiobjective Optimization Test Suites Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

Table 2: Characteristics of test suites: number of problems,

represent severe multimodality since it contains no problems

dimension of the search space 𝐷, number of objectives

from the green region (Figure 2d). On the other hand, DC-DTLZ,

𝑀 , and number of constraints 𝐼 . The characteristics of se-

LIR-CMOP, and MW are biased towards highly multimodal viola-

lected RCM problems are shown in parentheses.

tion landscapes or those with small basins of attraction (Figure 2e,

Figure 2g, and Figure 2h). Nevertheless, MW is one of the most Test suite

#problems

𝐷

𝑀

𝐼

diverse suites considering other characteristics (Figure 2h).

CTP [2]

8

*

2

2, 3

The C-DTLZ and DAS-CMOP suites are mainly located in the

CF [14]

10

*

2, 3

1, 2

green and orange regions and fail to sufficiently represent the

C-DTLZ [5]

6

*

*

1, *

characteristics of the red and blue regions.

NCTP [7]

18

*

2

1, 2

Finally, the results show that CF and RCM are well spread

DC-DTLZ [8]

6

*

*

1, *

through the whole embedded feature space (Figure 2b and Fig-DAS-CMOP [4]

9

*

2, 3

7, 11

ure 2i). As we can see, they have at least one representative CMOP

LIR-CMOP [3]

14

*

2, 3

2, 3

instance in each region. Therefore, CF and RCM are the most

MW [9]

14

*

2, *

1–4

diverse test suites according to the employed landscape features.

RCM [6]

50 (11)

2–34 (2–5)

2–5

1–29 (1–8)

*Scalable parameter.

6

CONCLUSIONS

In this paper, we analyzed the diversity of the frequently used

test suites for benchmarking optimizers in solving CMOPs. For

this purpose, we considered 29 landscape features for CMOPs

that were proposed in our previous work. In addition, the t-SNE

algorithm was used to reduce the dimensionality of the feature

space and reveal the diversity of the considered test suites.

The experimental results show that the most diverse test suites

of CMOPs according to the applied landscape features are CF and

RCM. Indeed, they include the widest variety of CMOPs with

different characteristics. In addition, MW also proved to be a di-

verse suite except for unimodal CMOPs. Nevertheless, we suggest

to consider CMOPs from various test suites for benchmarking

optimizers in constrained multiobjective optimization.

One of the main limitations of our study is that only low-

dimensional CMOPs were used in the analysis. Therefore, we

Figure 1: Embedding of the feature space as obtained by t-

were unable to adequately address the issue of scalability. For this SNE. The four regions are depicted in green, red, blue, and

reason, a crucial task that needs to be addressed in the feature is orange. The points that are not contained in any region

the extension of this work to large-scale CMOPs.

are considered to be outliers.

ACKNOWLEDGMENTS

We acknowledge financial support from the Slovenian Research

feature space consisting of the landscape features presented in

Agency (young researcher program and research core funding

Table 1. Each subfigure in Figure 2 corresponds to one of the no. P2-0209). This work is also part of a project that has received test suites. For example, Figure 2a exposes the embedding of the funding from the European Union’s Horizon 2020 research and

CTP suite in blue, while the gray points correspond to the rest

innovation program under Grant Agreement no. 692286.

of the test suites. Points with a shape of a plus (+) correspond

to CMOPs with two variables, points with a shape of a triangle

REFERENCES

(▲) to CMOPs with three variables, and points with a shape of a

[1]

T. Bartz-Beielstein, C. Doerr, J. Bossek, S. Chandrasekaran,

pentagon (

) to CMOPs with five variables.

T. Eftimov, A. Fischbach, P. Kerschke, M. López-Ibáñez, K.

An additional analysis shows that the embedding of the fea-

M. Malan, J. H. Moore, B. Naujoks, P. Orzechowski, V. Volz,

ture space can be, based on the corresponding characteristics,

M. Wagner, and T. Weise. Benchmarking in optimization:

split into four regions: green, red, blue and yellow (Figure 1).

Best practice and open issues. arXiv:2007.03488v2, (2020).

The green region corresponds to CMOPs with severe violation

[2]

K. Deb, A. Pratap, and T. Meyarivan. 2001. Constrained test

multimodality, small basins of attraction, and rugged violation

problems for multi-objective evolutionary optimization.

landscapes. The red region corresponds to CMOPs with mod-

In Evolutionary Multi-Criterion Optimization (EMO 2001),

erate violation multimodality, rugged violation landscapes, and

284–298.

small feasibility ratios. The blue region corresponds to relatively

[3]

Z. Fan, W. Li, X. Cai, H. Huang, Y. Fang, Y. You, J. Mo, C.

low violation multimodality, rugged violation landscapes, small

Wei, and E. Goodman. 2019. An improved epsilon constraint-

feasibility ratios, and positive correlations between objectives

handling method in MOEA/D for CMOPs with large in-

and constraints. Finally, the yellow region corresponds to uni-

feasible regions. Soft Comput., 23, 23, 12491–12510. doi:

modal CMOPs with large feasible components, smooth violation

10.1007/s00500- 019- 03794- x.

landscapes, and large feasible regions.

[4]

Z. Fan, W. Li, X. Cai, H. Li, C. Wei, Q. Zhang, K. Deb, and

As we can see from Figure 2a, almost all CTP problems are E. Goodman. 2019. Difficulty adjustable and scalable con-located in the orange region. Therefore, many relevant character-

strained multiobjective test problem toolkit. Evol. Comput.,

istics are poorly represented by CTP, e.g., violation multimodality, 28, 3, 339–378. doi: 10.1162/evco- a- 00259.

small feasibility ratios, etc. Similarly, NCTP fails to sufficiently 53





Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

Vodopija, et al.

(a) CTP

(b) CF

(c) C-DTLZ

(d) NCTP

(e) DC-DTLZ

(f) DAS-CMOP

(g) LIR-CMOP

(h) MW

(i) RCM

Figure 2: Embedding of the feature space as obtained by t-SNE. Each subfigure exposes the embedding of a selected suite.

[5]

H. Jain and K. Deb. 2014. An evolutionary many-objective

[9]

Z. Ma and Y. Wang. 2019. Evolutionary constrained mul-

optimization algorithm using reference-point based non-

tiobjective optimization: Test suite construction and per-

dominated sorting approach, Part II: Handling constraints

formance comparisons. IEEE Trans. Evol. Comput., 23, 6,

and extending to an adaptive approach. IEEE Trans. Evol.

972–986. doi: 10.1109/TEVC.2019.2896967.

Comput., 18, 4, 602–622. doi: 10.1109/TEVC.2013.2281534.

[10]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B.

[6]

A. Kumar, G. Wu, M. Z. Ali, Q. Luo, R. Mallipeddi, P. N.

Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss,

Suganthan, and S. Das. 2020. A Benchmark-Suite of Real-

V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M.

World Constrained Multi-Objective Optimization Prob-

Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn:

lems and some Baseline Results. Technical report. Indian

machine learning in Python. J. Mach. Learn. Res., 12, 2825–

Institute of Technology, Banaras Hindu University Cam-

2830.

pus, India.

[11]

L. van der Maaten. 2014. Accelerating t-SNE using tree-

[7]

J. P. Li, Y. Wang, S. Yang, and Z. Cai. 2016. A comparative

based algorithms. J. Mach. Learn. Res., 15, 1, 3221–3245.

study of constraint-handling techniques in evolutionary

[12]

L. van der Maaten and G. Hinton. 2008. Visualizing data

constrained multiobjective optimization. In IEEE Congress

using t-SNE. J. Mach. Learn. Res., 9, 2579–2605.

on Evolutionary Computation (CEC 2016), 4175–4182. doi:

[13]

A. Vodopija, T. Tušar, and B. Filipič. Characterization

10.1109/CEC.2016.7744320.

of constrained continuous multiobjective optimization

[8]

K. Li, R. Chen, G. Fu, and X. Yao. 2019. Two-archive evo-

problems: A feature space perspective. arXiv:2109.04564,

lutionary algorithm for constrained multiobjective opti-

(2021).

mization. IEEE Trans. Evol. Comput., 23, 2, 303–315. doi:

[14]

Q. Zhang, A. Zhou, S. Zhao, P. N. Suganthan, W. Liu, and

10.1109/TEVC.2018.2855411.

S. Tiwari. 2008. Multiobjective optimization test instances

for the CEC 2009 special session and competition. Techni-

cal report CES-487. The School of Computer Science and

Electronic Engieering, University of Essex, UK.

54





Corpus KAS 2.0: Cleaner and with New Datasets

Aleš Žagar, Matic Kavaš, Marko Robnik-Šikonja

University of Ljubljana, Faculty of Computer and Information Science Ljubljana, Slovenia

{ales.zagar,matic.kavas,marko.robnik}@fri.uni-lj.si

ABSTRACT

wrongly marked to contain both abstracts or switched Slovene

Corpus of Academic Slovene (KAS) contains Slovene BSc/BA,

and English abstracts. Several entries did not contain the abstract; MSc/MA, and PhD theses from 2000 - 2018. We present a cleaner

instead, there was front or back matter like copyright statement,

version of the corpus with added text segmentation and updated

table of contents, list of abbreviations etc.

POS-tagging. The updated corpus of abstracts contains fewer

Our analysis has shown that the corpora can be improved in

artefacts. Using machine learning classifiers, we filled in miss-

many aspects. Besides addressing the above-mentioned weak-

ing research field information in the metadata. We used the full

nesses, the main improvements in the updated KAS 2.0 and KAS-

texts and corresponding abstracts to create several new datasets:

Abs 2.0 corpora are chapter segmentation and improved meta-

monolingual and cross-lingual datasets for long text summariza-

data with machine learning methods (described in Sections 2 and tion of academic texts and a dataset of aligned sentences from

3). A further motivation for our work is the opportunity to ex-abstracts in English and Slovene, suitable for machine transla-

tract valuable new datasets for text summarization (monolingual

tion. We release the corpora, datasets, and developed source code

and cross-lingual) and a sentence-aligned machine translation

under a permissible licence.

dataset created from matching Slovene and English abstracts

(see Section 4). We present conclusions and ideas for further KEYWORDS

improvements in Section 5.

KAS corpus, academic writing, machine translation, text summa-

rization, CERIF classification

2

UPDATES: KAS 2.0 AND KAS-ABS 2.0

1

INTRODUCTION

We first describe methods for extracting text and abstracts from

The Corpus of Academic Slovene (KAS 1.0)1 is a corpus of Slove-PDF, followed by the differences between the versions 1.0 and

nian academic writing gathered from the digital libraries of Slove-2.0 of corpora.

nian higher education institutions via the Slovenian Open Science

portal2 [3]. It consists of diploma, master, and doctoral theses from Slovenian institutions of higher learning (mostly from the

2.1

Extraction of Text Body

University of Ljubljana and the University of Maribor). It contains 82,308 texts with almost 1.7 billion tokens.

As many texts in corpora version 1.0 contained several hard to

The KAS texts were extracted from the PDF formatted files,

fix faults (like gibberish due to extracted tables and figures), we which are not well-suited for the acquisition of high-quality raw

decided to extract texts once again from the PDFs. We used the

texts. For that reason, the KAS corpus is noisy. Our analysis

pdftotext tool, which is a part of the poppler-utils. The software showed that most original texts contain tables, images, and other

proved to be accurate and reliable. Its important feature is keeping kinds of figures which are transformed into gibberish when con-the original text layout and excluding the areas where we detected verted from the PDF format. The extracted figure captions also

figures, tables, and other graphical elements.

do not give any helpful information. Some texts contain front or

In the first step, we converted PDF files to images, one page

back matter (for example, a table of contents at the beginning

at a time and used the OpenCV computer vision library to detect

or references at the end), which shall not be present in the main

text and non-text areas. We marked the text areas on each page.

text body.

For each document, we also calibrated the size of the header and

The Corpus of KAS abstracts (KAS-Abs 1.0)3 contains 47,273

footer areas and removed them from the text areas together with

only Slovene, 49,261 only English, and 11,720 abstracts in both

the page numbers. In this process, we removed 2,467 out of the

languages. We observed several shortcomings of this corpus. A

original 91,019 documents due to the documents containing less

vast majority of abstracts contain keywords or the word "Ab-

than 15 pages or some unchecked exceptions in the code.

stract" somewhere in the abstract text. Many texts contain other Next, we searched for the beginning and the end of the main

kinds of meta-information, e.g., the name of the author or super-

text body. We observed that practically all bodies start with some visor and the title of the thesis. Several corpus entries contain

variation of the Slovene word "Uvod" (i.e. introduction). If we English and Slovene abstracts in the same unit, only one of them

found the beginning, we searched for the ending in the same way

but with different keywords (viri, literatura, povzetek, etc). For

1https://www.clarin.si/repository/xmlui/handle/11356/1244

texts with found beginning and end, the areas were clipped and

2https://www.openscience.si/

3https://www.clarin.si/repository/xmlui/handle/11356/1420

the extracted texts were normalized. The normalization included

handling Slovene characters with the caret (č, š, ž), ligattures

Permission to make digital or hard copies of part or all of this work for personal (tt, ff, etc.), removal of remaining figure and table captions, and or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and empty lines. The obtained text was segmented into the structure

the full citation on the first page. Copyrights for third-party components of this extracted from the table of contents. We matched headings in the

work must be honored. For all other uses, contact the owner/author(s).

text with the entries in the table of contents and used page num-

Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

© 2020 Copyright held by the owner/author(s).

bers as guidelines. We ended with 83,884 successfully extracted

documents.

55





Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Žagar, Kavaš and Robnik-Šikonja

2.2

Extraction of Abstracts

are not mutually exclusive. Thus, we tackle a multi-label classifi-We tried to improve the KAS-abstracts corpus by cleaning the

cation problem. In the corpus, there are 13,738 documents with

existing documents and extracting the abstracts directly from

high confidence levels of CERIF codes which we use in machine

the PDFs. An initial analysis of existing texts showed different

learning. Our dataset contains 64 labels out of 363 possible. We

formattings (71 different organizations publish the works in the

used 10% or 1374 samples as the test set and the remaining 90%

KAS corpus). We identified five major patterns of problems and

as the training set.

created scripts for resolving them. This produced approximately

As several studies have shown that recent neural embedding

40,000 cleaned texts while 20,000 were still problematic. The

approaches are not yet competitive with standard text repre-

direct extraction from the PDFs followed the same procedure as

sentations in document level tasks, we decided to use standard

for the main text body (described above). We considered figures,

Bag-of-Words representation with TF-IDF weighting. In the pre-

headers, footers, page numbers, keywords, meta-information,

processing step, we lemmatized texts using CLASSLA lemma-

abstract placement at the beginning and end of the documents,

tizer5 and removed stop-words6 and punctuation.

multiple abstracts of different lengths, etc. This resulted in 71,567

We compared four classifiers. For logistic regression (LR), k-

collected Slovene abstracts. A similar procedure was applied to

nearest neighbours (KNN), and support vector machines (SVM),

English abstracts and yielded 53,635 abstracts.

we used Scikit-learn [6], and for the multi-layer perceptron (MLP), we tried Keras implementation. For the first three, we prelimi-2.3

Differences from Version 1.0 to 2.0

nary tried several different parameter values but found that they

Besides cleaner texts, excluded gibberish from figures and ta-

perform the best with the default ones. The MLP neural network

bles, and excluded front- and back-matter, the most important

consists of one hidden layer with 256 units, sigmoid activation

difference between KAS versions 1.0 and 2.0 is that the texts are

function on hidden and output layers, Adam optimizer [5] with segmented by structure, i.e. by headings. Unfortunately, some

an initial learning rate of 0.01, and binary cross-entropy as a loss documents present in the original KAS were lost due to the dif-function. We used the early stopping (5 consecutive epochs with

ferent extraction, and for some documents appearing only in

no improvement) and reduced the learning rate on the plateau

version 2.0, there is no metadata.

(halving learning rate for every 2 epochs with no improvement)

KAS-abstracts is greatly improved and no longer contains large

as callbacks during the learning process.

quantities of unusable text and different artefacts (e.g., metadata, In Table 2, we report pattern accuracy and binary accuracy keywords, or front- and back-matter). Again, for some abstracts

of the trained classifiers. A model predicts a correct pattern if

present only in version 2.0, there is no metadata. Still, they are it assigned all true sub-CERIF codes to a document. For binary

usable for several tasks, including machine translation studies.

accuracy, a model predicts a sub-CERIF code correctly if it assigns Table 1 gives the quantitative overview of the obtained body texts a true single sub-CERIF code to the document. For example, let

and abstracts.

us assume that we have four sub-CERIF codes and an example

with a label sequence ’1010’. If a model predicts ’1010’, it receives 100% for both pattern and binary accuracy. If a model predicts

Table 1: Statistics of the obtained body texts and abstracts

’0010’, it gets 0% pattern accuracy and 75% binary accuracy since

in version 2.0 of the KAS corpora.

it misclassified only the first label.

Sum

Same as

Missing

With

in 1.0

from 1.0

metadata

Table 2: Results on the sub-CERIF multi-label classifica-

Slo abstracts

71,567

56,610

2,383

67,533

tion task. The best result for each metric is in bold.

Eng abstract

53,635

44,685

16,296

50,674

Body text

83,884

79,320

2,988

79,320

Algorithm

Binary accuracy

Pattern accuracy

LR

98.48

38.36

KNN

98.52

43.75

SVM

98.68

47.82

3

SUB-CERIF CLASSIFICATION

MLP

98.66

46.58

CERIF (Common European Research Information Format) is the

standard that the EU recommends to member states for recording

information about the research activity4. The top level has only Using the pattern accuracy metric, SVM and MLP are signifi-five categories (humanities, social sciences, physical sciences,

cantly better than KNN and LR. LR is the worst performing model,

biomedical sciences, and technological sciences). In comparison,

and KNN is in the middle. SVM is the best, and MLP is behind for

the lower level distinguishes 363 categories. As Slovene libraries 1.24 points. We assume that we do not have enough data for MLP

use the UDC classification, in the KAS corpus 1.0, only 17% of the to beat SVM. It is difficult to assess the models regarding binary documents also contain the CERIF and sub-CERIF codes in their

accuracy. In the test set, we have 761 examples with 1 label, 466

metadata. These are mapped from UDC codes by the heuristics

with 2 labels, 107 with 3 labels, 26 with 4 labels, 10 with 5, and 4

produced by the Slovene Open Science Portal. Below, we describe

with 6. A dummy model that predicts all zeros achieves binary

how we automatically annotated documents with missing sub-

accuracy of 97.51. All our models are better than this baseline,

CERIF codes using a machine learning approach.

and their ranks correspond with the pattern accuracy.

We build a dataset for automatic annotation of sub-CERIF

We conclude that given 64 labels and 10k training instances,

codes from the body texts of the documents. A document may

our best model (SVM) correctly predicts almost half of them,

have more than one sub-CERIF code, which means that classes

which is a useful result.

4https://www.dcc.ac.uk/resources/metadata-standards/cerif-common-european-

5https://github.com/clarinsi/classla

research-information-format

6We used the list from https://github.com/stopwords-iso/stopwords-sl

56





Corpus KAS 2.0

Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia

4

NEW DATASETS

other sentence alignments in existing translation datatsets). How-

We created two types of new datasets, described below: summa-

ever, if one would prefer even more certain alignment, the value

rization datasets and machine translation datasets.

of the threshold can be further increased at the expense of less

sentences in the datatset. We released three such datasets that

4.1

Summarization Datasets

reflect a trade-off between quality and quantity of the data. The

We created two new datasets appropriate for

sizes of the obtained datasets are available in Table 3.

long-text summariza-

tion in the monolingual and cross-lingual settings. The monolin-

Table 3: Size of the machine translation datasets based on

gual slo2slo dataset contains 69,730 Slovene abstracts and Slovene the margin-based quality threshold.

body texts and is suitable for training Slovene summarization

models for long texts. The cross-lingual slo2eng dataset contains

52,351 Slovene body texts and English abstracts. It is suitable for Dataset

Threshold

Size

Normal alignment

1.1

496,102

the cross-lingual summarization task.

Strict alignment

1.2

474,852

Very strict alignment

1.3

425,534

4.2

Machine Translation Datasets

For the creation of a sentence-aligned machine translation dataset, we used the neural approach proposed by Artetxe & Schwenk

[1]. The main difference to other text alignment approaches is in 5

CONCLUSIONS

using margin-based scoring of candidates in contrast to a hard

In this work, we created version 2.0 of Corpus KAS and Corpus

threshold with cosine similarity. We improved the approach by

KAS-Abstracts. We cleaned the texts and abstracts, introduced the

replacing the underlying neural model. Instead of BiLSTM-based

text segmentation based on its structure, and improved the meta-

LASER [2] representation, we used the transformer-based LaBSE

data. We created two new long text summarization datasets and a

[4] sentence representation, which has significantly improved dataset of aligned sentences for machine translations. The latest

average bitext retrieval accuracy. We used the implementation

versions of corpora and datasets are available on the CLARIN.SI.

from UKPLab7. This approach requires a threshold that omits The corpora are annotated with the CLASSLA tool and released

candidate pairs below a certain value. This value represents a

in txt, JSON and TEI formats. The source code for producing

trade-off between the quantity and quality of aligned pairs. The

the new versions of the corpora8 and the created datasets are higher the threshold, the better the quality of alignments, but

publicly available9 .

more samples are discarded.

In future work, the extraction of metadata for entries where

In text alignment, sentences do not always exhibit one-to-one

they are missing would be beneficial. There could be further im-

mapping: a source sentence can be split into two or more target

provements in cleaning the texts, and this would increase the

sentences and vice versa. To address the problem, we iteratively

number of available documents. When the corpora are extended

ran the alignment process until all sentences above the chosen

with data post-2018, the software might need further modifica-

threshold were assigned to each other. In cases of more than one

tions due to new formats and templates used in the academic

sentence assigned to a single sentence, we merged them and thus

works. Further experiments on the created MT datasets would

created a translation pair.

clarify the setting of parameters and show if current MT systems

We manually inspected the alignments consisting of more than

benefit more from better quality or larger quantity of data.

one sentence in either source or target text on a small subset of

abstracts. We observed that a merging process produces better

ACKNOWLEDGMENTS

results than imposing a restriction allowing only the one-to-one

The research was supported by CLARIN.SI (2021 call), Slove-

mapping. In Table 4, we present an example of the alignment.

nian Research Agency (research core funding program P6-0411),

The first column represents a margin-based score. If an aligned

Ministry of Culture of Republic of Slovenia through project Devel-

pair contains more than one sentence in the source or target,

opment of Slovene in Digital Environment (RSDO), and European

the score consists of the average margin-based score between

Union’s Horizon 2020 research and innovation programme under

a single sentence and multiple sentences. The last column is an

grant agreement No 825153, project EMBEDDIA (Cross-Lingual

indicator of whether merging was applied.

Embeddings for Less-Represented Languages in European News

We used the ratio variant of margin-based scoring and set the

Media). We thank Tomaž Erjavec (JSI, Department of Knowledge

default threshold to 1.1. We manually tested the alignment on our

Technologies) for providing data access and his assistance in

internal dataset. From 2015 examples, we successfully aligned

building TEI format of the corpus.

2002 of them (99.3%), misaligned 1 (0.1%), and omitted 12 of them

(0.6%). The analysis of 12 omitted cases showed that some pairs

REFERENCES

do not match each other or are not accurate translations of each

[1] Mikel Artetxe and Holger Schwenk. 2019. Margin-based par-

other, e.g., a large part of the original sentence is omitted, phrases allel corpus mining with multilingual sentence embeddings.

are only distantly related, etc. However, approximately half of

In Proceedings of the 57th Annual Meeting of the Association

the 12 cases shall be aligned, which means that our model works

for Computational Linguistics, 3197–3203.

very well, but conservatively and may fail for free translation

8

pairs.

https://github.com/korpus-kas

9KAS 2.0: https://www.clarin.si/repository/xmlui/handle/11356/1448

With the default value of the threshold (1.1), we produced

KAS-Abs 2.0: https://www.clarin.si/repository/xmlui/handle/11356/1449

496.102 sentence pairs. We believe the threshold is strict enough

Summarization datasets: https://www.clarin.si/repository/xmlui/handle/11356/1446

to produce good-quality dataset (especially if compared to many

MT datasets: https://www.clarin.si/repository/xmlui/handle/11356/1447

7https://github.com/UKPLab/sentence-transformers/blob/master/examples/

applications/parallel-sentence-mining/bitext_mining.py

57

Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Žagar, Kavaš and Robnik-Šikonja

Table 4: Examples from sentence-aligned Slovene-English abstracts.

Score

Slovene source sentence

English target sentence

Mrg

1.670

Moški pa pogosteje opravljajo opravila, ki se tičejo mehanizacije na Men, however, often perform tasks related to machinery on the farm.

No

kmetiji.

1.612

Zanimala nas je tudi prisotnost tradicionalnih vzorcev pri delu.

Additionally, I have also focused on the presence of traditional work No

patterns.

1.520

Želeli smo izvedeti, ali se kmečke ženske počutijo preobremenjene, cenjene I wanted to know whether rural women feel overwhelmed or valued, and No

in kako preživljajo prosti čas (če ga imajo).

how they spend their free time (if they have it).

1.441

Dotaknili smo se tudi problemov, s katerimi se srečujejo kmečke ženske Moreover, I have tackled the problems that rural women face when it No

med javnim in zasebnim življenjem.

comes to their public and private life.

1.437

Na koncu teoretičnega dela smo opisali še predloge za izboljšanje položaja At the end of the theoretical part, I have denoted further proposals for No

kmečkih žensk v družbi.

improving the situation of rural women in today’s society.

1.388

V diplomskem delu obravnavamo položaj žensk v kmečkih gospodinjstvih The thesis deals with the situation of women in rural households of No

v Sloveniji.

Slovenia.

1.354

V empiričnem delu pa smo s pomočjo anketnega vprašalnika, na katerega In the empirical part, I have conducted a survey on peasant women to No

so kot respondentke odgovarjale kmečke ženske, ugotavljali, kako je delo determine the gender division of farm labour.

na kmetiji porazdeljeno med spoloma.

1.271

V teoretičnem delu predstavljamo pojme, kot so gospodinja, kmečko

In the theoretical part, I have presented the following concepts: Yes gospodinjstvo ter kmečka družina, kjer smo opisali tudi tipologijo kmečkih

˝housewife˝, ˝rural household˝ and ˝rural family˝. In addition, I have družin.

described the typology of rural families.

1.249

V nadaljevanju smo predstavili tradicionalno dojemanje kmečkih žensk, I have explained the processes that have influenced the change in the Yes

njihovo obravnavo skozi čas v slovenski literaturi, pojasnili smo procese, situation of rural women through history and focused on their work ki so vplivali na spremembo položaja kmečkih žensk skozi zgodovino ter se (working day, divison of labour, work evaluation). Furthermore, I have osredotočili na delo kmečkih žensk (delovni dan, delitev dela, vrednotenje shed light on the traditional perception of peasant women and their dela).

treatment over time in Slovene literature.

1.217

Ugotovili smo, da so tradicionalni vzorci delitve dela na kmetiji še vedno Hence, the majority of work related to home and family (housework

Yes

prisotni, saj smo iz analize anket in literature ugotovili, da ženske opravl- and child-rearing) is performed by women. By analyzing the conducted jajo večino del vezanih na dom in družino, to pa so gospodinjska dela in survey and examining the literature, I have come to the conclusion that vzgoja otrok.

the division of farm labour more or less still follows traditional patterns.

[2] Mikel Artetxe and Holger Schwenk. 2019. Massively mul-

[5] Diederik P Kingma and Jimmy Ba. 2014. Adam: a method

tilingual sentence embeddings for zero-shot cross-lingual

for stochastic optimization. In Internationmal Conference

transfer and beyond. Transactions of the Association for

on Representation Learning.

Computational Linguistics, 7, 597–610.

[6] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort,

[3] Tomaž Erjavec, Darja Fišer, and Nikola Ljubešić. 2021. The

Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu

KAS corpus of Slovenian academic writing. Language Re-

Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg,

sources and Evaluation, 55, 2, 551–583.

et al. 2011. Scikit-learn: Machine learning in Python. Journal

[4] Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Naveen Ari-

of machine learning research, 12, 2825–2830.

vazhagan, and Wei Wang. 2020. Language-agnostic BERT

sentence embedding. arXiv preprint arXiv:2007.01852.

58



Indeks avtorjev / Author index



Andonovic Viktor ........................................................................................................................................................................... 7

Andova Andrejaana ...................................................................................................................................................................... 11

Anželj Gregor ............................................................................................................................................................................... 27

Arduino Alessandro...................................................................................................................................................................... 19

Batagelj Borut .............................................................................................................................................................................. 27

Boshkoska Biljana Mileva.............................................................................................................................................................. 7

Boškoski Pavle ............................................................................................................................................................................... 7

Boštic Matjaž ............................................................................................................................................................................... 23

Bottauscio Oriano ......................................................................................................................................................................... 19

Bovcon Narvika ........................................................................................................................................................................... 27

Cergolj Vincent ............................................................................................................................................................................ 15

De Masi Carlo M. ......................................................................................................................................................................... 15

Filipič Bogdan ........................................................................................................................................................................ 11, 51

Golob Ožbej ................................................................................................................................................................................. 19

Janko Vito .................................................................................................................................................................................... 23

Kavaš Matic ................................................................................................................................................................................. 55

Komarova Nadezhda .................................................................................................................................................................... 27

Kralj Novak Petra ......................................................................................................................................................................... 31

Lukan Junoš ................................................................................................................................................................................. 23

Luštrek Mitja .......................................................................................................................................................................... 15, 39

Pelicon Andraž ............................................................................................................................................................................. 31

Puc Jernej ..................................................................................................................................................................................... 35

Reščič Nina .................................................................................................................................................................................. 39

Robnik-Šikonja Marko ................................................................................................................................................................. 55

Sadikov Aleksander................................................................................................................................................................ 19, 35

Škrlj Blaž ...................................................................................................................................................................................... 31

Slapničar Gašper .......................................................................................................................................................................... 23

Solina Franc ................................................................................................................................................................................. 27

Stankoski Simon ........................................................................................................................................................................... 15

Susič David .................................................................................................................................................................................. 43

Trpin Alenka ................................................................................................................................................................................ 47

Tušar Tea ...................................................................................................................................................................................... 51

Vodopija Aljoša ........................................................................................................................................................................... 51

Žagar Aleš .................................................................................................................................................................................... 55

Ženko Bernard .............................................................................................................................................................................. 47

Zilberti Luca ................................................................................................................................................................................. 19





59





60



Slovenska konferenca o

umetni inteligenci

Slovenian Conference on

Artificial Intelligence

Mitja Luštrek, Matjaž Gams, Rok Piltaver





Document Outline


02 - Naslovnica - notranja - A - TEMP

03 - Kolofon - A - TEMP

04 - IS2021 - Predgovor - TEMP

05 - IS2021 - Konferencni odbori

07 - Kazalo - A

08 - Naslovnica - notranja - A - TEMP

09 - Predgovor podkonference - A

10 - Programski odbor podkonference - A

AndonovicEtal

Andova+Filipic Abstract

1 Introduction

2 Evolutionary Multitasking 2.1 Assortative Mating

2.2 Selective Imitation

2.3 Landscape Analysis





3 Experiments and results 3.1 Multitask Optimization

3.2 Many-Task Optimization





4 Conclusion and future work

5 Acknowledgments





DeMasiEtal Abstract

1 Introduction

2 Related Work 2.1 Drinking Detection From Wearables

2.2 Activity Recognition From Videos





3 Adopted Hardware 3.1 Wristband

3.2 Local Deployment of The Computer Vision System





4 Intent Recognition 4.1 Regions of Interest

4.2 Intent Recognition

4.3 Drinking Detection From Computer Vision on the Jetson NANO

4.4 Drinking Detection Using a Wearable device





5 Results and Discussion 5.1 Intent Recognition and Local Implementation of Drinking Detection

5.2 Wearable Sensing Results





6 Conclusions





GolobEtal Abstract

1 Introduction

2 Methods 2.1 Data Acquisition

2.2 Reconstruction Techniques

2.3 Anomaly Detection





3 Results

4 Discussion and Conclusions

Acknowledgments





JankoEtal Abstract

1 Introduction

2 Library Functionalities 2.1 Motion Sensors Features

2.2 Physiological Features

2.3 Other Functionalities





3 Usage Example 3.1 SHL Dataset

3.2 Methods

3.3 Results





4 Conclusion

Acknowledgments





KomarovaEtal Abstract

1 Uvod in motivacija

2 Slikovni prostor na umetniških slikah

3 Zaznava obrazov

4 Geometrijska interpretacija prostora

5 Rezultati

6 Razprava

7 Zaključek





PeliconEtal Abstract

1 Introduction

2 Data 2.1 Annotation Schema

2.2 Sampling for Training and Evaluation

2.3 Annotation Procedure





3 Experiments 3.1 autoBOT - an autoML for texts

3.2 Deep Learning

3.3 Other Baseline Approaches





4 Results

5 Conclusion





Puc+Sadikov Abstract

1 Introduction

2 Related work

3 The SDG Environment 3.1 Description

3.2 Observations

3.3 Actions

3.4 Execution

3.5 Online Play

3.6 Replay System





4 Supervised Learning Baseline 4.1 Agent Model Architecture

4.2 Imitation Learning

4.3 Demonstrations

4.4 Results





5 Conclusions & Future Work





Rescic+Lustrek Abstract

1 Introduction

2 Methodology 2.1 Problem outline

2.2 Dataset

2.3 Feature ranking





3 Results 3.1 Classification problem

3.2 Regression problem

3.3 Discussion





4 Conclusion and future work

Acknowledgments





Susic Abstract

1 Introduction

2 Data description and preparation

3 Methods and models 3.1 Country-Specific Approach

3.2 Time-Series Approach





4 Models' parameters selection

5 Results

6 Conclusion





Trpin+Ženko

VodopijaEtal Abstract

1 Introduction

2 Theoretical Background

3 Methodology 3.1 ELA Features

3.2 Dimensionality Reduction with t-SNE





4 Experimental Setup

5 Results and Discussion

6 Conclusions

Acknowledgments





ŽagarEtal Abstract

1 Introduction

2 Updates: KAS 2.0 and KAS-Abs 2.0 2.1 Extraction of Text Body

2.2 Extraction of Abstracts

2.3 Differences from Version 1.0 to 2.0





3 Sub-CERIF classification

4 New datasets 4.1 Summarization Datasets

4.2 Machine Translation Datasets





5 Conclusions

Acknowledgments





12 - Index - A

Blank Page

Blank Page

Blank Page

Blank Page

Blank Page

Blank Page