Zbornik 24. mednarodne multikonference • INFORMACIJSKA DRUZBA Zvezek A Proceedings of the 24th International Multiconference INFORMATION SOCIETY Volume A I S S 0 S I Slovenska konferenca o umetni inteligenci Slovenian Conference on Artificial Intelligence Uredniki • Editors: Mitja Luštrek, Matjaž Gams, Rok Piltaver 8. oktober 2021 Ljubljana, Slovenija • 8 October 2021 Ljubljana, Slovenia • http://is.ijs.si Zbornik 24. mednarodne multikonference INFORMACIJSKA DRUŽBA – IS 2021 Zvezek A Proceedings of the 24th International Multiconference INFORMATION SOCIETY – IS 2021 Volume A Slovenska konferenca o umetni inteligenci Slovenian Conference on Artificial Intelligence Uredniki / Editors Mitja Luštrek, Matjaž Gams, Rok Piltaver http://is.ijs.si 8. oktober 2021 / 8 October 2021 Ljubljana, Slovenia Uredniki: Mitja Luštrek Odsek za inteligentne sisteme Institut »Jožef Stefan«, Ljubljana Matjaž Gams Odsek za inteligentne sisteme Institut »Jožef Stefan«, Ljubljana Rok Piltaver Outfit7 in Odsek za inteligentne sisteme, Institut »Jožef Stefan«, Ljubljana Založnik: Institut »Jožef Stefan«, Ljubljana Priprava zbornika: Mitja Lasič, Vesna Lasič, Lana Zemljak Oblikovanje naslovnice: Vesna Lasič Dostop do e-publikacije: http://library.ijs.si/Stacks/Proceedings/InformationSociety Ljubljana, oktober 2021 Informacijska družba ISSN 2630-371X Kataložni zapis o publikaciji (CIP) pripravili v Narodni in univerzitetni knjižnici v Ljubljani COBISS.SI-ID 85847043 ISBN 978-961-264-215-0 (PDF) PREDGOVOR MULTIKONFERENCI INFORMACIJSKA DRUŽBA 2021 Štiriindvajseta multikonferenca Informacijska družba je preživela probleme zaradi korone v 2020. Odziv se povečuje, v 2021 imamo enajst konferenc, a pravo upanje je za 2022, ko naj bi dovolj velika precepljenost končno omogočila normalno delovanje. Tudi v 2021 gre zahvala za skoraj normalno delovanje konference tistim predsednikom konferenc, ki so kljub prvi pandemiji modernega sveta pogumno obdržali visok strokovni nivo. Stagnacija določenih aktivnosti v 2020 in 2021 pa skoraj v ničemer ni omejila neverjetne rasti IKTja, informacijske družbe, umetne inteligence in znanosti nasploh, ampak nasprotno – rast znanja, računalništva in umetne inteligence se nadaljuje z že kar običajno nesluteno hitrostjo. Po drugi strani se je pospešil razpad družbenih vrednot, zaupanje v znanost in razvoj. Se pa zavedanje večine ljudi, da je potrebno podpreti stroko, čedalje bolj krepi, kar je bistvena sprememba glede na 2020. Letos smo v multikonferenco povezali enajst odličnih neodvisnih konferenc. Zajema okoli 170 večinoma spletnih predstavitev, povzetkov in referatov v okviru samostojnih konferenc in delavnic ter 400 obiskovalcev. Prireditev so spremljale okrogle mize in razprave ter posebni dogodki, kot je svečana podelitev nagrad – seveda večinoma preko spleta. Izbrani prispevki bodo izšli tudi v posebni številki revije Informatica (http://www.informatica.si/), ki se ponaša s 45-letno tradicijo odlične znanstvene revije. Multikonferenco Informacijska družba 2021 sestavljajo naslednje samostojne konference: • Slovenska konferenca o umetni inteligenci • Odkrivanje znanja in podatkovna skladišča • Kognitivna znanost • Ljudje in okolje • 50-letnica poučevanja računalništva v slovenskih srednjih šolah • Delavnica projekta Batman • Delavnica projekta Insieme Interreg • Delavnica projekta Urbanite • Študentska konferenca o računalniškem raziskovanju 2021 • Mednarodna konferenca o prenosu tehnologij • Vzgoja in izobraževanje v informacijski družbi Soorganizatorji in podporniki multikonference so različne raziskovalne institucije in združenja, med njimi ACM Slovenija, SLAIS, DKZ in druga slovenska nacionalna akademija, Inženirska akademija Slovenije (IAS). V imenu organizatorjev konference se zahvaljujemo združenjem in institucijam, še posebej pa udeležencem za njihove dragocene prispevke in priložnost, da z nami delijo svoje izkušnje o informacijski družbi. Zahvaljujemo se tudi recenzentom za njihovo pomoč pri recenziranju. S podelitvijo nagrad, še posebej z nagrado Michie-Turing, se avtonomna stroka s področja opredeli do najbolj izstopajočih dosežkov. Nagrado Michie-Turing za izjemen življenjski prispevek k razvoju in promociji informacijske družbe je prejel prof. dr. Jernej Kozak. Priznanje za dosežek leta pripada ekipi Odseka za inteligentne sisteme Instituta ''Jožef Stefan'' za osvojeno drugo mesto na tekmovanju XPrize Pandemic Response Challenge za iskanje najboljših ukrepov proti koroni. »Informacijsko limono« za najmanj primerno informacijsko potezo je prejela trditev, da je aplikacija za sledenje stikom problematična za zasebnost, »informacijsko jagodo« kot najboljšo potezo pa COVID-19 Sledilnik, tj. sistem za zbiranje podatkov o koroni. Čestitke nagrajencem! Mojca Ciglarič, predsednik programskega odbora Matjaž Gams, predsednik organizacijskega odbora i FOREWORD - INFORMATION SOCIETY 2021 The 24th Information Society Multiconference survived the COVID-19 problems. In 2021, there are eleven conferences with a growing trend and real hopes that 2022 will be better due to successful vaccination. The multiconference survived due to the conference chairs who bravely decided to continue with their conferences despite the first pandemic in the modern era. The COVID-19 pandemic did not decrease the growth of ICT, information society, artificial intelligence and science overall, quite on the contrary – the progress of computers, knowledge and artificial intelligence continued with the fascinating growth rate. However, COVID-19 did increase the downfall of societal norms, trust in science and progress. On the other hand, the awareness of the majority, that science and development are the only perspectives for a prosperous future, substantially grows. The Multiconference is running parallel sessions with 170 presentations of scientific papers at eleven conferences, many round tables, workshops and award ceremonies, and 400 attendees. Selected papers will be published in the Informatica journal with its 45-years tradition of excellent research publishing. The Information Society 2021 Multiconference consists of the following conferences: • Slovenian Conference on Artificial Intelligence • Data Mining and Data Warehouses • Cognitive Science • People and Environment • 50-years of High-school Computer Education in Slovenia • Batman Project Workshop • Insieme Interreg Project Workshop • URBANITE Project Workshop • Student Computer Science Research Conference 2021 • International Conference of Transfer of Technologies • Education in Information Society The multiconference is co-organized and supported by several major research institutions and societies, among them ACM Slovenia, i.e. the Slovenian chapter of the ACM, SLAIS, DKZ and the second national academy, the Slovenian Engineering Academy. In the name of the conference organizers, we thank all the societies and institutions, and particularly all the participants for their valuable contribution and their interest in this event, and the reviewers for their thorough reviews. The award for lifelong outstanding contributions is presented in memory of Donald Michie and Alan Turing. The Michie-Turing award was given to Prof. Dr. Jernej Kozak for his lifelong outstanding contribution to the development and promotion of the information society in our country. In addition, the yearly recognition for current achievements was awarded to the team from the Department of Intelligent systems, Jožef Stefan Institute for the second place at the XPrize Pandemic Response Challenge for proposing best counter-measures against COVID-19. The information lemon goes to the claim that the mobile application for tracking COVID-19 contacts will harm information privacy. The information strawberry as the best information service last year went to COVID-19 Sledilnik, a program to regularly report all data related to COVID-19 in Slovenia. Congratulations! Mojca Ciglarič, Programme Committee Chair Matjaž Gams, Organizing Committee Chair ii KONFERENČNI ODBORI CONFERENCE COMMITTEES International Programme Committee Organizing Committee Vladimir Bajic, South Africa Matjaž Gams, chair Heiner Benking, Germany Mitja Luštrek Se Woo Cheon, South Korea Lana Zemljak Howie Firth, UK Vesna Koricki Olga Fomichova, Russia Mitja Lasič Vladimir Fomichov, Russia Blaž Mahnič Vesna Hljuz Dobric, Croatia Klara Vulikić Alfred Inselberg, Israel Jay Liebowitz, USA Huan Liu, Singapore Henz Martin, Germany Marcin Paprzycki, USA Claude Sammut, Australia Jiri Wiedermann, Czech Republic Xindong Wu, USA Yiming Ye, USA Ning Zhong, USA Wray Buntine, Australia Bezalel Gavish, USA Gal A. Kaminka, Israel Mike Bain, Australia Michela Milano, Italy Derong Liu, Chicago, USA Toby Walsh, Australia Sergio Campos-Cordobes, Spain Shabnam Farahmand, Finland Sergio Crovella, Italy Programme Committee Mojca Ciglarič, chair Bogdan Filipič Dunja Mladenič Niko Zimic Bojan Orel, Andrej Gams Franc Novak Rok Piltaver Franc Solina, Matjaž Gams Vladislav Rajkovič Toma Strle Viljan Mahnič, Mitja Luštrek Grega Repovš Tine Kolenik Cene Bavec, Marko Grobelnik Ivan Rozman Franci Pivec Tomaž Kalin, Nikola Guid Niko Schlamberger Uroš Rajkovič Jozsef Györkös, Marjan Heričko Stanko Strmčnik Borut Batagelj Tadej Bajd Borka Jerman Blažič Džonova Jurij Šilc Tomaž Ogrin Jaroslav Berce Gorazd Kandus Jurij Tasič Aleš Ude Mojca Bernik Urban Kordeš Denis Trček Bojan Blažica Marko Bohanec Marjan Krisper Andrej Ule Matjaž Kljun Ivan Bratko Andrej Kuščer Boštjan Vilfan Robert Blatnik Andrej Brodnik Jadran Lenarčič Baldomir Zajc Erik Dovgan Dušan Caf Borut Likar Blaž Zupan Špela Stres Saša Divjak Janez Malačič Boris Žemva Anton Gradišek Tomaž Erjavec Olga Markič Leon Žlajpah iii iv KAZALO / TABLE OF CONTENTS Slovenska konferenca o umetni inteligenci / Slovenian Conference on Artificial Intelligence .......................... 1 PREDGOVOR / FOREWORD ................................................................................................................................. 3 PROGRAMSKI ODBORI / PROGRAMME COMMITTEES ..................................................................................... 5 Estimating Client's Job-search Process Duration / Andonovic Viktor, Boškoski Pavle, Boshkoska Biljana Mileva ............................................................................................................................................................................ 7 Some Experimental Results in Evolutionary Multitasking / Andova Andrejaana, Filipič Bogdan ......................... 11 Intent Recognition and Drinking Detection For Assisting kitchen-based Activities / De Masi Carlo M., Stankoski Simon, Cergolj Vincent, Luštrek Mitja .............................................................................................................. 15 Anomaly Detection in Magnetic Resonance-based Electrical Properties Tomography of in silico Brains / Golob Ožbej, Arduino Alessandro, Bottauscio Oriano, Zilberti Luca, Sadikov Aleksander ........................................ 19 Library for Feature Calculation in the Context-Recognition Domain / Janko Vito, Boštic Matjaž, Lukan Junoš, Slapničar Gašper .............................................................................................................................................. 23 Določanje slikovnega prostora na umetniških slikah / Komarova Nadezhda, Anželj Gregor, Batagelj Borut, Bovcon Narvika, Solina Franc .......................................................................................................................... 27 Automated Hate Speech Target Identification / Pelicon Andraž, Škrlj Blaž, Kralj Novak Petra ........................... 31 SiDeGame: An Online Benchmark Environment for Multi-Agent Reinforcement Learning / Puc Jernej, Sadikov Aleksander ........................................................................................................................................................ 35 Question Ranking for Food Frequency Questionnaires / Reščič Nina, Luštrek Mitja .......................................... 39 Daily Covid-19 Deaths Prediction in Slovenija / Susič David ............................................................................... 43 Iris recognition based on SIFT and SURF feature detection / Trpin Alenka, Ženko Bernard .............................. 47 Analyzing the Diversity of Constrained Multiobjective Optimization Test Suites / Vodopija Aljoša, Tušar Tea, Filipič Bogdan ................................................................................................................................................... 51 Corpus KAS 2.0: Cleaner and with New Datasets / Žagar Aleš, Kavaš Matic, Robnik-Šikonja Marko ............... 55 Indeks avtorjev / Author index ................................................................................................................................ 59 v vi Zbornik 24. mednarodne multikonference INFORMACIJSKA DRUŽBA – IS 2021 Zvezek A Proceedings of the 24th International Multiconference INFORMATION SOCIETY – IS 2021 Volume A Slovenska konferenca o umetni inteligenci Slovenian Conference on Artificial Intelligence Uredniki / Editors Mitja Luštrek, Matjaž Gams, Rok Piltaver http://is.ijs.si 8. oktober 2021 / 8 October 2021 Ljubljana, Slovenia 1 2 PREDGOVOR Po zaslugi pandemije COVID-19 še vedno živimo v bolj zanimivih časih, kot bi si želeli, vendar umetne inteligence to ne moti in napreduje s podobnim tempom kot pretekla leta. Računalniški vid in obdelava naravnega jezika sta še vedno vroči področji, pred nedavnim pa nam je OpenAI postregel s parom navdušujočih kombinacij obojega. Prva je DALL-E, globoka nevronska mreža, izpeljana iz OpenAIjeve slavne mreže za generiranje besedila GPT-3, ki je sposobna »razumeti« opis slike in nato takšno sliko generirati. Pri tem je kos slikam, na kakršne prej ni naletela – generirati zna denimo prav čedno sliko redkve daikon v baletnem krilcu, ki sprehaja psa. Druga, CLIP, deluje obratno in generira besedilne opise slik. Še en viden dosežek zadnjega časa prihaja s področje biologije in medicine, ki sta zelo plodni področji za uporabo umetne inteligence. Algoritem AlphaFold 2, ki – podobno kot večina pomembnih dosežkov umetne inteligence zadnjih let – temelji na globokih nevronskih mrežah, je dosegel dramatičen napredek pri določanju strukture beljakovin, kar je težaven problem, pomemben za razvoj zdravil. Posebej odmeven nedaven dosežek umetne inteligence iz domačih logov je metoda za priporočanje optimalnih ukrepov zoper COVID-19, ki jo je razvila ekipa Odseka za inteligentne sisteme na Institutu Jožef Stefan. Pri tej sodbi avtorji predgovora sicer nismo povsem nepristranski, saj sva k dosežku dva prispevala, a drugo mesto ne tekmovanju XPrize Pandemic Response Challenge s polmilijonskim nagradnim skladom našo trditev potrjuje. Za uspeh tokrat ni bila potrebna globoka nevronska mreža – metoda kombinira epidemiološki model SEIR, klasično strojno učenje in večkriterijsko optimizacijo z evolucijskim algoritmom. Na Slovenski konferenci o umetni inteligenci je predstavljen le delček tega dela, več o njem pa je moč izvedeti na Delavnici projekta Insieme Interreg, ki prav tako poteka v okviru Informacijske družbe. Posebej veliko število drugih delavnic in konferenc na Informacijski družbi letos je sicer dobro za multikonferenco kot celoto, našo konferenco pa je bržkone prikrajšalo za kak prispevek. K tej težavi moramo dodati še naveličanost raziskovalne srenje nezmožnosti žive udeležbe na konferencah, tako da smo se morali na koncu zadovoljiti s 13 prispevki. Večino je kot po navadi prispeval Institut Jožef Stefan, dobro je zastopana tudi Fakulteta za računalništvo in informatiko Univerze v Ljubljani, druge ustanove pa žal ne. Kljub temu smo poskrbeli, da so prispevki kakovostni, in smo jih zavrnili več kot pretekla leta. Bomo pa prihodnje leta napeli moči, da privabimo več prispevkov iz širšega nabora ustanov. 3 FOREWORD Thanks to the COVID-19 pandemic we still live in more interesting time than we would like, but artificial intelligence is not much bothered by this and is progressing as rapidly as in the recent years. Computer vision and natural language processing are still hot topics, and OpenAI recently provided a pair of exciting combinations of the two. The first is DALL-E, a deep neural network derived from OpenAI's famous language generation network GPT-3. It can »understand« a description of an image and then generate such an image. It can handle images never encountered before – for instance, it can generate a nice image of a daikon radish in a tutu walking a dog. The second is CLIP, which works in reverse and generates descriptions of images. Another prominent recent achievement comes from biology and medicine, which is fruitful ground for applications of artificial intelligence. The AlphaGo 2 algorithm, which – like most main achievements of artificial intelligence in the recent years – is based on deep neural networks, achieved a breakthrough in protein folding. This is a hard problem important for drug discovery. A prominent recent Slovenian achievement of artificial intelligence is a method for recommending optimal interventions against COVID-19, which was developed by a team from the Department of Intelligence Systems at Jožef Stefan Institute. The authors of this foreword are not entirely unbiased when we say this, because two of us contributed to the achievement, but second placed at the XPrize Pandemic Response Challenge with a prize purse of half a million lends credence to our claim. This success did not require a deep neural network – the method combines a SEIR epidemiological model, classical machine learning and multi-objective optimisation with an evolutionary algorithm. The Slovenian Conference of Artificial Intelligence presents only a small part of this work, while more can be learned in the Insieme Interreg project workshop. A particularly large number of other workshops and conference at Information Society this year are good for the multi-conference as a whole, but probably deprived our conference for a few papers. Another problem is that the research community is getting tired of the inability to attend conferences live, which is why we ended up with only 13 papers. Most of them, as usual, come from Jožef Stefan Institute. The Faculty of Computer and Information Science of the University of Ljubljana is also well represented, while other institutions less so. Despite this we made sure that the papers are high-quality, and we turned away more than usual. But our goal for the following years is of course to secure more papers from a wider range or institutions. 4 PROGRAMSKI ODBOR / PROGRAMME COMMITTEE Mitja Luštrek Matjaž Gams Rok Piltaver Cene Bavec Jaro Berce Marko Bohanec Marko Bonač Ivan Bratko Bojan Cestnik Aleš Dobnikar Bogdan Filipič Borka Jerman Blažič Marjan Krisper Marjan Mernik Biljana Mileva Boshkoska Vladislav Rajkovič Niko Schlamberger Tomaž Seljak Miha Smolnikar Peter Stanovnik Damjan Strnad Vasja Vehovar Martin Žnidaršič 5 6 Estimating Client’s Job-search Process Duration Viktor Andonovic1 Pavle Boškoski2 Biljana Mileva Boshkoska2,3 Knowledge Technologies Knowledge Technologies Knowledge Technologies 1Jožef Stefan International 2Institute Jožef Stefan 2Institute Jožef Stefan Postgraduate School Ljubljana, Slovenia 3 Faculty of information studies Ljubljana, Slovenia pavle.boskoski@ijs.si 2Ljubljana, Slovenia viktor.andonovikj@ijs.si 3Novo Mesto, Slovenia biljana.mileva@ijs.si ABSTRACT are built on top of statistical surveys [2]. These data sets comprise a series of snapshots of an individual labour force status observed Modelling the labour market, analysing ways to reduce at discrete time points. Such discrete sampling might be with low unemployment, and creating decision support tools are frequency in order to truly capture the changing dynamics. becoming more popular topics with the rise in digital data and Several methods for approaching similar labour market computational power. The paper aims to analyse a Machine modelling problems have been implemented in other countries. Learning (ML) approach for estimating the time duration until Finland’s Statistical profiling tool, introduced in 2007, consists a job-seeker finds a job, i.e. leaves the Public Employment of a simple logit model [3]. It predicts the probability of long-Service (PES), after the initial entering. The dataset that we use term unemployment and categorises job seekers into two groups, from PES is complex, and there is almost no correlation between risk or high-risk of long-term unemployment. In 2012 Ireland most of the features in it, which makes it challenging for implemented a PEX (probability of exit) model using data modelling. We used statistical analysis and visualisations to collected on job-seekers who entered the PES as unemployed understand the problem better and form a basis for further during 13 weeks [4]. The PEX tool is a probit model for modelling. As a result, we developed several ML models, measuring the job-seeker's probability of exiting unemployment including basic multivariate linear regression used for in one year. performance comparison with other more specifically designed As a result of our work, we have developed an ML model models. that can be used in a PES as a part of their decision toolbox, which can serve as a filtering method that prioritises job-seekers 1 INTRODUCTION and recognises ones who do not necessarily need PES resources and services, as they will get employed soon regardless of the The research field of creating tools for supporting the decision- interventions by the organisation. making process for employment services has attracted significant interest lately. One can track such efforts for more than 20 years [1]. Different variants of tools and systems have been developed 2 DATA and implemented with varying success in different countries. The data used for the paper is provided by a public organisation PES is willing to move away from the traditional role of servicing engaged in the HECAT project [4], which aims at investigation, the job-seekers and take a more systematic approach by demonstration and pilotting a profiling tool to support labour implementing data-driven solutions in their toolbox. Here, the market decision making by unemployed citizens and case goal is to create a model that uses available data that describes workers in PES. the job-seekers that have entered the PES and outputs the approximate time (in days) needed for the individual to leave the 2.1 Data description PES as an employed person. These factors can be assessed either by introducing experts’ The dataset consists of 74086 instances, each representing a knowledge or by extracting the corresponding dynamics directly client enrolled in the PES, described with 16 sociological, from the available data. What was (or is) available determines demographic and time-related characteristics, known as features how the models are built and their effectiveness. or attributes. The data were obtained during one year. The dataset The biggest issue when dealing with any modelling, for that is complex in a way that its attributes come in a different form matter, is the quality of data. Typically models of the labour flow (categorical, numerical, date and time), and most of them need to undergo some transformation for the aim of input suitability for different ML models. The general structure of the client's attributes is described by dividing the attributes into several Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed prominent groups: socioeconomic variables (gender, age, for profit or commercial advantage and that copies bear this notice and the full nationality), information on job readiness (education, health citation on the first page. Copyrights for third-party components of this work must limitations, care responsibilities), and opportunities (regional be honored. For all other uses, contact the owner/author(s). Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia labour market development), and all available labour market © 2021 Copyright held by the owner/author(s). history information, such as prior work experience. Most of the 7 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia V. Andonovic et al. categorical features are given with numbers, where each number represents a unique category, described in a separate CSV file. The target variable is in numeric form, and it is a counter of days that a person stays in the process before exiting the PES. Some of the features in the dataset contain weird values (such as the negative number for clients age), which are a mistake or a result of noise in the data. This indicates the necessity of performing data cleaning and preprocessing before using the dataset to input various ML models. Figure 1 gives an overview of the attributes of the dataset. Figure 3: Grid of distributions of the dataset features 2.3 Data preprocessing It is estimated that in most data mining and knowledge discovery pipelines, 75 to 85% of the time is dedicated to preprocessing the data [5]. Cleaning and transforming samples are the cornerstone Figure 1: General information on the dataset features of a reliable and robust pattern recognition system. The first step of the data preprocessing part was data cleaning. The dataset 2.2 Data understanding included values for some of the attributes, which were an obvious result of a noise or a mistake. For example, some of the instances The target variable, 'duration', is a numerical count variable. In had negative values for the target variable, which is impossible order to gain a better understanding of the target variable, the because of the nature of that attribute, which is a count-based probability distribution was plotted on a graph. Figure 2 shows variable. the probability distribution of 'duration’. Most of the classical ML algorithms require the input data to be in numerical form. We used one-hot-encoding for the categorical features with at most 20 different categories. High- cardinality features were encoded using the Binary Encoding technique. Frequently used techniques like label-encoding do not work in high-cardinality because of the inclusion of artificial numerical relative distance between the instances or overfitting in the case of one-hot-encoding [6]. The ‘Entry Date’ feature was used to extract the day and month of entry separately. As those are cyclical features, we performed a transformation in order to better represent the cyclical phenomenon, for instance to avoid the artificial large difference between month 1 and month 12. The best way to Figure 2: Probability distribution of the target variable handle this is to calculate the sin() and cos() component so that this cyclical feature is represented as (𝑥, 𝑦) coordinates of a The information for the probability distribution of the target circle. variable directly influences the predictive model selection. By The normalisation of the attributes' values was applied to scale looking at Figure 2, it can be assumed that that the target variable the attributes in a way that their mean value is zero, and their is following the Poisson distribution. We also plotted the variance is retained with the use of their own standard deviation. distributions of the features. Figure 3 illustrates a grid of It allows equality of opportunity for each attribute. By this, no distributions of each feature of the dataset. attribute gives more value to itself regarding the range of values it has. Several normalisation techniques are commonly used, but the most popular one is the standard scaler, defined as: 𝑧 = !"# (2.1) $ where 𝑥 is the actual value, 𝜇 is the mean, and 𝑠 is the standard deviation. 8 Estimating client’s job-search process duration Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia All the calculations and transformations were performed in where 𝑃(𝑘) is the probability of seeing 𝑘 events during time unit Python programming language, by making use of modules like given event rate 𝜆 . Let X, y be our dataset for the Poisson pandas, NumPy and sci-kit learn. regression task. The log-likelihood function that needs to be maximised is: 3 METHODOLOGY ∑0 𝑙𝑜𝑔 %1) 9𝑃 0 <*!"$%&'2,(3&)4(&= (3.4) Since the target variable is numerical, the task should be treated !(𝑦): = ∑ 𝑙𝑜𝑔 %1) 5&! as a regression problem. Regression analysis describes methods whose goal is to estimate the relationship between a dependent After the expression is simplified, the final equation for the (target) variable and one or more independent variables. In Poisson loss has the following form: formal terms, the goal is to specify the following general model 𝐿 0 67%$$78 = ∑ ?𝜆(𝑋 %1) %) − 𝑦%𝑙𝑜𝑔 9𝜆(𝑋%):A (3.5) 𝑌% = 𝑓(𝑋%, 𝛽) + 𝑒% (3.1) CatBoost Regressor is optimised with regard to this objective where 𝑖 denotes the 𝑖&' observed input-output data set, the vector function. 𝑋 represents the input (independent) variables, 𝛽 is the set of model parameters, 𝑓(∙) is the function, and 𝑒% is the modelling error. The goal is to find the proper function 𝑓 and its parameters 4 EVALIUATION 𝛽 so the error term is as close to zero as possible. The model performance on the test set is evaluated with Root In its simplest form, the function 𝑓(∙) can represent a linear Mean Squared Error (RMSE) as a metric. RMSE is frequently model. For example, the univariate linear model of (3.1) would used in regression problems, and it is a measure of the difference be: between the values predicted by a model or an estimator and the 𝑌% = 𝛽( + 𝛽)𝑋% + 𝑒% (3.2) actual values of the instances. RMSE is given with the following expression: Generally, the function 𝑓 can describe much more complex * dynamics. The multivariate linear regression model is used as a 𝑅𝑀𝑆𝐸 = F∑ (5 &+, &"6:&)) (3.6) 0 base model and will be used to help with the assessment of the performance of other more specific and complex models simply where 𝑦% is the original value of the instance, and 𝑝𝑣% is the by comparing them to the base model. The aim is to develop such predicted value by the model. The hyper-parameters of the models that will significantly outperform the base model. In models were tuned using RandomizedSearchCV. This method order to construct a model that generalises well to the data, a optimises the hyper-parameters by cross-validated search over decision tree is used as a base learning algorithm for the given parameter settings. A fixed number of parameter settings ensembles. was sampled from the specified distributions. 3.1 Ensemble learning 65,28 70 60 The idea of ensemble learning is based on the theoretical 51,66 50 44,13 foundations that the generalization ability of an ensemble is usually much stronger than the one of a single learner. Ensemble 40 learning is mainly implemented as two subprocedures: training 30 weak component learners and selectively combine the member 20 learners into a stronger learner [7]. Two ensemble models based 10 on different techniques were developed, Random Forest 0 ean Squared Error (days) Regressor [8] and boosting algorithm - CatBoost Regressor. Linear Random Forest CatBoost Bagging is used to reduce the variance of a decision tree Regression (Poisson oth M classifier. The objective is to create several subsets of data from objective) Ro the training sample chosen randomly with replacement. Each Model collection of subset data is used to train their corresponding decision trees. The result is the average of all the predictions from different trees, which is more robust than a single decision Figure 4: Comparison of the model performance tree classifier. Based on the shape of the probability distribution given in Figure 4 shows the diagram for comparison of the models' Figure 2, we assume that the target variable comes from Poisson performances. The results show that both Random Forest and distribution. Therefore, we design our model to maximise the CatBoost significantly outperform the base linear regression log-likelihood for Poisson distribution [9]. The probability mass model. Also, optimising the mean Poisson deviance as a loss function of the Poisson distribution is given with the following function results in significant improvement in the performance expression: of the boosting model. The final score that the CatBoost Regressor optimised with regards to mean Poisson deviance 𝑃(𝑘) = *!"(,)# (3.3) evaluated on RMSE is 44.13 days. .! 9 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia V. Andonovic et al. 5 CONCLUSION Achieving desirable results using machine learning models requires a significant amount of quality data and a deep understanding of the problem. Feature engineering is one of the key concepts here, which, if it is appropriately done, enables the generation of new features that give helpful, previously unknown insights about the data. The paper proposes an approach that emphasises the engineering of optimisation function concerning the probability distribution of the target variable, which results in developing a specific model for approaching the problem. Including the Poisson objective function in the boosting model resulted in significant improvement in its performance. There is still space for improvement in the results. Using modern end-to- end deep learning architectures have the potential to provide better results than the proposed models, which leaves space for future work on this topic. Having a tool that can roughly estimate the time a new client stays in the job-search process by having the standard data formation about himself is beneficial for the PES. The creation of decision-making tools for organisations dealing with employment services supports the process of reducing unemployment in the countries, which is a massive benefit for the global economy. ACKNOWLEDGMENTS First author acknowledges Ad Futura, Public Scholarship, Development, Disability and Maintenance Fund of the Republic of Slovenia. The second author acknowledges funding from the Slovenian Research Agency via program Complex Networks P1- 0383. The last two authors acknowledge the funding received from the European Union’s Horizon 2020 research and innovation programme project HECAT under grant agreement No. 870702. REFERENCES [1] P. Boshkoski and B. Mileva - Boshkoska, "Report on commonly used algorithms and their performance," Horizon 2020, Deliverable number: D3.1., 2020. [2] J. Grundy, "Statistical profiling of the unemployed," Studies in Political Economy, 2015. [3] T. Riipinen, "Risk profiling of long-term unemployment in finland," Dialogue Con- ference Brussels., 2011. [4] P. J. O'Connel, E. Kelly and J. Walsh, "National profiling of the unemployed in Ireland," ESRI Research Series, vol. 10, 2009. [5] "HECAT - Disruptive Technologies Supporting Labour Market Decision Making," 2020. [Online]. Available: http://hecat.eu. [6] F. Johannes, D. Gamberger and N. Lavrac, "Machine Learning and Data Mining," Cognitive Technologies, 2012. [7] M. Brammer, Principles of Data Mining, 2007. [8] F. Huang, G. Xie and R. Xiao, "Research on Ensemble Learning," International Conference on Artificial Intelligence and Computational Intelligence, 2009. [9] A. Saha, S. Basu and A. Datta, "Random Forest for Dependent Data," arXiv, 2020. [10] A. Zakariya Y, "Diagnostic in Poisson Regression Models," Electronic Journal of Applied Statistical Analysis, 2012. 10 Some Experimental Results in Evolutionary Multitasking Andrejaana Andova Bogdan Filipič Jožef Stefan Institute and Jožef Stefan Institute and Jožef Stefan International Postgraduate School Jožef Stefan International Postgraduate School Jamova cesta 39 Jamova cesta 39 Ljubljana, Slovenia Ljubljana, Slovenia andrejaana.andova@ijs.si bogdan.filipic@ijs.si ABSTRACT that change with generations and to which techniques resem- Transfer learning and multitask learning have shown that, in bling natural selection and genetic variation are applied. These machine learning, common information in two problems can techniques ensure that the fittest individuals (solutions) from be used to build more effective models. Inspired by this finding, the population are passed to the next generation. The algorithm attempts in evolutionary computation have also been made to begins by initializing a population of solutions. Then, a selection solve multiple optimization problems simultaneously. This new operator is used to select the fittest individuals as parents. After approach is called evolutionary multitasking (EMT). that, a reproduction operator is utilized to create offspring from In this work, we show how EMT extends ordinary evolution-the parents. The next step is to select a subset of individuals from ary algorithms and present the results that we obtained in solving the combined set of parents and children and replace the old pop-multiple optimization problems simultaneously. We also compare ulation with the selected subset. The new population is then used them with the results of algorithms that solve one optimization for the next generation. The cycle of selection, reproduction, and problem at a time. Finally, we provide visualizations and expla-replacement is repeated until a stopping criterion is satisfied. The nations of why and when EMT is beneficial. stopping criterion can be defined in various ways, for example, by the maximum number of generations. KEYWORDS Until recently, most EAs focused on solving only one optimiza- evolutionary algorithms, numerical optimization, multifactorial tion problem at a time. To exploit the parallelism of population- optimization, evolutionary multitasking based search, Gupta et al. introduced a new category of optimiza- tion approach called multifactorial optimization or evolutionary multitasking (EMT) [8]. The goal of EMT is to develop EAs that 1 INTRODUCTION are able to simultaneously solve multiple optimization problems In optimization the task is to find one or more solutions that without sacrificing the quality of the obtained solutions and the best solve a given problem. To determine which of the possible algorithm efficiency. solutions gives the best result, we use the objective function. A practical motivation for the development of EMT algorithms This can be the cost of fabrication, the efficiency of a process, the is the rapidly growing cloud computing. In cloud computing, mul-quality of a product, etc. The mathematical formulation of such tiple users can simultaneously send optimization problems to the problems is given as follows: server. These problems may either have similar characteristics or they may belong to completely different domains. Previously, Minimize/Maximize 𝑓 (𝑥 ) the servers solved these problems sequentially, but with the in- subject to 𝑔 (𝑥 ) ≥ 0, 𝑗 = 1, 2, .., 𝐽 ; troduction of EMT, they can solve the problems in parallel. 𝑗 ℎ (𝑥 ) = 0, 𝑘 = 1, 2, .., 𝐾 ; After the introduction of EMT by Gupta et al., many other 𝑘 (𝐿) (𝑈 ) works followed that also introduced methodologies specialized 𝑥 ≤ 𝑥 ≤ 𝑥 , 𝑖 = 1, 2, .., 𝑛. 𝑖 𝑖 𝑖 (1) in solving multiple optimization problems simultaneously [1, 4, Here, a solution 𝑥 = [𝑥1, 𝑥2, .., 𝑥 ]T is a vector of 𝑛 decision 5, 6, 9, 10]. 𝑛 variables. The objective 𝑓 (𝑥) can be either maximized or mini- In this paper, we present our experimental results in solving mized, but since many optimization algorithms are designed to multiple optimization problems simultaneously and discuss the solve minimization problems, we usually convert maximization results from the point of view of EMT performance. We do this objectives to minimization ones by multiplying the objective by applying the EMT methodology as proposed by Gupta et al. functions by −1. ℎ (𝑥) are equality constraints, 𝑔 (𝑥) inequality to test optimization problems and analyzing the results. 𝑘 𝑗 constraints, and (𝐿) (𝑈 ) The paper is further organized as follows. In Section 2, we in- 𝑥 and 𝑥 are boundary constraints [3]. In 𝑖 𝑖 this paper, we consider problems that include only boundary troduce the basic concepts of EMT. In Section 3, we first present constraints. our results in EMT with visualizations that explain why and When the optimization problem can not be solved using math- when EMT performs well, and then report the results in evolu- ematical methods, the usual alternative is to use randomized tionary many-task optimization. Finally, in Section 4, we give a optimization algorithms such as evolutionary algorithms (EAs). conclusion and present the ideas for future work. These algorithms are characterized by a population of solutions Permission to make digital or hard copies of part or all of this work for personal 2 EVOLUTIONARY MULTITASKING or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and Evolutionary multitasking is characterized by the simultaneous the full citation on the first page. Copyrights for third-party components of this existence of multiple decision spaces corresponding to different work must be honored. For all other uses, contact the owner/author(s). problems, which may or may not be independent, each with a Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia © 2021 Copyright held by the owner/author(s). unique decision space landscape. In order for EMT to have cross- domain optimization properties, Gupta et al. proposed to use a 11 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Andrejaana Andova and Bogdan Filipič uniform genetic code in which each decision variable is encoded performance. In particular, it is important to develop a measure with a random number from [0, 1]. Decoding such a represen- of the inter-task complementarity used during the process of tation in continuous problems is done by using the following multitasking. To this end, a synergy metric that captures and equation for each decision variable: quantifies how similar two problems are has been proposed [7]. ( The main idea behind the synergy metric is to use the dot product 𝐿) (𝑈 ) (𝐿) 𝑢 = 𝑢 + (𝑢 − 𝑢 ) · 𝑣 , (2) 𝑖 𝑖 𝑖 𝑖 𝑖 between the gradient of a given solution in one problem, and the where 𝑢 is the decision variable in the original space, and 𝑣 is vector pointing to the global optimum of another problem. If the 𝑖 𝑖 the decision variable in the encoded space. The dimensionality of dot product of a given solution is larger than 0, the solution of the the solution vector is equal to max {𝐷 }, where 𝐷 represents first problem is pushing the candidate solution in the direction of 𝑗 𝑗 𝑗 the dimensionality of a single optimization problem. This type the global optimum of the second problem. If the dot product is of encoding allows problems to share decision variables at the smaller than 0, the solution is pushed in the opposite direction. beginning of the genetic code, which contributes to the transfer of useful genetic material from one problem to another. 3 EXPERIMENTS AND RESULTS Since EMT attempts to solve multiple problems simultaneously EMT is a novel concept in evolutionary optimization, and thus, using a single population, it is necessary to formulate a new a limited number of experiments were carried out so far. We technique for comparing population members. To this end, a set present some experiments performed and results obtained using of additional properties is defined for each individual 𝑥 in the 𝑖 EMT in both multi- and many-task optimization. population as follows. • Skill factor: The skill factor 𝜏 of 𝑥 is the one problem, 3.1 Multitask Optimization 𝑖 𝑖 among all problems in EMT, for which the individual is In the multitask optimization experiments, we took two fre- specialized. This skill factor can be assigned in a complex quently used optimization problems, i.e., 50-dimensional (50D) way by selecting the best individuals for each task or by Sphere and Ackley. We solved them using EMT and a genetic randomly assigning each individual one task for which it algorithm (GA). To be able to compare the results, we used the is specialized. In our case, we will use the later, simpler same population size and the same number of function evalua- method for assigning the skill factor. tions per problem. The 𝑟𝑚𝑝 parameter in EMT was set to 0.3, and • Scalar fitness: The scalar fitness is the fitness of an indi- for GA we used the default parameter values as defined in pymoo vidual for the problem it is specialized. [2]. We monitored the difference between EMT and GA over To compare two solutions, we use the scalar fitness and the skill time. If the difference is positive, EMT performs better than GA, factor. The scalar fitness shows how good a solution is for a while if it is negative, GA performs better than EMT. Because the given problem, and the skill factor shows for which problem the fitness values vary between different problems, we normalized solution performs best. A solution 𝑥 is better than 𝑥 if and the difference between EMT and GA in each problem by dividing 𝑎 𝑏 only if both have the same skill factor and 𝑥 has a higher scalar the values with the highest absolute difference. 𝑎 fitness than 𝑥 . If the solutions have different skill factors, they In the first experiment, the optima of the two problems were 𝑏 are incomparable. placed at the opposite ends of the search space. Because of this, the problems have very little common information, and the syn- 2.1 Assortative Mating ergy function mostly takes negative values. This is visualized for To produce offspring, the authors of EMT [8] used assortative a 2D Sphere function in Figure 1 and for a 2D Ackley function mating as a reproduction mechanism. In assortative mating, two in Figure 2. The normalized difference between EMT and GA randomly selected parents can undergo crossover if they have in optimizing 50D Sphere and Ackley functions is presented in the same skill factor. If, on the other hand, their skill factors differ, Figure 3. From the results, we can see that GA performs better crossover occurs only with a given random mating probability on these problems. 𝑟 𝑚𝑝 , otherwise, mutation takes place. A value of 𝑟𝑚𝑝 close to 0 means that only culturally identical individuals are allowed to perform crossover, while a value close to 1 allows completely random mating. 2.2 Selective Imitation Evaluating each individual for each problem is computationally expensive. For this reason, each child is evaluated only on one problem, which is the skill factor that one of its parents has. In this way, the total number of function evaluations is reduced, while the solution is still evaluated on the problem on which it most likely performs well. The procedure is called selective imitation. Figure 1: Synergy metric on the Sphere function solved to- gether with the Ackley function when the optima are far 2.3 Landscape Analysis away. In multitask machine learning, it is well known that useful infor- mation cannot always be found for two problems. Therefore, to In Figure 4, we present the results from the second experiment enable further success in the field of evolutionary multitasking, where the optima of 50D Sphere and Ackley functions were it is important to develop a meaningful theoretical explanation placed closer together. Here, we can see that the optimization of when and why implicit genetic transfer can lead to improved of the Sphere function does not show significant improvement 12 Some Experimental Results in Evolutionary Multitasking Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Figure 2: Synergy metric on the Ackley function solved together with the Sphere function when the optima are Figure 4: Normalized difference between multitask and far away. single-task optimization on 50D Sphere and Ackley func- tions when the optima are close. Figure 3: Normalized difference between multitask and Figure 5: Synergy metric on the Sphere function solved single-task optimization on 50D Sphere and Ackley func- together with the Ackley function when the optima are tions when the optima are far away. close. when being performed together with the optimization of the Ackley function, but on the Ackley function EMT converges to the optimal solution much faster. An explanation for this is illustrated in 2D in Figures 5 and 6. Here we can see that the synergy in the Sphere space is mostly equal to 0, except for some small parts where it rises to +10 and falls to −10. Because both the positive and the negative parts of the synergy values of the Sphere problem are small, we can notice no difference in convergence on the Sphere problem. In contrast, more than half of the space of the Ackley function has a positive synergy metric, indicating that this part of the space appoints the solutions in the right direction toward the global optimum. On the other hand, most of the decision space Figure 6: Synergy metric on the Ackley function solved of the Ackley function has constant fitness values, which compli- together with the Sphere function when the optima are cates the GA search for the global optimum. For this reason, the close. information transferred from the Sphere problem to the Ackley problem is useful, and thus we can see faster convergence when solving the two problems together using EMT. number of problems we are trying to solve does not cause diffi- culties to EMT. If the problems are similar, we can solve many 3.2 Many-Task Optimization problems simultaneously without losing efficiency. When solving more than three tasks simultaneously, we are deal- Figure 8 shows the results obtained when solving six well-ing with a many-task optimization. In Figure 7, we present the known optimization problems at the same time: Ackley, Sphere, results obtained by randomly shifting (within a small, 10% range Rastrigin, Rosenbrock, Schwefel, and Griewank, all 50D. From the of the total space) the global optimum of both the Ackley and results, we can notice that although the optimization procedure the Sphere function 25 times, resulting in 50 different 50D opti- converges faster for most of the functions, for the Sphere and mization problems. During the optimization process, we used the the Schwefel function the convergence speed of the optimization same algorithm parameter values for EMT and GA as reported process drops. The same pattern can be noticed in Figure 9 where in Section 3.1. In the results, we can notice similar patterns as the optimum of each function is shifted 8 times, resulting in when solving just two problems. This proves that increasing the 6 ∗ 8 = 48 problems altogether. 13 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Andrejaana Andova and Bogdan Filipič However, if the problems are too different, the performance of the optimization drops. To explain why EMT works well on some problem pairs and why on some others it does not, we provided visualizations of the synergy metric. We so far tested EMT on simple benchmark functions that are usually used for single-objective optimization. However, in future work, we plan to test it also on real-world scenarios with more complex functions and constraints. Furthermore, so far we have used the synergy metric to explain why some problems are solved Unfortunately, with this metric we can not strictly determine when solving two problems will be successful. Thus, one possible future direction is to develop machine learning methods that predict when multitasking a set of problems would be successful. Figure 7: Normalized difference between multitask and This may be useful for cloud systems that could form several single-task optimization on 50 problems originating from groups of similar problems and then solve them in a multitask 50D Sphere and Ackley functions whose optima are manner. shifted close to each other. 5 ACKNOWLEDGMENTS We acknowledge financial support from the Slovenian Research Agency (young researcher program and research core funding no. P2-0209). REFERENCES [1] Kavitesh Kumar Bali, Abhishek Gupta, Liang Feng, Yew Soon Ong, and Tan Puay Siew. 2017. Linearized domain adaptation in evolutionary multitasking. In 2017 IEEE Con- gress on Evolutionary Computation (CEC). IEEE, 1295–1302. [2] Julian Blank and Kalyanmoy Deb. 2020. Pymoo: Multi- objective optimization in Python. IEEE Access, 8, 89497– 89509. Figure 8: Normalized difference between multitask and [3] Kalyanmoy Deb. 2001. Multi-Objective Optimization using single-task optimization on six well-known 50D optimiza- Evolutionary Algorithms. John Wiley & Sons, Chichester. tion problems when the optima are shifted close to each [4] Liang Feng, Lei Zhou, Jinghui Zhong, Abhishek Gupta, other. Yew-Soon Ong, Kay-Chen Tan, and Alex Kai Qin. 2018. Evolutionary multitasking via explicit autoencoding. IEEE Transactions on Cybernetics, 49, 9, 3457–3470. [5] Maoguo Gong, Zedong Tang, Hao Li, and Jun Zhang. 2019. Evolutionary multitasking with dynamic resource allocat- ing strategy. IEEE Transactions on Evolutionary Computa- tion, 23, 5, 858–869. [6] Abhishek Gupta, Jacek Mańdziuk, and Yew-Soon Ong. 2015. Evolutionary multitasking in bi-level optimization. Complex & Intelligent Systems, 1, 1-4, 83–95. [7] Abhishek Gupta, Yew-Soon Ong, Bingshui Da, Liang Feng, and Stephanus Daniel Handoko. 2016. Landscape synergy in evolutionary multitasking. In 2016 IEEE Congress on Evolutionary Computation (CEC). IEEE, 3076–3083. [8] Abhishek Gupta, Yew-Soon Ong, and Liang Feng. 2015. Figure 9: Normalized difference between multitask and Multifactorial evolution: Toward evolutionary multitask- single-task optimization on 48 problems originating from ing. IEEE Transactions on Evolutionary Computation, 20, 3, six well-known 50D optimization problems whose optima 343–357. are shifted close to each other. [9] Abhishek Gupta, Yew-Soon Ong, Liang Feng, and Kay Chen Tan. 2016. Multiobjective multifactorial optimiza- 4 CONCLUSION AND FUTURE WORK tion in evolutionary multitasking. IEEE Transactions on Cybernetics, 47, 7, 1652–1665. We presented our experimental results on solving multiple opti- [10] Yu-Wei Wen and Chuan-Kang Ting. 2017. Parting ways mization problems simultaneously using a novel method called and reallocating resources in evolutionary multitasking. evolutionary multitasking. We were solving just two optimiza- In 2017 IEEE Congress on Evolutionary Computation (CEC). tion problems, but also as many as 50 optimization problems IEEE, 2404–2411. at the same time. From the experimental results, we can con- clude that there are some groups of problems for which EMT can improve the speed of convergence of the optimization process. 14 Intent Recognition and Drinking Detection For Assisting Kitchen-based Activities Carlo M. De Masi Simon Stankoski carlo.maria.demasi@ijs.si simon.stankoski@ijs.si Department of Intelligent Systems Department of Intelligent Systems Jožef Stefan Institute Jožef Stefan Institute Ljubljana, Slovenia Ljubljana, Slovenia Vincent Cergolj Mitja Luštrek vc2756@student.uni- lj.si mitja.lustrek@ijs.si Univerza v Ljubljani, Fakulteta za elektrotehniko Department of Intelligent Systems Department of Intelligent Systems Jožef Stefan Institute Jožef Stefan Institute Ljubljana, Slovenia Ljubljana, Slovenia ABSTRACT This paper is organized as follows. Section 2 discusses the related work. Section 3 presents the system architectures. Section We combine different computer-vision (pose estimation, object 4 describes the recognition modules of the system. Section 5 detection, image classification) and wearable based activity recog-shows the results of the recognition modules. Finally, Section 6 nition methods to analyze the user’s behaviour, and produce a concludes the paper. series of context-based detections (detect locations, recognize activities) in order to provide real-time assistance to people with mild cognitive impairment (MCI) in the accomplishment of every 2 RELATED WORK day, kitchen-related activities. 2.1 Drinking Detection From Wearables KEYWORDS Recent advances in the accuracy and accessibility of wearable sensing technology (e.g., commercial inertial sensors, fitness computer vision, activity recognition, object detection, pose esti-bands, and smartwatches) has allowed researchers and practi- mation tioners to utilize different types of wearable sensors to assess fluid intake in both laboratory and free-living conditions. 1 INTRODUCTION The necessity for fluid intake monitoring emerges as a result of people’s lack of awareness of their hydration levels. Dehydration Smart home technologies have been extensively adopted for mea- can lead to many severe health problems like organ and cognitive suring and decreasing the impact of Mild Cognitive Impairment impairments. Therefore, a system that can continuously track (MCI) on everyday life [9]. In the scope of the CoachMyLife (CML) the fluid intake and provide feedback to the user if useful. project we have been developing a system employing different In [1], the authors explored the possibility of recognizing drink-machine learning techniques with the aim of assisting persons ing moments from wrist-mounted inertial sensors. They used affected by MCI in performing activities in their apartments, with adaptive segmentation to overcome the problem with variable a particular focus on tasks related to the kitchen. length of the drinking gestures. They used random forest algo- In a previous work, we presented one of the first components rithm, trained with 45 features, and obtained an average precision of this system, i.e. a computer vision pipeline which allows to of 90.3% and an average recall of 91.0%. In [5], the authors em-detect the activity of drinking, by analyzing the video collected ployed a two-step detection procedure, enabling them to detect by an RGB camera through a 3D Convolutional Neural Network drinking moments and estimate the fluid intake. They extracted (3D-CNN) [12]. 28 statistical features, from which only six were selected using In the present paper, we present our work on extending said backward feature selection. Finally, they trained a Conditional pipeline, by discussing (i) a drinking-detection algorithm based Random Field model, resulting in a precision of 81.7% and re- on motion data from a wristband, which can be used to further call of 77.5%. In [4], the authors used a machine-learning based validate the one based on computer vision, and to replace it in model to detect hand-to-mouth gestures. Similarly as the previ- situations where the activity is not performed in front of the ous methods, they extracted 10 time-domain features and trained camera; (ii) a method based on pose detection to identify inter- a random forest classifier. They validated their method in a free- actions of the user with their environment, in order to perform living scenario and obtained precision of 84% and recall of 85%. intent recognition, and (iii) a possible new implementation of our Although remarkable results were achieved, the evaluation of the previous computer-vision pipeline for drinking detection that studies is limited and it is not showing the real-life performance. can be deployed on edge devices. Permission to make digital or hard copies of part or all of this work for personal 2.2 Activity Recognition From Videos or classroom use is granted without fee provided that copies are not made or In recent years, the problem of computer-based Human Activity distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this Recognition (HAR) of daily living has been tackled by different work must be honored. For all other uses, contact the owner /author(s). computer-vision methods. Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia HAR can be performed directly on RGB images and videos by © 2021 Copyright held by the owner/author(s). analyzing: (i) the spatial features in each frame, thus obtaining 15 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Trovato and Tobin, et al. predictions for each frame that can then be extended to the whole 4 INTENT RECOGNITION video by pooling or by a recurrent-based neural networks [2], (ii) One of the main goals of the CML project is to provide users with the temporal features related to motions and variations between real-time, context-based notifications to assist them in perform- frames [6], or (iii) some combination of the two [10]. ing activities. The most recent approaches aimed at simultaneous evaluation This is achieved in two steps. First, by combining computer- of both spatial and temporal features involve the usage of 3D- vision and the wearable device, the system detects real-time CNN, i.e., convolutional models characterized by an additional events, such as the position of the user, their interaction with third temporal dimension [12]. the environment, the displacement of a mug the user is expected An alternative approach, not involving the direct analysis of to drink from, the opening/closing of cabinet and fridge door, the whole frames, consists in exploiting the information provided drinking and eating. by human pose estimation, so that body keypoints coordinates, Then, these events are passed to the intent recognition module, reconstructed in a 2D or 3D space, can be fed to deep-learning which uses them to predict which activity the user is performing, models to provide predictions [3]. and provide assistance if needed. We adopted a Single Shot MultiBox Detector (SSD) [8] model, 3 ADOPTED HARDWARE pre-trained on the 80 classes of the COCO dataset [7] for the 3.1 Wristband detection of the user, and fine-tuned on a custom dataset we collected to locate the position of the mug. Pose estimation, which The drinking-detection procedure is implemented on a wrist-is used to track the movement of the user’s hands and detect band which is equipped with a nRF52840 System On Chip (SoC) interactions with domestic appliances, is achieved by a SimpleNet module. The SoC offers a large amount of Flash and RAM, 1MB model with a ResNet backbone [13]. and 256 kB, respectively. Additionally, it has protocol support for Bluetooth Low Energy (BLE). The architecture of nRF52840 is based on 32-bit ARM® Cortex™-M4 CP U with floating point 4.1 Regions of Interest unit running at 64 MHz. The wristbands power supply source During the initial setup, the user is asked to identify some regions is a battery with a capacity of 500 mAh. The measurements of of interest (ROIs) in the camera image, which can be either single accelerations and angular velocities are performed by the system-or double-zoned. in-package LSM6DSL, manufactured by STM. It is equipped with In the first case, the ROI is "activated" when the user’s feet are a 3D digital accelerometer and a 3D digital gyroscope based on within the selected region (Fig. 1a), whereas double-zone ROIs MEMS technology that operates at 0.65 mA in high-performance are used to detect if the user is in the desired area and/or if their mode and allows low power consumption with constant opera-hands are in the selected upper area (Fig. 1b). tion. The most prominent feature of the Inertial Measurement Unit (IMU) is a 4 kB FIFO (First In First Out) buffer, which stores the data of the accelerometer and gyroscope. This allows for very 4.2 Intent Recognition low power operation, as the SoC wakes up only when triggered The events detected by the computer vision pipeline are passed by an "FIFO full" interrupt event. to the intent recognition module, which predicts the activity the user is currently engaged on. 3.2 Local Deployment of The Computer Currently, this prediction is based on a set of pre-determined Vision System rules. A number of possible activities is manually inserted, each The computer vision pipeline for drinking detection we previ- formed by different steps, corresponding to possible events that ously developed for the project worked by retrieving the video can be detected by the computer vision system (Fig. 2a). Different stream collected by an IP camera in the user’s apartment, and activities can share one or more steps, and as the system detects analyzing it on a remote server. This approach, however, pre- the completion of the various steps, the list of possible ongoing sented issues related to the remote access to the camera, which activities gets reduced (Fig. 2b, 2c), until only one activity is can sometimes be blocked by the router’s firewall functionalities, identified and followed until its completion (Fig. 2d). and raised safety and privacy concerns with the users. If too long of a time interval passes between the completion For these reasons, we have been working on deploying the of two steps, the activity is classified as "interrupted", and the CML system on a local device. After some unsuccessful attempts system can show a notification to the user, asking if they require to implement the system on Android devices by using frame-assistance. 1 2 works such as Apache TVM or Deep Java Library (DJL) , we 3 opted for deployment on a Jetson NANO device . 4.3 Drinking Detection From Computer Direct deployment of our system on the device was possible, Vision on the Jetson NANO although not immediate, but the resulting performance was sub- optimal in terms of the FPS reached by the various detection The model we previously adopted to perform activity recognition algorithms (≈2 FPS for the object detection). To overcome this, from videos is particularly computationally expensive and so, we optimized said algorithms by TensorRT, a library built on although it proved to be very effective in the detection of drinking NVIDIA’s CUDA library for parallel programming, thus improv-events, it was not possible to implement it on the Jetson NANO. ing inference performance for deep learning models (≈22 FPS for For this reason, we are currently collecting a dataset of short the object detection). video clips, passing them through a pose estimation model, in order to obtain the 2D position of 18 body parts across a time 1 https://tvm.apache.org/ series of frames, with an associated class label for the frame series. 2 https://djl.ai/ 3 Then, this will be analyzed through an LSTM-based model to https://www.nvidia.com/en-us/autonomous-machines/embedded- systems/jetson-nano/ perform HAR. 16 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia (a) (b) Figure 1: Triggers based on user’s location and their interaction with the environment. Interact: cabinet Interact: cabinet Interact: table Interact: cabinet Interact: cabinet Interact: table Next to: sink Next to: fridge Interact: counter Next to: sink Next to: fridge Interact: counter Interact: stoves Interact: coffee machine Interact: table Interact: stoves Interact: coffee machine Interact: table BOIL MAKE EAT BOIL MAKE EAT WATER COFFEE SOMETHING WATER COFFEE SOMETHING (a) (b) Interact: cabinet Interact: cabinet Interact: table Interact: cabinet Interact: cabinet Interact: table Next to: sink Next to: fridge Interact: counter Next to: sink Next to: fridge Interact: counter Interact: stoves Interact: coffee machine Interact: table Interact: stoves Interact: coffee machine Interact: table BOIL MAKE EAT BOIL MAKE EAT WATER COFFEE SOMETHING WATER COFFEE SOMETHING (c) (d) Figure 2: As the computer vision system detects the completion of various steps, the list of possible ongoing activities gets reduced, until one of them is completed or interrupted. 4.4 Drinking Detection Using a Wearable wait for the next AWT event. Otherwise, if at least one prediction device is positive, the machine-learning procedure continues to work for another three new batches of data. Due to the desired minimum power consumption, the drinking The machine-learning method for detection of drinking ges- detection was implemented directly on the wristband. This is tures is based on time- and frequency-domain features. The raw preferable as it eliminates the need to transfer all the raw sensor data is segmented into 5-second windows and 216 features are data to a smartphone or some sort of central device. Raw sen- extracted in total. We used a relatively simple approach due to sor data transmission is clearly undesirable due to high power the memory limitation of the wristband. The deployed model consumption and it is not possible if the central device is not was trained using the drinking dataset described in Section 4.4.1 nearby. and additional non-drinking data collected in real-life scenario The first step of drinking detection using the wristband is [11]. to enable the IMU in activity/inactivity recognition mode. This allows the IMU to be in a low power state for the most part of 4.4.1 Drinking Dataset. For the aim of this study, we recruited the day. 19 subjects (11 males and 8 females). Each subject was equipped When activity is recognized the IMU enables absolute wrist with the wristband described in Section 3.1. We developed a detection (AWT) which checks if the angle between the horizontal custom application that ran on the wristband and collected three- plane and the Y axis of the IMU is larger than 30 degrees. If the axis accelerometer and three-axis gyroscope data at a sampling condition is met the IMU is enabled in batching mode, storing 4 frequency of 50 Hz. The dataset is publicly available and we accelerometer and gyroscope data in the FIFO buffer. Every time hope that it will serve researchers in future studies. the FIFO buffer is full, data is transferred to the SoC, where we We developed a general procedure for the participants to fol- directly start the machine learning pipeline. This procedure is low during the data collection process. The ground truth was repeated for three batches of IMU readings. If all three predictions registered manually by participants pressing a button on the from the machine learning model are non drinking, we disable wristband before performing the gesture and after finishing the the gyroscope, we stop the machine learning procedure and we 4 https://github.com/simon2706/DrinkingDetectionIJS 17 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Trovato and Tobin, et al. gesture. The data collection procedure included drinking from REFERENCES six different container types—namely, bottle, coffee cup, coffee [1] Keum San Chun, Ashley B Sanders, Rebecca Adaimi, Necole mug, glass, shot glass and wine glass. Streeper, David E Conroy, and Edison Thomaz. 2019. To- For each participant we collected 36 drinking episodes (3 fluid wards a generalizable method for detecting fluid intake level x 6 containers x 2 positions). The idea of the different fluid with wrist-mounted sensors and adaptive segmentation. level was to obtain drinking episodes with a short, medium and In Proceedings of the 24th International Conference on Intel- long duration. We also considered different body positions. The ligent User Interfaces, 80–85. participants first performed the drinking gestures while being [2] Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadar- seated and afterwards they repeated the same gestures while rama, et al. 2015. Long-term recurrent convolutional net- standing. works for visual recognition and description. In Proceed- ings of the IEEE conference on computer vision and pattern 5 RESULTS AND DISCUSSION recognition, 2625–2634. 5.1 Intent Recognition and Local [3] Giovanni Ercolano and Silvia Rossi. 2021. Combining cnn Implementation of Drinking Detection and lstm for activity of daily living recognition with a 3d matrix skeleton representation. Intelligent Service Robotics, A pilot phase will begin shortly, during which the intent recogni- 14, 2, 175–185. tion module will be evaluated. [4] Diana Gomes and Inês Sousa. 2019. Real-time drink trigger Regarding the new model for drinking detection, a prelimi- detection in free-living conditions using inertial sensors. nary test of our new approach, ran on a subset of the Berkeley Sensors, 19, 9, 2145. 5 Multimodal Human Action Database (MHAD) dataset , reached [5] Takashi Hamatani, Moustafa Elhamshary, Akira Uchiyama, an accuracy of over 90%, and we’ll extend the analysis to our case and Teruo Higashino. 2018. Fluidmeter: gauging the hu-once the dataset collection will be over. man daily fluid intake using smartwatches. Proceedings of 5.2 Wearable Sensing Results the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2, 3, 1–25. For evaluation, the leave-one-subject-out (LOSO) cross-validation [6] Ammar Ladjailia, Imed Bouchrika, Hayet Farida Merouani, technique was used. In other words, the models were trained on Nouzha Harrati, and Zohra Mahfouf. 2020. Human activ- the whole dataset except for one subject on which we later tested ity recognition via optical flow. Neural Computing and the performance. Applications, 32, 21, 16387–16400. For the drinking detection model, we considered several clas- [7] Tsung-Yi Lin, Michael Maire, Serge Belongie, et al. 2014. sifiers including logistic regression (LR), linear discriminant anal-Microsoft coco: common objects in context. (2014). arXiv: ysis (LDA), k-nearest neighbors (KNN), naive Bayes (NB) and 1405.0312 [cs.CV]. XGBoost. [8] Wei Liu, Dragomir Anguelov, Dumitru Erhan, et al. 2016. The obtained results are shown in Table 1. It can be clearly Ssd: single shot multibox detector. Lecture Notes in Com-seen that XGBoost outperforms all other classifiers. However, due puter Science, 21–37. issn: 1611-3349. doi: 10.1007/978- to the technical limitations described in Section 3.1 the trained 3- 319- 46448- 0_2. http://dx.doi.org/10.1007/978- 3- 319- model is unable to fit below 100 KB. The size of the LR model 46448- 0_2. is only 2 KB, which is optimal for our device. Furthermore, the [9] Maxime Lussier, Stéphane Adam, Belkacem Chikhaoui, results obtained with LR are only 0.03 lower compared to those Charles Consel, Mathieu Gagnon, Brigitte Gilbert, Sylvain from XGBoost. Therefore, we deployed the model trained with Giroux, Manon Guay, Carol Hudon, Hélène Imbeault, et al. the LR classifier. 2019. Smart home technology: a new approach for per- formance measurements of activities of daily living and 6 CONCLUSIONS prediction of mild cognitive impairment in older adults. We presented our work on drinking detection using wearables Journal of Alzheimer’s Disease, 68, 1, 85–96. and intent recognition/drinking detection using computer vision. [10] Karen Simonyan and Andrew Zisserman. 2014. Two-stream A pilot phase, beginning in October 2021, will provide thor- convolutional networks for action recognition in videos. ough testing of the functionalities described in the paper. Nonethe-In Advances in neural information processing systems, 568– less, the results obtained from the internal testing for each module 576. of the system show promising results for both drinking (with [11] Simon Stankoski, Marko Jordan, Hristijan Gjoreski, and both wearables and computer vision) and intent recognition. Mitja Luštrek. 2021. Smartwatch-based eating detection: data selection for machine learning from imbalanced data 5 http://tele-immersion.citris-uc.org/berkeley_mhad with imperfect labels. Sensors, 21, 5. issn: 1424-8220. doi: 10.3390/s21051902. https://www.mdpi.com/1424- 8220/21/ Table 1: Comparison of different classifiers for detection 5/1902. of drinking activity. [12] Gül Varol, Ivan Laptev, and Cordelia Schmid. 2017. Long- term temporal convolutions for action recognition. IEEE Method Precision Recall F1 score transactions on pattern analysis and machine intelligence, Logistic regression 0.87 0.77 0.81 40, 6, 1510–1517. Linear discriminant analysis 0.54 0.69 0.55 [13] Bin Xiao, Haiping Wu, and Yichen Wei. 2018. Simple base- K-nearest neighbors 0.84 0.69 0.75 lines for human pose estimation and tracking. In Proceed- Naive Bayes 0.68 0.85 0.74 ings of the European conference on computer vision (ECCV), XGBoost 0.89 0.81 0.84 466–481. 18 Anomaly Detection in Magnetic Resonance-based Electrical Properties Tomography of in silico Brains Ožbej Golob Alessandro Arduino Oriano Bottauscio University of Ljubljana Istituto Nazionale di Ricerca Istituto Nazionale di Ricerca Faculty of Computer and Metrologica Metrologica Information Science Torino, Italy Torino, Italy Ljubljana, Slovenia a.arduino@inrim.it o.bottauscio@inrim.it ozbej.golob@gmail.com Luca Zilberti Aleksander Sadikov Istituto Nazionale di Ricerca University of Ljubljana Metrologica Faculty of Computer and Torino, Italy Information Science l.zilberti@inrim.it Ljubljana, Slovenia aleksander.sadikov@fri.uni- lj.si ABSTRACT fields can easily penetrate into most biological tissues, making EPT suitable for imaging of the whole body. The MRI scans for Magnetic resonance-based electrical properties tomography (EPT) EPT are performed using a standard MRI scanner, and its spa- is one of the novel quantitative magnetic resonance imaging cial resolution is determined by MRI images and quality of used techniques being tested for use in clinical practice. This paper 𝐵 -mapping technique [9]. presents preliminary research and results of automated detection 1 The objective of this research was to develop and evaluate of anomalies from EPT images. We used in silico data based on algorithms to automatically detect anomalies of different sizes anatomical human brains in this experiments and developed two in the EPT images. The data consisted of in silico simulated algorithms for anomaly detection. The first algorithm employs a brain scans of phantoms that either contained an anomaly or standard approach with edge detection and segmentation while not. The evaluation was aimed towards answering whether an the second algorithm exploits the quantitative nature of EPT and anomaly can be detected or not, and how large an anomaly can works directly with the measured electrical properties (electrical be (reasonably) reliably detected. This represents an initial step conductivity and permittivity). The two algorithms were com-towards the potential clinical use of EPT. pared on – as of yet – noiseless data. The algorithm using the standard approach was able to quite reliably detect anomalies 2 METHODS roughly the size of a cube with a 14 mm edge while the EPT-based algorithm was able to detect anomalies roughly the size of a cube 2.1 Data Acquisition with a 12 mm long edge. The MRI acquisition of the EPT inputs has been simulated in a noiseless case. Thus, the result of the electromagnetic simulation KEYWORDS at RF has been directly converted in the acquired data, with no electrical properties tomography (EPT), magnetic resonance imag- further post-processing. Precisely, the 𝐵 field generated by a 1 ing (MRI), automatic anomaly detection, artificial intelligence current-driven 16-leg birdcage body-coil (radius 35, height 45) operated both in transmission and in reception with a polarisation 1 INTRODUCTION switch has been computed in presence of anatomical human heads with a homemade FEM–BEM code [2]. The simulations The frequency-dependent electrical properties (EPs), including have been conducted at 64 (i.e. the Larmor frequency of a 1.5 electrical conductivity and permittivity, of biological tissues pro-scanner). vide important diagnostic information, e.g. for tumour charac- The acquisitions of 19 human head models from the XCAT terisation [9]. EPs can potentially be used as biomarkers of the library [6] have been simulated. The considered population is healthiness of various tissues. Previous studies, not based on statistically representative of different genders and ages. For each magnetic resonance imaging (MRI), have shown that various head model, 10 different variants are considered: diseases cause changes of EPs in the tissue [3]. Electrical properties tomography (EPT) is used for quantita- (1) Two physiological variants with the original distribution tive reconstruction of EPs distribution at radiofrequency (RF) of the biological tissues. In one case, the nominal electrical with spatial resolution of a few millimetres. EPT requires no elec-conductivity provided by the IT’IS Foundation database [5] trode mounting and, during MRI scanning, no external energy is assigned to each tissue. In the other case, the electrical is introduced into the body other than the 𝐵 fields. Applied conductivity of white and grey matter is sampled from a 1 𝐵1 uniform distribution that admits a variation up to 10 with Permission to make digital or hard copies of part or all of this work for personal respect to the nominal value. This will be referred to as or classroom use is granted without fee provided that copies are not made or the physiological variability of the electrical conductivity. distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this (2) Eight pathological variants, in which a spherical patho-work must be honored. For all other uses, contact the owner /author(s). logical inclusion is inserted in the white matter tissue. Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia The radius of the inclusion ranges from 5 to 45 and its © 2021 Copyright held by the owner/author(s). electrical conductivity is set equal to that of the white 19 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Golob et al. matter increased by a factor uniformly sampled from 10 to 50 of the nominal value, because previous experimental results have shown that pathological tissues have higher EP values than healthy tissue [7, 8]. The location of the inclusion within the head is selected with a random procedure and only its intersection with the white matter tissue is kept in the final model (see Fig. 1 panels a and d). All the pathological variants take into account the physiological variability in the determination of the white and grey matter electrical conductivity. 2.2 Reconstruction Techniques In order to retrieve the distribution of the electrical conductiv- ity, the phase-based implementations of Helmholtz-EPT (H-EPT) Figure 1: Median electrical conductivity distribution by re- and convection-reaction–EPT (CR-EPT) provided by the open- gions. (a) Segmented healthy MRI image. (b) Median elec- source library EPTlib [1] have been used. For each head model, trical conductivity distribution. (c) Detected regions (bor-the distribution of the transceive phase [3] (input of phase-based dered red). (d) Segmented pathological MRI image (anom-EPT) is obtained by linearly combining the phases of the rotat- aly is yellow). (e) Median electrical conductivity distribu- ing components of 𝐵 simulated both in transmission and in 1 tion. (f) Detected regions (bordered red). Please note that reception [1]. not all of the regions are visible as only a 2D slice is shown Since noiseless inputs are considered, the smallest filter has while the data is 3D. been used both in H-EPT and in CR-EPT. Moreover, CR-EPT has been applied for a volume tomography, with an electrical conductivity of 0.1 forced at the boundaries and an artificial −4 diffusion coefficient equal to 10 . biomarker of healthy brain. Mandija et al. [4] presented mean Currently, the proposed anomaly detection algorithms have electrical conductivity and standard deviation of white and grey been tested only on the H-EPT results. matter as a reliable measure of whether the brain contains patho- logical tissue. 2.3 Anomaly Detection In input data for our experiments, electrical conductivity is We developed two anomaly detection algorithms: (i) a more clas- distributed from 90% to 110% of nominal value for white mat- sical approach for anomaly detection in MR images and (ii) an ter, and from 110% to 150% for anomalies. However, it must be EPT-based approach working with direct quantitative properties noted that these are the values used for setting up the phantoms, estimated by the MRI-based EPT. and that these values are then only approximated when EPT re- construction is performed. These reconstructed properties have 2.3.1 Classical Approach. The classical approach uses standard been used as input for anomaly detection. The algorithm detects techniques used for anomaly detection in MR images. This ap- anomalies based on the difference between white matter and proach could be applied (also) on standard MR images as it is anomalies. The algorithm uses noiseless EPT images, produced independent of the MRI technique. The algorithm uses noiseless with H-EPT, as input data. EPT images, produced with Helmholtz reconstruction technique, The algorithm, as the classical one, receives as input previ- as input data. ously segmented white matter from the whole EPT image. It The algorithm receives previously segmented (this segmenta- then detects all voxels that have electrical conductivity between tion was not of interest in this research) white matter from the 110% and 150% of median electrical conductivity of white matter EPT image and detects the edges in it. The edges are detected and marks them as a potential anomaly. These voxels, marked using a simple gradient edge detection technique. The gradient as potentially being an anomaly, are then grouped into regions is calculated for each voxel based on the directional change of based by their location. The algorithm ignores all smaller regions electrical conductivity of neighbouring voxels. The edges are (below a set size threshold) that likely represent noise and reconrepresented as borders between white matter and other brain struction errors. All the remaining regions are classified as the tissues as well as borders between white matter and anomalies. anomaly. Edge voxels are ignored in order to avoid H-EPT reconstruction errors, which occur at borders between tissues [4]. 3 RESULTS The algorithm then calculates median electrical conductivity Figure 2 shows the predictions of whether an image contains an of all regions as separated by the detected edges. Figure 1 shows anomaly or not for both algorithms – classical on the left (a) and median electrical conductivity distribution by regions in a sample EPT approach on the right (b). Each EPT image corresponds to image. one bar on the chart and they are arranged with the increasing The k-means algorithm is then employed for the classifica- size of the anomaly; the size of the bar represents the size of tion of regions into healthy and anomaly-containing ones. The the anomaly in voxels. The bars are cut off at 2,000 voxels for algorithm classifies an MR image based on median electrical con- easier viewing. Only images actually containing the anomaly are ductivity of each region. The anomaly location is associated with shown; for the others the false positives (FP) rate describes the the regions detected as containing the anomaly. performance of the two algorithms. The green colour represents 2.3.2 EPT Approach. EPT differs from standard MRI techniques correct predictions and the red colour the incorrect ones. The by representing EPs as quantitative values. EPs are a reliable yellow colour means that the algorithm correctly predicted the 20 Anomaly Detection in MR-based EPT of in silico Brains Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Table 1: Classification evaluation of the classical ap- Table 3: Classification evaluation of the EPT approach. proach. Measure Training data Test data Measure Training data Test data Precision 0.976 0.971 Precision 0.975 1.000 Recall 0.769 0.708 Recall 0.750 0.708 F1 score 0.860 0.819 F1 score 0.848 0.829 Accuracy 0.800 0.750 Accuracy 0.785 0.767 Table 4: Localisation evaluation of the EPT approach. Table 2: Localisation evaluation of the classical approach. Measure Training data Test data Measure Training data Test data IoU 0.381 ± 0.140 0.435 ± 0.125 IoU 0.197 ± 0.116 0.244 ± 0.110 Precision 0.874 ± 0.208 0.900 ± 0.177 Precision 0.932 ± 0.202 0.988 ± 0.050 Recall 0.396 ± 0.142 0.450 ± 0.126 Recall 0.204 ± 0.123 0.245 ± 0.110 F1 score 0.535 ± 0.166 0.594 ± 0.142 F1 score 0.313 ± 0.163 0.379 ± 0.143 Analogously, Table 3 shows the results of classification evaluation of the EPT approach and Table 4 shows the results of presence of the anomaly, but for the wrong reasons (hence Inter-localisation evaluation of the EPT approach. Again, IoU and F1 section over Union (IoU) is zero) – these cannot be counted as score values are reduced as the result of ignoring anomaly edge correct performance. Some misclassifications are labeled with the voxels. most likely cause: either that the anomaly is scattered in several An example of anomaly localisation is shown in Figure 3. As smaller regions (each below the detection threshold size) or, in shown in the image, the EPT approach is generally better at case of the EPT approach, that the anomaly is too close to the top anomaly localisation than the classical approach. border and is ”overshadowed” by the cranium. For the unlabelled misclassifications the most likely reason is the small size of the 4 DISCUSSION AND CONCLUSIONS anomaly. Figure 2 captures rather well the minimal anomaly size where The results indicate potential for future use of the EPT technique each algorithm starts performing quite reliably. The classical for the anomaly detection in clinical practice. The results in terms approach detects anomalies larger than 350 voxels and the EPT of the anomaly size are on par with what a trained radiologist is approach detects anomalies larger than 170 voxels. Since each able to detect manually. voxel represents a cube with a 2 mm edge, these volumes trans- EPT, being a quantitative technique, offers the advantage of late roughly to a cube with the edge of 14 mm for the classical comparability of the images (e.g. in longitudinal monitoring of the approach and a cube with the edge of slightly less than 12 mm patient) compared to the standard qualitative MRI. Furthermore, for the EPT approach. the direct EPT approach performed better than the classical one Tables 1-4 further clarify the results. The images were split via edge detection. It is also less complex and this can often be a into a training set, used to optimise several internal parameters bonus in practical applications. and a test set for independent evaluation. Internal parameters of However, this is a pilot study and further research is required the classical approach specify: (i) minimum gradient value for to put these approaches into actual practice. The biggest limita- a voxel to be recognized as an edge; (ii) electrical conductivity tion of the presented study and results is that the images, while difference between anomaly and healthy tissue; (iii) minimum being an actual EPT reconstruction, were deliberately noiseless. region size. Internal parameters of the EPT approach specify: (i) With the introduction of noise the data would very much resem- how many initial slices of white matter are ignored (to avoid ble the actual in vivo cases, however the obtained results will reconstruction errors); (ii) minimum region size. The split, while likely be worse. A lot of further work, mostly on noise reduction random in nature, was made based on individual phantom heads and detection in presence of noise is likely still required. – the same head with different anomalies simulated could not be Moreover, currently only the data captured using H-EPT is both in the test and training set. The training set consisted of 130 used. This technique causes (large) reconstruction errors which images (including 26 not containing an anomaly), and the test set occur at the borders between tissues. The results could poten- consisted of 60 images (including 12 not containing an anomaly). tially be improved by combining H-EPT and CR-EPT [1], as the Table 1 shows the results of classification evaluation of the latter technique does not cause reconstruction errors at borders classical approach and Table 2 shows the results of localisation between tissues. evaluation using the classical approach. The localisation results The anomaly localisation could also be improved by not ignor- are reported as mean ± standard deviation of electrical conduc- ing edges. The edges would still be removed when anomalies are tivity. The values of IoU and F1 score for localisation are lower as detected, however, once an anomaly is detected, the edges around a result of ignoring anomaly edge voxels. Anomaly edge voxels the anomaly could be classified as anomaly, thus improving the are ignored because of H-EPT reconstruction errors. This is not IoU and the F1 score. an issue for anomaly detection as values of precision are still In addition to the mean value of electrical conductivity, the high. Values of IoU and F1 score of localisation will be improved standard deviation of the electrical conductivity could also be by acknowledging edges of anomaly after it is already detected. taken into account when detecting edges and anomalies. 21 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Golob et al. Figure 2: Predictions of anomaly detection algorithms. (a) Classical approach. (b) EPT approach. Kuster. 2018. IT’IS database for thermal and electromag- netic parameters of biological tissues. Version 4.0. (2018). doi: 10.13099/VIP21000- 04- 0. [6] W.P. Segars, B.M.W. Tsui, J. Cai, F.-F. Yin, G.S.K. Fung, and E. Samei. 2018. Application of the 4-D XCAT phantoms in biomedical imaging and beyond. IEEE Transactions on Medical Imaging, 37, 3, 680–692. [7] Andrzej J. Surowiec, Stanislaw S. Stuchly, J. Robin Barr, and Arvind Swarup. 1988. Dielectric properties of breast Figure 3: Anomaly localization. (a) Segmented pathologi- carcinoma and the surrounding tissues. IEEE Transactions cal MRI image. (b) Localization of classical approach (de- on Biomedical Engineering, 35, 4, 257–263. tected anomaly is red). (c) Localization of EPT approach [8] B.A. Wilkinson, Rod Smallwood, A. Keshtar, J. A. Lee, and (detected anomaly is red). F.C. Hamdy. 2002. Electrical impedance spectroscopy and the diagnosis of bladder pathology: a pilot study. The Jour- nal of urology, 168, 4, 1563–1567. Finally, once results achieved on EPT images of phantom brain [9] Xiaotong Zhang, Jiaen Liu, and Bin He. 2014. Magnetic- are satisfactory, implemented approaches could be tested on in resonance-based electrical properties tomography: a review. vivo data. IEEE Reviews in Biomedical Engineering, 7, 87–96. doi: 10. ACKNOWLEDGMENTS 1109/RBME.2013.2297206. The results presented here have been developed in the frame- work of the EMPIR Project 18HLT05 QUIERO. This project has received funding from the EMPIR programme co-financed by the Participating States and from the European Union’s Horizon 2020 research and innovation programme. REFERENCES [1] A. Arduino. 2021. EPTlib: an open-source extensible collec- tion of electric properties tomography techniques. Applied Science, 11, 7, 3237. [2] O. Bottauscio, M. Chiampi, and L. Zilberti. 2014. Massively parallelized boundary element simulation of voxel-based human models exposed to MRI fields. IEEE Transactions on Magnetics, 50, 2, 7025504. [3] Jiaen Liu, Yicun Wang, Ulrich Katscher, and Bin He. 2017. Electrical properties tomography based on 𝐵 maps in MRI: 1 principles, applications, and challenges. IEEE Transactions on Biomedical Engineering, 64, 11, 2515–2530. doi: 10.1109/ TBME.2017.2725140. [4] Stefano Mandija, Petar I. Petrov, Jord J. T. Vink, Sebastian F. W. Neggers, and Cornelis A. T. van den Berg. 2021. Brain tissue conductivity measurements with MR-electrical prop- erties tomography: an in vivo study. Brain topography, 34, 1, 56–63. [5] Hasgall P.A., Di Gennaro F., C. Baumgartner, E. Neufeld, B. Lloyd, M.C. Gosselin, D. Payne, A. Klingenböck, and N. 22 Library for Feature Calculation in the Context-Recognition Domain Vito Janko Matjaž Boštic Jožef Stefan Institute Jožef Stefan Institute Department of Intelligent Systems Department of Intelligent Systems Ljubljana, Slovenia Ljubljana, Slovenia vito.janko@ijs.si bosticmatjaz@gmail.com Junoš Lukan Gašper Slapničar Jožef Stefan Institute Jožef Stefan Institute Department of Intelligent Systems Department of Intelligent Systems Ljubljana, Slovenia Ljubljana, Slovenia Jožef Stefan International Postgraduate School Jožef Stefan International Postgraduate School Ljubljana, Slovenia Ljubljana, Slovenia junos.lukan@ijs.si gasper.slapnicar@ijs.si ABSTRACT of a new context recognition system can be tedious and time- consuming. It usually consists of collecting relevant sensor data, Context recognition is a mature artificial intelligence domain parsing it to a suitable format, calculating features based on this with established methods for a variety of tasks. A typical machine data and finally training the model. learning pipeline in this domain includes data preprocessing, fea- In this work we present a Python library focused on streamlin- ture extraction and model training. The second of these steps is ing this process. Its main functionality is calculating the features typically the most challenging, as sufficient expert knowledge from sensor data. It can generate over a hundred different fea- is required to design good features for a particular problem. We tures that have proven themselves in various context-recognition present a Python library which offers a simple interface for fea- projects we tackled in the past [4, 3, 5]. Loosely, the features can ture calculation useful in a myriad of different tasks, from activity be divided in two categories: those suitable for motion data (e.g. recognition to physiological signal analysis. It also offers addi- generated by accelerometer or gyroscope) and those specialized tional useful tools for data preprocessing and machine learning, for physiological signals. such as a custom wrapper feature selection method and predic- Furthermore, the library implements some other functionalit- tion smoothing using Hidden Markov Models. The usefulness ies that are often used in context recognition pipelines: reshaping and usage is demonstrated on the 2018 SHL locomotion challenge data into windows, re-sampling the data, selecting the best fea- where a few simple lines of code allow us to achieve solid predicttures after generating them and a method for smoothing the ive performance with F1 score of up to 93.1, notably surpassing final predictions of the classifier using a Hidden Markov Model the baseline performance and nearing the results of the winning approach. submission. To demonstrate the usefulness of the library we used its func- KEYWORDS tionalities exclusively (with the exception of a generic Random Forest classifier [11]) on the SHL Challenge dataset [16]. We feature calculation, python library, context recognition, machine demonstrate the whole pipeline, from reading in the raw data learning to the finished context-recognition system that is comparable to 1 INTRODUCTION the best-performing submissions in the SHL Challenge. Context recognition is a vague term encompassing a variety of 2 LIBRARY FUNCTIONALITIES tasks where sensors are put on (or around) a person and are then used to determine something about them. For example, The library is implemented in Python as this has been the most sensors in a smartphone can determine if a user is standing, popular data science language in recent years [6]. It is available in walking, running or even falling. A wristband sensor can read a public repository with pip install cr-features command. physiological signals like heart-rate or sweating to determine Its main and most valuable functionality lies in feature gener- stress or blood pressure. These kinds of applications are usually ation. The ‘motion features’ are listed in Section 2.1, while the used for self monitoring in sport activities or for helping the ‘physiological features’ are described in Section 2.2. Remaining users manage various medical conditions. non-feature related functionalities are explained in Section 2.3. The context-recognition field is quite mature and its applic- ations often come pre-installed in many commercial devices 2.1 Motion Sensors Features like wristbands and smartphones. Nonetheless, the development Features listed in the first two subsections are general and can be Permission to make digital or hard copies of part or all of this work for personal applied on any sensor data time-series. The last subsection (2.1.3), or classroom use is granted without fee provided that copies are not made or on the other hand, lists features that have an additional semantic distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this interpretation for acceleration and require data from three (x,y,z) work must be honored. For all other uses, contact the owner /author(s). axes. The library defines similar sensor subsets for some other Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia sensors (e.g. gyroscope). Only a subset of features is listed for © 2021 Copyright held by the owner/author(s). brevity, while the full list can be found in the documentation [1]. 23 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Vito Janko, Matjaž Boštic, Junoš Lukan, and Gašper Slapničar 2.1.1 General Statistical Features. between sympathetic and parasympathetic regulation of heart • beat [10] and is thus an especially useful physiological indicator. Basic statistical measures: maximum, minimum, standard Calculation of features related to cardiovascular activity fol- deviation, median, mean difference between samples. • lows recommendations by Malik, Bigger, Camm, Kleiger, Malli- Number of peaks – useful for detecting and counting steps, ani, Moss and Schwartz [8]. To describe heart rate variability, the estimating the energy expenditure and determining the Fourier transform of inter-beat intervals is calculated and then frequency of motion: peak count, number of times data several frequency features are derived from the spectrum [5]. crossed its mean value, longest time data was above or below its mean value. 2.2.2 Skin Conductivity. Electrical conductivity of the skin var- • Different data aggregations that can indicate the intensity ies due to physiological changes in sweat glands, which are con- of the activity: (squared) sum of values, sum of absolute trolled by the autonomic nervous system. In a simple model of values. resistive properties of skin and sweat glands, whenever the level • Autocorrelations (i.e. how similar the data is to a shifted of sweat in the glands is increased, its conductivity also increases version of itself ) which indicate periodicity: autocorrela- [2]. Sweat glands thus act as variable resistors and actual sweat-tion for raw data, for peak positions, for mean crossings. ing, that is sweat secretion from the glands, is not needed for this • Data shape: skewness (a measure of symmetry, or more change to be measurable. precisely, the lack of symmetry), kurtosis (a measure of Changes in skin conductivity are not only triggered by other whether the data is heavy-tailed or light-tailed relative to physiological changes, such as the ones in (skin) temperature, a normal distribution), interquartile range. but also reflect psychological processes. Skin conductivity can indicate cognitive activity or emotional responses and can do so 2.1.2 Frequency Features. They are calculated by first comput- with good sensitivity [see 7, for an exhaustive review]. ing an estimate of the power spectral density of the signal via a Sweat glands continuously adapt to their environment and periodogram. We used the Welch’s method which is an improve- their reactions can be slow or fast. Two main modes of fluctu- ment over the traditional methods in that it reduces noise in the ations are therefore distinguished: skin conductance level changes, estimated power spectra. which are slow variations of the general trend, also called tonic Once the periodogram is obtained, the following features are electrodermal measures, and skin conductance responses, quick computed: the magnitude value of the three highest peaks in reactions, also called phasic electrodermal measures [13]. periodogram, the three highest frequencies corresponding to To calculate skin conductivity features the two components the highest peaks, energy of the signal calculated as the sum are first separated. This is done using the EDA Explorer library of squared FFT component magnitudes, entropy of the signal [14] which enables searching for peaks (SCRs) in the signal by computed as the information entropy of the normalized FFT com-specifying their desired characteristics. ponent magnitudes, and the distribution of the FFT magnitudes The signal is first filtered using a Butterworth low-pass filter into 10 equal sized bins ranging from 0 Hz to 𝐹 /2, where 𝐹 is 𝑠 𝑠 from SciPy [15]. Next, the peaks are detected by considering their the sampling frequency. Finally, we also computed the previously amplitude, onset, and offset time. described skewness and kurtosis for the periodogram. Once the SCRs are found, their characteristics are calculated Most of the described features are useful for finding different which can be used as features. These include their number and periodic patterns, how often they occur and how intense they rate (relative frequency in time) as well as the means and maxima are. of various characteristics, such as their maximum amplitude, their 2.1.3 Accelerometer Features. duration, increase time etc. The tonic component is calculated using peakutils [9]. It • Phone rotation estimation. First, roll and pitch are calcu- is detected as the signal baseline, fitting a 10-th degree polyno- lated, then we calculate their characteristics: mean, stand- mial to the signal. Similarly to the phasic component, statistical ard deviation, peaks, autocorrelations. features are calculated, such as the difference between this com- • Physical interpretations: velocity, kinetic energy. ponent and the raw signal, and the sum of its derivative. • Comparing data axis; useful for determining the sensor orientation relative to the direction of motion: correlation 2.2.3 Skin Temperature. Skin temperature is a fairly simple phy- between axis data, comparing their means, mean direction siological parameter, both from the point of view of measurement of the vector they form. as well as feature calculation. It can still serve as an indicator of affect [7]. Unlike the other physiological parameters which make 2.2 Physiological Features use of expert features only some generic statistical features are Physiological features are useful for obtaining information about calculated for this indicator. a person’s physiological state, typically reflected in their cardi-2.3 Other Functionalities ovascular response. We computed several features from signals obtainable from many modern wristbands as described in the The following functionalities are not directly related to the fea- sections below. ture generation but are nonetheless often used in conjunction with it – and can thus make the workflow more straightforward. 2.2.1 Heart Rate and Heart Rate Variability. Cardiovascular meas- ures are widely used to predict both medical problems as well as 2.3.1 Resize, Resample. The presented library works with raw psychological processes [7]. They range from simple heart rate data in matrix form: each row representing one window of data, calculations to more complex heart rate variability indicators. i.e. one instance. If the original data is in the form of 1D time- Heart rate variability is a measure of how quickly heart rate it- series, the convertInputInto2d function can reformat it in the self changes and it is usually calculated on a beat-by-beat basis, required format. It can work both with windows of fixed number considering the inter-beat interval (IBI). It reflects the interaction of data samples as well as windows representing a fixed time 24 Feature library Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia interval. Another frequent pre-processing step is down-sampling In the actual challenge, the subset used was the data recor- the data and it can be done with the resample function. ded by one of the three participants, which included 82 days of recording, split into the training set (271 hours) and testing 2.3.2 Wrapper Feature Selection. While many feature selection set (95 hours). Raw data from 7 sensors was provided: acceler- libraries already exist (e.g. scikit-learn [11]), we implemented ometer, gyroscope, magnetometer, linear acceleration, gravity, another one in this library as it was frequently used in our pre- orientation and air pressure. All were sampled at 100 Hz [16]. vious work [4, 3]. It combines the relatively common ‘wrapper’ Data was split into 1-minute segments using a sliding window approach with reducing the feature count using correlations. It without overlap and then randomly shuffled, providing consistent works in three steps: instances. Finally, the training data had 16 310 such instances and (1) Calculate the information gain for every feature and rank test data had 5698, where each instance contained 6000 samples. them based on it. This highlights the sheer size of the data and the challenges in (2) Calculate the correlation between each feature pair. If the processing it in full. correlation exceeds the given threshold, discard the one with lower information gain. 3.2 Methods (3) Create the classifier using only the highest ranking feature and measure the accuracy using a validation set. Then add We used a traditional ML pipeline for this task: first preprocessing the second feature and measure the accuracy again. If it the data, then computing informative features, selecting the best was the same or higher, keep the feature, otherwise discard of them and finally using them to train and evaluate a classifica- it. Repeat for all other remaining features. tion model. We added another not so traditional step: smoothing the predictions using HMM. 2.3.3 Hidden Markov Model Smoothing. The final functionality All steps except training and evaluation were done in few lines is a tool to post-process the predictions of the context-recognition using the presented library; the Python code (with some missing system – taking into account the temporal dependencies between steps in comments) is given below. All classification was done the instances. using scikit-learn implementation of Random Forest with default Take an example in which the classifier predicts the following parameters. minute-by-minute sequence: ‘subway’, ‘subway’, ‘bus’, ‘subway’, ‘subway’. It is far more likely that the ‘bus’ prediction is a mis-from CalculatingFeatures import resample, classification than switching vehicles for just a minute. calculateFeatures, selectFeatures, hmm_smoothing Such a sequence can be corrected using a Hidden Markov Model (HMM). This model assumes that there are hidden states # Data was already windowed # Data was resampled from 100 Hz to 20 Hz corresponding to real activities which emit visible signals – clas-acc_x = pd.read_csv(path, sep=" ") sifications. The parameters of this models can be inferred from acc_x = resample(acc_x, 6000, 1200) the matrix of transitions probabilities and confusion matrix of # Repeat for all data types (and axes) the predictor. features_train = calculateFeatures( Once the parameters are estimated, the Viterbi algorithm is acc_x, used in the background to determine the most likely sequence acc_y, of hidden states (activities) given visible emissions (predictions). acc_z, In many domains [4, 12] this method significantly improves the featureNames=accelerationNames, final prediction accuracy. prefix="acc", While this method is least connected to the feature generation, ) # Repeat for all data types and train/test/valid sets we have not seen it implemented in a different library and have # Merge in one dataframe found it greatly useful. selected = selectFeatures( 3 USAGE EXAMPLE features_train, features_validation ) We illustrate the usage of our library with an example: The Sussex-f1, cf, predictions = evaluate( Huawei Locomotion Challenge 2018 [16]. This was a worldwide features_train[selected], open activity recognition challenge with monetary incentives, features_test[selected], labels_train, organized as part of the HASCA workshop within UbiComp labels_test, conference. 17 teams participated with 19 submissions. The goal ) was to train a recognition pipeline on the provided training data smoothed = hmm_smoothing(labels_train, cf, predictions) and then use it to classify the withheld test data as well as possible # smoothed is an array representing final output in terms of the 𝐹 score metric. 1 3.1 SHL Dataset 3.3 Results The challenge used a subset of the full dataset which was recorded over a period of 7 months by 3 participants engaging in 8 different We compared the results – in terms of 𝐹 score – of different 1 modes of transportation (still, walk, run, bike, car, bus, train and stages in the machine learning pipeline against the top three subway). The phones were worn on 4 body positions, namely the submissions in the competition. hand, torso, hip pocket and in a backpack and recorded 16 sensor In the first stage we used just the mean and standard devi- modalities simultaneously. This totalled to 2812 hours of labelled ation as features (and calculated them for each data modality) to data and this is considered one of the largest such datasets openly provide a baseline solution. Next, we calculated some features us-available [16]. ing the presented library. We then selected only a subset of them 25 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Vito Janko, Matjaž Boštic, Junoš Lukan, and Gašper Slapničar Table 1: A comparison of different versions of the pipeline, Mlakar, Mitja Luštrek et al. 2020. Classical and deep learn- against the best submissions in the SHL Challenge. The ing methods for recognizing human activities and modes number of features used in our methods is also listed. of transportation with smartphone sensors. Information Fusion, 62, 47–62. Experiment # features [5] Martin Gjoreski, Mitja Luštrek, Matjaž Gams and Hristijan 𝐹 score 1 Gjoreski. 2017. Monitoring stress with a wrist device using Baseline 38 80.3 context. Journal of Biomedical Informatics, 73, 159–170. All features 298 87.7 doi: 10.1016/j.jbi.2017.08.006. Feature selection 130 87.1 [6] Harshil. 2021. Tools of the trade: a short history. https : HMM 130 93.1 / / www . kaggle . com / haakakak / tools - of - the - trade - a - Third place / 87.5 short- history/. Accessed: 2021-09-20. (2021). Second place / 92.4 [7] Sylvia D. Kreibig. 2010. Autonomic nervous system activ- First place / 93.9 ity in emotion: a review. Biological Psychology, 84, 3, 394– 421. doi: 10.1016/j.biopsycho.2010.03.010. [8] M. Malik, J. T. Bigger, A. J. Camm, R. E. Kleiger, A. Malliani, and again measured the performance. Finally, we used the HMM A. J. Moss and P. J. Schwartz. 1996. Heart rate variability: smoothing; a post-processing step described in Section 2.3.3. standards of measurement, physiological interpretation, Results are shown in Table 1. It shows that the features gener-and clinical use. European Heart Journal, 17, 3, 354–381. ated by the library substantially improve the performance. The doi: 10.1093/oxfordjournals.eurheartj.a014868. feature selection, on the other hand – while significantly reducing [9] Lucas Hermann Negri and Christophe Vestri. 2017. Lucash- the number of features required – did not increase performance. n/peakutils: v1.1.0. (2017). doi: 10.5281/ZENODO.887917. Of note, the performance did increase in the internal validation [10] M Pagani, F Lombardi, S Guzzetti, O Rimoldi, R Furlan, set, but this gain did not translate to the test set. The final jump P Pizzinelli, G Sandrone, G Malfatto, S Dell'Orto and E in performance was achieved using the HMM smoothing and we Piccaluga. 1986. Power spectral analysis of heart rate and highly recommend this method in this and similar domains. arterial pressure variabilities as a marker of sympatho- Using just the methods in the presented library and no para- vagal interaction in man and conscious dog. Circulation meter or method tuning we achieved the results comparable with Research, 59, 2, 178–193. doi: 10.1161/01.res.59.2.178. the first placed submission to the challenge. [11] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, 4 CONCLUSION V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. In this paper we demonstrated the base usage of a Python library Brucher, M. Perrot and E. Duchesnay. 2011. Scikit-learn: capable of calculating features suitable for the context recognition machine learning in Python. Journal of Machine Learning domain. The most important features that can be calculated are Research, 12, 2825–2830. listed in this paper with specialized ones thoroughly described. [12] Clément Picard, Vito Janko, Nina Reščič, Martin Gjoreski We also showed on a topical example (SHL Challenge dataset) and Mitja Luštrek. 2021. Identification of cooking pre- how only a few lines of code can generate a very capable context- paration using motion capture data: a submission to the recognition system that can compete with the best entries submit- cooking activity recognition challenge. In Human Activity ted to this challenge. Such system can be improved with extensive Recognition Challenge. Springer, 103–113. tuning but we provide a solid starting point. [13] Society for Psychophysiological Research Ad Hoc Com- It is our hope that by making this library publicly available mittee on Electrodermal Measures. 2012. Publication re- we can help the workflow of many future context-recognition commendations for electrodermal measurements. Psycho- researchers. physiology, 49, 8, 1017–1034. doi: 10.1111/j.1469- 8986. 2012.01384.x. ACKNOWLEDGMENTS [14] Sara Taylor, Natasha Jaques, Weixuan Chen, Szymon Fedor, We acknowledge the financial support from the Slovenian Re- Akane Sano and Rosalind Picard. 2015. Automatic identi- search Agency (research core funding No. P2-0209). fication of artifacts in electrodermal activity data. In 2015 37th Annual International Conference of the IEEE Engin- REFERENCES eering in Medicine and Biology Society (EMBC). IEEE. doi: 10.1109/embc.2015.7318762. [1] Matjaž Boštic, Vito Janko, Gašper Slapničar, Jakob Valič [15] Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt and Junoš Lukan. 2021. cr-features. A library for feature Haberland, Tyler Reddy, David Cournapeau, ..., Paul van calculation in the context-recognition domain. https : / / Mulbregt and SciPy 1.0 Contributors. 2020. SciPy 1.0: Fun- repo . ijs . si / matjazbostic / calculatingfeatures. Accessed: damental Algorithms for Scientific Computing in Python. 2021-09-20. (2021). Nature Methods, 17, 261–272. doi: 10.1038/s41592- 019- [2] Wolfram Boucsein. 2012. Electrodermal activity. Springer 0686- 2. Science & Business Media. [16] Lin Wang, Hristijan Gjoreski, Kazuya Murao, Tsuyoshi [3] Božidara Cvetković, Robert Szeklicki, Vito Janko, Przemyslaw Okita and Daniel Roggen. 2018. Summary of the sussex- Lutomski and Mitja Luštrek. 2018. Real-time activity mon- huawei locomotion-transportation recognition challenge. itoring with a wristband and a smartphone. Information Fusion In Proceedings of the 2018 ACM International Joint Con- , 43, 77–93. ference and 2018 International Symposium on Pervasive [4] Martin Gjoreski, Vito Janko, Gašper Slapničar, Miha Mlakar, and Ubiquitous Computing and Wearable Computers. ACM, Nina Reščič, Jani Bizjak, Vid Drobnič, Matej Marinko, Nejc 1521–1530. doi: 10.1145/3267305.3267519. 26 Določanje slikovnega prostora na umetniških slikah Reconstruction of image space depicted on artistic paintings Nadezhda Komarova Borut Batagelj Gregor Anželj Narvika Bovcon nadezhdakomarova7@gmail.com Franc Solina gregor.anzelj@gmail.com borut.batagelj@f ri.uni- lj.si Gimnazija Bežigrad narvika.bovcon@f ri.uni- lj.si 1000 Ljubljana, Slovenia franc.solina@f ri.uni- lj.si Fakulteta za računalništvo in informatiko Univerza v Ljubljani POVZETEK dostopne na medmrežju, na primer Google Arts and Culture, Wi- kimedia Commons, Getty Open Content Program, ADA (Archive V članku poročamo o analizi slikovnega prostora na umetniških of Digital Art) in druge [4]. Z analizo in vizualizacijo velikih ume-slikah s pomočjo metod računalniškega vida. Naš cilj je bil, da tniških zbirk se je prvi začel ukvarjati Lev Manovich [8]. Leta ugotovimo, ali je možno zgolj na osnovi zaznave obrazov na sli- 2012 je preučeval vizualizacijske metode za družboslovne vede kah določiti prostorsko organizacijo slike. Analiza je potekala in medijske raziskave. Ukvarjal se je z informativno, uporabno na izbranem vzorcu 3356 slik. Najprej smo določili tridimenzi- in estetsko vrednostjo vizualizacij [9]. onalne koordinate zaznanih obrazov na posamezni sliki. Nato Analiza razlik med predstavitvijo prostora s fotografijo in ume- smo tem točkam priredili ravnino. Slikovni prostor smo tako do- tniško sliko je bila narejena leta 2014 [11]. S statistično analizo ločili z enačbo prirejene ravnine oziroma kotom med to ravnino slik tihožitij, ki so jih ustvarili udeleženci eksperimenta, so ugoto-in slikovno ravnino. Bolj kot je ravnina, ki jo določajo obrazi, vili, da so predmeti, na katere so udeleženci usmerjali pozornost, nagnjena od navpične smeri, globlji je prikazani slikovni prostor. naslikani večji kot so na fotografijah. Zato je vprašanje, ali je KLJUČNE BESEDE dosledna uporaba linearne perspektive najbolj primerna metoda za posnemanje sveta [1]. Umetnostna zgodovina nam nazorno računalniški vid, slikovni prostor, zaznava obrazov, umetnostna prikaže, da so umetniki za posnemanje sveta uporabljali zelo zgodovina različne pristope. ABSTRACT Pri naši analizi slikovnega prostora smo izhajali iz dveh pred- postavk: In the article, we report on the analysis of the image space de- (1) v raziskavi želimo analizirati veliko število umetniških slik picted on artistic paintings utilizing methods of computer vision. v smislu današnjega trenda Big Data, Our aim was to find out whether one can recover the spatial (2) uporabiti želimo take metode računalniškega vida, ki de- organization of a picture based on detection of faces. The anal- lujejo hitro in čimbolj zanesljivo. ysis was conducted on the sample of 3356 paintings. First, 3D coordinates of faces were determined. Then, a plane was fitted to Med hitre in zanesljive metode računalniškega vida zagotovo the faces on every painting. Images were therefore described in sodi zaznava in identifikacija oseb na osnovi njihovih obrazov. terms of the angle between the fitted plane and the picture plane. Zaradi varnostnih razlogov se je teh problemov na področju bio- The bigger the angle between both planes, the deeper the picture metrije lotilo že zelo veliko znanstvenikov. Danes obstajajo hitre space depicted. in zanesljive metode za zaznavo in analizo obrazov na slikah [10]. Za navdih nam je služil članek Irvinga Zupnicka iz leta 1959 KEYWORDS [14], objavljen še veliko pred uporabo računalnikov v likovni computer vision, image space, face detection, art history umetnosti, ki opisuje kako je na slikah iz različnih umetnostnih obdobjih organiziran slikovni prostor. Zato smo si zastavili vpra- 1 UVOD IN MOTIVACIJA šanje, ali je mogoče s pomočjo metod računalniškega vida rekon- struirati slikovni prostor na umetniških slikah? Bolj konkretno, Odločili smo se povezati dve raziskovalni področji, ki sta si na- ali ga je mogoče rekonstruirati na osnovi zaznave obrazov na videz zelo vsaksebi, to je umetnostna zgodovina in umetna in- slikah? Določitve 3D razsežnosti prostora, upodobljenega na sliki, teligenca. Metode računalniškega vida se že redno uporabljajo smo se lotili na osnovi pozicije obrazov na sliki (𝑥 in 𝑦 koorditudi za analizo umetniških slik [12]. Večina teh raziskav je osre- nate) in njihove velikosti, kar nam daje grobo informacijo o tretji dotočena na analizo posameznih ali manjšega števila umetniških dimenziji 𝑧 — to je oddaljenosti obraza od opazovalca. Ta pristop slik. Po drugi strani smo danes v dobi velepodatkov (angl. Big Data seveda temelji na predpostavki, da so na slikah ljudje oziroma ), saj je vedno več informacij dostopnih v digitalni obliki. da so upodobljeni njihovi dovolj veliki obrazi. Resda v zgodovini Tudi velike zbirke reprodukcij umetniških slik so danes prosto likovne umetnosti poznamo veliko tihožitij ali pokrajinskih slik, Permission to make digital or hard copies of part or all of this work for personal na katerih ni obrazov. Toda velika večina umetniških slik iz obdo-or classroom use is granted without fee provided that copies are not made or bja pred izumom fotografije dejansko upodablja ljudi oz. njihove distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this obraze. work must be honored. For all other uses, contact the owner /author(s). Iz javno dostopnih baz umetniških slik smo za našo študijo Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia izbrali testno množico 3356 slik iz različnih umetnostnozgodo- © 2021 Copyright held by the owner/author(s). vinskih obdobij in žanrov. 27 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Nadezhda Komarova, Gregor Anželj, Borut Batagelj, Narvika Bovcon, and Franc Solina 2 SLIKOVNI PROSTOR NA UMETNIŠKIH SLIKAH Slika 3: Vijoličasta ravnina, ki se prilega 3D pozicijam obrazov na Renoirjevem Plesu v Le Moulin de la Galette in in rdeča ravnina 𝑧 = 0 – ploskev slikarskega platna, na kateri smo zaznali obraze. upodabljanje prostora: velikosti oseb niso določene s prostorskim oddaljevanjem, temveč npr. z družbenim statusom. Slika 1: Auguste Renoir, Ples v Le Moulin de la Galette; vi- dijo se zaznani obrazi. Velikost obrazov jasno odraža glo- 3 ZAZNAVA OBRAZOV bino slikarskega prostora. Predpostavili smo, da so resnični obrazi pri vseh osebah pribli- žno enako veliki. Zato so bili večji obrazi obravnavani kot bližji površini slike in manjši kot bolj oddaljeni od površine slike oz. od opazovalca. Zaznani so bili z orodjem RetinaFace, ki izvede dvodimenzio- nalno poravnavo in tridimenzionalno rekonstrukcijo obraza [2]. Zasnovan je na osnovi globoke nevronske mreže. Detektor vrne podatke o obrazih v dvodimenzionalnem pro- storu površine slike, torej imajo središča obraznih okvirjev in točke oči, nosu ter ust samo 𝑥 in 𝑦 koordinate. Toda za rekonstrukcijo tridimenzionalnega prostora slike potrebujemo tudi globine obrazov oz. koordinato 𝑧 . Tridimenzionalni prostor, kot ga prikazuje umetniška slika, se razlikuje od fotografskega predvsem zato, ker slikarji redko dosledno upoštevajo linearno perspektivo. Na fotografijah je perspektiva po drugi strani bolj konsistentno dolo- čena. Zato je na njih mogoče z enačbo (1) [6] določiti oddaljenost predmeta od kamere: 𝑓 · ℎ · ℎ 𝑟 Slika 2: Poslikava v grobnici Unsu. Vsi obrazi se enake 𝑑 = (1) ℎ · ℎ 𝑖 𝑠 velikosti, ves slikarski prostor je zgoščen kar v ravnini po- Z enačbo (1) izračunamo oddaljenost 𝑑 objekta v milimetrih, če slikave. je 𝑓 goriščna razdalja fotoaparata, ℎ resnična višina objekta v 𝑟 Vsakemu obrazu na slikah smo priredili tridimenzionalne koor- milimetrih, ℎ višina slike v pikslih, ℎ višina objekta na sliki v 𝑖 dinate, ki pa niso bile zanesljive v absolutnih vrednostih, temveč pikslih in ℎ višina senzorja fotoaparata v milimetrih. Z njo so 𝑠 odražajo zgolj relativne razdalje. Nato smo tem obraznim točkam bile določene tudi oddaljenosti obrazov na slikah v vzorcu, pri priredili ravnine s smislu vsote najmanjših kvadratov razdalj med čemer so bile uporabljene vrednosti goriščne razdalje in višine točkami in iskano ravnino. Pri tistih slikah, ki prikazujejo obraze, senzorja, kvocient katerih opiše, kako vidijo človeške oči. Četudi ki so v spodnjem delu slike opazovalcu blizu, in se višje na sliki je bilo po tem postopku nemogoče določiti natančne tridimen-postopno oddaljujejo (Slika 1), so dobljene ravnine bolj nagnjene zionalne koordinate obrazov na sliki, so bile določene relativne v globino kot pri tistih, kjer so vsi obrazi približno na enaki razda-oddaljenosti med obrazi in površino slike. Za namen te raziskave lji od opazovalca (Slika 2). V takih primerih je dobljena ravnina tudi niti ni pomembno, če zaznamo vse obraze na sliki. skorajda vzporedna s površino slike. Rafaelova Atenska šola in staroegipčanska poslikava v grobnici Unsu imata zelo različni 4 GEOMETRIJSKA INTERPRETACIJA prostorski ureditvi. Na prvi sliki se obrazi zmanjšujejo z odda- PROSTORA ljevanjem ljudi. Ravnina, prirejena točkam na tej sliki, je zato Parametre nagnjena v globino (Slika 3). 𝐴, 𝐵 in 𝐶 enačbe ravnine 𝑧 = 𝐴𝑥 + 𝐵𝑦 + 𝐶 smo določili z minimizacijo funkcije Po drugi strani tudi poslikava na Sliki 2 prikazuje množico ljudi, vendar so vsi enake višine in njihovi obrazi so enako veliki. 𝑚 Õ Ravnina, prirejena obrazom na egipčanski sliki, je zato vzporedna 𝐸 (𝐴, 𝐵, 𝐶 ) = (𝐴𝑥 + 𝐵𝑦 + 𝐶 − 𝑧 )2, (2) 𝑖 𝑖 𝑖 ravnini 𝑧 = 0. Za egipčansko slikarstvo je značilno konceptualno 𝑖 =1 28 Določanje slikovnega prostora na umetniških slikah Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia kjer 𝑚 pomeni število točk in 𝑥 , 𝑦 ter 𝑧 koordinate točk. 𝑖 𝑖 𝑖 Funkcija (2) doseže minimum, ko je ∇𝐸 = (0, 0, 0) [3]. Za gradi- 𝜕𝐸 𝜕𝐸 𝜕𝐸 𝜕𝐸 𝜕𝐸 ent te funkcije velja ∇𝐸 = ( 𝜕𝐸 , , ), kjer so , in 𝜕𝐴 𝜕𝐵 𝜕𝐶 𝜕𝐴 𝜕𝐵 𝜕𝐶 naslednji. 𝑚 𝜕𝐸 Õ = 2 𝑥 (𝐴𝑥 + 𝐵𝑦 + 𝐶 − 𝑧 ) (3) 𝑖 𝑖 𝑖 𝑖 𝜕𝐴 𝑖 =1 𝑚 𝜕𝐸 Õ = 2 𝑦 (𝐴𝑥 + 𝐵𝑦 + 𝐶 − 𝑧 ) (4) 𝑖 𝑖 𝑖 𝑖 𝜕𝐵 𝑖 =1 𝑚 𝜕𝐸 Õ = 2 (𝐴𝑥 + 𝐵𝑦 + 𝐶 − 𝑧 ) (5) 𝑖 𝑖 𝑖 𝜕𝐶 𝑖 =1 Slika 4: Razporeditev razredov pri gručenju na osnovi rav- Tako množici 3D točk priredimo ravnino z minimizacijo razdalj nin. Gruče so razpršene in izrazite razmejitve med njimi med temi točkami in njihovimi slikami na ploskvi v smeri 𝑧 . ni. Koeficienti A, B in C so zato rešitve sistema linearnih enačb (6), (7) in (8). 𝑚 𝑚 𝑚 𝑚 Õ Õ Õ Õ 2 𝐴 𝑥 + 𝐵 𝑥 𝑦 + 𝐶 𝑥 = 𝑥 𝑧 (6) 𝑖 𝑖 𝑖 𝑖 𝑖 𝑖 𝑖 =1 𝑖 =1 𝑖 =1 𝑖 =1 𝑚 𝑚 𝑚 𝑚 Õ Õ Õ Õ 2 𝐴 𝑥 𝑦 + 𝐵 𝑦 + 𝐶 𝑦 = 𝑦 𝑧 (7) 𝑖 𝑖 𝑖 𝑖 𝑖 𝑖 𝑖 =1 𝑖 =1 𝑖 =1 𝑖 =1 𝑚 𝑚 𝑚 Õ Õ Õ 𝐴 𝑥 + 𝐵 𝑦 + 𝐶 = 𝑧 (8) 𝑖 𝑖 𝑖 𝑖 =1 𝑖 =1 𝑖 =1 5 REZULTATI Slika 5: Zastopanost posameznih kotov za slike v testni Slike smo izbrali iz prostodostopne zbirke WikiArt (https://www. množici. wikiart.org), kjer so umetnine med drugim razdeljene po žanrih. ustrezajo slikam, kjer se upodobljene osebe enotno oddaljujejo Izbrana so bila slikarska dela (potrebno je bilo izločiti npr. kipar-oz. približujejo. Če je bil kot med ravnino, ki je bila prirejena ska), kjer je bilo upodobljenih več ljudi. Iz zbirke WikiArt so bila obrazom na sliki, in ravnino 𝑧 = 0 izračunan kot natančno 0 zato izbrana dela iz žanrov pastorale (77 slik), allegorical painting stopinj, je to pomenilo, da na sliki ni bilo zaznanega nobenega (1225 slik), history painting (1377 slik) in literary painting (667 obraza, samo en obraz ali pa so imeli vsi obrazi enake globine. Na slik), in sicer skupaj 3356 slik. Poleg žanra smo imeli tudi podatke intervalu od 0 do 5 stopinj (Slika 5) je bil najpogostejši barok, na o umetnostno zgodovinskem obdobju v katero sodi posamezna preostalih intervalih po romantika. Ni pa na nobenem intervalu slika. Zanimalo nas je, kako lahko le na osnovi teh podatkov močno prevladoval le en slog, saj je odstotek slik, ki je pripadal smiselno razdelimo testno množico slik z metodo gručenja in ali najpogostejšemu slogu v posameznem intervalu med 20 in 30%. je ta delitev relevantna z vidika umetnostno zgodovine. Za določitev korelacije med časom nastanka posamezne slike Kot kriterij pri gručenju so bile uporabljene enačbe ravnin ter in kotom med ravninama za to sliko je bil uporabljen Spearma- kot med prirejeno ravnino in slikovno ravnino 𝑧 = 0. Detektor RetinaFace nov koeficient korelacije. Ta predstavlja neparametersko stopnjo opiše slednje s tremi parametri – rotacijami okoli povezanosti med spremenljivkama oz. kako dobro je mogoče osi 𝑥 , 𝑦 in 𝑧 (v pozitivni in negativni smeri). Pri posamezni sliki opisati njun odnos z monotono funkcijo [13]. Koeficient je bil so bile izbrane rotacije v vsaki smeri z največjimi absolutnimi 0.183, kar predstavlja šibko pozitivno korelacijo. p vrednost je vrednostmi. bila v tem primeru blizu 0, kar pomeni, da korelacija med letom Gručenje je bilo opravljeno z algoritmom BIRCH, implemen- nastanka slike in kotom, ki odraža slikarsko globino ni linearna. tiranim s knjižnico scikit-learn. BIRCH (angl. Balanced Iterative Reducing and Clustering using Hierarchies Na prikazu na Sliki 6 je razvidno, da če opazujemo obdobje od ) je algoritem gručenja, približno leta 1700 in vse do danes, povprečen kot med ravninama ki je posebej prilagojen delu z večjimi podatkovnimi vzorci [7]. za posamezna desetletja blago narašča. Na Sliki 4 so ekstremne vrednosti izločene. Prikazana je raz- poreditev slik po gručenju na osnovi ravnin. Bila je izvedena 6 RAZPRAVA primerjava tega, katerim umetnostnim slogom pripadajo slike v posameznih razredih. To je bilo mogoče, saj je bila vsaka slika Glavna hipoteza naše raziskave je bila, ali lahko na nek enostaven v zbirki označena poleg žanra tudi z letom nastanka in ume-način ugotovimo kakšen je slikarski prostor, to je, kako izrazita tnostnim slogom (barok, romantika ipd.). Število razredov smo je globinska dimenzija na dani umetniški sliki. Slikarski prostor omejili na deset. Zaradi izrazite drugačnosti prostorske razpore- pa je povezan tako z umetnostno zgodovinskim obdobjem v ka- ditve na nekaterih slikah so bile slednje izločene v posamezne terega sodi slika, kot tudi z žanrom slike. Na ta način se nam razrede (2, 5, 6, 7 in 8). Ti razredi vsebujejo le po eno sliko in niso odpira možnost avtomatske klasifikacije velikega števila slik, bo-vidni na Sliki 4. disi s statističnimi metodami, še bolj pa bi prišle v poštev metode Histogram na Sliki 5 prikazuje zastopanost različnih intervalov strojnega učenja. kotov v proučevanem vzorcu. Vidi se, da je bil največji delež Odločili smo se, da bomo slikovni prostor določali posredno s slik takih, kjer je bil kot med ravninama med 15 in 20 stopinj, pomočjo zaznave obrazov. Ko je bil posamezen obraz zaznan z kar se zdi relativno malo. Večji koti med ravninama večinoma orodjem RetinaFace, je bil s tem določen obrazni okvir na določeni 29 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Nadezhda Komarova, Gregor Anželj, Borut Batagelj, Narvika Bovcon, and Franc Solina na vprašanja, ki si jih umetnostni zgodovinarji do sedaj sploh še niso upali zastaviti. LITERATURA [1] Katarina Bebar. “Upodabljanje prostora po načelih line- arne perspektive s pomočjo obogatene resničnosti”. V: Likovne besede 114 (2020), str. 14–21. [2] Jiankang Deng in sod. “RetinaFace: Single-Shot Multi- Level Face Localisation in the Wild”. V: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020, str. 5202–5211. doi: 10.1109/CVPR42600.202 0.00525. Slika 6: Koti med ravninama v odvisnosti od časa nastanka [3] David Eberly. “Least Squares Fitting of Data”. V: Magic slike. Rdeče točke predstavljajo povprečen kot za posame- Software, Inc. (sep. 2001). url: http://www.sci.utah.edu/~b zno desetletje. alling/FEtools/doc_f iles/LeastSquaresFitting.pdf . [4] Image resources: Free image resources. Sotheby’s Institute koordinati 𝑥 in 𝑦 na ravnini slike. Velikost obraznega okvirja pa of Art. url: https://sia.libguides.com/images/f reeimagere nam je dal še informacijo o relativni oddaljenosti obraza 𝑧 od sources (pridobljeno 1. 3. 2021). ravnine slike. Zanesljivost zaznave obrazov na umetniških slikah [5] Jure Kovač, Peter Peer in Franc Solina. “Automatic natural je bil verjetno nekoliko slabši, saj je bil RetinaFace naučen na fo-and man-made scene differentiation using perspective tografijah obrazov in ne na umetniških upodobitvah [2]. V kakšni geometrical properties of the scenes”. V: Proceedings 15th prihodnji raziskavi bi lahko uporabili še dodatne informacije, ki International Conference on Systems, Signals and Image jih daje orodje RetinaFace za zaznavo obrazov: orientacija obraza, Processing. Bratislava, 2008, 507–510. lega oči, nosu in ust, spol ter starost osebe, določeno na osnovi [6] Yun Liang. How to measure the real size of an object from obraza. Poleg tega bi lahko v prihodnjih raziskavah pri analizi slik an unknown picture? Jan. 2015. url: https://www.researc upoštevali tudi barvno sestavo in druge slikovne značilke, ki jih hgate.net/post/How- to- measure- the- real- size- of - an- obj lahko robustno določimo z metodami računalniškega vida [12]. ect- f rom- an- unknown- picture. Sami smo se ukvarjali npr. z detekcijo črt perspektivne projekcije [7] Cory Maklin. “BIRCH Clustering Algorithm Example In na fotografijah [5, 1]. Python”. V: towards data science (jul. 2019). url: https://to Četudi smo v našem preizkusu metode likovna dela združevali wardsdatascience.com/machine- learning- birch- clusterin v razrede po podobnosti prostorske ureditve, se niso pokazale g- algorithm- clearly- explained- f b9838cbeed9. stroge meje med umetnostnimi slogi slik. Informativna pa je bila [8] Lev Manovich. “Data Science and Digital Art History”. V: korelacija med časom nastanka dela in kotom med ravninama. International Journal for Digital Art History 1 (jun. 2015). V izbranem vzorcu slik različni umetnostnozgodovinski slogi doi: 10.11588/dah.2015.1.21631. niso bili povsem enakomerno zastopani in je bilo npr. veliko del [9] Lev Manovich. Museum without walls, art history without iz romantike. Za vsako zgodovinsko obdobje so najverjetneje names: visualization methods for Humanities and Media izrazite določene medsebojne povezanosti teh značilnosti. Usta- Studies. Oxford Handbook Online, 2013. doi: 10.1093/oxf ljen umetnostnozgodovinski pristop pri analizi slik je sočasno ordhb/9780199757640.013.005. opazovanje dveh ali več del, pri katerih raziskovalec na osnovi [10] Mohd Nayeem. “Exploring Other Face Detection Approa- svojega predhodnega znanja izloči značilne poteze, razlike ipd. ches(Part 1) — RetinaFace”. V: Analytics Vidhya (jul. 2020). [8]. Strojno učenje bi na tej točki postalo učinkovito, saj po eni url: https://medium.com/analytics- vidhya/exploring- oth strani nudi možnost analize velike količine podatkov, odkrivanje er- f ace- detection- approaches- part- 1- retinaf ace- 9b00f 4 sočasnih povezav med različnimi značilkami, po drugi strani pa 53f d15. zagotavlja objektivnost matematičnih pristopov. Zato bi bilo v [11] Robert Pepperell in Manuela Braunagel. “Do Artists Use nadaljevanju koristno uporabiti poleg obrazov tudi druge infor- Linear Perspective to Depict Visual Space?” V: Perception macije na slikah. Potrebno pa je upoštevati, da delitev umetniških 43 (avg. 2014), 395 – 416. doi: 10.1068/p7692. del ne more biti absolutna, saj umetnostno zgodovino sestavljajo [12] David G. Stork. “Computer Vision and Computer Graphics posamezni umetniki, vsak od njih ustvarja v svojem lastnem Analysis of Paintings and Drawings: An Introduction to slogu, ki lahko do neke mere sledi splošnim trendom obdobja, the Literature”. V: Computer Analysis of Images and Pat- vendar nikoli popolnoma. Tudi posamezni likovni umetniki v terns. Ur. Xiaoyi Jiang in Nicolai Petkov. Berlin, Heidelberg: času svoje kariere lahko spremenijo svoj umetniški slog. Springer Berlin Heidelberg, 2009, str. 9–24. doi: 10.1007/9 78- 3- 642- 03767- 2_2. 7 ZAKLJUČEK [13] Eric W. Weisstein. “Spearman Rank Correlation Coeffici- V članku smo pokazali nov pristop k avtomatski analizi umetni- ent”. V: MathWorld, a Wolfram Web Resource (brez datuma). ških slik z uporabo metod računalniškega vida. Demonstrirali url: https://mathworld.wolf ram.com/SpearmanRankCorr smo, da je z metodo zaznave obrazov na slikah možno nasloviti elationCoef f icient.html. tudi bolj kompleksna vprašanja, kot v našem primeru organiza- [14] Irving L. Zupnick. “Concept of Space and Spatial Organiza- cija prostora na slikah. Čeprav rezultati te raziskave morda niso tion in Art”. V: The Journal of Aesthetics and Art Criticism tako jasno izraženi in niso reproducirali rezultatov umetnostnih (dec. 1959), str. 215–221. doi: 10.2307/427268. zgodovinarjev, se uporaba računalnikov na področju umetnostne zgodovine kot na sploh v humanistiki šele zares začenja. Raču- nalniško zasnovane analitične metode bodo omogočile odgovore 30 Automated Hate Speech Target Identification ∗ ∗ ∗ Andraž Pelicon Blaž Škrlj Petra Kralj Novak andraz.pelicon@ijs.si blaz.skrlj@ijs.si petra.kralj.novak@ijs.si Jožef Stefan Institute Jožef Stefan Institute Jožef Stefan Institute Jamova cesta 39 Jamova cesta 39 Jamova cesta 39 Ljubljana, Slovenia Ljubljana, Slovenia Ljubljana, Slovenia ABSTRACT POS tag, keyword-based, knowledge graph-based and relational features) and two types of document embeddings (non-sparse rep- We present a new human-labelled Slovenian Twitter dataset an- resentations). To our knowledge, this is one of the first attempts notated for hate speech targets and attempts to automated hate to solve a Slovene-based text classification task with an autoML speech target classification via different machine learning ap- approach. Finally, we trained a model based on the SloBERTa proaches. This work represents, to our knowledge, one of the pre-trained language model [11], a state-of-the-art transformer-first attempts to solve a Slovene-based text classification task with based language model pre-trained on a Slovenian corpus and a an autoML approach. Our results show that the classification task set of baselines. is a difficult one, both in terms of annotator agreement and in Our results show that the context-aware SloBERTa model terms of classifier performance. The best performing classifier is significantly outperforms all the other models. This result, to-SloBERTa-based, followed by AutoBOT-neurosymbolic-full. gether with the lower inter-annotator scores, confirms our initial KEYWORDS assumption that hate speech target identification is a complex semantic task that requires a complex understanding of the text hate speech targets, autoML, text features spaces that goes beyond simple pattern matching. The SloBERTa model reaches annotator agreement in terms of classification accuracy, 1 INTRODUCTION indicating a fair performance of the model. Hate speech and offensive content has become pervasive in social media and has become a serious concern for government orga- 2 DATA nizations, online communities, and social media platforms [13]. We collected almost three years worth of all Slovenian Twitter Due to the amount of user-generated content steadily increasing, data in the period from December 1, 2017, to October 1, 2020, in the research community has been focusing on developing com- total 11,135,654 tweets. The period includes several government putational methods to moderate hate speech on online platforms changes, elections and the first Covid-19-related lockdown. We [6, 1, 8]. While several of the proposed methods achieve good used the TweetCat tool [5], which is developed for harvesting performance on distinguishing hateful and respectful content, Twitter data of less frequent languages. several important challenges remain, some of them related to the data itself. Several studies report both low amounts of hate 2.1 Annotation Schema speech instances in the labelled datasets, as well as relatively low agreement scores between annotators [9]. The low agreement Our annotation schema is adapted from OLID [13] and FRENK [4]. score between annotators indicates that recognizing hate speech It is a two-step annotation procedure. After reading a tweet, is a hard task even for humans suggesting that this task requires without any context, the annotator first selects the type of speech. a more broad semantic interpretation of the text and its context We differentiate between the following speech types: beyond simple pattern matching of linguistic features. 0 acceptable - non hate speech type: speech that does not To test this assumption, we have gathered a new Slovenian contain uncivil language; 1 dataset containing tweets annotated for hate speech targets . 1 inappropriate - hate speech type: contains terms that are This dataset builds on the dataset used for detecting hate speech obscene, vulgar but the text is not directed at any person communities [3] and topics [2] on Slovenian Twitter. The dataset specifically; is available in the clarin.si dataset repository with the handle: 2 offensive - hate speech type: including offensive gener- https://www.clarin.si/repository/xmlui/handle/11356/1398. alization, contempt, dehumanization, indirect offensive Next, we addressed the hate speech target classification task remarks; by the autoML approach autoBOT [10]. The key idea of autoBOT 3 violent - hate speech type: author threatens, indulges, is that, instead of evolving at the learner level, evolution is con-desires or calls for physical violence against a target; it ducted at the representation level. The proposed approach con- also includes calling for, denying or glorifying war crimes sists of an evolutionary algorithm that jointly optimizes various and crimes against humanity. sparse representations of a given text (including word, subword, If the annotator chooses either the offensive or violent hate ∗ All authors contributed equally to this research. speech type, they also include one of the twelve possible targets 1 Slovenian Twitter dataset 2018-2020 1.0: http://hdl.handle.net/11356/1423 of hate speech: Permission to make digital or hard copies of part or all of this work for personal • Racism (intolerance based on nationality, ethnicity, lan- or classroom use is granted without fee provided that copies are not made or guage, towards foreigners; and based on race, skin color) distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this • Migrants (intolerance of refugees or migrants, offensive work must be honored. For all other uses, contact the owner /author(s). generalization, call for their exclusion, restriction of rights, Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia non-acceptance, denial of assistance . . . ) © 2021 Copyright held by the owner/author(s). • Islamophobia (intolerance towards Muslims) 31 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Pelicon et al. • Antisemitism (intolerance of Jews; also includes conspir- acy theories, Holocaust denial or glorification, offensive stereotypes . . . ) • Religion (other than above) • Homophobia (intolerance based on sexual orientation and/or identity, calls for restrictions on the rights of LGBTQ per- sons • Sexism (offensive gender-based generalization, misogynis- tic insults, unjustified gender discrimination) • Ideology (intolerance based on political affiliation, politi- cal belief, ideology. . . e.g. “communists”, “leftists”, “home defenders”, “socialists”, “activists for. . . ”) • Media (journalists and media, also includes allegations of unprofessional reporting, false news, bias) • Politics (intolerance towards individual politicians, author- Figure 1: Number of annotated examples for hate speech ities, system, political parties) type and target. The class distribution is severely unbal- • Individual (intolerance towards any other individual due anced. to individual characteristics; like commentator, neighbor, acquaintance ) either individuals or groups), 4% as inappropriate (mostly con- • Other (intolerance towards members of other groups due taining swear words), and the remaining 61% as acceptable. In the to belonging to this group; write in the blank column on evaluation set, which is a random selection of 10.000 Slovenian the right which group it is) tweets, only 69 tweets were labelled as violent by at least one 2.2 Sampling for Training and Evaluation annotator, which is about 0.3%. The training dataset for hate speech type includes 34,204 ex- The training set is sampled from data collected before February amples and the evaluation dataset includes 6,430 examples. Many 2020. The sampling was intentionally biased to contain as much of the examples are repeated (by two annotations for the same hate speech as possible in order to obtain enough organic exam- tweet), yet conflicting (due to annotator disagreement). The trainples to train the model successfully. A simple model was used to ing and evaluation sets for hate speech type and target are sum- flag potential hate speech content, and additionally, filtering by marized in Table 1. users and by tweet length (number of characters) was applied. The overall annotator agreement for hate speech target on the 2 50,000 tweets were selected for annotation. training set is 63.1%, and Nominal Krippendorf Alpha is 0.537. The evaluation set is sampled from data collected between The annotator agreement for hate speech target on the evalua- February 2020 and August 2020. Contrary to the training set, the tion set is 62.8%, and Nominal Krippendorf Alpha is 0.503. These evaluation set is an unbiased random sample. Since the evaluation scores indicate that the dataset is of high quality compared to set is from a later period compared to the training set, the possi-other datasets annotated for hate speech, yet the relatively low bility of data linkage is minimized. Furthermore, the estimates agreement indicates that the annotation task is difficult and am- of model performance made on the evaluation set are realistic, biguous even for humans. or even pessimistic, since the model is tested on a real-world distribution of data where hate speech is less prevalent than in 3 EXPERIMENTS the biased training set. The evaluation set is also characterized We compare different machine learning algorithms on the hate by a new topic, COVID-19; this ensures that our model is robust speech target identification task. They belong to one of the fol- to small contextual shifts that may be present in the test data. For lowing three categories: classical, representation optimization the evaluation set, 10,000 tweets were selected to be annotated. and deep learning. The results are presented in Table 1. 2.3 Annotation Procedure 3.1 autoBOT - an autoML for texts Each tweet was annotated twice: In 90% of the cases by two dif- With the increasing amounts of available computing power, au- ferent annotators (to estimate inter-annotator agreement) and tomation of machine learning has become an active research in 10% of the cases by the same annotator (to assess the self- endeavor. Commonly, this branch of research focuses on auto- agreement). Special attention was devoted to an evening out matic model selection and configuration. However, it has recently the overlap between annotators to get agreement estimates on also been focused on the task of obtaining a suitable representa- equally sized sets. Ten annotators were engaged for our annota- tion when less-structured inputs are considered (e.g. texts). This tion campaign. They were given annotation guidelines, a training work represents, to our knowledge, one of the first attempts to session and a test on a small set to evaluate their understanding solve a Slovene-based text classification task with an existing of the task and their commitment before starting the annota- autoML approach. The in-house developed method, called au- tion procedure. The annotation process lasted four months, and toBOT [10], has already shown promising results on multiple it required about 1,200 person-hours for the ten annotators to shared tasks (and in extensive empirical evaluation). Albeit it complete the task. commonly scores on average worse than large, multi million- In the training set, intentionally biased in favour of hate speech, parameter neural networks, it remains interpretable and does about 1% of tweets were labelled as violent, 34% as offensive (to not need any specialized hardware. Thus, this system serves as an easy-to-obtain baseline which commonly performs better 2 Some annotators skipped some examples. than ad hoc approaches such as, e.g. word-based features coupled 32 Hate speech targets Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia with, e.g. a Support Vector Machine (SVM). The tool has mul- The SloBERTa-based predictor performed the best, however, is tiple configurations which determine the feature space that is also the one which includes the highest number of tunable pa- being evolved during the search for an optimal configuration of rameters (more than 100m). The next series of learners are based both the representation of a given document, but also the most on autoBOT’s evolution and perform reasonably well. Interest- suitable learner. We left all settings to default, varying only the ingly, autoBOT variants which exploit only symbolic features representation type, which was either symbolic, neuro-symbolic- perform better than the second neural network-based baseline lite, neuro-symbolic-full or neural. Detailed descriptions of these which was not pre-trained specifically for Slovene – the mpnet. 3 feature spaces are available online . The main difference between The remaining baselines perform worse, albeit having a similar these variants is that the neuro-symbolic ones simultaneously number of final parameters to the final autoBOT-based models consider both symbolic and sub-symbolic feature spaces (e.g. (tens of thousands at most). The autoBOT-neural, which imple- tokens and embeddings of the documents), whilst symbolic or ments the two main doc2vec variants, performs better than the neural-only consider only one type. The neural variant is based naïve doc2vec implementation, however not notably better. on the two non-contextual doc2vec variants and commonly does To better understand the key properties of the data set which not perform particularly well on its own. carry information relevant for the addressed predictive task, we 3.2 Deep Learning additionally explored autoBOT-symbolic’s ‘report’ functionality, which offers insight into the importance of individual feature We trained a modelbased on the SloBERTa pre-trained language subspaces. Each subspace and each feature in the subspace has a model [11]. SloBERTa is a transformer-based language model weight associated with it: the larger the weights, the more rel-that shares the same architecture and training regime as the evant a given feature type was for the learner. Visualization of Camembert model [7] and is pre-trained on Slovenian corpora. these importances is shown in Table 2. It can be observed that For fine-tuning of the SloBERTa language model, we first split character-based features were the most relevant for this task. the original training set into training and validation folds in the This result is in alignment with many previous results on tweet 90%:10% ratio. We used the suggested hyperparameters for this classification, where e.g. punctuation-level features can be sur- model. We used the Adam optimizer with the learning rate of prisingly effective. Furthermore, relational token features were 2𝑒 − 5 and learning rate warmup over the first 10% of the training also relevant. This feature type can be understood as skip-grams instances. We used a weight decay set to 0.01 for regularization. with dynamic distances between the two tokens. This feature The model was trained for maximum 3 epochs with a batch size of type indicates that short phrases might have been of relevance. 32. The best model was selected based on the validation set score. Interestingly, keyword-based features were not relevant for the We performed the training of the models using the HuggingFace learner. Further, autoBOT, being effectively a fine-tuned linear Transformers library [12]. learner, also offers direct insight into fine-grained performances. We tokenized the textual input for the neural models with the Examples for the top five features per type are shown in Table 2. language model’s tokenizer. For performing matrix operations efficiently, all inputs were adjusted to the same length. After tok-5 CONCLUSION enizing all inputs, their maximum length was set to 256 tokens. In this work we present a new dataset of Slovenian tweets anno- Longer sequences were truncated, while shorter sequences were tated for hate speech targets. To develop effective computational zero-padded. The fine-tuned model is available at the Hugging- models to solve this task we use two approaches: the autoML 4 Face repository . approach combining symbolic and neural representations and a 3.3 Other Baseline Approaches contextually-aware language model SloBERTa. The two mentioned approaches have demonstrated state-of-the- The results show that the context-aware SloBERTa model art performance; however, to establish their performance on this significantly outperforms all the other trained models. This result, new task, we also implemented the following baselines. First, together with the lower inter-annotator scores, confirm our initial a simple majority classifier to establish the worst-case perfor-assumption that hate speech target identification is a complex mance. Next, a doc2vec-based representation learner was coupled semantic task that requires a more complex understanding of with a linear SVM (doc2vec). The svm-word is a sparse TF-IDF the text that goes beyond simple pattern matching. However, representation of the documents coupled with a linear SVM. Sim- the seemingly simpler models may still offer distinct advantages ilarly, the svm-char, however, the representations are based on over the more complex neural models. First, the auto-ML models characters in this variant. The two alternatives use logistic re- tested in this work are easily interpretable, offering insights into gression (lr-word, lr-char ). As another strong baseline, we used a textual features which contribute to the classification. On the multilingual language model called MPNet to obtain contextual other hand, the neural language models generally work as black- representations, coupled with an SVM classifier. The baseline boxes, and the extent of their interpretability is still an open doc2vec model was trained for 32 epochs with eight threads. The research question. Second, the auto-ML models are significantly min_count parameter was set to 2, window size to 5 and vector more straightforward to deploy as they tend to be much less size to 512. For SVM and logistic regression (LR)-based learners, computationally demanding both in terms of RAM and CP U a grid search including the following regularization values was usage. Neural language models are able to solve harder tasks traversed: {0.1, 0.5, 1, 5, 10, 20, 50, 100, 500}. but their increased number of parameters usually makes them a considerable challenge to deploy in a scalable fashion. 4 RESULTS The classification results for the discussed learning algorithms ACKNOWLEDGEMENTS are given in Table 1. The results are sorted by learner complexity. We would like to thank the Slovenian Research Agency for the 3 autoBOT feature spaces: https://skblaz.github.io/autobot/features.html 4 financing of the second researcher (young researcher grant) and Hate speech target classification model: https://huggingface.co/IMSyPP/hate_ speech_targets_slo the financial support from research core funding no. P2-103. The 33 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Pelicon et al. Table 1: Overview of the classification results. The SloBERTa model significantly outperforms all the other models and reaches inter-annotator agreement. Classification model Accuracy Macro Rec Macro Prec Macro F1 majority 40.79% 8.33% 3.40% 4.83% doc2vec 43.25% 20.65% 20.67% 19.76% AutoBOT-neural (9h) 45.79% 15.37% 20.00% 16.10% svm-word 50.39% 21.40% 25.75% 22.02% lr-word 50.39% 21.40% 25.75% 22.02% lr-char 51.21% 25.14% 28.17% 26.10% svm-char 51.90% 23.47% 27.59% 24.20% AutoBOT-neurosymbolic-lite (4h) 54.26% 27.34% 35.06% 28.90% Paraphrase-multilingual-mpnet-base-v2 + Linear SVM 55.40% 40.24% 44.29% 41.20% AutoBOT-symbolic (9h) 55.99% 29.68% 37.86% 31.32% AutoBOT-neurosymbolic-full (4h) 56.28% 32.29% 37.83% 33.07% SloBERTa 63.81% 53.03% 45.63% 48.28% Table 2: Most relevant features per feature subspace. Feature subspaces are ordered relative to their importance. Individual numeric values next to each feature represent that feature’s importance for the final learner. The features are sorted per-type. Note the word_features and their alignment with what a human would associate with hate speech. char_features ta s : 3.56 ni d : 2.73 lič : 2.69 ola : 2.58 ne m : 2.5 relational_features_token pa–3–je : 2.23 pa–2–se : 2.12 v–2–pa : 1.78 ne–1–pa : 1.75 v–2–se : 1.71 pos_features nnp nn nnp : 1.77 nnp jj nn : 1.75 nnp jj : 1.57 cc : 1.46 nn nn rb : 1.45 word_features idioti : 1.09 riti : 0.95 tole : 0.95 sem : 0.94 fdv : 0.93 relational_features_char e–3–d : 1.74 i–3–s : 1.56 n–3–z : 1.48 h–5–v : 1.43 z–4–t : 1.4 topic_features topic_12 : 0.14 topic_2 : 0.02 topic_0 : 0.0 topic_1 : 0.0 topic_3 : 0.0 keyword_features 007amnesia : 0.0 15sto : 0.0 24kitchen : 0.0 2pira : 0.0 2sto7 : 0.0 work was also supported by European Union’s Horizon 2020 re- [7] L. Martin, B. Muller, P. J. Ortiz Suárez, Y. Dupont, L. Ro- search and innovation programme project EMBEDDIA (grant no. mary, É. de la Clergerie, D. Seddah, and B. Sagot. 2020. 825153) and the European Union’s Rights, Equality and Citizen- CamemBERT: a tasty French language model. In Proceed- 5 ship Programme (2014-2020) project IMSyPP (grant no. 875263). ings of the 58th Annual Meeting of the Association for Com- putational Linguistics. Association for Computational Lin- REFERENCES guistics, Online, (July 2020), 7203–7219. [1] P. Badjatiya, S. Gupta, M. Gupta, and V. Varma. 2017. Deep [8] A. Pelicon, R. Shekhar, B. Škrlj, M. Purver, and S. Pol- learning for hate speech detection in tweets. In Proceedings lak. 2021. Investigating cross-lingual training for offensive of the 26th international conference on World Wide Web language detection. PeerJ Computer Science, 7, e559. companion, 759–760. [9] B. Ross, M. Rist, G. Carbonell, B. Cabrera, N. Kurowsky, [2] B. Evkoski, I. Mozetic, N. Ljubesic, and P. Kralj Novak. and M. Wojatzki. 2017. Measuring the reliability of hate 2021. Community evolution in retweet networks. arXiv speech annotations: the case of the european refugee crisis. preprint arXiv:2105.06214. arXiv preprint arXiv:1701.08118. [3] B. Evkoski, A. Pelicon, I. Mozetic, N. Ljubesic, and P. Kralj [10] B. Škrlj, M. Martinc, N. Lavrač, and S. Pollak. 2021. Autobot: Novak. 2021. Retweet communities reveal the main sources evolving neuro-symbolic representations for explainable of hate speech. (2021). arXiv: 2105.14898 [cs.SI]. low resource text classification. Machine Learning, 110, 5, [4] N. Ljubešić, D. Fišer, and T. Erjavec. 2019. The frenk datasets 989–1028. issn: 1573-0565. doi: 10.1007/s10994- 021- 05968- of socially unacceptable discourse in slovene and english. x. (2019). arXiv: 1906.02045 [cs.CL]. [11] M. Ulčar and M. Robnik-Šikonja. 2021. Slovenian RoBERTa [5] N. Ljubešić, D. Fišer, and T. Erjavec. 2014. TweetCaT: a contextual embeddings model: SloBERTa 2.0. Slovenian tool for building Twitter corpora of smaller languages. language resource repository CLARIN.SI. (2021). http:// In Proceedings of the Ninth International Conference on hdl.handle.net/11356/1397. Language Resources and Evaluation. European Language [12] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Resources Association (ELRA), Reykjavik, Iceland, (May Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, 2014). S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, [6] S. Malmasi and M. Zampieri. 2018. Challenges in discrimi- T. Le Scao, S. Gugger, M. Drame, Q. Lhoest, and A. M. nating profanity from hate speech. Journal of Experimental Rush. 2019. HuggingFace’s Transformers: State-of-the-art & Theoretical Artificial Intelligence, 30, 2, 187–202. Natural Language Processing. ArXiv, abs/1910.03771. [13] M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, N. Farra, 5 The content of this publication represents the views of the authors only and is their and R. Kumar. 2019. Predicting the Type and Target of sole responsibility. The European Commission does not accept any responsibility for use that may be made of the information it contains. Offensive Posts in Social Media. In Proceedings of NAACL. 34 SiDeGame: An Online Benchmark Environment for Multi-Agent Reinforcement Learning Jernej Puc Aleksander Sadikov jernej.puc@fs.uni- lj.si aleksander.sadikov@fri.uni- lj.si University of Ljubljana University of Ljubljana Faculty of Mechanical Engineering Faculty of Computer and Information Science Ljubljana, Slovenia Ljubljana, Slovenia ABSTRACT “capture the flag” in first-person view, while using similar input and output schemes to those of human players. However, the Modern video games present a challenging benchmark for ar- project is based on an inaccessible implementation of a funda- tificial intelligence research. Various technical limitations can mentally shallow game mode, which makes it untenable as a often lead to playing interfaces that are heavily biased in terms benchmark for reinforcement learning. Nonetheless, it shows a of ease of learning for either humans or computers, and it is dif- type of game that can suit the given requirements. ficult to strike the right balance. In this paper, a new benchmark The first-person shooter (FPS) genre has many interesting environment is presented, which emphasises the role of strategic representatives, some of which have already been repurposed elements by enabling more equivalent interfaces, is suitable for as reinforcement learning environments [4, 5]. Unsuitably, they reinforcement learning experiments on widely distributed sys-tend to revolve around simpler content, such as single-player tems, and supports imitation learning, as is demonstrated. The or deathmatch scenarios, and are not straight-forward for re- environment is realised as a team-based competitive game and searchers to customise. Indeed, accessibility and modifications its source code is openly available at a public repository. generally require developer support and cooperation [6]. KEYWORDS Confronted with this barrier, recent work on Counter-Strike: Global Offensive (CSGO) [6] resigned itself to the limits of imita-simulation environment, multi-agent system, deep neural net- tion learning, which could be facilitated by external recording of works, imitation learning, reinforcement learning public matches. Although CSGO’s standard competitive mode is 1 INTRODUCTION fittingly strategic, it, instead, focused on the mentioned death- match, and withheld information from agents by ignoring sound Reinforcement learning is a powerful concept that can be used and having them use cropped and downscaled image inputs with to take on highly complex challenges. In its advancement, video common information omitted or rendered unrecognisable. games have emerged as suitable benchmarks: they define clear This paper also considers imitation learning, in attempt of goals, allow agents to be compared between themselves and with establishing a baseline and starting point for eventual reinforce- humans, and, in comparison to preceding milestones [7], they ment learning, akin to the approach of AlphaGo [7] and AlphaS-begin to incorporate complexities of the real world. tar [8]. The deep neural network architecture that was used Success has been achieved even in notably difficult tasks, such in these experiments accepts audio inputs similarly to instances as the modern games of StarCraft II [8] and Dota 2 [1]. However, from the literature [3], which convert sounds into their frequency being modern games, the authors were forced to compromise: domain representations using the discrete Fourier transform. the intricate and graphically intensive input spaces had to be simplified and transformed, while combinatorically overwhelming 3 THE SDG ENVIRONMENT action spaces were functionally changed until superhuman per- SiDeGame relies on the game rules of CSGO to provide a founda- formances could, as well, be attributed to advantages of different tion of notable depth. Crucially, the observation space is simpli-playing conditions. fied by viewing the environment from a top-down perspective Search for examples that could compare in strategic depth and in low resolution to allow modern deep neural networks to pro- cultivate a competitive player base, while enabling consistent cess it directly. Consequently, not all aspects of the game could interfaces and being open to researchers leaves few options but be reasonably adapted and the action space could not be fully to create one anew. This has led us to create SiDeGame, the preserved, yet the playing experience remains egocentric and is “simplified defusal game” (abbrev. SDG), which incorporates key largely consistent with true first-person control schemes. rules of an established video game title in a computationally and perceptively simpler simulation environment, accessible at: 3.1 Description https://github.com/JernejPuc/sidegame-py By the rules carried over from CSGO, two teams of 5 players each 2 RELATED WORK asymmetrically compete in attack and defence: the goal of one team is to detonate a bomb at one of two preset locations, while Importance of an even playing field has been emphasised by the goal of the other is to prevent them from doing so. After a authors of the For The Win (FTW) agents [4], playing a form of certain number of rounds, the teams switch sides, and the first Permission to make digital or hard copies of part or all of this work for personal to pass a threshold of rounds won is declared the winner. or classroom use is granted without fee provided that copies are not made or In the course of a round, players must navigate a map, an distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this artificial environment with carefully placed tactical elements of work must be honored. For all other uses, contact the owner /author(s). various degrees of passage and cover. Besides weaponry, a player Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia can utilise auxiliary equipment, the availability of both of which © 2021 Copyright held by the owner/author(s). depends on prior survival and economic rewards. 35 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Jernej Puc and Aleksander Sadikov Figure 1: Screenshots of various views encountered in SiDeGame. Additionally interesting for AI research are aspects of the be eventually expressed, causing unintended consequences and game that encourage or demand active coordination, such as leading to practically unplayable conditions. Training regimes shared economy, unassigned roles, and imperfect information on should, for example, reduce the regularity of sampling, bound teammates’ status and surroundings. sampling within acceptable thresholds, or use more sophisticated contextual rules to confirm the agent’s intent. 3.2 Observations 3.4 Execution The majority of information is provided through the image dis- play, several screenshots of which can be seen in Figure 1. Images Multi-agent interaction is built upon separate server and client are generated at a low base resolution of 256×144 pixels, con- processes regularly exchanging state and event information via straining the visual elements to be small and carefully placed, packet communication using the UDP protocol. Simulations are while remaining easily distinguishable. The human interface sim- intended to run in real-time, but can have their tick rate and time ply upscales the display with nearest neighbour interpolation, scale adjusted on both authoritative and local ends. ensuring equivalence of available information. With the exception of pixel-wise iteration for tracing lines of The main view is based on projection of the radar image of a sight and disregarding the dependencies of imported extensions, classic CSGO map, Cache, which has only minor vertically over- the environment is fully implemented in the Python program- lapping components and thus proved easiest to adapt. Alternative ming language. Despite clear inefficiencies, this development views include the inventory wheel, map plan, and communica- choice streamlines integration with machine learning solutions, tion wheels. The latter are used to construct short messages of which predominantly relate to the Python ecosystem, and eases grounded signs that are appended to the chat log in the sidebar code readability and customisation. Server and client processes and allow explicit coordination within the team. are spawned as single Python processes that are restricted to the Since projection is egocentric, the prominent role of sound CP U, enabling mass parallelisation and preserving GP U resources is retained: other agents out of line of sight may still give off for learning processes. some information regarding their relative position, equipment, For AI agents, development targeted 30 updates per second, and preparedness. To support the advantages of awareness of which had been deemed acceptable to human opponents, al- sound, spatial audio is implemented by convolving sound signals though higher tick rates can be achieved at both the original with HRIR filters [2], while amplitude and frequency attenuation (144p) and reasonably upscaled (e. g. 720p) resolutions. This could characteristics were empirically formulated. SiDeGame supports also be used to speed up the simulation, subject to the computa- conversion of sounds into spectral vectors, which were used in tional stability and potential overhead of a specific configuration. the experiments of this work directly, but can also be accumulated and later processed in the form of a spectrogram. 3.5 Online Play If there is a delay between action inference and its effect in the In the context of agent evaluation and comparison, capability of environment, an input analogous to proprioception can also be online play, where actors, both human and artificial, can com- considered. It can be trivially simulated by tracking the effective pete remotely and without having to share their program, is an mouse and keyboard states, i. e. which keys are pressed and how essential component, as outcomes of adversarial games cannot the cursor is moving at a given time. be compared in isolation. Feasible physical distance between actors in a match is expe- 3.3 Actions rientially limited by temporal delays that arise from communi- The game expects 19 binary inputs, corresponding to distinct key cation steps in the client loop. Inclusion of select networking presses, one ternary value for scrolling the chat log, and two real concepts, such as client-side state prediction and reconciliation, values for controlling cursor movement. In general, combinations foreign entity interpolation, and server-side lag compensation, of these can legitimately be executed simultaneously, providing should maintain playable conditions to a large extent even among no benefit to the use of compound actions. international participants. It should be noted that some of the keys, pertaining to al- In extrapolation, online play could also support widely dis- ternative views or otherwise functional when kept held down, tributed multi-agent reinforcement learning experiments in the expect unperturbed presses lasting several seconds. For stochastic form of large-scale population-based training [4, 8]. These are policies, where actions during training are sampled, this dura-subject to training and inference data transfer constraints, which tion could be long enough to cause even minute probabilities to can be alleviated by slowing down the simulation and having the 36 An Online Benchmark Environment for Multi-Agent Reinforcement Learning Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Mouse/kbd. state data pass fewer bottlenecks. In a general configuration, multiple process groups each reserve a subset of agents (unique model Spectra parameters) from the global pool and train them with locally Mouse/kbd. distributed processes, while their instances participate in shared state CNN RNN matches, as depicted in Figure 2. RNN RNN FC Cutout coords. Global Model controller Image Conv params. CNN RNN a) CNN Cutout coords. b) 1 2 N CNN Inf. Opt. Inf. Opt. Inf. Opt. Figure 3: The deep neural network architecture used in our experiments: The visual (red), audio (blue), and c) mouse/keyboard state (yellow) encoding pathways con- Auth. srv. verge in the recurrent core (green). Moreover, visual encod- simulations ing splits off into focused encoding by cropping the input image as specified by the cutout coordinates (orange). Figure 2: Online multiplayer reinforcement learning: a) The global controller process oversees all of the models many of them and is relatively dense, hinting at the inevitability in a population of agents, ensuring they are not simultane-that not all bits of visual information can be equally accounted ously being updated by any two process groups. b) Process for at any given time. Generally, this could be addressed with groups consist of a local controller and locally distributed sufficiently high model capacity and appropriate use of attention- inference, optimisation, and actor processes. c) All actor based layers. In this work, however, the visual pathway was instances may interact through remote environments sim- explicitly split into primary and focused visual encoding, based ulated by authoritative servers. on the intuition of human visual perception, where only a small part of our field of view is perceived in sharp detail. Instead of ingesting full-scale image data, focused visual en- 3.6 Replay System coding processes cutouts of much smaller size, so that singular entities can be unambiguously observed. The cutout coordinates The packets of information that a client exchanges with the are obtained from a spatial probability distribution along with fu-server in the course of a session are made to be sufficient to ture mouse and key states as outputs of the network. If they were, faithfully reproduce the player’s perspective. Byte strings can be instead, determined internally, the cropping operation would gathered, annotated, and saved as binary files, which can then be need to be differentiable, which could prove hard to satisfy. replayed in real-time or manually stepped to inspect and extract the player’s observations and actions, statistics, or other aspects 4.2 Imitation Learning of the underlying game state. Replays are an important resource Imitation learning aims to align the agent’s behaviour to that of for review and analysis of competitive games, but were primarily a number of demonstrators, e. g. experienced humans. Among its included in SiDeGame for the purposes of imitation learning. basic methods is behavioural cloning, which relies on a dataset 4 SUPERVISED LEARNING BASELINE 𝐷 = {{𝑜 } }} of pairs of observations 1, 𝑎1 , . . . , {𝑜 , 𝑎 𝑜 and target 𝑁 𝑁 actions 𝑎. The agent with parameterisation 𝜃 is tasked to predict Within the limits of available computational resources and in for each observation 𝑜 such an action ˆ 𝑎 to satisfy the following 𝑖 𝑖 view of the scale of exemplary projects [1, 8], the estimated level optimisation problem: of parallelisation, required for meaningful results of reinforce- 𝑁 ment learning experiments in an acceptable time frame, could ∗ 1 Õ 𝜃 = arg min 𝐿 (𝑎 , ˆ 𝑎 ), (1) not be reached. Instead, a baseline and a starting point for rein- 𝑖 𝑖 𝑁 𝜃 𝑖 = 1 forcement learning was attempted to be achieved with imitation learning, a form of supervised learning from demonstrations. where the loss function 𝐿, evaluating similarity between predicted and imitated actions, is dependant on the form of the action space. 4.1 Agent Model Architecture In this experiment, all outputs of the model were made discrete and the loss function formulated as an average of cross-entropy The agent’s policy was modelled as a parameterised deep neural terms for 𝑇 sub-actions of 𝐶 categories: network according to the architecture depicted in Figure 3. The model is composed of common elements: residual convolu- 𝑇 𝐶𝑡 Õ Õ 𝑡 ,𝑐 𝑡 ,𝑐 tional blocks, recurrent cells, and fully-connected layers, forming 𝐿 (𝑎 , ˆ 𝑎 ) = − 𝑎 log ˆ 𝑎 (2) 𝑖 𝑖 𝑖 𝑖 recognisable sub-networks, such as the recurrent core, which 𝑡 = 1 𝑐 = 1 provides the agent with memory and delay compensation, input After the gradients are numerically computed with regards to encoding pathways, and distinct output heads. the depth of truncated backpropagation through time, parame- The irregularity of visual encoding stems from the considera- ter updates are applied using one of the standard optimisation tion that, while visual elements are simple, the display includes algorithms. 37 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Jernej Puc and Aleksander Sadikov 4.3 Demonstrations It seemed to respond to the presence and movement of other entities in its vicinity, was able to navigate across the map towards A collection of replays was recorded from a short session between a tactical objective without hindering collisions and seemingly 10 demonstrators of negligible experience with SiDeGame, but hide behind cover, but failed to demonstrate offensive behaviour. with varying degrees of familiarity with related video games. Seven hours or 770,000 samples of total play were obtained at 5 CONCLUSIONS & FUTURE WORK 30 frames per second, which is unideally low, especially since samples and episodes are highly correlated. Attributing the shortcomings of recent works in deep reinforce- Main sub-actions were extracted from mouse and keyboard ment learning to inconsistencies between human and AI inter- states, while focused cutout coordinates would require logistical faces, a new benchmark environment has been created in the and sensory measures that were infeasible to procure. Instead, form of a lightweight multi-agent game with various tools for the coordinates were manually labelled by viewing replays at 75% training and evaluation of agents. In addition to addressing these speed and tracing paths between estimated points of contextual concerns, the simulation environment is based on a renowned tac- interest. These labels, while not ideal, fared noticeably better tical video game, providing interesting challenges for AI research, than synthetically generated pseudo-labels. particularly in domains of sound and explicit communication. Amid data extraction, observation-action pairs had actions In approaching the game with imitation learning, the trained shifted by 6 steps, conditioning the model to predict actions after agent failed to develop practically meaningful behaviours when a temporal delay close to the human response time. trained on arguably few demonstrations and was found lack- ing as a starting point for reinforcement learning experiments. 4.4 Results Nevertheless, the presented agent model architecture is general enough to be applicable to other common tasks with standard The neural network, consisting of approx. 2.9M parameters, and computer peripherals and lends itself to further experimentation. training procedure were implemented using the PyTorch package. Online characteristics of the created environment hint at its po- For training, a machine with 4 Nvidia 1080Ti GP Us was avail- tential for large-scale reinforcement learning experiments, with able. Each GP U corresponded to an optimisation process, which its accessibility and adaptability allowing the AI community to received an approximately equal share of training sequences and explore this and other directions. At the same time, certain com- progressed them chronologically in batches of 12 sequences and ponents of the environment that are not specific to AI research epochs of 30 steps. After every epoch, the gradients with regard to could also prove useful to a wider community, outside of the the loss were computed with truncated backpropagation through scope of its primary intent. time separately on each GP U, synchronously averaged between them, and used to separately update their copy of the model pa- REFERENCES rameters using the AdamW optimisation algorithm with a cosine 1-cycle learning rate schedule. [1] Christopher Berner et al. 2019. Dota 2 with large scale The main training process ran for 300,000 steps over 6 days. deep reinforcement learning. CoRR, abs/1912.06680. arXiv: The large variance in the loss in Figure 4 can be attributed to 1912.06680. http://arxiv.org/abs/1912.06680. differences between game phases and subtler characteristics of [2] Fabian Brinkmann et al. 2017. A high resolution and full- demonstrators, which were found to be distinct from degrees of spherical head-related transfer function database for differ- capability and activity. ent head-above-torso orientations. J. Audio Eng. Soc, 65, 10, 841–848. doi: 10.17743/jaes.2017.0033. http://www.aes.org/ Loss value Cross entropy terms e- lib/browse.cfm?elib=19357. 1.0 7 Focal coords. [3] Shashank Hegde, Anssi Kanervisto, and Aleksei Petrenko. Keys (sum) 0.8 6 Keys (average) 2021. Agents that listen: high-throughput reinforcement 5 Ver. mouse mvmt. 0.6 learning with multiple sensory systems. CoRR, abs/2107.02195. 4 Hor. mouse mvmt. arXiv: 2107.02195. https://arxiv.org/abs/2107.02195. 0.4 3 [4] Max Jaderberg et al. 2018. Human-level performance in 2 0.2 first-person multiplayer games with population-based deep 1 0.0 0 reinforcement learning. CoRR, abs/1807.01281. arXiv: 1807. 0 75000 150000 225000 300000 0 75000 150000 225000 300000 Update step Update step 01281. http://arxiv.org/abs/1807.01281. [5] Michal Kempka et al. 2016. ViZDoom: A Doom-based AI research platform for visual reinforcement learning. CoRR, Figure 4: Loss progression over the course of training. Left: abs/1605.02097. arXiv: 1605 . 02097. http : / / arxiv. org / abs / Average loss value enveloped by minimum and maximum 1605.02097. evaluations. Right: Averages of constituent terms. [6] Tim Pearce and Jun Zhu. 2021. Counter-Strike deathmatch with large-scale behavioural cloning. CoRR, abs/2104.04258. Figure 4 shows that, by the end of the training schedule, only arXiv: 2104.04258. https://arxiv.org/abs/2104.04258. imitation of focal coordinates leaves room for improvement, [7] David Silver et al. 2016. Mastering the game of go with while other terms in the loss function have already overfitted. deep neural networks and tree search. Nature, 529, 7587, Due to the relatively small size of the network, overfitting had 484–489. issn: 1476-4687. doi: 10.1038/nature16961. https: been underestimated, although the outcome could have been //doi.org/10.1038/nature16961. inevitable with the given amount of data. [8] Oriol Vinyals et al. 2019. Grandmaster level in StarCraft II In practice, the trained agent’s behaviour was greatly sensitive using multi-agent reinforcement learning. Nature, 575, 7782, to even imperceptibly slight changes in starting conditions. Its 350–354. issn: 1476-4687. doi: 10.1038/s41586- 019- 1724- z. switching between alternative views was debilitatingly chaotic https://doi.org/10.1038/s41586- 019- 1724- z. and had to be suppressed to allow expression of other behaviours. 38 Question Ranking for Food Frequency Questionnaires Nina Reščič Mitja Luštrek nina.rescic@ijs.si mitja.lustrek@ijs.si Department of Intelligent Systems, Jožef Stefan Institute Department of Intelligent Systems, Jožef Stefan Institute Jožef Stefan International Postgraduate School Ljubljana, Slovenia Ljubljana, Slovenia ABSTRACT This paper explores the ranking of questions and is the next step from our previous work. With ranking the questions by im- Food Frequency Questionnaires (FFQs) are probably the most portance and asking them in the ranked order, it can be expected commonly used dietary assessment tools. In the WellCo project, that quality of predictions will improve with each additional we developed the Extended Short Form Food Frequency Question- answer and we are not limited with the constraint that certain naire (ESFFFQ), integrated into a mobile application, in order to number of questions should be answered. We addressed the prob- monitor the quality of users’ nutrition. The developed question- lem as a single-target problem for classification and regression. naire returns diet quality scores for eight targets — fruit intake, vegetable intake Additionally, we tested the algorithms on different representa- , fish intake, salt intake, sugar intake, fat intake, fi- bre intake tions of features for both type of problem. The findings of this and protein intake. This paper explores the single-target paper could be used for setting the baseline for our future re- problem of question ranking. We compared the question ranking search. of the machine learning algorithms on three different types of features for classification and regression problems. Our findings 2 METHODOLOGY showed that the addressing problem as a regression problem performs better than treating it as a classification problem and 2.1 Problem outline the best performance was achieved by using a Linear Regression In our previous research [6, 7] we tried to find subsets of questions on features, where answers were transformed to frequencies of that would allow us to ask the users about their dietary habits consumption of certain food groups. with as few questions as possible and still get sufficient information to evaluate their nutrition. For this we used the Extended KEYWORDS Short Form Food Frequency Questionnaire (ESFFFQ) [5]. The nutrition monitoring, FFQs, question ranking questionnaire returns diet quality scores scores for fruit intake, vegetable intake, fish intake, salt intake, sugar intake,fat intake, fibre intake and protein intake. We calculate the nutrient intake 1 INTRODUCTION amounts and from there we further calculate the diet quality Adopting and maintaining a healthy lifestyle has become ex- scores. tremely important and healthy nutrition habits represent a major The questionnaire was included in a mobile application, where part in achieving this goal. Self-assessment tools are playing a big the system asked the users about their diet with one or two role in nutrition monitoring and many applications are including questions per day. The answers were saved into a database and Food Frequency Questionnaires (FFQs) as a monitoring tool, due every fortnight the quality scores were recalculated. As it could to they in-expensiveness, simplicity and reasonably good assess- happen that the users did not answer all the questions by the ment [8, 3]. An FFQ is a questionnaire that asks the respondents time the recalculation was done, it was of great importance to ask about the frequency of consumption of different food items (e.g., the questions in the right order. In the terminology of machine "How many times a week do you eat fish?"). In the EU-funded learning this would be a feature ranking problem. We explored project WellCo we developed and validated an Extended Short the problem as a set of single-target problems — separately for Form Frequency questionnaire (ESFFFQ) [5] that was included individual outcome scores. As three of the diet quality scores in a health coaching application for seniors. (fruit, vegetable and fish intake) are only dependent on one or two Cade et al. [2] suggest that for assessment of dietary data short questions, the problem of feature ranking is trivial. Therefore we FFQs could be sufficient and that marginal gain in information is explored the problem for the remaining five targets — fat intake, decreasing with extensive FFQs. Block et al. [1] concluded that sugar intake, fibre intake, protein intake and salt intake. longer and reduced return comparable values of micronutrients intake. Taking this idea a step forward, we explored the possi- 2.2 Dataset bilities to get the most information even if one does not answer We got the answers to ESFFFQ from 92 adults as a part of the the whole questionnaire. In our previous work we explored how WellCo project and additionally from 1039 adults included in to find the smallest set of questions that still provides enough SIMenu, the Slovenian EUMenu research project [4]. The ques-information by applying different feature selection techniques tions included in the ESFFFQ were a subset of the questions in the [6, 7]. FFQ in SIMenu. Furthermore, the answers (consumption frequen- cies) were equivalent in both questionnaires, and consequently Permission to make digital or hard copies of part or all of this work for personal extracting the answers from SIMenu and adding them to the or classroom use is granted without fee provided that copies are not made or answers from the ESFFFQ was a very straightforward task. distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner /author(s). 2.3 Feature ranking Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia To do the experiments, we first randomly split the data into © 2021 Copyright held by the owner/author(s). validation and training sets in ratio 1:3. To train the models and 39 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Reščič et al. rank the features we then used 4-fold cross-validation on the frequencies or amounts, we get better results on the validations training set and used the average feature importance from all 4 set than with RF. folds as the final feature ranking. The ranked features were used to predict quality scores (clas- sification problem) and nutrient amount (regression problem), by adding the question as they were ranked. In this paper we present the results for two commonly used machine learning algorithms — Logistic/Linear Regression and Random Forest Classifier/Regressor. To rank the features we used the absolute value of the coefficients in the Linear/Logistic Regression and the feature_importance attribute as implemented in the Random Forest Classifier/Regressor in the sklearn library. Additionally we compared different feature representations — Figure 1: Results on validation set for fat intake features where answers are represented with nominal discrete equidistant values (once per week is represented as integer 2), Sugar. For sugar intake the story is very similar. RF performed features where answers were transformed into frequencies of fairly well for the first few questions and then the accuracy began consumption (once per week is represented as approx. 0.14 per to fall. The best performing algorithm was the LR on the features day) and features where answers were transformed into amounts (Figure 2, where the answers were transformed into frequencies. of nutrients (once per week is represented as grams/day). In the last represenation, the features differed between the targets sugar, fat, salt, fibre and protein. We ran the experiments for five diet categories (fat intake, sugar intake, fibre intake, protein intake and salt intake) for both classification and regression problem. In both cases we started with the best ranked question, trained the model and compared results on train and validation sets. Then we added the second best ranked question, trained the models and compared the results. We added the questions one by one until the last one. Figure 2: Results on validation set for sugar intake 3 RESULTS 3.1 Classification problem Fibre. For fibre intake the RF algorithms performed better for a very long time (Figure 3) and it reached the best accuracy after For classification we tried to predict the quality scores for each of 6 questions. The LR performed worse, and it did similarly badly the five nutrition categories. There were three scores - 2 (good), on the training set as well. 1 (medium) and 0 (bad). The distribution of the scores for all the categories is shown in Table 1. Table 1: Distribution of target values for classification Score Fat Sugar Fibre Protein Salt 2 51% 74% 26% 79% 32% 1 31% 14% 22% 13% 47% 0 18% 12% 52% 8% 21% Figure 3: Results on validation set for fibre intake We compared Random Forest Classifier and Logistic Regres- sion for three different types of features - discrete equidistant Protein. For protein intake (Figure 4) the results are similar to answers, answers transformed to frequencies and answers trans-those for fibre intake. However, in case of protein intake the formed to amounts. majority class is 79% and most of the algorithms almost never Fat. exceeded this value. For Random Forest (RF) there was not a big difference be- tween the three representations of the features. With all three, the highest accuracy on the validation set (79%) is achieved with 5 questions and afterwards the accuracy starts falling and stays on the interval between 75% and 79%. This clearly indicates over- fitting, which is confirmed by the fact that the accuracy for RF on the training set was 100% from the fifth question. A similar situation happened for all the remaining targets and will not be repeated in the following subsections. On the training set Lo- gistic Regression (LR) had worse results than the RF and it also performed the worst from all algorithms when run on the dis- Figure 4: Results on validation set for protein intake crete features. However, when the features are transformed into 40 Question Ranking for Food Frequency Questionnaires Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Salt. For salt intake the best model is the LR on the answers transformed to amounts. As seen in Figure 5, it exceeded the RF algorithms for almost 20% from eleventh added question on and predicted the quality scores with more than 90% accuracy with only 14 questions, which is half of the questionnaire. Figure 7: Results on validation set for sugar intake Fibre. Classification for fibre intake was very bad, however, when considering it as a regression problem, the LR on ’frequency’ Figure 5: Results on validation set for salt intake features’ predicted the amounts with error smaller than 2 grams when more than eleven questions were used 8. Considering Table 2 this means that predicting how bad/good the fibre intake was 3.2 Regression problem done better then predicting if it is bad or good. While knowing the quality score is a valid first information whether one’s diet is good or not, generally more interesting information is how good (or how bad) it really is. Therefore it is reasonable to look at the same problem as a regression problem , where we try to predict the actual amount (in grams) of con- sumed nutrients. Again we explored the performance of Random Forest Regressor (RF) and Linear Regression (LR) on the three previously described feature sets. Table 2: Nutrient intake in grams/day to quality scores Score Fat[g] Sugar[g] Fibre[g] Protein[g] Salt[g] Figure 8: Results on validation set for fibre intake 2 ≤ 74 ≤ 55 ≥ 30 ≥ 55 ≤ 6 1 else else else else else 0 ≥ 111 ≥ 82 ≤ 25 ≤ 45 ≥ 9 Protein. For protein intake all algorithms had a similar perfor- Fat. mance up to ten included questions, however, the LR on the The best performing algorithm for fat intake was the LR on ’frequency features’ started to perform better and better with the answers transformed to frequencies. The overfitting of the RF each added questions and predicted the amount of protein con- is even more visible than with the classification problem as the sumption with error of 5 grams (Figure 9). errors for these models did not fall under 20 grams even if all the questions were used, while the error of the LR on the feature sets where the answers are transformed to frequencies or amounts was smaller than 5 grams from eleven included questions (Figure 6). Figure 9: Results on validation set for protein intake Figure 6: Results on validation set for fat intake Salt. Similarly to the protein intake all algorithms performed Sugar. Similarly to fat intake, LR with the ’frequency features’ with a comparable error up to nine included questions, and after performed best (Figure 7). However the LR on the ’amounts fea-that LR using the features transformed to frequencies started to tures’ performed well for more than 15 questions, but predicted perform way better and predicted salt intake with error smaller the worst for the first eleven included questions. than 1 gram with eleven included questions (Figure 10). 41 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Reščič et al. several nutrition quality scores but still would want to avoid answering too many questions. Next, probably more important and interesting research problem, is how to use the answers already provided to our advantage — so instead of statically ranking the questions we would rather explore how we could improve the prediction performance by dynamically ranking and asking the questions. ACKNOWLEDGMENTS Figure 10: Results on validation set for salt intake WellCo Project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 769765. 3.3 Discussion The authors acknowledge the financial support from the Slove- nian Research Agency (research core funding No. P2-0209). We compared performance of feature ranking for two different The WideHealth project has received funding from the Euro- machine learning algorithms on three different types of features pean Union’s Horizon 2020 research and innovation programme for both classification and regression problems. While the classi- under grant agreement No 95227. fication problem might give the general idea about one’s dietary habits, it is inclined towards overfitting even for very simple mod-REFERENCES els, such as Logistic Regression, while more complex algorithms, [1] Block G, Hartman AM, and Naughton D. 1990. A reduced Random Forest Classifier in our case, are even more subject to dietary questionnaire: development and validation. Epi- this deficiency. By predicting amounts instead of quality scores, demiology, 1, 58–64. doi: 10 . 1097 / 00001648 - 199001000 - one gets information about how good/bad the dietary habits are 00013. instead of just if they are good or bad. [2] Cade J., Thompson R., Burley V., and Warm D. 2002. Devel- Transforming features from discrete equidistant values to fre- opment, validation and utilisation of food-frequency ques- quencies or amounts of nutrients proved to be a very good ap- tionnaires – a review. Public Health Nutrition, 5, 4, 567–587. proach. The transformation gave better results for both classifi- doi: 10.1079/PHN2001318. cation and regression problem for both Random Forest Regres- [3] Shim JS, Oh K, and Kim HC. 2014. Dietary assessment meth- sor/Classifier and Logistic/Linear Regression. While the perfor- ods in epidemiologic studies. Epidemiol Health, 36. doi: mance of both algorithms on features transformed to frequencies 10.4178/epih/e2014009. and features transformed to amounts for the classification prob- [4] Gregorič M, Blaznik U., Delfar N., Zaletel M., Lavtar D., lem was comparable, and Linear Regression on features trans- Koroušić-Seljak B., Golja P., Zdešar Kotnik K., Pravst I., formed to amounts gave markedly better results for salt intake, Fidler Mis N., Kostanjevec S., Pajnkihar M., Poklar Vatovec the Linear Regression on features transformed to frequencies T., and Hočevar-Grom A. 2019. Slovenian national food outperformed all other combinations of features and algorithms consumption survey in adolescents, adults and elderly : for the regression problem for all of the targets. The reason for external scientific report. EFSA Supporting Publications, 16, this is that linear regression on amounts is a very good match in 11, 1729E. doi: 10.2903/sp.efsa.2019.EN- 1729. the sense that the target variable (total amount) is the sum of all [5] Reščič N., Valenčič E., Mlinarič E., Seljak Koroušić B., and features (partial amounts). Luštrek M. 2019. Mobile nutrition monitoring for well- Transforming the features to frequencies instead to amounts being. In (UbiComp/ISWC ’19 Adjunct). Association for has another advantage — frequencies transformed to amounts are Computing Machinery, London, United Kingdom, 1194–1197. specific to each target, while features transformed to frequencies doi: 10.1145/3341162.3347076. are equal for all targets. This is an important finding for possible [6] Reščič N., Eftimov T., Koroušić Seljak B., and Luštrek M. future research where one would address ranking of questions as 2020. Optimising an ffq using a machine learning pipeline a multi-target problem. Additionally, regression problem using to teach an efficient nutrient intake predictive model. Nu- Linear Regression on features transformed to frequencies could trients, 12, 12. doi: 10.3390/nu12123789. solve as a baseline for future experiments. [7] Reščič N., Eftimov T., and Seljak Koroušić B. 2020. Com- 4 CONCLUSION AND FUTURE WORK parison of feature selection algorithms for minimization of target specific ffqs. In 2020 IEEE International Conference on Ranking the questions of FFQs when it could be expected that not Big Data (Big Data), 3592–3595. doi: 10.1109/BigData50022. all of the questions will be answered is an important step when 2020.9378246. building models for predicting quality of one’s diet. In this paper [8] Thompson T. and Byers T. 1994. Dietary assessment re- we compared two feature ranking algorithms on three different source manual. The Journal of nutrition, 124, (December types of features for classification and regression problem for 1994), 2245S–2317S. doi: 10.1093/jn/124.suppl_11.2245s. five targets. The findings of this paper show that considering the problem as a regression problem on features transformed to frequencies and using a simple machine learning algorithms (Linear Regression) gives the best results for all five targets and provides baseline for future experiments. There are several possibilities for future work. As hinted in the previous section, the question of multi-target question ranking is one of the first that appears — one might want to monitor 42 Daily Covid-19 Deaths Prediction For Slovenia David Susič "Jožef Stefan" Institute Ljubljana, Slovenia david.susic@ijs.si ABSTRACT related government interventions (school closing, workplace clos- ing, cancel public events, restrictions on gatherings, close pub- In this paper, models for predicting daily Covid-19 deaths for lic transport, stay at home requirements, restrictions on inter- Slovenia are analysed. Two different approaches are considered. nal movement, international travel controls, public information In the first approach, the models were trained on the fist wave campaigns, testing policy, contact tracing, and facial coverings), dataset of state intervention plans, cases and country-specific Covid-19 related cases and deaths, and some static data, in par- static data for 11 other European countries. The models with the ticular the country’s population, population density, median age, best performance in this case were the k-Nearest Neighbors re- percentage of people over 65, percentage of people over 70, gdp gressor and the Random Forest regressor. In the second approach, per capita, cardiovascular death rate, diabetes prevalence, per- a time-series analysis was performed. The models used in this centage of female and male smokers, hospital beds per thousand case were Seasonal Autoregressive Integrated Moving Average people, and life expectancy. To suppress anomalies in registered Exogenous and Feed forward Neural Network. For comparison, cases on Sundays and holidays, a 7-day moving average was all 4 models were tested on the second wave for Slovenia and used for both cases and deaths. The dataset covers the European the model with the best performance was Feed forward Neural countries of Slovenia, Italy, Hungary, Austria, Croatia, France, Network, with a mean absolute error of 1.34 deaths. Germany, Poland, Slovak Republic, Bosnia and Herzegovina, and KEYWORDS the Netherlands from January 22, 2020 to December 11, 2020. All of the countries chosen for this study are geographically next to Covid-19, deaths, predictions, machine learning one another and are thus expected to have similar course of epi- demic. The data on government interventions, cases and deaths are derived from the "COVID-19 Government response tracker" 1 INTRODUCTION database, collected by Blavatnik School of Government at Oxford The aim of this analysis is to find out whether we can predict University [4]. The intervention values range between 0-4 and Covid-19 deaths for Slovenia based on the characteristics of the represent their strictness, for example, if only some or all schools epidemic in other European countries, and whether we can pre-are closed. The static data are collected from a variety of sources dict deaths based on a time series analysis of historical data (e.g. (United Nations, World Bank, Global Burden of Disease, Blavat- predicting for the second wave based on the first wave infor- nik School of Government, etc.) [3]. The original data are publicly mation). The main advantage of the first approach is that we available online. The processed data used for the purpose of this do not need historical case and death data for the country for study can be found online at https://repo.ijs.si/davidsusic/covid- which we are making a prediction (in this case Slovenia), while seminar-data. the second approach is generally more accurate but relies on historical death data. The aim is also to find out which of the two 3 METHODS AND MODELS approaches provides more accurate predictions. It is important Two different approaches were considered for the analysis. For to note that although this is a study for Slovenia, the results can the first part of the analysis, referred to as the country-specific be interpreted as a general assessment of the effectiveness of the approach, the models were trained on the data of government methods described for predicting Covid-19 deaths and can be intervention plans, cases, deaths and country-specific static data applied to any country for which the data are available. for the 10 other European countries, with the aim of predicting The data used in this analysis are described in Section 2. Sec-deaths for Slovenia. In this case, the predictions were made for tion 3 provides a description of the approaches and the models. each day, disregarding the time order. For the second part of the Section 4 contains a discussion of the determination of the opti-analysis, a time series prediction was performed, using only the mal parameters of the selected models. The results are given in daily deaths for Slovenia as data. Section 5. The conclusion, along with ideas for possible improvements, is given in Section 6. 3.1 Country-Specific Approach 2 DATA DESCRIPTION AND PREPARATION In the country-specific approach, the selection of the base model was very important, as models that perform worse than the base The data used in this paper consist of daily Covid-19 related model are not worthy of interpretation. The baseline was defined features at the country level. It contains 12 different Covid-19 as 𝑁 (𝑡 ) = 𝑁 (𝑡 − 14) · 𝑀, (1) deaths cases Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or where 𝑀 = 0.023 is the mortality rate factor of those infected, distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this calculated as a weighted average of the mortality rates of the work must be honored. For all other uses, contact the owner /author(s). countries included in this study [2], and 𝑡 denotes a specific Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia day. This simple model implies, that the number of deaths on © 2021 Copyright held by the owner/author(s). a given day 𝑡 is equal to the number of new infections on the 43 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia David Susič day 𝑡 − 14, multiplied by the mortality rate factor. The regres- sor model that were tested are: Random Forest (RF), k-Nearest 0.10 Neighbors (KNN), Stochastic Gradient Descent, Ridge, Lasso, and Epsilon-Support Vector. Description of all of the models can be Base found in the Python scikit-learn documentation [5]. The two that MAE 0.05 KNN performed significantly better than the baseline were the KNN regressor and RF regressor. Other regression models performed RF the same or worse than the baseline model and were thus not used in the further analysis. All models were tested in the 10-fold −28 −24 −20 −16 −12 −8 −4 0 cross-validation with the performance measures mean absolute Lookback [days] 2 error (MAE), mean squared error (MSE) and 𝑅 score on the data subset that does not include Slovenia. The measures are defined as: 𝑛−1 1 Õ 0.04 MAE (𝑦, ˆ 𝑦 ) = |𝑦 − ˆ 𝑦 |, (2a) Base 𝑖 𝑖 𝑛 𝑖 =0 KNN 𝑛−1 MSE 0.02 1 Õ RF MSE (𝑦, ˆ 𝑦 ) = (𝑦 − ˆ 𝑦 )2 , (2b) 𝑖 𝑖 𝑛 𝑖 =0 0.00 Í𝑛 (𝑦 − ˆ 𝑦 )2 𝑖 𝑖 2 𝑖 =1 𝑅 (𝑦, ˆ 𝑦 ) = 1 − , (2c) −28 −24 −20 −16 −12 −8 −4 0 Í𝑛 (𝑦 − ¯ 𝑦 )2 𝑖 𝑖 𝑖 =1 Lookback [days] where ˆ 𝑦 is the predicted value of the 𝑖 -th sample, 𝑦 is the 𝑖 corresponding true value, n is the sample size and ¯ 𝑦 is the average Í𝑛 true value ¯ 𝑦 = 1 𝑦 . 1 𝑛 𝑖 =1 For each sample, additional features of the government inter- 1.0 ventions and cases were added for the previous days. The number Base of previous days was defined using the lookback parameter. Mod- re KNN els were tested for lookback values between −28 and 0 days. The 0.8 sco RF comparison is shown in Figure 1. It can be seen that the perfor-2 R mance decreases in the range where the lookback is shorter than 0.6 14 days, but does not increase in the range where the lookback exceeds this value. The main reason for this is probably the fact −28 −24 −20 −16 −12 −8 −4 0 that most deaths occur within the first 14 days of infection. A Lookback [days] lookback of 14 days was used for further analysis as it was found to be the most appropriate. 3.2 Time-Series Approach Figure 1: 10-fold cross validation performance measure In the second approach, a time series analysis was performed. In of the models for different lookback parameter. The mea- this case, only daily deaths for Slovenia were used as data. The sures and its units are are: MAE [deaths/100k] (top), MSE models used in this case were Seasonal Autoregressive Integrated [deaths2/100k2] (middle) and 2 𝑅 score (bottom) Moving Average Exogenous (SARIMAX(p,d,q)(P,D,Q,m) [6] and Feed forward Neural Network (FFNN) [1]. The former is a combination of several different algorithms. Table 1: 10-fold cross-validation performance measures of The first is the autoregressive AR (p) model, which is a linear the predictions for 21 days for SARIMAX and FFNN algo- model that relies only on past p values to predict current values. rithms. The next is the moving average MA (q) model, which uses the residuals of the past q values to fit the model accordingly. The I(d) 2 MAE MSE score represents the order of integration. It represents the number of 𝑅 2 [deaths] [deaths ] times we need to integrate the time series to ensure stationarity. The X stands for exogenous variable, i.e., it suggests adding a SARIMAX 1.13 4.81 0.71s separate other external variable to measure the target variable. FFNN 0.53 1.15 0.88 Finally, the S stands for seasonal, meaning that we expect our data to have a seasonal aspect. The parameters P, D, and Q are the seasonal versions of the parameters p, d, and q, and the parameter m represents the length of the cycle. n-fold cross validation. This means, that there is no random The FFNN structure included 10 input perceptrons - one for shuffling of the data. The test set must always be the final portion each death value in the last 10 days, a hidden layer of 64 percep-of the data - the final part of the date range. The concept of trons, and 1 output perceptron. forward chaining is shown in Figure 2. The results of the 10-fold Since the future data of the time series contain the information cross-validation of the predictions for 21 days are shown in Table about the past, a forward chaining approach was performed for 1. 44 Daily Covid-19 Deaths Prediction For Slovenia Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia each splitting decision, the results and hence the performance measures are also somewhat random. However, they do follow a certain trend that becomes apparent when a polyfit is applied. To reduce the randomness of the results, the average of 3 separate predictions was calculated for each number of trees. To determine the best parameters of the SARIMAX model, the auto_arima algorithm from the Python pmdarima library was used [7]. The algorithm analyzes the given data and determines the best model and its parameters for that data. In this case, the selected model was SARIMAX(2, 1, 4)(4, 1, 1, 12). Figure 2: Forward chaining approach to time-series n-fold In the case of FFNN, the parameter selection was omitted - the cross-validation. same model structure was always used. 5 RESULTS 4 MODELS’ PARAMETERS SELECTION With the optimal parameters selected, the graphs of the pre- The next step was to determine the optimal parameters of the dictions can be plotted. The predictions of the country-specific selected models. For this purpose, the regressor models were approach are shown in Figure 5. trained on the same dataset used in the 10-fold cross-validation and tested on the data for Slovenia. For this particular case, different model parameters were tested to see which performed best. The MAE [deaths/100k] as a function of parameters K for the 50 Random Forrest Regressor KNN and as a function of the number of trees for RF are shown K Nearest Neighbours Regressor in the Figures 3 and 4, respectively. 40 Baseline Truth 30 Deaths 20 0.15 10 MAE 0 K = 55 0.10 r-22 r-21 y-21 0 50 100 150 eb-21 Jan-22 F Ma Ap Ma Jun-20 Jul-20 Aug-19 Sep-18 Oct-18 Nov-17 K Figure 5: Deaths for Slovenia from 22.1.2020 to 11.12.2020. Figure 3: MAE of the KNN regressor as function of K. Models’ predictions, compared to true values. All models predicted the number of deaths for the first epi- demic wave fairly accurately. As a result of the unrepresentative reporting of Covid-19 cases for the second wave, the base model polyfit predicts a much lower number of daily deaths. We can also see 0.15 that the KNN regressor predicts the same value from a certain day forward. The reason for this is most probably that the algorithm MAE always finds the same k=55 neighbors, thus always predicts the 0.10 same value. To avoid this, a larger dataset would be required. MAE for RF, KNN and baseline are shown in Table 2. 50 100 150 200 Table 2: MAE comparison of the country-specific models Number of trees for the interval from 22.1.2020 to 11.12.2020. Figure 4: MAE of the RF regressor as a function of the num- RF KNN baseline ber of trees. MAE [deaths] 5.41 5.39 5.48 For the KNN regressor, MAE has a minimum at 𝐾 = 55, while The predictions for the time interval between 21.11.2020 and for RF the fitting function shows that the appropriate number of 11.12.2020 for the time-series approach are shown in Figure 6. trees is 100, since the model does not improve with additional MAE for FFNN and SARIMAX, shown in Table 3, are substantion-trees at this point. It is important to note that since RF is ran- ally lower than MAE of the country-specific models. However, dom in the sense that it randomly selects a subset of features at the accuracy decreases as the prediction time interval increases. 45 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia David Susič Table 3: MAE comparison of the time-series models for the Table 4: MAE comparison of the models for the interval interval from 21.11.2020 to 11.12.2020. from 1.11.2020 to 11.12.2020. FFNN SARIMAX FFNN SARIMAX RF Reg. KNN Reg. MAE [deaths] 1.24 2.27 MAE [deaths] 1.34 1.67 6.46 8.85 It can be seen that in this case the time-series approach is more accurate than the country-specific one. However, for longer time intervals, the country-specific approach is better because it does not rely on past data. It is important to note that the country-specific models’ error are actually lower when making predictions from the start of the epidemic. The reason for this is that for the first 6 months, the numbers of deaths were very low 50 as can be seen in the Figure 5. The best performing model overall is the FFNN with the MAE of 1.34 deaths. The reason for the best performance of this model 45 is probably that it had a relatively high number of input pa- Deaths rameters. The input layer consisted of 10 perceptrons, i.e. each FFNN prediction was based on the values of the last 10 days. 40 SARIMAX 6 CONCLUSION Truth In this paper, two different approaches to predicting Covid-19 deaths for Slovenia were tested. Both approaches turned out to be reliable. The main implications of the presented study are that Nov-17 Nov-21 Nov-25 Nov-29 Dec-03 Dec-07 Dec-11 for short time intervals the time series approach is much more accurate than the country-specific approach. The advantage of the country-specific approach is that it can predict the number of Figure 6: Slovenia deaths from 21.11.2020 to 11.12.2020. deaths for a given day, based on the number of cases, countermea- Time-series models’ predictions, compared to true values. sures and country-specific static data, without necessarily having information about the past. On the other hand, for the prediction of the second wave, where we already know the course of the epidemic in the first wave, the time series approach is better To determine the overall best model for such predictions, all 4 - at least for the prediction for Slovenia. In the future studies, models were tested on the second epidemic wave. The predictions predictions for the third and fourth waves will be analysed. are visualized in the Figure and the MAEs [deaths] are listed in the Table 4. REFERENCES [1] Francois Chollet et al. 2015. Keras. https : / / github . com / fchollet/keras. [2] Ensheng Dong et al. 2020. An interactive web-based dash- board to track covid-19 in real time. The Lancet Infectious 50 Diseases, 20, 5. doi: 10.1016/S1473-3099(20)30120-1. http: //doi.acm.org/10.1016/S1473- 3099(20)30120- 1. [3] Thomas Hale et al. 2020. A cross-country database of covid- 40 19 testing. Scientific Data, 7, 345. doi: 10.1038/s41597-020- 00688- 8. http://doi.acm.org/10.1038/s41597- 020- 00688- 8. [4] Thomas Hale et al. 2021. A global panel database of pan- Deaths FFNN 30 demic policies (oxford covid-19 government response tracker). SARIMAX Nature Human Behaviour, 5, 3529–538. doi: 10.1038/s41562- RF Reg. 021- 01079- 8. http://doi.acm.org/10.1038/s41562- 021- 01079- 20 KNN Reg. 8. Truth [5] Fabian Pedregosa et al. 2012. Scikit-learn: machine learn- ing in python. Journal of Machine Learning Research, 12, ( January 2012). [6] Skipper Seabold and Josef Perktold. 2010. Statsmodels: econo- metric and statistical modeling with python. Proceedings of Nov-01 Nov-07 Nov-13 Nov-19 Nov-25 Dec-01 Dec-07 the 9th Python in Science Conference, 2010, (January 2010). [7] Taylor G. Smith et al. 2017. pmdarima: arima estimators for Python. [Online; accessed 9.1.2021]. (2017). http : / / www. Figure 7: Slovenia deaths from 1.11.2020 to 11.12.2020. alkaline- ml.com/pmdarima. Models’ predictions, compared to true values. 46 Iris Recognition Based on SIFT and SURF Feature Detection Alenka Trpin Bernard Ženko Faculty of Information Studies Department of Knowledge Technologies Ljubljanska cesta 31A Jožef Stefan Institute 8000 Novo mesto, Slovenia 1000 Ljubljana, Slovenia alenka.trpin@fis.unm.si bernard.zenko@ijs.si ABSTRACT (3) Feature extraction, where a feature vector is generated using different filters, and (4) comparison, based on different distances Human iris recognition is generally considered to be one of the (Hamming distance in specific cases) between pairs of most effective approaches for biometric identification. transformed iris images and the corresponding masks [10]. The Identification is required in numerous areas such as security (e.g., comparison step nowadays frequently implemented with a airports and other buildings, airports), identity verification (e.g., machine learned classification model. banking, electoral registration), criminal justice system. This This work firs uses Scale Invariant Feature Transform paper presents an approach for iris image classification that is (SUFT) and Speed Up Robust Features (SURF) algorithms to based on two popular algorithms for image feature construction extract image keypoints or descriptors and then the bag of visual Scale Invariant Feature Transform (SIFT) and Speed Up Robust words to generate image features that can be used by standard Features (SURF). Both algorithms were used in combination supervised machine learning methods. We evaluate our method with the bag of visual words approach to create descriptive image on a publicly available iris image dataset. features that can be used by supervised machine learning methods and a set of standard machine learning methods (k-Nearest Neighbor, random forest, support vector machines and 2 RELATED WORK neural networks) were evaluated on publicly available iris data Iris recognition is frequently used for gender recognition and set. personal biometric authentication [6, 8, 9]. Ali et. al. applied KEYWORDS contrast-limited adaptive histogram equalization to the normalized image. They used SURF and investigated the Iris recognition, image classification, SIFT features, SURF necessity of iris image enhancement based on the CASIA-Iris- features Interval dataset [1]. Păvăloi and Ignat present experiments carried out with a new approach for iris image classification based on matching SIFT on iris occlusion images. They used the 1 INTRODUCTION UPOL iris dataset to test their methods [6]. Bansal and Sharma Biometrics is the science of determining a person's identity and use a statistical feature extraction technique based on the is an important approach for forensic and security identity correlation between adjacent pixels, which was combined with a management. Face, fingerprints, voice and iris are the most 2-D Wavelet Tree feature extraction technique to extract commonly used biometrics identifiers for personal identification. significant features from iris images. support vector machines They provide characteristics in terms of personal appearance. (SVM) were used to classify iris images into male or female The biometric system first scans the biometric characteristic, and classes [2]. Salve et. al. used an artificial neural network and then, typically based on a library of scans or classification model SVM as a classifier for iris patterns. Before applying the identifies the person [5]. classifier, the region of interest, i.e., the iris region, is segmented Typical iris recognition system consists of four key modules: using a Canny edge detector and a Hough transform. The eyelid (1) image pre-processing, where the system detects the boundary and eyelash effect are kept to a minimum. A Daugman rubber- of the pupil and the outer iris, (2) normalization, where the inner plate model is used to normalise the iris to improve and outer circle parameters obtained from iris localization are computational efficiency and appropriate dimensionality. given as input. Then, a transformation from polar to Cartesian Furthermore, the discriminative feature sequence is obtained by coordinates is applied which maps the circle (iris) into a rectangle. feature extraction from the segmented iris image using 1D Log Gabor wavelet [14]. Adamović et. al. applied an approach that classifies biometric templates as numerical features in the Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed CASIA iris image collection. These templates are generated by for profit or commercial advantage and that copies bear this notice and the full converting a normalised iris image into a one-dimensional fixed-citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). length code set, which is then subjected to stylometric feature Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia extraction. The extracted features are further used in combination © 2021 Copyright held by the owner/author(s). with SVM and random forest (RF) classifiers [15]. 47 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Trpin and Ženko 3 METHODOLOGY vocabulary (generate a vector of visual word frequencies) [15]. It is worth mentioning that a specific BoVW model is based on a Our iris recognition approach combines image feature generation given training dataset and it only includes visual words that algorithms SIFT, SURF, bags of visual words model and appear in the training images. standard supervised machine learning classification methods. In the following subsections we briefly describe each of these 3.4 Classification Methods components, and then explain how these components are combined together. The image classification phase of image analysis can be in principle performed with any machine learning method for 3.1 SIFT classification. We have decided to evaluate a diverse set of standard methods, which we briefly describe in the following The SIFT algorithm detects a set of local features in an image. paragraphs. These features represent local areas of the image, and the The kNN is a supervised method that can be used for algorithm also computes their description in a form of a vector. classification and regression. It is a simple algorithm where the The algorithm proceeds in several stages. The first stage of classification of new instances is based on the majority class of computation is scale-space extrema detection which searches the k closest training examples. The closeness is measured with over all scales and image locations. It employs the so-called a distance measure, which is usually Euclidean, Minkowski or difference of Gaussian function to identify potential interest Manhattan distance [9]. points that are invariant to scale and orientation. The second RF is a supervised learning algorithm based on the ensemble stage localizes each candidate at a location. Keypoints are principle of using decision trees as the basic classifier and extracted by detecting scale space extrema. The main idea behind creating a learning model by combining multiple decision trees. the scale space extrema detection is to identify stable features The main idea of the RF classifier is to create multiple decision which are invariant to changes in scale and viewpoint. At this trees using a bootstrapped sampling method and introduce point the keypoint descriptors are extracted [4, 6]. In essence, randomness in the individual tree building process. The class SIFT describes each image with a set of keypoints, and each label of a new example is determined by majority voting of all keypoint is described with a vector of dimension 128. It is worth trees in the ensemble [11]. mentioning that SIFT can detect different numbers of keypoints The neural networks (NN) consist of several layers of simple in different images. units (neurons), which are simple functions with weight and bias parameters. Each neuron in one layer is connected to all neurons 3.2 SURF in the next layer by a process called back-propagation, and uses The SURF algorithm is based on similar ideas as SIFT, but their gradient descent to measure the rate of change of the loss implementation is different. It can be used for similar tasks as function (e.g. Cross-Entropy loss). NN can have different SIFT, but it is faster, and produces highly accurate results when structures, but typically have an input layer, one or more hidden provided appropriate reference images. Instead of difference of layers and an output layer. Each of these layers contain one or Gaussian function, SURF uses approximate Laplacian of more neurons [9, 12, 13]. In this work, we used the adam solver Gaussian images and a box filter. Determinants of the Hessian function because it is fast and gives good results. It is an matrix are then used to detect the keypoints. A neighbourhood optimisation algorithm that uses running averages of the two around the key point is selected and divided into sub-regions and gradients and other moments of the gradients [13]. then for each sub-region the wavelet responses are taken and For the activation function, we use the logistic or sigmoid represented to get SURF feature descriptor [1, 4]. In the end, each activation function. This determines how nodes in the network image is again represented with a set of keypoints, which are layer convert a weighted sum of input data into output data. The described with vectors. logistic or sigmoid activation function accepts any real value as input and the output values are from 0 to 1 [12]. 3.3 Bag of Visual Words Support vector machines (SVM) is a discriminant technique The bag of visual words (BoVW) approach can be used for which means that the classification function takes a data point transforming or tokenizing keypoint-based image features, such and assigns it to one of the different classes of the classification as SIFT or SURF, into a fixed number of features, which is task. SVM transform the original data with a kernel function in a typically required by supervised machine learning methods. At hyperspace, and then tries to find a hyperplane that distinguishes first generates a visual word vocabulary from a (training) set of the two classes optimally. This hyperplane is defined with images, and then describes each image with these visual words. support vectors and distances between support vectors are The visual word vector of an image contains the presence or maximised. SVM is very effective method for high dimensional absence information of each visual word in the image. In case of problems [2, 14]. SIFT or SURF keypoints, for example, the visual word vector contains numbers of keypoints in an image that are similar to a 3.5 Our Method given visual word. The process for extracting BoVW features Our approach for iris image classification is a based on the bag from images involves the following steps: automatically detect of visual words model, and we use either SIFT or SURF regions or points of interest, compute local descriptors over those algorithm for image keypoint detection. In the training phase we points (in our case, this means employing SIFT or SURF perform the following steps. algorithm), quantize the descriptors into words to form the visual vocabulary, for example with a clustering algorithm, and find the occurrences in the image of each specific visual word in the 48 Iris recognition based on SIFT and SURF feature detection Information Society 2020, 5–9 October 2020, Ljubljana, Slovenia 1. For each image 𝑖, the SIFT or SURF algorithm is run, In our experiments we have used available Python which detects 𝐾𝑖 keypoints (each keypoint has 𝐷 = implementations of included algorithms (scikit-learn for machine 128 dimensions). learning) with their default parameters, except the following: 2. We collect keypoints from all training images, that is,  k-means: k=500, ∑𝑛 𝐾 𝑖=1 𝑖 keypoints.  kNN: k=15, Euclidean metric, 3. We cluster the above set of keypoints with the k-means  RF: number of estimators = 100, clustering algorithm. Based on preliminary experimets  we decided to use k = 500. The clusters, or their SVM: linear kernel function,  centroids, represent the visual words for our problem NN: "adam" solver function, 8 hidden layers and 8 of iris recognition. neurons, "logistic" activation function. The classification accuracy was evaluated with 5-fold 4. Now, we use the clustering model to assign each stratified cross validation. The results are presented separately keypoint in an image to its nearest centroid (visual for the small and big Ubiris.v1 datasets in Table 1 and 2, word) and sum up the occurrences of these visual respectively. words for each image. We end up with image descriptions, where each image is described with a vector of length k. Table 1: Classification accuracy on the small dataset with 5. The dataset derived in the previous step can now be standard deviation used to train a classification model with an arbitrary machine learning method. In our experiments, we have classifier/keypoint method SIFT SURF used four methods: k-Nearest Neighbor, Random Forest, Support Vector Machines and Neural kNN 0,37 ± 0,0 0,46 ± 0,0 Networks. RF 0,43 ± 0,06 0,63 ± 0,0 In the classification phase, when we need to classify a new image, we need to perform three steps. SVM 0,67 ± 0,0 0,86 ± 0,0 NN 1. Run the SIFT or SURF algorithm on the new image to 0,63 ± 0,0 0,77 ± 0,0 detect keypoints (analogous to step 1 in training). 2. Use the clustering model to assign each keypoint to its The baseline accuracy for the small data set is 0.14 (i.e., nearest centroid and sum up their occurrences to derive 1/number of classes=1/7), and in Table 1 we can see that all visual words vector (analogous to step 4 in training). instantiations of our method give better results than chance. The 3. Classify the image with the trained classification NN and SVM classifiers perform much better than RF and model. especially kNN. Comparing the keypoint detectors, we can see We have performed experiments with two keypoint detection that SURF gives consistently better results than SIFT, although algorithms (SIFT and SURF) and four classification algorithms the difference is not very large. The results on the big dataset are, (kNN, RF, SVM and NN), and the results are presented in the as expected, worse. The default accuracy in this case is 0.0079 next section. (i.e., 1/127), and again all instantiations of our method give better results than chance. Again, SVM and NN perform best, but for some reason, NN performs very poorly in combination with 4 RESULTS SURF keypoints. RF in this case performs only slightly worse For evaluating our approach, we have used the Ubiris.v1 dataset than SVM, while kNN is much worse. Also, on this data we can (http://iris.di.ubi.pt/ubiris1.html). It contains 1865 images of 200 see that SURF keypoints give somewhat better results than SIFT, x 150 resolution in 24-bit colours. They are grouped in two the only exception is NN, where SURF fails. subsets: the first contains 1205 images in 241 classes and the In summary we can conclude that for iris recognition the more second one contains 660 images in 132 classes. Images in the complex learning algorithms (SVM, NN) outperform simpler first subset have minimal noise factors, especially those related ones (kNN and even RF), and that the SURF algorithm slightly to reflections, luminosity, and contrast, because they were outperforms SIFT. However, we can also conclude that iris captured inside a dark room. The second subset of images was recognition is a hard problem, which would probably benefit collected in a less controlled setting to introduce natural from application of state-of-the-art deep learning approaches. luminosity variation. This resulted in more heterogeneous images with included reflections, contrast, luminosity and focus Table 2: Classification accuracy on the big dataset with problems. Images collected at this stage simulate the ones standard deviation captured by a vision system without or with minimal active participation from the subjects [7]. These two subsets of images do not have the same classes. classifier/keypoint method SIFT SURF For our experiments we used the examples belonging to a subset kNN 0,02 ± 0,025 0,06 ± 0,039 of all classes: for the small subset we have selected 7 (the first seven classes) and for the big subset we have selected 127 classes RF 0,1 ± 0,018 0,11 ± 0,014 (the first 127 classes). In the resulting datasets the examples were SVM 0,08 ± 0,039 0,13 ± 0,014 evenly distributed among the selected classes. NN 0,17 ± 0,01 0,25 ± 0,005 49 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Trpin and Ženko To investigate whether any of the observed differences is additional feature extractors, like Oriented FAST and Rotated statistically significant, we applied Friedman and Nemenyi tests BRIEF (ORB) or Local Binary Pattern (LBP), and, given their as recommended in [8]. The results in the form of an average success in image recognition in general, also convolutional rank diagram with the estimated critical distance is presented in neural networks approaches. With the latter, we will be Figure 1 for big dataset and Figure 2 for small dataset. especially interested in evaluating and comparing the performance vs. computational cost trade off. REFERENCES [1] Ali, H.S., Ismail, A.I., Farag, F.A. 2016. Speeded up robust features for efficient iris recognition. SIViP 10, 1385–1391 (2016). [2] Atul Bansal, Ravinder Agarwal and R. K. Sharma, "SVM Based Gender Classification Using Iris Images," 2012 Fourth International Conference on Computational Intelligence and Communication Networks, 2012, pp. 425-429. Figure 1: Average rank diagram with the estimated critical [3] David G. Lowe. 2004. Distinctive Image Features from Scale-Invariant distance for the evaluated methods (small dataset) keypoints. International Journal of Computer Vision, 60, 2, pp. 91-110. [4] Ebrahim Karami, Siva Prasad, Mohamed Shehata. 2017. Image Matching Using SIFT, SURF, BRIEF and ORB: Performance Comparison for Distorted Images. Newfoundland Electrical and Computer Engineering Conference. [5] Hájek J., Drahanský M. 2019. Recognition-Based on Eye Biometrics: Iris and Retina. In: Obaidat M., Traore I., Woungang I. (eds) Biometric-Based Physical and Cybersecurity Systems. Springer, Cham. [6] Ioan Păvăloi and Anca Ignat. 2019. Iris Image Classification Using SIFT Features. 23rd International Conference on Knowledge-Based Systems and Intelligent Information & Engineering Systems, Elsevier. 159 (2019) Figure 2: Average rank diagram with the estimated critical 241–250. distance for the evaluated methods (big dataset) [7] Hugo Pedro Proença and Luís A. Alexandre. 2005. UBIRIS: A noisy iris image database. 13th International Conference on Image Analysis and Processing - ICIAP 2005, Springer, (Sept, 2005) 970-977. The critical value for the eight classifiers and a confidence [8] Janez Demšar. 2006. Statistical Comparisons of Classifiers over Multiple level of 0.05 is 3.031, the critical distance is CD = 4.695605. Data Sets. J. Mach. Learn. Res., 7, 1-30. Based on the size of CD we can only claim that the top of [9] Jiawei Han, Micheline Kamber and Jian Pei. 2012. Data Mining: Concepts and Techniques. (3rd ed.). The Morgan Kaufmann. ranked methods and significantly better than the low ranked ones. [10] John Daugman. 2004. How iris recognition works. IEEE Trans Cir-cuits For example, NN-SURF, NN-SIFT and SVM-SURF are better Syst Video Technol 14(1): 21–30. [11] Leo Breiman. 2001. Random forests. Machine Learning, 45(1), 5-32. than KNN-SIFT. On the other hand, the differences among [12] Saša Adamović, Vladislav Miškovic, Nemanja Maček, Milan neighboring methods on the diagram are not significant. Milosavljević, Marko Šarac, Muzafer Saračević, Milan Gnjatović. 2020. An efficient novel approach for iris recognition based on stylometric features and machine learning techniques, Future Generation Computer Systems, 107 (2020), 144-157. 5 CONCLUSION [13] Shervin Minaee and Abdolrashidi Amirali. 2019. DeepIris: Iris Recognition Using a Deep Learning Approach. The paper presents an evaluation of a typical bag of visual words [14] Sushilkumar S. Salve and S. P. Narote. 2016. Iris recognition using SVM approach on a specific dataset for human iris recognition. The and ANN. 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), 474-478. results show that iris recognition is a relatively hard task and in [15] Wadhah Ayadi, Wajdi Elhamzi, Imen Charfi, Mohamed Atri. 2019. A order to improve the accuracy we would need a dataset with more hybrid feature extraction approach for brain MRI classification based on Bag-of-words. Biomedical Signal Processing and Control, 48, 144-152. examples of each class. In the future work we plan to evaluate 50 Analyzing the Diversity of Constrained Multiobjective Optimization Test Suites Aljoša Vodopija Tea Tušar Bogdan Filipič aljosa.vodopija@ijs.si tea.tusar@ijs.si bogdan.filipic@ijs.si Jožef Stefan Institute and Jožef Stefan Institute and Jožef Stefan Institute and Jožef Stefan International Jožef Stefan International Jožef Stefan International Postgraduate School Postgraduate School Postgraduate School Jamova cesta 39 Jamova cesta 39 Jamova cesta 39 Ljubljana, Slovenia Ljubljana, Slovenia Ljubljana, Slovenia ABSTRACT In this study, we employ the landscape features proposed in [13] to express and discuss the diversity of frequently used test A well-designed test suite for benchmarking novel optimizers suites of CMOPs. This is achieved by firstly computing the land- for constrained multiobjective optimization problems (CMOPs) scape features and then employing the t-distributed Stochastic should be diverse enough to detect both the optimizers’ strengths Neighbor Embedding (t-SNE), a dimensionality reduction tech- and shortcomings. However, until recently there was a lack of nique, to embed the 29-D CMOP feature space into the 2-D space. methods for characterizing CMOPs, and measuring the diversity Note that due to space limitations, only selected results are shown of a suite of problems was virtually impossible. This study utilizes 1 in this paper. The complete results can be found online . the landscape features proposed in our previous work to charac- The rest of this paper is organized as follows. Section 2 pro-terize frequently used test suites for benchmarking optimizers in vides the theoretical background. In Section 3, we present the solving CMOPs. In addition, we apply the t-distributed Stochastic landscape features and the t-SNE algorithm. Section 4 is dedi-Neighbor Embedding (t-SNE) dimensionality reduction approach cated to the experimental setup, while the results are discussed in to reveal the diversity of these test suites. The experimental re-Section 5. Finally, Section 6 summarizes the study and provides sults indicate which ones express sufficient diversity. an idea for future work. KEYWORDS 2 THEORETICAL BACKGROUND constrained multiobjective optimization, benchmarking, land- A CMOP can be formulated as: scape feature, t-SNE minimize 𝑓 (𝑥 ), 𝑚 = 1, . . . , 𝑀 𝑚 (1) 1 INTRODUCTION subject to 𝑔 (𝑥 ) ≤ 0, 𝑖 = 1, . . . , 𝐼 𝑖 Real-world optimization problems frequently involve multiple where 𝑥 = (𝑥 ) is a search vector, : 1, . . . , 𝑥 𝑓 𝑆 → 𝐷 𝑚 R are objective 𝐷 objectives and constraints. These problems are called constrained functions, 𝑔 : 𝑆 → is a search 𝑖 R constraint functions, 𝑆 ⊆ R multiobjective optimization problems (CMOPs) and have been space of dimension 𝐷, and 𝑀 and 𝐼 are the numbers of objectives gaining a lot of attention in the last years [13]. As with other and constraints, respectively. theoretically-oriented optimization studies, a crucial step in test-If a solution 𝑥 satisfies all the constraints, 𝑔 (𝑥 ) ≤ 0 for 𝑖 = 𝑖 ing novel algorithms in constrained multiobjective optimization 1, . . . , 𝐼 , then it is a feasible solution. For each of the constraints is the preparation of a benchmark test. 𝑔 we can define the constraint violation as 𝑣 (𝑥 ) = max(0, 𝑔 (𝑥 )). 𝑖 𝑖 𝑖 One of the key elements of a benchmark test is the selection of In addition, an overall constraint violation is defined as suitable test CMOPs [1]. A well-designed benchmark suite should 𝐼 Õ include “a wide variety of problems with different characteris- 𝑣 (𝑥 ) = 𝑣 (𝑥 ) . (2) 𝑖 tics” [1]. This way the benchmark problems are diverse enough 𝑖 to “highlight the strengths as well as weaknesses of different A solution 𝑥 is feasible iff 𝑣 (𝑥 ) = 0. algorithms” [1]. However, until recently there existed only few A feasible solution 𝑥 ∈ 𝑆 is said to dominate a solution 𝑦 ∈ 𝑆 if and limited techniques proposed to explore CMOPs [13]. For this 𝑓 (𝑥 ) ≤ 𝑓 (𝑦) for all 1 ≤ 𝑚 ≤ 𝑀, and 𝑓 (𝑥 ) < 𝑓 (𝑦) for at least 𝑚 𝑚 𝑚 𝑚 reason, the test suites of CMOPs were insufficiently understood ∗ one 1 ≤ 𝑚 ≤ 𝑀 . In addition, 𝑥 ∈ 𝑆 is a Pareto-optimal solution and measuring their diversity was virtually impossible. ∗ if there exists no 𝑥 ∈ 𝑆 that dominates 𝑥 . All feasible solutions To overcome this situation, in our previous work [13], we represent a feasible region, 𝐹 = {𝑥 ∈ 𝑆 | 𝑣 (𝑥 ) = 0}. Besides, experimented with various exploratory landscape analysis (ELA) all nondominated feasible solutions form a Pareto-optimal set, techniques and proposed 29 landscape features to characterize 𝑆 . The image of the Pareto-optimal set is the Pareto front, = o 𝑃o CMOPs, including their violation landscapes—a similar concept {𝑓 (𝑥 ) | 𝑥 ∈ 𝑆 }. A connected component (a maximal connected o as the fitness landscape where fitness is replaced by the overall subset with respect to the inclusion order) of the feasible region constraint violation. is called a feasible component, F ⊆ 𝐹 . In [13], we introduced analogous terms from the perspective Permission to make digital or hard copies of part or all of this work for personal of the overall constraint violation. A local minimum-violation or classroom use is granted without fee provided that copies are not made or ∗ distributed for profit or commercial advantage and that copies bear this notice and solution is thus a solution 𝑥 for which exists a 𝛿 > 0 such the full citation on the first page. Copyrights for third-party components of this ∗ ∗ that 𝑣 (𝑥 ) ≤ 𝑣 (𝑥 ) for all 𝑥 ∈ {𝑥 | 𝑑 (𝑥 , 𝑥 ) ≤ 𝛿 }. If there is work must be honored. For all other uses, contact the owner /author(s). ∗ ∗ Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia no other solution 𝑥 ∈ 𝑆 for which 𝑣 (𝑥 ) > 𝑣 (𝑥 ), then 𝑥 is a © 2021 Copyright held by the owner/author(s). 1 https://vodopijaaljosa.github.io/cmop-web/ 51 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Vodopija, et al. (global) minimum-violation solution Table 1: The ELA features used to characterize CMOPs cat- . We denoted the set of all egorized into four groups: space-filling design, informa- local minimum-violation solutions by 𝐹 and called a connected l tion content, random walk, and adaptive walk [13]. component M ⊆ 𝐹 a local minimum-violation component. l In order to express the modality of a violation landscape, we defined a local search procedure to be a mapping from the search Space-filling design features space to the set of local minimum-violation solutions, 𝜇 : 𝑆 → 𝐹 , 𝑁 l F Number of feasible components such that 𝜇 (𝑥 ) = 𝑥 for all 𝑥 ∈ 𝐹 . A basin of attraction of a local F Smallest feasible component l min minimum-violation component M and local search 𝜇 is then a F Median feasible component med subset of 𝑆 in which 𝜇 converges towards a solution from M , F Largest feasible component max i.e., B (M ) = {𝑥 ∈ 𝑆 | 𝜇 (𝑥 ) ∈ M }. The violation landscape is O (F ) Proportion of Pareto-optimal solutions in F max max unimodal if there is only one basin in 𝑆 and multimodal otherwise. F Size of the “optimal” feasible component opt 𝜌 Feasibility ratio F 3 METHODOLOGY 𝜌 Minimum correlation min 3.1 ELA Features 𝜌 Maximum correlation max 𝜌 Proportion of boundary Pareto-optimal solutions 𝜕𝑆𝑜 The landscape features used in this study were introduced in our Information content features previous work [13] and can be categorized into four groups: space- 𝐻 Maximum information content max filling design, information content, random walk and adaptive 𝜀 Settling sensitivity 𝑠 walk features. They are summarized in Table 1. 𝑀 Initial partial information 0 The space-filling design features are used to quantify the fea- Random walk features sible components, the relationship between the objectives and (𝜌 ) Minimal ratio of feasible boundary crossings 𝜕𝐹 min constraints, and measure the feasibility ratio and proportion of (𝜌 ) Median ratio of feasible boundary crossings 𝜕𝐹 med boundary Pareto-optimal solutions. Next, the information con- (𝜌 ) Maximal ratio of feasible boundary crossings 𝜕𝐹 max tent features are mainly used to express the smoothness and Adaptive walk features ruggedness of violation landscapes. They are derived by ana- 𝑁 B Number of basins lyzing the entropy of sequences of overall violation values as B Smallest basin min obtained from a random sampling of the search space. Then, the B Median basin med random walk features considered in this study are used to quan- B Largest basin max tify the number of boundary crossings from feasible to infeasible (B ) Smallest feasible basin F min regions. They are used to categorize the degree of segmentation (B ) Median feasible basin F med of the feasible region. Finally, features from the last group are (B ) Largest feasible basin F max derived from adaptive walks through the search space. They are ∪B Proportion of feasible basins F used to describe various aspects of basins of attraction in the 𝑣 (B) Median constraint violation over all basins med violation landscapes. 𝑣 (B) Maximum constraint violation of all basins max 3.2 Dimensionality Reduction with t-SNE 𝑣 (B ) Constraint violation of B max max O (B ) Proportion of Pareto-optimal solutions in B max max The t-SNE algorithm is a popular nonlinear dimensionality re- B Size of the “optimal” basin opt duction technique designed to represent high-dimensional data in a low-dimensional space, typically the 2-D plane [12]. First, it converts similarities between data points to distributions. Then, MW [9]. In addition, we included also a novel suite named RCM [6]. it tries to find a low-dimensional embedding of the points that In contrast to other suites which consist of artificial test prob- minimizes the divergence between the two distributions that lems, RCM contains 50 instances of real-world CMOPs based measure neighbor similarity—one in the original space and the on physical models. Note that we actually used only 11 RCM other in the projected space. This means that t-SNE tries to pre- problems, since only continuous and low-dimensional problems serve the local relationships between neighboring points, while were suitable for our analysis. We considered three dimensions of the global structure is generally lost. the search space: 2, 3, 5. It is to be noted that large-scale CMOPs Finding the best embedding is an optimization problem with were not taken into account since the methodology described a non-convex fitness function. To solve it, t-SNE uses a gradient in Section 3 is not sufficiently scalable. This limits our results to descent method with a random starting point, which means that low-dimensional CMOPs. Table 2 shows the basic characteristics different runs can yield different results. The output of t-SNE of the studied test suites. depends also on other parameters, such as the perplexity (similar For dimensionality reduction, we used the t-SNE implemen- to the number of nearest neighbors in other graph-based dimen- tation from the scikit-learn Python package [10] with default sionality reduction techniques), early exaggeration (separation of parameter values. That is, we used the Euclidean distance metric, clusters in the embedded space) and learning rate (also called 𝜀). random initialization of the embedding, perplexity of 30, early The gradients can be computed exactly or estimated using the exaggeration of 12, learning rate of 200, the maximum number of Barnes-Hut approximation, which substantially accelerates the iterations of 1000, and the maximum number of iterations with- method without degrading its performance [11]. out progress before aborting of 300. The gradient was computed by the Barnes-Hut approximation with the angular size of 0.5. 4 EXPERIMENTAL SETUP 5 RESULTS AND DISCUSSION We studied eight suites of CMOPs which are most frequently used in the literature. These are CTP [2], CF [14], C-DTLZ [5], The results obtained by t-SNE are shown in Figures 1 and 2. NCTP [7], DC-DTLZ [8], LIR-CMOP [3], DAS-CMOP [4], and Specifically, the figures show the 2-D embedding of the 29-D 52 Analyzing the Diversity of Constrained Multiobjective Optimization Test Suites Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Table 2: Characteristics of test suites: number of problems, represent severe multimodality since it contains no problems dimension of the search space 𝐷, number of objectives from the green region (Figure 2d). On the other hand, DC-DTLZ, 𝑀 , and number of constraints 𝐼 . The characteristics of se- LIR-CMOP, and MW are biased towards highly multimodal viola- lected RCM problems are shown in parentheses. tion landscapes or those with small basins of attraction (Figure 2e, Figure 2g, and Figure 2h). Nevertheless, MW is one of the most Test suite #problems 𝐷 𝑀 𝐼 diverse suites considering other characteristics (Figure 2h). CTP [2] 8 * 2 2, 3 The C-DTLZ and DAS-CMOP suites are mainly located in the CF [14] 10 * 2, 3 1, 2 green and orange regions and fail to sufficiently represent the C-DTLZ [5] 6 * * 1, * characteristics of the red and blue regions. NCTP [7] 18 * 2 1, 2 Finally, the results show that CF and RCM are well spread DC-DTLZ [8] 6 * * 1, * through the whole embedded feature space (Figure 2b and Fig-DAS-CMOP [4] 9 * 2, 3 7, 11 ure 2i). As we can see, they have at least one representative CMOP LIR-CMOP [3] 14 * 2, 3 2, 3 instance in each region. Therefore, CF and RCM are the most MW [9] 14 * 2, * 1–4 diverse test suites according to the employed landscape features. RCM [6] 50 (11) 2–34 (2–5) 2–5 1–29 (1–8) *Scalable parameter. 6 CONCLUSIONS In this paper, we analyzed the diversity of the frequently used test suites for benchmarking optimizers in solving CMOPs. For this purpose, we considered 29 landscape features for CMOPs that were proposed in our previous work. In addition, the t-SNE algorithm was used to reduce the dimensionality of the feature space and reveal the diversity of the considered test suites. The experimental results show that the most diverse test suites of CMOPs according to the applied landscape features are CF and RCM. Indeed, they include the widest variety of CMOPs with different characteristics. In addition, MW also proved to be a di- verse suite except for unimodal CMOPs. Nevertheless, we suggest to consider CMOPs from various test suites for benchmarking optimizers in constrained multiobjective optimization. One of the main limitations of our study is that only low- dimensional CMOPs were used in the analysis. Therefore, we Figure 1: Embedding of the feature space as obtained by t- were unable to adequately address the issue of scalability. For this SNE. The four regions are depicted in green, red, blue, and reason, a crucial task that needs to be addressed in the feature is orange. The points that are not contained in any region the extension of this work to large-scale CMOPs. are considered to be outliers. ACKNOWLEDGMENTS We acknowledge financial support from the Slovenian Research feature space consisting of the landscape features presented in Agency (young researcher program and research core funding Table 1. Each subfigure in Figure 2 corresponds to one of the no. P2-0209). This work is also part of a project that has received test suites. For example, Figure 2a exposes the embedding of the funding from the European Union’s Horizon 2020 research and CTP suite in blue, while the gray points correspond to the rest innovation program under Grant Agreement no. 692286. of the test suites. Points with a shape of a plus (+) correspond to CMOPs with two variables, points with a shape of a triangle REFERENCES (▲) to CMOPs with three variables, and points with a shape of a [1] T. Bartz-Beielstein, C. Doerr, J. Bossek, S. Chandrasekaran, pentagon ( ) to CMOPs with five variables. T. Eftimov, A. Fischbach, P. Kerschke, M. López-Ibáñez, K. An additional analysis shows that the embedding of the fea- M. Malan, J. H. Moore, B. Naujoks, P. Orzechowski, V. Volz, ture space can be, based on the corresponding characteristics, M. Wagner, and T. Weise. Benchmarking in optimization: split into four regions: green, red, blue and yellow (Figure 1). Best practice and open issues. arXiv:2007.03488v2, (2020). The green region corresponds to CMOPs with severe violation [2] K. Deb, A. Pratap, and T. Meyarivan. 2001. Constrained test multimodality, small basins of attraction, and rugged violation problems for multi-objective evolutionary optimization. landscapes. The red region corresponds to CMOPs with mod- In Evolutionary Multi-Criterion Optimization (EMO 2001), erate violation multimodality, rugged violation landscapes, and 284–298. small feasibility ratios. The blue region corresponds to relatively [3] Z. Fan, W. Li, X. Cai, H. Huang, Y. Fang, Y. You, J. Mo, C. low violation multimodality, rugged violation landscapes, small Wei, and E. Goodman. 2019. An improved epsilon constraint- feasibility ratios, and positive correlations between objectives handling method in MOEA/D for CMOPs with large in- and constraints. Finally, the yellow region corresponds to uni- feasible regions. Soft Comput., 23, 23, 12491–12510. doi: modal CMOPs with large feasible components, smooth violation 10.1007/s00500- 019- 03794- x. landscapes, and large feasible regions. [4] Z. Fan, W. Li, X. Cai, H. Li, C. Wei, Q. Zhang, K. Deb, and As we can see from Figure 2a, almost all CTP problems are E. Goodman. 2019. Difficulty adjustable and scalable con-located in the orange region. Therefore, many relevant character- strained multiobjective test problem toolkit. Evol. Comput., istics are poorly represented by CTP, e.g., violation multimodality, 28, 3, 339–378. doi: 10.1162/evco- a- 00259. small feasibility ratios, etc. Similarly, NCTP fails to sufficiently 53 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Vodopija, et al. (a) CTP (b) CF (c) C-DTLZ (d) NCTP (e) DC-DTLZ (f) DAS-CMOP (g) LIR-CMOP (h) MW (i) RCM Figure 2: Embedding of the feature space as obtained by t-SNE. Each subfigure exposes the embedding of a selected suite. [5] H. Jain and K. Deb. 2014. An evolutionary many-objective [9] Z. Ma and Y. Wang. 2019. Evolutionary constrained mul- optimization algorithm using reference-point based non- tiobjective optimization: Test suite construction and per- dominated sorting approach, Part II: Handling constraints formance comparisons. IEEE Trans. Evol. Comput., 23, 6, and extending to an adaptive approach. IEEE Trans. Evol. 972–986. doi: 10.1109/TEVC.2019.2896967. Comput., 18, 4, 602–622. doi: 10.1109/TEVC.2013.2281534. [10] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. [6] A. Kumar, G. Wu, M. Z. Ali, Q. Luo, R. Mallipeddi, P. N. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, Suganthan, and S. Das. 2020. A Benchmark-Suite of Real- V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. World Constrained Multi-Objective Optimization Prob- Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: lems and some Baseline Results. Technical report. Indian machine learning in Python. J. Mach. Learn. Res., 12, 2825– Institute of Technology, Banaras Hindu University Cam- 2830. pus, India. [11] L. van der Maaten. 2014. Accelerating t-SNE using tree- [7] J. P. Li, Y. Wang, S. Yang, and Z. Cai. 2016. A comparative based algorithms. J. Mach. Learn. Res., 15, 1, 3221–3245. study of constraint-handling techniques in evolutionary [12] L. van der Maaten and G. Hinton. 2008. Visualizing data constrained multiobjective optimization. In IEEE Congress using t-SNE. J. Mach. Learn. Res., 9, 2579–2605. on Evolutionary Computation (CEC 2016), 4175–4182. doi: [13] A. Vodopija, T. Tušar, and B. Filipič. Characterization 10.1109/CEC.2016.7744320. of constrained continuous multiobjective optimization [8] K. Li, R. Chen, G. Fu, and X. Yao. 2019. Two-archive evo- problems: A feature space perspective. arXiv:2109.04564, lutionary algorithm for constrained multiobjective opti- (2021). mization. IEEE Trans. Evol. Comput., 23, 2, 303–315. doi: [14] Q. Zhang, A. Zhou, S. Zhao, P. N. Suganthan, W. Liu, and 10.1109/TEVC.2018.2855411. S. Tiwari. 2008. Multiobjective optimization test instances for the CEC 2009 special session and competition. Techni- cal report CES-487. The School of Computer Science and Electronic Engieering, University of Essex, UK. 54 Corpus KAS 2.0: Cleaner and with New Datasets Aleš Žagar, Matic Kavaš, Marko Robnik-Šikonja University of Ljubljana, Faculty of Computer and Information Science Ljubljana, Slovenia {ales.zagar,matic.kavas,marko.robnik}@fri.uni-lj.si ABSTRACT wrongly marked to contain both abstracts or switched Slovene Corpus of Academic Slovene (KAS) contains Slovene BSc/BA, and English abstracts. Several entries did not contain the abstract; MSc/MA, and PhD theses from 2000 - 2018. We present a cleaner instead, there was front or back matter like copyright statement, version of the corpus with added text segmentation and updated table of contents, list of abbreviations etc. POS-tagging. The updated corpus of abstracts contains fewer Our analysis has shown that the corpora can be improved in artefacts. Using machine learning classifiers, we filled in miss- many aspects. Besides addressing the above-mentioned weak- ing research field information in the metadata. We used the full nesses, the main improvements in the updated KAS 2.0 and KAS- texts and corresponding abstracts to create several new datasets: Abs 2.0 corpora are chapter segmentation and improved meta- monolingual and cross-lingual datasets for long text summariza- data with machine learning methods (described in Sections 2 and tion of academic texts and a dataset of aligned sentences from 3). A further motivation for our work is the opportunity to ex-abstracts in English and Slovene, suitable for machine transla- tract valuable new datasets for text summarization (monolingual tion. We release the corpora, datasets, and developed source code and cross-lingual) and a sentence-aligned machine translation under a permissible licence. dataset created from matching Slovene and English abstracts (see Section 4). We present conclusions and ideas for further KEYWORDS improvements in Section 5. KAS corpus, academic writing, machine translation, text summa- rization, CERIF classification 2 UPDATES: KAS 2.0 AND KAS-ABS 2.0 1 INTRODUCTION We first describe methods for extracting text and abstracts from The Corpus of Academic Slovene (KAS 1.0)1 is a corpus of Slove-PDF, followed by the differences between the versions 1.0 and nian academic writing gathered from the digital libraries of Slove-2.0 of corpora. nian higher education institutions via the Slovenian Open Science portal2 [3]. It consists of diploma, master, and doctoral theses from Slovenian institutions of higher learning (mostly from the 2.1 Extraction of Text Body University of Ljubljana and the University of Maribor). It contains 82,308 texts with almost 1.7 billion tokens. As many texts in corpora version 1.0 contained several hard to The KAS texts were extracted from the PDF formatted files, fix faults (like gibberish due to extracted tables and figures), we which are not well-suited for the acquisition of high-quality raw decided to extract texts once again from the PDFs. We used the texts. For that reason, the KAS corpus is noisy. Our analysis pdftotext tool, which is a part of the poppler-utils. The software showed that most original texts contain tables, images, and other proved to be accurate and reliable. Its important feature is keeping kinds of figures which are transformed into gibberish when con-the original text layout and excluding the areas where we detected verted from the PDF format. The extracted figure captions also figures, tables, and other graphical elements. do not give any helpful information. Some texts contain front or In the first step, we converted PDF files to images, one page back matter (for example, a table of contents at the beginning at a time and used the OpenCV computer vision library to detect or references at the end), which shall not be present in the main text and non-text areas. We marked the text areas on each page. text body. For each document, we also calibrated the size of the header and The Corpus of KAS abstracts (KAS-Abs 1.0)3 contains 47,273 footer areas and removed them from the text areas together with only Slovene, 49,261 only English, and 11,720 abstracts in both the page numbers. In this process, we removed 2,467 out of the languages. We observed several shortcomings of this corpus. A original 91,019 documents due to the documents containing less vast majority of abstracts contain keywords or the word "Ab- than 15 pages or some unchecked exceptions in the code. stract" somewhere in the abstract text. Many texts contain other Next, we searched for the beginning and the end of the main kinds of meta-information, e.g., the name of the author or super- text body. We observed that practically all bodies start with some visor and the title of the thesis. Several corpus entries contain variation of the Slovene word "Uvod" (i.e. introduction). If we English and Slovene abstracts in the same unit, only one of them found the beginning, we searched for the ending in the same way but with different keywords (viri, literatura, povzetek, etc). For 1https://www.clarin.si/repository/xmlui/handle/11356/1244 texts with found beginning and end, the areas were clipped and 2https://www.openscience.si/ 3https://www.clarin.si/repository/xmlui/handle/11356/1420 the extracted texts were normalized. The normalization included handling Slovene characters with the caret (č, š, ž), ligattures Permission to make digital or hard copies of part or all of this work for personal (tt, ff, etc.), removal of remaining figure and table captions, and or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and empty lines. The obtained text was segmented into the structure the full citation on the first page. Copyrights for third-party components of this extracted from the table of contents. We matched headings in the work must be honored. For all other uses, contact the owner/author(s). text with the entries in the table of contents and used page num- Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia © 2020 Copyright held by the owner/author(s). bers as guidelines. We ended with 83,884 successfully extracted documents. 55 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Žagar, Kavaš and Robnik-Šikonja 2.2 Extraction of Abstracts are not mutually exclusive. Thus, we tackle a multi-label classifi-We tried to improve the KAS-abstracts corpus by cleaning the cation problem. In the corpus, there are 13,738 documents with existing documents and extracting the abstracts directly from high confidence levels of CERIF codes which we use in machine the PDFs. An initial analysis of existing texts showed different learning. Our dataset contains 64 labels out of 363 possible. We formattings (71 different organizations publish the works in the used 10% or 1374 samples as the test set and the remaining 90% KAS corpus). We identified five major patterns of problems and as the training set. created scripts for resolving them. This produced approximately As several studies have shown that recent neural embedding 40,000 cleaned texts while 20,000 were still problematic. The approaches are not yet competitive with standard text repre- direct extraction from the PDFs followed the same procedure as sentations in document level tasks, we decided to use standard for the main text body (described above). We considered figures, Bag-of-Words representation with TF-IDF weighting. In the pre- headers, footers, page numbers, keywords, meta-information, processing step, we lemmatized texts using CLASSLA lemma- abstract placement at the beginning and end of the documents, tizer5 and removed stop-words6 and punctuation. multiple abstracts of different lengths, etc. This resulted in 71,567 We compared four classifiers. For logistic regression (LR), k- collected Slovene abstracts. A similar procedure was applied to nearest neighbours (KNN), and support vector machines (SVM), English abstracts and yielded 53,635 abstracts. we used Scikit-learn [6], and for the multi-layer perceptron (MLP), we tried Keras implementation. For the first three, we prelimi-2.3 Differences from Version 1.0 to 2.0 nary tried several different parameter values but found that they Besides cleaner texts, excluded gibberish from figures and ta- perform the best with the default ones. The MLP neural network bles, and excluded front- and back-matter, the most important consists of one hidden layer with 256 units, sigmoid activation difference between KAS versions 1.0 and 2.0 is that the texts are function on hidden and output layers, Adam optimizer [5] with segmented by structure, i.e. by headings. Unfortunately, some an initial learning rate of 0.01, and binary cross-entropy as a loss documents present in the original KAS were lost due to the dif-function. We used the early stopping (5 consecutive epochs with ferent extraction, and for some documents appearing only in no improvement) and reduced the learning rate on the plateau version 2.0, there is no metadata. (halving learning rate for every 2 epochs with no improvement) KAS-abstracts is greatly improved and no longer contains large as callbacks during the learning process. quantities of unusable text and different artefacts (e.g., metadata, In Table 2, we report pattern accuracy and binary accuracy keywords, or front- and back-matter). Again, for some abstracts of the trained classifiers. A model predicts a correct pattern if present only in version 2.0, there is no metadata. Still, they are it assigned all true sub-CERIF codes to a document. For binary usable for several tasks, including machine translation studies. accuracy, a model predicts a sub-CERIF code correctly if it assigns Table 1 gives the quantitative overview of the obtained body texts a true single sub-CERIF code to the document. For example, let and abstracts. us assume that we have four sub-CERIF codes and an example with a label sequence ’1010’. If a model predicts ’1010’, it receives 100% for both pattern and binary accuracy. If a model predicts Table 1: Statistics of the obtained body texts and abstracts ’0010’, it gets 0% pattern accuracy and 75% binary accuracy since in version 2.0 of the KAS corpora. it misclassified only the first label. Sum Same as Missing With in 1.0 from 1.0 metadata Table 2: Results on the sub-CERIF multi-label classifica- Slo abstracts 71,567 56,610 2,383 67,533 tion task. The best result for each metric is in bold. Eng abstract 53,635 44,685 16,296 50,674 Body text 83,884 79,320 2,988 79,320 Algorithm Binary accuracy Pattern accuracy LR 98.48 38.36 KNN 98.52 43.75 SVM 98.68 47.82 3 SUB-CERIF CLASSIFICATION MLP 98.66 46.58 CERIF (Common European Research Information Format) is the standard that the EU recommends to member states for recording information about the research activity4. The top level has only Using the pattern accuracy metric, SVM and MLP are signifi-five categories (humanities, social sciences, physical sciences, cantly better than KNN and LR. LR is the worst performing model, biomedical sciences, and technological sciences). In comparison, and KNN is in the middle. SVM is the best, and MLP is behind for the lower level distinguishes 363 categories. As Slovene libraries 1.24 points. We assume that we do not have enough data for MLP use the UDC classification, in the KAS corpus 1.0, only 17% of the to beat SVM. It is difficult to assess the models regarding binary documents also contain the CERIF and sub-CERIF codes in their accuracy. In the test set, we have 761 examples with 1 label, 466 metadata. These are mapped from UDC codes by the heuristics with 2 labels, 107 with 3 labels, 26 with 4 labels, 10 with 5, and 4 produced by the Slovene Open Science Portal. Below, we describe with 6. A dummy model that predicts all zeros achieves binary how we automatically annotated documents with missing sub- accuracy of 97.51. All our models are better than this baseline, CERIF codes using a machine learning approach. and their ranks correspond with the pattern accuracy. We build a dataset for automatic annotation of sub-CERIF We conclude that given 64 labels and 10k training instances, codes from the body texts of the documents. A document may our best model (SVM) correctly predicts almost half of them, have more than one sub-CERIF code, which means that classes which is a useful result. 4https://www.dcc.ac.uk/resources/metadata-standards/cerif-common-european- 5https://github.com/clarinsi/classla research-information-format 6We used the list from https://github.com/stopwords-iso/stopwords-sl 56 Corpus KAS 2.0 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia 4 NEW DATASETS other sentence alignments in existing translation datatsets). How- We created two types of new datasets, described below: summa- ever, if one would prefer even more certain alignment, the value rization datasets and machine translation datasets. of the threshold can be further increased at the expense of less sentences in the datatset. We released three such datasets that 4.1 Summarization Datasets reflect a trade-off between quality and quantity of the data. The We created two new datasets appropriate for sizes of the obtained datasets are available in Table 3. long-text summariza- tion in the monolingual and cross-lingual settings. The monolin- Table 3: Size of the machine translation datasets based on gual slo2slo dataset contains 69,730 Slovene abstracts and Slovene the margin-based quality threshold. body texts and is suitable for training Slovene summarization models for long texts. The cross-lingual slo2eng dataset contains 52,351 Slovene body texts and English abstracts. It is suitable for Dataset Threshold Size Normal alignment 1.1 496,102 the cross-lingual summarization task. Strict alignment 1.2 474,852 Very strict alignment 1.3 425,534 4.2 Machine Translation Datasets For the creation of a sentence-aligned machine translation dataset, we used the neural approach proposed by Artetxe & Schwenk [1]. The main difference to other text alignment approaches is in 5 CONCLUSIONS using margin-based scoring of candidates in contrast to a hard In this work, we created version 2.0 of Corpus KAS and Corpus threshold with cosine similarity. We improved the approach by KAS-Abstracts. We cleaned the texts and abstracts, introduced the replacing the underlying neural model. Instead of BiLSTM-based text segmentation based on its structure, and improved the meta- LASER [2] representation, we used the transformer-based LaBSE data. We created two new long text summarization datasets and a [4] sentence representation, which has significantly improved dataset of aligned sentences for machine translations. The latest average bitext retrieval accuracy. We used the implementation versions of corpora and datasets are available on the CLARIN.SI. from UKPLab7. This approach requires a threshold that omits The corpora are annotated with the CLASSLA tool and released candidate pairs below a certain value. This value represents a in txt, JSON and TEI formats. The source code for producing trade-off between the quantity and quality of aligned pairs. The the new versions of the corpora8 and the created datasets are higher the threshold, the better the quality of alignments, but publicly available9 . more samples are discarded. In future work, the extraction of metadata for entries where In text alignment, sentences do not always exhibit one-to-one they are missing would be beneficial. There could be further im- mapping: a source sentence can be split into two or more target provements in cleaning the texts, and this would increase the sentences and vice versa. To address the problem, we iteratively number of available documents. When the corpora are extended ran the alignment process until all sentences above the chosen with data post-2018, the software might need further modifica- threshold were assigned to each other. In cases of more than one tions due to new formats and templates used in the academic sentence assigned to a single sentence, we merged them and thus works. Further experiments on the created MT datasets would created a translation pair. clarify the setting of parameters and show if current MT systems We manually inspected the alignments consisting of more than benefit more from better quality or larger quantity of data. one sentence in either source or target text on a small subset of abstracts. We observed that a merging process produces better ACKNOWLEDGMENTS results than imposing a restriction allowing only the one-to-one The research was supported by CLARIN.SI (2021 call), Slove- mapping. In Table 4, we present an example of the alignment. nian Research Agency (research core funding program P6-0411), The first column represents a margin-based score. If an aligned Ministry of Culture of Republic of Slovenia through project Devel- pair contains more than one sentence in the source or target, opment of Slovene in Digital Environment (RSDO), and European the score consists of the average margin-based score between Union’s Horizon 2020 research and innovation programme under a single sentence and multiple sentences. The last column is an grant agreement No 825153, project EMBEDDIA (Cross-Lingual indicator of whether merging was applied. Embeddings for Less-Represented Languages in European News We used the ratio variant of margin-based scoring and set the Media). We thank Tomaž Erjavec (JSI, Department of Knowledge default threshold to 1.1. We manually tested the alignment on our Technologies) for providing data access and his assistance in internal dataset. From 2015 examples, we successfully aligned building TEI format of the corpus. 2002 of them (99.3%), misaligned 1 (0.1%), and omitted 12 of them (0.6%). The analysis of 12 omitted cases showed that some pairs REFERENCES do not match each other or are not accurate translations of each [1] Mikel Artetxe and Holger Schwenk. 2019. Margin-based par- other, e.g., a large part of the original sentence is omitted, phrases allel corpus mining with multilingual sentence embeddings. are only distantly related, etc. However, approximately half of In Proceedings of the 57th Annual Meeting of the Association the 12 cases shall be aligned, which means that our model works for Computational Linguistics, 3197–3203. very well, but conservatively and may fail for free translation 8 pairs. https://github.com/korpus-kas 9KAS 2.0: https://www.clarin.si/repository/xmlui/handle/11356/1448 With the default value of the threshold (1.1), we produced KAS-Abs 2.0: https://www.clarin.si/repository/xmlui/handle/11356/1449 496.102 sentence pairs. We believe the threshold is strict enough Summarization datasets: https://www.clarin.si/repository/xmlui/handle/11356/1446 to produce good-quality dataset (especially if compared to many MT datasets: https://www.clarin.si/repository/xmlui/handle/11356/1447 7https://github.com/UKPLab/sentence-transformers/blob/master/examples/ applications/parallel-sentence-mining/bitext_mining.py 57 Information Society 2021, 4–8 October 2021, Ljubljana, Slovenia Žagar, Kavaš and Robnik-Šikonja Table 4: Examples from sentence-aligned Slovene-English abstracts. Score Slovene source sentence English target sentence Mrg 1.670 Moški pa pogosteje opravljajo opravila, ki se tičejo mehanizacije na Men, however, often perform tasks related to machinery on the farm. No kmetiji. 1.612 Zanimala nas je tudi prisotnost tradicionalnih vzorcev pri delu. Additionally, I have also focused on the presence of traditional work No patterns. 1.520 Želeli smo izvedeti, ali se kmečke ženske počutijo preobremenjene, cenjene I wanted to know whether rural women feel overwhelmed or valued, and No in kako preživljajo prosti čas (če ga imajo). how they spend their free time (if they have it). 1.441 Dotaknili smo se tudi problemov, s katerimi se srečujejo kmečke ženske Moreover, I have tackled the problems that rural women face when it No med javnim in zasebnim življenjem. comes to their public and private life. 1.437 Na koncu teoretičnega dela smo opisali še predloge za izboljšanje položaja At the end of the theoretical part, I have denoted further proposals for No kmečkih žensk v družbi. improving the situation of rural women in today’s society. 1.388 V diplomskem delu obravnavamo položaj žensk v kmečkih gospodinjstvih The thesis deals with the situation of women in rural households of No v Sloveniji. Slovenia. 1.354 V empiričnem delu pa smo s pomočjo anketnega vprašalnika, na katerega In the empirical part, I have conducted a survey on peasant women to No so kot respondentke odgovarjale kmečke ženske, ugotavljali, kako je delo determine the gender division of farm labour. na kmetiji porazdeljeno med spoloma. 1.271 V teoretičnem delu predstavljamo pojme, kot so gospodinja, kmečko In the theoretical part, I have presented the following concepts: Yes gospodinjstvo ter kmečka družina, kjer smo opisali tudi tipologijo kmečkih ˝housewife˝, ˝rural household˝ and ˝rural family˝. In addition, I have družin. described the typology of rural families. 1.249 V nadaljevanju smo predstavili tradicionalno dojemanje kmečkih žensk, I have explained the processes that have influenced the change in the Yes njihovo obravnavo skozi čas v slovenski literaturi, pojasnili smo procese, situation of rural women through history and focused on their work ki so vplivali na spremembo položaja kmečkih žensk skozi zgodovino ter se (working day, divison of labour, work evaluation). Furthermore, I have osredotočili na delo kmečkih žensk (delovni dan, delitev dela, vrednotenje shed light on the traditional perception of peasant women and their dela). treatment over time in Slovene literature. 1.217 Ugotovili smo, da so tradicionalni vzorci delitve dela na kmetiji še vedno Hence, the majority of work related to home and family (housework Yes prisotni, saj smo iz analize anket in literature ugotovili, da ženske opravl- and child-rearing) is performed by women. By analyzing the conducted jajo večino del vezanih na dom in družino, to pa so gospodinjska dela in survey and examining the literature, I have come to the conclusion that vzgoja otrok. the division of farm labour more or less still follows traditional patterns. [2] Mikel Artetxe and Holger Schwenk. 2019. Massively mul- [5] Diederik P Kingma and Jimmy Ba. 2014. Adam: a method tilingual sentence embeddings for zero-shot cross-lingual for stochastic optimization. In Internationmal Conference transfer and beyond. Transactions of the Association for on Representation Learning. Computational Linguistics, 7, 597–610. [6] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, [3] Tomaž Erjavec, Darja Fišer, and Nikola Ljubešić. 2021. The Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu KAS corpus of Slovenian academic writing. Language Re- Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, sources and Evaluation, 55, 2, 551–583. et al. 2011. Scikit-learn: Machine learning in Python. Journal [4] Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Naveen Ari- of machine learning research, 12, 2825–2830. vazhagan, and Wei Wang. 2020. Language-agnostic BERT sentence embedding. arXiv preprint arXiv:2007.01852. 58 Indeks avtorjev / Author index Andonovic Viktor ........................................................................................................................................................................... 7 Andova Andrejaana ...................................................................................................................................................................... 11 Anželj Gregor ............................................................................................................................................................................... 27 Arduino Alessandro...................................................................................................................................................................... 19 Batagelj Borut .............................................................................................................................................................................. 27 Boshkoska Biljana Mileva.............................................................................................................................................................. 7 Boškoski Pavle ............................................................................................................................................................................... 7 Boštic Matjaž ............................................................................................................................................................................... 23 Bottauscio Oriano ......................................................................................................................................................................... 19 Bovcon Narvika ........................................................................................................................................................................... 27 Cergolj Vincent ............................................................................................................................................................................ 15 De Masi Carlo M. ......................................................................................................................................................................... 15 Filipič Bogdan ........................................................................................................................................................................ 11, 51 Golob Ožbej ................................................................................................................................................................................. 19 Janko Vito .................................................................................................................................................................................... 23 Kavaš Matic ................................................................................................................................................................................. 55 Komarova Nadezhda .................................................................................................................................................................... 27 Kralj Novak Petra ......................................................................................................................................................................... 31 Lukan Junoš ................................................................................................................................................................................. 23 Luštrek Mitja .......................................................................................................................................................................... 15, 39 Pelicon Andraž ............................................................................................................................................................................. 31 Puc Jernej ..................................................................................................................................................................................... 35 Reščič Nina .................................................................................................................................................................................. 39 Robnik-Šikonja Marko ................................................................................................................................................................. 55 Sadikov Aleksander................................................................................................................................................................ 19, 35 Škrlj Blaž ...................................................................................................................................................................................... 31 Slapničar Gašper .......................................................................................................................................................................... 23 Solina Franc ................................................................................................................................................................................. 27 Stankoski Simon ........................................................................................................................................................................... 15 Susič David .................................................................................................................................................................................. 43 Trpin Alenka ................................................................................................................................................................................ 47 Tušar Tea ...................................................................................................................................................................................... 51 Vodopija Aljoša ........................................................................................................................................................................... 51 Žagar Aleš .................................................................................................................................................................................... 55 Ženko Bernard .............................................................................................................................................................................. 47 Zilberti Luca ................................................................................................................................................................................. 19 59 60 Slovenska konferenca o umetni inteligenci Slovenian Conference on Artificial Intelligence Mitja Luštrek, Matjaž Gams, Rok Piltaver Document Outline 02 - Naslovnica - notranja - A - TEMP 03 - Kolofon - A - TEMP 04 - IS2021 - Predgovor - TEMP 05 - IS2021 - Konferencni odbori 07 - Kazalo - A 08 - Naslovnica - notranja - A - TEMP 09 - Predgovor podkonference - A 10 - Programski odbor podkonference - A AndonovicEtal Andova+Filipic Abstract 1 Introduction 2 Evolutionary Multitasking 2.1 Assortative Mating 2.2 Selective Imitation 2.3 Landscape Analysis 3 Experiments and results 3.1 Multitask Optimization 3.2 Many-Task Optimization 4 Conclusion and future work 5 Acknowledgments DeMasiEtal Abstract 1 Introduction 2 Related Work 2.1 Drinking Detection From Wearables 2.2 Activity Recognition From Videos 3 Adopted Hardware 3.1 Wristband 3.2 Local Deployment of The Computer Vision System 4 Intent Recognition 4.1 Regions of Interest 4.2 Intent Recognition 4.3 Drinking Detection From Computer Vision on the Jetson NANO 4.4 Drinking Detection Using a Wearable device 5 Results and Discussion 5.1 Intent Recognition and Local Implementation of Drinking Detection 5.2 Wearable Sensing Results 6 Conclusions GolobEtal Abstract 1 Introduction 2 Methods 2.1 Data Acquisition 2.2 Reconstruction Techniques 2.3 Anomaly Detection 3 Results 4 Discussion and Conclusions Acknowledgments JankoEtal Abstract 1 Introduction 2 Library Functionalities 2.1 Motion Sensors Features 2.2 Physiological Features 2.3 Other Functionalities 3 Usage Example 3.1 SHL Dataset 3.2 Methods 3.3 Results 4 Conclusion Acknowledgments KomarovaEtal Abstract 1 Uvod in motivacija 2 Slikovni prostor na umetniških slikah 3 Zaznava obrazov 4 Geometrijska interpretacija prostora 5 Rezultati 6 Razprava 7 Zaključek PeliconEtal Abstract 1 Introduction 2 Data 2.1 Annotation Schema 2.2 Sampling for Training and Evaluation 2.3 Annotation Procedure 3 Experiments 3.1 autoBOT - an autoML for texts 3.2 Deep Learning 3.3 Other Baseline Approaches 4 Results 5 Conclusion Puc+Sadikov Abstract 1 Introduction 2 Related work 3 The SDG Environment 3.1 Description 3.2 Observations 3.3 Actions 3.4 Execution 3.5 Online Play 3.6 Replay System 4 Supervised Learning Baseline 4.1 Agent Model Architecture 4.2 Imitation Learning 4.3 Demonstrations 4.4 Results 5 Conclusions & Future Work Rescic+Lustrek Abstract 1 Introduction 2 Methodology 2.1 Problem outline 2.2 Dataset 2.3 Feature ranking 3 Results 3.1 Classification problem 3.2 Regression problem 3.3 Discussion 4 Conclusion and future work Acknowledgments Susic Abstract 1 Introduction 2 Data description and preparation 3 Methods and models 3.1 Country-Specific Approach 3.2 Time-Series Approach 4 Models' parameters selection 5 Results 6 Conclusion Trpin+Ženko VodopijaEtal Abstract 1 Introduction 2 Theoretical Background 3 Methodology 3.1 ELA Features 3.2 Dimensionality Reduction with t-SNE 4 Experimental Setup 5 Results and Discussion 6 Conclusions Acknowledgments ŽagarEtal Abstract 1 Introduction 2 Updates: KAS 2.0 and KAS-Abs 2.0 2.1 Extraction of Text Body 2.2 Extraction of Abstracts 2.3 Differences from Version 1.0 to 2.0 3 Sub-CERIF classification 4 New datasets 4.1 Summarization Datasets 4.2 Machine Translation Datasets 5 Conclusions Acknowledgments 12 - Index - A Blank Page Blank Page Blank Page Blank Page Blank Page Blank Page