Zbornik 25. mednarodne multikonference INFORMACIJSKA DRUZBA Zvezek H Proceedings of the 25th International Multiconference INFORMATION SOCIETY Volume H Vseprisotne zdravstvene storitve in pametni senzorji Pervasive Health and Smart Sensing Uredniki  Editors: Nina Rescic, Oscar Mayora, Daniel Denkovski Ljubljana, Slovenija 13. oktober 13 October Ljubljana, Slovenia httpis.ijs.si Zbornik 25. mednarodne multikonference INFORMACIJSKA DRUŽBA – IS 2022 Zvezek H Proceedings of the 25th International Multiconference INFORMATION SOCIETY – IS 2022 Volume H Vseprisotne zdravstvene storitve in pametni senzorji Pervasive Health and Smart Sensing Uredniki / Editors Nina Reščič, Oscar Mayora, Daniel Denkovski http://is.ijs.si 13. oktober 2022 / 13 October 2022 Ljubljana, Slovenija Uredniki: Nina Reščič Department of Intelligent Systems Institut »Jožef Stefan«, Ljubljana Oscar Mayora Digital Health Lab, Fondazione Bruno Kessler Trento, Italy Daniel Denkovski Computers Science and Computer Engineering Faculty of Electrical Engineering and Information Technologies Skopje, North Macedonia Založnik: Institut »Jožef Stefan«, Ljubljana Priprava zbornika: Mitja Lasič, Vesna Lasič, Lana Zemljak Oblikovanje naslovnice: Vesna Lasič Dostop do e-publikacije: http://library.ijs.si/Stacks/Proceedings/InformationSociety Ljubljana, oktober 2022 Informacijska družba ISSN 2630-371X Kataložni zapis o publikaciji (CIP) pripravili v Narodni in univerzitetni knjižnici v Ljubljani COBISS.SI-ID 127505923 ISBN 978-961-264-247-1 (PDF) PREDGOVOR MULTIKONFERENCI INFORMACIJSKA DRUŽBA 2022 Petindvajseta multikonferenca Informacijska družba je preživela probleme zaradi korone. Zahvala za skoraj normalno delovanje konference gre predvsem tistim predsednikom konferenc, ki so kljub prvi pandemiji modernega sveta pogumno obdržali visok strokovni nivo. Pandemija v letih 2020 do danes skoraj v ničemer ni omejila neverjetne rasti IKTja, informacijske družbe, umetne inteligence in znanosti nasploh, ampak nasprotno – rast znanja, računalništva in umetne inteligence se nadaljuje z že kar običajno nesluteno hitrostjo. Po drugi strani se nadaljuje razpadanje družbenih vrednot ter tragična vojna v Ukrajini, ki lahko pljuskne v Evropo. Se pa zavedanje večine ljudi, da je potrebno podpreti stroko, krepi. Konec koncev je v 2022 v veljavo stopil not raziskovalni zakon, ki bo izboljšal razmere, predvsem leto za letom povečeval sredstva za znanost. Letos smo v multikonferenco povezali enajst odličnih neodvisnih konferenc, med njimi »Legende računalništva«, s katero postavljamo nov mehanizem promocije informacijske družbe. IS 2022 zajema okoli 200 predstavitev, povzetkov in referatov v okviru samostojnih konferenc in delavnic ter 400 obiskovalcev. Prireditev so spremljale okrogle mize in razprave ter posebni dogodki, kot je svečana podelitev nagrad. Izbrani prispevki bodo izšli tudi v posebni številki revije Informatica (http://www.informatica.si/), ki se ponaša s 46-letno tradicijo odlične znanstvene revije. Multikonferenco Informacijska družba 2022 sestavljajo naslednje samostojne konference: • Slovenska konferenca o umetni inteligenci • Izkopavanje znanja in podatkovna skladišča • Demografske in družinske analize • Kognitivna znanost • Kognitonika • Legende računalništva • Vseprisotne zdravstvene storitve in pametni senzorji • Mednarodna konferenca o prenosu tehnologij • Vzgoja in izobraževanje v informacijski družbi • Študentska konferenca o računalniškem raziskovanju • Matcos 2022 Soorganizatorji in podporniki konference so različne raziskovalne institucije in združenja, med njimi ACM Slovenija, SLAIS, DKZ in druga slovenska nacionalna akademija, Inženirska akademija Slovenije (IAS). V imenu organizatorjev konference se zahvaljujemo združenjem in institucijam, še posebej pa udeležencem za njihove dragocene prispevke in priložnost, da z nami delijo svoje izkušnje o informacijski družbi. Zahvaljujemo se tudi recenzentom za njihovo pomoč pri recenziranju. S podelitvijo nagrad, še posebej z nagrado Michie-Turing, se avtonomna stroka s področja opredeli do najbolj izstopajočih dosežkov. Nagrado Michie-Turing za izjemen življenjski prispevek k razvoju in promociji informacijske družbe je prejel prof. dr. Jadran Lenarčič. Priznanje za dosežek leta pripada ekipi NIJZ za portal zVEM. »Informacijsko limono« za najmanj primerno informacijsko potezo je prejela cenzura na socialnih omrežjih, »informacijsko jagodo« kot najboljšo potezo pa nova elektronska osebna izkaznica. Čestitke nagrajencem! Mojca Ciglarič, predsednik programskega odbora Matjaž Gams, predsednik organizacijskega odbora i FOREWORD - INFORMATION SOCIETY 2022 The 25th Information Society Multiconference (http://is.ijs.si) survived the COVID-19 problems. The multiconference survived due to the conference chairs who bravely decided to continue with their conferences despite the first pandemics in the modern era. The COVID-19 pandemic from 2020 till now did not decrease the growth of ICT, information society, artificial intelligence and science overall, quite on the contrary – the progress of computers, knowledge and artificial intelligence continued with the fascinating growth rate. However, the downfall of societal norms and progress seems to slowly but surely continue along with the tragical war in Ukraine. On the other hand, the awareness of the majority, that science and development are the only perspective for prosperous future, substantially grows. In 2020, a new law regulating Slovenian research was accepted promoting increase of funding year by year. The Multiconference is running parallel sessions with 200 presentations of scientific papers at twelve conferences, many round tables, workshops and award ceremonies, and 400 attendees. Among the conferences, “Legends of computing” introduce the “Hall of fame” concept for computer science and informatics. Selected papers will be published in the Informatica journal with its 46-years tradition of excellent research publishing. The Information Society 2022 Multiconference consists of the following conferences: • Slovenian Conference on Artificial Intelligence • Data Mining and Data Warehouses • Cognitive Science • Demographic and family analyses • Cognitonics • Legends of computing • Pervasive health and smart sensing • International technology transfer conference • Education in information society • Student computer science research conference 2022 • Matcos 2022 The multiconference is co-organized and supported by several major research institutions and societies, among them ACM Slovenia, i.e. the Slovenian chapter of the ACM, SLAIS, DKZ and the second national academy, the Slovenian Engineering Academy. In the name of the conference organizers, we thank all the societies and institutions, and particularly all the participants for their valuable contribution and their interest in this event, and the reviewers for their thorough reviews. The award for life-long outstanding contributions is presented in memory of Donald Michie and Alan Turing. The Michie-Turing award was given to Prof. Dr. Jadran Lenarčič for his life-long outstanding contribution to the development and promotion of information society in our country. In addition, the yearly recognition for current achievements was awarded to NIJZ for the zVEM platform. The information lemon goes to the censorship on social networks. The information strawberry as the best information service last year went to the electronic identity card. Congratulations! Mojca Ciglarič, Programme Committee Chair Matjaž Gams, Organizing Committee Chair ii KONFERENČNI ODBORI CONFERENCE COMMITTEES International Programme Committee Organizing Committee Vladimir Bajic, South Africa Matjaž Gams, chair Heiner Benking, Germany Mitja Luštrek Se Woo Cheon, South Korea Lana Zemljak Howie Firth, UK Vesna Koricki Olga Fomichova, Russia Mitja Lasič Vladimir Fomichov, Russia Blaž Mahnič Vesna Hljuz Dobric, Croatia Alfred Inselberg, Israel Jay Liebowitz, USA Huan Liu, Singapore Henz Martin, Germany Marcin Paprzycki, USA Claude Sammut, Australia Jiri Wiedermann, Czech Republic Xindong Wu, USA Yiming Ye, USA Ning Zhong, USA Wray Buntine, Australia Bezalel Gavish, USA Gal A. Kaminka, Israel Mike Bain, Australia Michela Milano, Italy Derong Liu, Chicago, USA Toby Walsh, Australia Sergio Campos-Cordobes, Spain Shabnam Farahmand, Finland Sergio Crovella, Italy Programme Committee Mojca Ciglarič, chair Nikola Guid Andrej Ule Bojan Orel, Marjan Heričko Boštjan Vilfan Franc Solina, Borka Jerman Blažič Džonova Baldomir Zajc Viljan Mahnič, Gorazd Kandus Blaž Zupan Cene Bavec, Urban Kordeš Boris Žemva Tomaž Kalin, Marjan Krisper Leon Žlajpah Jozsef Györkös, Andrej Kuščer Niko Zimic Tadej Bajd Jadran Lenarčič Rok Piltaver Jaroslav Berce Borut Likar Toma Strle Mojca Bernik Janez Malačič Tine Kolenik Marko Bohanec Olga Markič Franci Pivec Ivan Bratko Dunja Mladenič Uroš Rajkovič Andrej Brodnik Franc Novak Borut Batagelj Dušan Caf Vladislav Rajkovič Tomaž Ogrin Saša Divjak Grega Repovš Aleš Ude Tomaž Erjavec Ivan Rozman Bojan Blažica Bogdan Filipič Niko Schlamberger Matjaž Kljun Andrej Gams Stanko Strmčnik Robert Blatnik Matjaž Gams Jurij Šilc Erik Dovgan Mitja Luštrek Jurij Tasič Špela Stres Marko Grobelnik Denis Trček Anton Gradišek iii iv KAZALO / TABLE OF CONTENTS Vseprisotne zdravstvene storitve in pametni senzorji / Pervasive Health and Smart Sensing .......................... 1 PREDGOVOR / FOREWORD ................................................................................................................................. 3 PROGRAMSKI ODBORI / PROGRAMME COMMITTEES ..................................................................................... 5 Optimized method for walking detection by wristband with accelerometer sensor / Hrastič Aleksander, Kranjec Matej, PI3 ........................................................................................................................................................... 7 Android Integration of a Machine Learning Pipeline for Human Activity Recognition / Srbinoski Viktor, Denkovski Daniel, Kizhevska Emilija, Gjoreski Hristijan .................................................................................. 11 Speaking Recognition with Facial EMG Sensors / Nikoloski Antonio, Poposki Petar, Kiprijanovska Ivana, Stankoski Simon, Gjoreski Martin, Nduka Charles, Gjoreski Hristijan ............................................................. 15 Machine-learning models for MDS-UPDRS III Prediction: A comparative study of features, models, and data sources / Lobo Vítor, Branco Diogo, Guerreiro Tiago, Bouça Raquel, Ferreira Joaquim ............................... 19 Elements of a System for Holistic Monitoring of Mental Health Characteristics at Home / Kirsten Kristina, Arnrich Bert ....................................................................................................................................................... 23 Towards Multi-Modal Recordings in Daily Life: A Baseline Assessment of an Experimental Framework / Anders Christoph, Moontaha Sidratul, Arnrich Bert ...................................................................................................... 27 Assessing Sources of Variability of Hierarchical Data in a Repeated-Measures Diary Study of Stress / Lukan Junoš, Bolliger Larissa, Clays Els, Šiško Primož, Luštrek Mitja....................................................................... 31 Academic Performance Relation with Behavioral Trends and Personal Characteristics from Wearable Device Perspective / Saylam Berrenur, Ekmekci Ekrem Yusuf, Altunoğlu Eren, Durmaz İncel Özlem...................... 35 Detection of postpartum anemia using machine learning / Susič David, Bombač Tavčar Lea, Hrobat Hana, Gornik Lea, Lučovnik Miha, Gradišek Anton .................................................................................................... 40 Covid symptoms home questionnaire classification and outcome verification by patients / Jakimovski Goran, Nikolova Dragana ............................................................................................................................................. 44 Piloting ICT Solutions for Integrated Care / Luštrek Mitja, Angelopoulou Efthalia, Guzzi Pietro Hiram, Drobne Samo, Matkovic Roberta, Miljkovic Miodrag, Papageorgiou Sokratis G, Blažica Bojan .................................. 48 Network Anomaly Detection using Federated Learning for the Internet of Things / Cholakoska Ana, Jakimovski Bojan, Pfitzner Bjarne, Gjoreski Hristijan, Arnrich Bert, Kalendar Marija, Efnusheva Danijela ........................ 52 Indeks avtorjev / Author index ................................................................................................................................ 57 v vi Zbornik 25. mednarodne multikonference INFORMACIJSKA DRUŽBA – IS 2022 Zvezek H Proceedings of the 25th International Multiconference INFORMATION SOCIETY – IS 2022 Volume H Vseprisotne zdravstvene storitve in pametni senzorji Pervasive Health and Smart Sensing Uredniki / Editors Nina Reščič, Oscar Mayora, Daniel Denkovski http://is.ijs.si 13. oktober 2022 / 13 October 2022 Ljubljana, Slovenija 1 2 PREDGOVOR Pomen digitalnih zdravstvenih storitev v zadnjih desetletjih nenehno narašča. Staranje prebivalstva je neposredno povezano s povečevanjem števila kroničnih bolnikov, ki jim razvoj medicine sicer omogoča zdravstveno oskrbo in posledično tudi podaljševanje življenjske dobe, hkrati pa je zdravstveni sistem zaradi tega dodatno obremenjen. Razvoj digitalne tehnologije je prinesel vse več dostopnih orodij za stroškovno učinkovito vzdrževanje in izboljševanje zdravja in kakovosti življenja ter obenem pripomogel k razbremenitvi zdravstva. Nedavna pandemija COVID-19 je dodatno poudarila potrebo po zagotavljanju zdravstvenih storitev na daljavo. Tehnološki napredek je sicer nekoliko upočasnjen zaradi zakonodaje, saj digitalne tehnologije ne morejo nositi odgovornosti zaradi napačnih zdravstvenih odločitev, prav tako je zelo pomembno tudi varstvo podatkov in spoštovanje zasebnosti pacientov. Sodelovanje vseh pomembnih družbenih, zdravstvenih in pravnih akterjev tako pomaga postaviti stabilnejše in zanesljivejše temelje za razvoj, uvajanje in uporabo digitalnih zdravstvenih tehnologij in storitev. Vseprisotne zdravstvene storitve in uporaba pametnih senzorjev so tako ključni deli digitalnega zdravja. Pametni senzorji in razne nosljive naprave omogočijo spremljanje na daljavo in in tako dodatno podprejo spremljanje zdravstvenega stanja bolnikov v klinikah in izven njih. Dodatno lahko pametni in vseprisotni sistemi za spremljanje zdravja zmanjšajo določena tveganja in odkrijejo težave v zgodnejših fazah bolezni. Konferenco »Vseprisotni zdravstveni sistemi in pametni senzorji« organizira EU projekt WideHealth, t.i. »widening« projekt, katerega glavni namen je vzpostavljanje trajnostne mreže raziskav med vključenimi partnerji. Konzorcij projekta sestavlja pet partnerjev (trije »widening« in dva »non-widening«), ki preko izmenjav in drugih raziskovalnih sodelovanj poglabljajo znanje na treh glavnih področjih: »data-driven healthcare«, »human factors in pervasive health« in »federated learning«. Namen konference »Vseprisotne zdravstvene storitve in pametni senzorji« je izmenjava strokovnega znanja in napredka raziskav na omenjenih področjih. Na konferenci bo predstavljenih 12 prispevkov, ki se osredotočajo na različne vidike pametnega zaznavanja in vsesplošnega zdravja. V prvem delu konference so vključeni prispevki, ki se osredotočajo na prepoznavanje človeških aktivnosti z uporabo nosljivih naprav (vključno z novejšimi tehnologijami, npr. pametnimi očali). Prispevki drugega dela konference se osredotočajo na objektivno in subjektivno spremljanje duševnega zdravja. V zadnjem, tretjem, delu so zbrani prispevki, ki predlagajo nove aplikacije, metodologije in IKT rešitve za vseprisotne zdravstvene sisteme ter izboljšanje varnosti in zasebnosti v takih sistemih. 3 FOREWORD The importance of digital health is constantly growing in recent decades. The reasons are well known: on the one hand, the aging of the population is producing an increasing number of chronic patients, and the progress of medicine is keeping them alive and in need of care; on the other hand, the progress of digital technology is creating an increasing number of available tools to maintain and/or increase health and quality of life cost-effectively. The recent COVID-19 pandemic has further emphasized the need to provide remote medical services to patients, which has boosted the emergence and adoption of digital technologies, especially in telehealth and telemedicine. Technological advances have been slowed mainly due to legislation since bad medical decisions cannot be blamed on digital technologies, and security and privacy issues also cannot be neglected. However, the involvement of all the important social, medical, and legal actors helps set up a more stable and reliable foundation for developing, deploying, and using digital health technologies and services. Pervasive health and smart sensing are crucial parts of digital health. Smart sensors and wearables can augment the healthcare system, enabling remote monitoring and supporting the patient's medical condition in and out of the clinics. Furthermore, smart and pervasive health monitoring systems can reduce death risks, identifying the issues at earlier stages of the diseases. They are the main focus of our "Pervasive Health and Smart Sensing" conference, as the name suggests. The conference is organized by the EU WideHealth project, a widening project that aims to conduct research on pervasive eHealth and establish a sustainable network of research and dissemination across Europe. It connects five partners (3 widening and two non-widening) to share and develop their research on three main topics: data-driven healthcare, human factors in pervasive health, and federated machine learning. The Pervasive Health and Smart Sensing conference aims to share expertise and research advancements in these areas. The 12 papers we have accepted at the conference focus on different aspects of smart sensing and pervasive health. Several works utilize wearable devices (including new types, i.e., smart glasses) and machine learning for human activity recognition. Several others focus on objective and subjective monitoring of mental health. Finally, there are papers proposing new applications, methodologies, and ICT solutions for pervasive health and improving the security and privacy in such systems. 4 PROGRAMSKI ODBOR / PROGRAMME COMMITTEE Oscar Mayora Daniel Denkovski Nina Reščič Orhan Konak Hristijan Gjoreski Valentin Rakovic Diogo Branco Monika Simjanoska Martin Gjoreski Tiago Guerreiro Tome Eftimov Vito Janko Venet Osmani Junoš Lukan Eftim Zdravevski 5 6 Optimized Method for Walking Detection by Wristband with Accelerometer Sensor Aleksander Hrastič Matej Kranjec Primož Kocuvan alekshrastic@gmail.com matejkranjec04@gmail.com primoz.kocuvan@ijs.si University of Ljubljana, Faculty of University of Ljubljana, Faculty of Department of Intelligent Systems Electrical Engineering Electrical Engineering Jožef Stefan Institute Ljubljana, Slovenia Ljubljana, Slovenia Ljubljana, Slovenia ABSTRACT detect whether the person is walking or not. However, many stud- ies have focused on using machine learning algorithms, which This paper presents the part of the gait impairment measurement provide high accuracy but are computationally expensive to im- algorithm, which consists exclusively of the walking detection plement in embedded systems (wristbands). algorithm. The purpose of the optimized algorithm is to improve We present to you a computationally inexpensive algorithm the detection of walking. Today’s embedded devices (like wrist- for detecting whether a person is walking or not. Furthermore, bands) have low-level interrupts that detect steps and, conse- the algorithm can detect walking and other daily activities similar quently, walking. The problem is that these could be inaccurate to the walking pattern and can be used on a low-power wristband in some cases. For example, a person can swing with a hand system. In our case, the most crucial aspect of our gait detection while sitting, and the device will detect steps. The importance of algorithm should be to detect as minimal cases as possible where walking detection is crucial for gait impairment measurements, the algorithm predicts that the person is walking naturally. Still, as gait data should only be collected when a person is walking in the actual case, the person is performing other activities. in a "normal" manner and not performing any other walking-An algorithm to measure gait deterioration (our next step) like activities. An algorithm to measure gait impairment will be will help the elderly prevent falls. The algorithm will monitor developed in the later stages of this study. We focused on improv- a person’s gait daily, and when a person’s gait deteriorates dra- ing the walking detection algorithm with statistical methods in matically, it will notify caregivers of increased chances of falling. both time and frequency domains in contrast to computation- Accordingly, caregivers can take the person to rehabilitative walk- ally expensive algorithms that use machine learning. The walk ing therapy or give them more care. detection algorithm has been optimized based on data collected by a wristband with a 3-axis accelerometer sensor. With our optimized algorithm, we got an average accuracy of 89.4%. We 2 RELATED WORK can conclude that our proposed method works well for detecting Advances in the accuracy and accessibility of wearable sensing when a person is walking normally. The algorithm successfully technology (e.g., fitness bands and smartwatches) has allowed detects "not natural walking" scenarios when the person is sit-researchers and practitioners to utilize different types of wearable ting and swinging their hand or walking with extreme hand sensors to detect walking. movements. In [2] the authors explored the possibility of detecting activity KEYWORDS from a smartphone-based accelerometer sensor. They used smart- phones placed in different positions(backpack, pocket, in hand) wristband, walking detection, FFT, periodogram, activity recog- to collect data when doing an activity (walking, fast walking, nition, hamming window slow walking, running). To reduce complexity, they computed the magnitude of the 3-axis accelerometer. The magnitude vector 1 INTRODUCTION is then processed using time and frequency domain statistical Every year number of older adults fall and injure themselves. For techniques. Finally, the statistical methods on the time-domain example, in Western Europe, in 2017 alone, 13840 per 100,000 measures are applied for state recognition, while the statistical older adults over the age of 70 are known to have fallen and techniques on the frequency-domain features are implemented injured themselves to the extent of medical assistance [1]. To for walking movement distinction. prevent such phenomena, measurement and monitoring of gait In [3], they use a smartphone with a gyroscope to collect deterioration in the elderly must be developed. One part of the data. They propose a new algorithm based on Fast Fourier Trans- such algorithm must consist of a walking detection algorithm form (FFT) [4] to identify the walking activity of a user who that detects whether a person is walking or not in a non-invasive can perform different activities and hold the smartphone differ- way. ently. The proposed algorithm (FFT) was able to achieve superior Wristbands with various sensors (e.g., accelerometer, gyro- overall performance compared to the other two best-performing scope) have proven to be an excellent technology for automatic algorithms (Short Time Fourier Transform (STFT) and Standard and non-invasive detection of daily activities. In this case, we can Deviation Threshold (STD TH)). use the acceleration vector data from the accelerometer sensor to The authors in [5] propose an algorithm that classifies human activity in real time based on data from an accelerometer attached Permission to make digital or hard copies of part or all of this work for personal to the subject. The algorithm uses dynamic linear discriminant or classroom use is granted without fee provided that copies are not made or analysis (LDA), which can dynamically update classifier matrices distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this without storing all training samples in memory. LDA is used to work must be honored. For all other uses, contact the owner /author(s). find a transformation of extracted features that separate data dis- Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia tribution into different classes while minimizing the distribution © 2022 Copyright held by the owner/author(s). of data of the same class in the newly transformed space. 7 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Aleksander Hrastič, Matej Kranjec, and Primož Kocuvan Compared to the state-of-the-art algorithm, our paper aims to combine the FFT and threshold algorithm from [2] and axis selection algorithm from [3] while adding an upper bound threshold to detect exaggerated hand movements and excluding them from false positives. 3 METHODOLOGY The main goal of the research was to improve, or rather optimize, the gait detection algorithm based on statistical methods and frequency coefficients obtained from the measurements of the Empatica E4 bracelet accelerometer. To achieve this, we had to record data with the wristband while performing various activi- ties and test the performance of our algorithm on the collected data. The data was collected using the Empatica E4 wristband Figure 1: Example of raw signal from accelerometer sensor [6]. The sampling frequency for the 3-axis accelerometer is 32 Hz. It has an 8-bit resolution and a default range of ± 2 g with sensitive motion detection along three axes: x, y and z. threshold to prevent false walk detection when a subject is swing- 3.1 Data collection ing a hand uncontrollably, shown in the main algorithm 2 on line (18). This is more thoroughly described below. An Empatica E4 bracelet was used for data collection and placed First, we use the time windowing algorithm (Algorithm 1) to on the subject’s left wrist. The wristband was connected to a process the data in a shorter time frame. Then, we need to divide smartphone via Bluetooth and streamed real-time data that was the data into time windows (W). We found empirically that it is uploaded to the Empatica server. We have designed various routes best if the data window length (𝑤 ) is 5 seconds with a 2.5-second 𝑡 and defined actions on these routes, which the subjects should overlap (𝑜 ). 𝑡 carry out. Data was then collected from different individuals The time windows are then filtered (x, y, and z axes are filtered who wore the bracelet and followed the planned route. Various separately) with a high-pass Butterworth filter to capture the sig- walking styles were performed on the designed paths, such as nal proportionally (symmetrically) with respect to the time axis. normal walking, slow walking, fast walking, and walking with The general shape of the frequency response of a Butterworth random hand movements. Some actions involved sitting in a filter is defined as equation (1). Where 𝑓 is the cutoff frequency, 𝑐 chair and performing arm swings that are similar in motion to 𝜖 is the passband gain, and 𝑛 is the order of the filter. We chose arm swings if the subject were walking. the order of 𝑛 to be 5. We chose it heuristically. For our example, In [2], data was gathered from 7 individuals doing different the cutoff frequency was set to 1 Hz. walking styles (slow walking, fast walking, normal walking). They collected 27 samples. In our case, the data was collected 1 from 4 individuals shown in Table 1. We also collected a total of 𝐻 ( 𝑓 ) = (1) √︂ 2𝑛 𝑓 27 samples. 2 1 + 𝜖 𝑓𝑐 In the next step, we detect which of the three axes is the most Table 1: Table of participants sensitive for each time window. This step is accomplished by cal- culating each filtered axis’s standard deviation (STD) separately Participant Gender Age Disability and selecting the one axis with the highest STD value. A Male 22 None Afterward, we compute modified periodogram coefficients B Male 24 None from the most sensitive axis for each window. To calculate the C Male 83 Difficulty walking modified periodogram in the algorithm 2 we multiplied signal D Female 79 None windows with Hamming window, which is defined as (2). The Hamming window is an extension of the Hamming window and is a semi-cosine bell-shaped curve. Figure 1 shows all three axes of raw accelerometer data collected from the Empatica wristband. During an interval between 2𝜋 𝑛 20 seconds and 70 seconds, the subject wearing the Empatica 𝑤 (𝑛) = 0.54 − 0.46𝑐𝑜𝑠 , 0 ≤ 𝑛 ≤ 𝑀 − 1 (2) walked in a straight line. 𝑁 − 1 Where N represents total length of the window. 3.2 Algorithm Our optimized algorithm combines aspects from two papers For each time window, two main conditions had to be met for [2][3]. From the first paper, we used the modified periodogram it to be classified as "walking." thresholding algorithm to detect walking only when the mini- Modified periodogram coefficients are computed using equa- mum required hand activity is reached in frequency ranges that tion (3). Time windows that met the first condition (4), need to correspond to human walking activity. From the other paper, we have computed modified periodogram coefficients that are on implemented this on the 3-axial accelerometer. For each time the interval 0.6 to 2 Hz (𝑆 ( 𝑓 ) where 𝑓 represents all the fre- 𝑥 𝑥 𝑖 𝑖 window, we select and process only the data on an axis with the quencies inside the interval) and had higher mean than the mean most variance. Our contribution to the algorithm for walking of coefficients in the interval outside 0.6 to 2 Hz (𝑆 ( 𝑓 ) where 𝑥 𝑥 𝑜 detection is a combination of the two, with added upper bound 𝑓 represents all the frequencies outside the interval). 𝑜 8 Optimized Method for Walking Detection by Wristband with Accelerometer Sensor Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Algorithm 2 for detection of walking function Stationary(𝑑) 𝑛 ← 2 𝑛𝑜𝑟 𝑚 (𝑑 ) 𝑆 ( 𝑓 ) = |𝐹 ( 𝑓 ) |2 (3) 𝑥 𝑥 𝑚 ← 𝑛 [:] − ) 2 𝑚𝑒𝑎𝑛 (𝑛2 𝑠𝑑 ← 𝑠𝑡𝑑 (𝑚) Where F(f ) is output from FFT at desired frequency f. end function Require: 𝑊 Ensure: 𝑏𝑜𝑜𝑙𝑒𝑎𝑛 [] for all (𝑊 ) do ⊲ i represents index of current window in a 𝑖 𝑆 ( 𝑓 ) > 𝑆 ( 𝑓 ) (4) 𝑥 𝑥 𝑖 𝑥 𝑥 𝑜 loop if 𝑙𝑒𝑛𝑔𝑡ℎ(𝑆𝑡𝑎𝑡𝑖𝑜𝑛𝑎𝑟𝑦 (𝑊 )) ≥ 0 then 𝑖 𝑊 ← 𝐵𝑢𝑡𝑡𝑒𝑟𝑤𝑜𝑟𝑡ℎ𝐹𝑖𝑙𝑡𝑒𝑟 (𝑊 (𝑥 )) 𝑥 𝑖 The second condition (5) that had to be met for the time win- 𝑊 ← 𝐵𝑢𝑡𝑡𝑒𝑟𝑤𝑜𝑟𝑡ℎ𝐹𝑖𝑙𝑡𝑒𝑟 (𝑊 (𝑦)) 𝑦 𝑖 dow is that the STD of the vector norm of the unfiltered signal 𝑊 ← 𝐵𝑢𝑡𝑡𝑒𝑟𝑤𝑜𝑟𝑡ℎ𝐹𝑖𝑙𝑡𝑒𝑟 (𝑊 (𝑧)) 𝑧 𝑖 must be between 0.3 g and 0.7 g. The lower limit (0.3 g) ensures 𝑛 ← 𝑎𝑣𝑔(𝑛𝑜𝑟𝑚(𝑊 )) 𝑚𝑒𝑎𝑛𝑥 2 𝑥 that walking is not falsely detected when the subject is not mov- 𝑛 ← 𝑎𝑣𝑔(2𝑛𝑜𝑟𝑚(𝑊 )) 𝑚𝑒𝑎𝑛 𝑦 2 𝑦 ing. The higher limit (0.7 g) prevents walking detection when 𝑛 ← 𝑎𝑣𝑔(2𝑛𝑜𝑟𝑚(𝑊 )) 𝑚𝑒𝑎𝑛𝑧 2 𝑧 subjects move their arms uncontrollably. Both limits were de- 𝑎𝑚 ← 𝑎𝑟 𝑔𝑚𝑎𝑥 {𝑛 } 𝑚𝑒𝑎𝑛𝑥 2, 𝑛𝑚𝑒𝑎𝑛 𝑦 2, 𝑛𝑚𝑒𝑎𝑛𝑧 2 termined empirically based on our collected data set. The norm 𝑝𝑔 ← 𝑝𝑒𝑟 𝑖𝑜𝑑𝑜𝑔𝑟 𝑎𝑚 (𝑎𝑚, ℎ𝑎𝑚𝑚𝑖𝑛𝑔) ⊲ hamming is the is calculated using equation (6) where x, y, and z are the time-windowing function windowed accelerometer signal vectors, each representing an if (𝑚𝑎𝑥 (𝑎𝑚) − 𝑚𝑖𝑛(𝑎𝑚) > 0.3) and 𝑝𝑔 ( 𝑓 > axis. "i" means the same index on all three axes, ranging from 0.6 and 𝑓 < 2) then 1 to the length of the time window (this is calculated from the 𝑏𝑜𝑜𝑙 𝑒𝑎𝑛 ← 𝑏𝑜𝑜𝑙𝑒𝑎𝑛 + [1] raw signal using the (7) where N is a number of samples in a else time window). Time windows that satisfy both conditions are 𝑏𝑜𝑜𝑙 𝑒𝑎𝑛 ← 𝑏𝑜𝑜𝑙𝑒𝑎𝑛 + [0] classified as "walking"; all other window cases are classified as end if "not walking." end if end for 0.3 < 𝜎 > 0.7 (5) 𝑛𝑜𝑟 𝑚 4 RESULTS √︃ We ran the algorithm on different recordings taken with the Em- 2 2 2 𝑛𝑜𝑟 𝑚 = 𝑥 + 𝑦 + 𝑧 (6) 𝑖 𝑖 𝑖 𝑖 patica wristband. Slow and fast straight walking, stair climbing, and sitting involving arm swing. Figure 2 shows a dot plot where zero (on the y-axis) represents "no walking," and one represents "walking." The x-axis represents time (in seconds). Dots on the x-axis are linearly spaced by 2.5 √︄ Í𝑁 (𝑛𝑜𝑟𝑚 − 𝑛𝑜𝑟𝑚)2 seconds. During the first 8 seconds, the subject was standing, so 𝑖 𝑖 =1 𝜎 = (7) 𝑛𝑜𝑟 𝑚 for this part of the signal, the algorithm correctly classified it as 𝑁 "not walking." After 8 seconds, the subject started to walk in a straight line, and the algorithm correctly detected this activity as "walking." For our example, we can confirm that the algorithm works correctly under normal walking conditions. Algorithm 1 for windowing Require: (𝑎𝑐𝑐 , 𝑎𝑐𝑐 , 𝑎𝑐𝑐 ), 𝑤 , 𝑜 ⊲ 𝑜 is the overlap 𝑤 = 𝑥 𝑦 𝑧 𝑡 𝑡 𝑡 𝑡 length of the window Ensure: (𝑊 ,𝑊 ,𝑊 ) 𝑥 𝑦 𝑧 𝑊 ← [] 𝑠 ← 0 ⊲ 𝑠 = start index of windowl 𝑡 𝑡 𝑒 ← 𝑠 + 𝑤 ⊲ 𝑒 = end index of window 𝑡 𝑡 𝑡 𝑡 for all (𝑎𝑐𝑐 , 𝑎𝑐𝑐 , 𝑎𝑐𝑐 ) do 𝑥 𝑦 𝑧 while 𝑠 ≤ 𝑁 do ⊲ N is the number of samples in a 𝑡 window, i represents index of current sample in a loop ′ 𝑎𝑐𝑐 ← 𝑎𝑐𝑐 [𝑠 : 𝑒 ] 𝑖 𝑡 𝑡 𝑖 ′ 𝑊 ← 𝑊 + [𝑎𝑐𝑐 ] 𝑖 𝑠 ← 𝑠 + 𝑜 𝑡 𝑡 𝑡 𝑒 ← 𝑒 + 𝑜 𝑡 𝑡 𝑡 end while Figure 2: Proposed algorithm used on straight walking end for activity, recorded Empatica E4 wristband 9 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Aleksander Hrastič, Matej Kranjec, and Primož Kocuvan Table 2: Table of activities and their accuracy Activity Detected as "Walking" Detected as "Not walking" Accuracy Walking and swinging hand 47.5s 7.5s 86.4% Slow walking 52.5s 2.5s 95.5% alking Stair climbing 17.5s 37.5s 68% W Fast walking 5s 52s 91.2% Standing 7.5s 47.5s 86.4% alking Sitting 0s 55s 100% W Sitting and swinging hand 2.5s 52.5s 95.5% Not Walking and uncontrollably moving hand 0s 55s 100% Figure 3 shows an example of a recording where several human the elderly to walk fast. In the stairs climbing case, the algorithm did not perform very well, but that is not relevant in our case. activities are present, such as standing, sitting on a chair and More importantly, in the last 6 cases algorithm performs well in performing random hand movements, walking and performing detecting true negatives. hand movements, and walking and performing exaggerated hand movements. It can be seen that the algorithm had difficulty in gait classification when high-amplitude arm movements were present during the subject’s gait. This is because gait characteristics are lost in the noise of high-amplitude hand movements. For our purposes, the issue is not critical because the future end goal is to measure the subjects’ gait impairment, so there is no problem in discarding the parts of the signal where the person does not walk in a "natural" way. However, we can also observe that there was deviation when the subject sat down and started swinging his arm (One instance at the 42nd second where the algorithm should predict "not walking" but instead, it predicted "walking"). On Figure 4: Proposed algorithm used when sitting and swing-Figure 4 at about 78th second, we can observe that the algorithm ing hand, recorded on Empatica E4 wristband detected sitting as if it were walking. 5 CONCLUSION In the related work, we described the state-of-the-art algorithms used in today’s many applications. For this research, we selected two algorithms from many of them and expanded (optimized) the work for our purposes. The results of our algorithm were able to detect when a person was walking normally, slowly, and quickly. In addition, the algorithm correctly detected cases when a person does not walk while sitting but swings his arm. To measure gait impairment, we only want to use time win- dows of the signal where we are certain that the person is walking and that there are no additional "unnecessary" hand movements. In the future, we will further improve the algorithm so that the deterioration of walking, our final goal, can be measured cor- rectly. Figure 3: Proposed algorithm used on multiple activities, recorded on Empatica E4 wristband REFERENCES [1] Juanita A Haagsma et al. 2020. Falls in older aged adults in 22 european countries: incidence, mortality and burden of disease from 1990 to 2017. We require that we have the least amount of false positives in Injury Prevention, 26, (Feb. 2020), i67–i74. doi: 10.1136/injuryprev-2019-0433 our data set because we want to detect only the scenarios where 47. a person is walking the most naturally. This is a typical binary [2] Chalne T ornqvist. 2017. Walking movement detection using stationary sto-chastic methods on accelerometer data. MA thesis. Lund University. classification problem, where the final results are shown in Table [3] Guodong Qi and Baoqi Huang. 2018. Walking detection using the gyroscope 2. The first three activities (walking and swinging hand, slow of an unconstrained smartphone. In (Jan. 2018), 539–548. isbn: 978-3-319-66627-3. doi: 10.1007/978- 3- 319- 66628- 0_51. walking...) are considered natural walking and should be detected [4] E. O. Brigham and R. E. Morrow. 1967. The fast fourier transform. IEEE as walking. The next 6 (Fast walking, standing, sitting, sitting Spectrum, 4, 12, 63–70. doi: 10.1109/MSPEC.1967.5217220. and swinging hand, walking and uncontrollably moving hand) [5] Yen-Ping Chen, Jhun-Ying Yang, Shun-Nan Liou, Gwo-Yun Lee, and JeenShing Wang. 2008. Online classifier construction algorithm for human activity activities should be considered as "not walking" because they are detection using a tri-axial accelerometer. Applied Mathematics and Compu-less optimal for feature collection for the algorithm that will be tation, 205, 2, 849–860. Special Issue on Advanced Intelligent Computing implemented in the next stages of this study. The study we are Theory and Methodology in Applied Mathematics and Computation. doi: https://doi.org/10.1016/j.amc.2008.05.099. conducting is primarily meant for the elderly, so we categorized [6] [n. d.] Medical devices, ai and algorithms for remote patient monitoring. the "fast walking" scenario as not walking, as it is not common for empatica. https://www.empatica.com/. 10 Android Integration of a Machine Learning Pipeline for Human Activity Recognition Viktor Srbinoski, Daniel Denkovski, Emilija Kizhevska, Hristijan Gjoreski Faculty of Electrical Engineering and Information Technologies, Ss. Cyril and Methodius University in Skopje, N. Macedonia, Jozef Stefan Institute, Slovenia viktor_srbinoski@hotmail.com, danield@feit.ukim.edu.mk, emilija.kizhevska@ijs.si, hristijang@feit.ukim.edu.mk ABSTRACT more convenient to use smartphone sensors for the common user, as smartphones have become ubiquitous. In the last decade, smartphones have seen a serious growth Human activity recognition is a popular topic, which has in the processing power. Coupled with greater affordability been worked on extensively in the recent years [2]. Practical this has led to a worldwide smartphone ubiquity. Alongside applications for HAR are mainly in improvement of the the advances in processing and battery technology, there are quality of life and medicine. A great example of HAR models great advances in sensor technology as well, and every being used in medicine can be found in paper [3], which smartphone today comes equipped with multiple sensors: focuses on fall detection mainly for the elderly population. accelerometer, gyroscope, magnetometer etc. The sensory Using dedicated wearable sensors to recognize activities data is already being used in a variety of applications, among is the most common approach. Smartwatch is usually which several focus on the human activity recognition. In this equipped with the same sensors as the smartphones and has paper, we propose a smartphone Android integration of a a much more fixed position on the body (tightly around the machine learning pipeline for recognizing human activities. wrist). The drawback is that the arms are more prone to The proposed approach uses the 3-axis accelerometer in the random movement which introduces noise into the system smartphone, processes the data in real time, and then a and makes HAR more difficult. A detailed analysis on these machine learning model recognizes the user's activities in issues can be found in paper [4]. real time: walking, running, jumping, cycling and standing Using data from smartphone sensors to train models for still. The proposed Recurrent Neural Network model and its HAR has also been explored recently in [5], where a deep machine learning pipeline are developed on a publicly open neural network is trained on the data from multiple sensors activity dataset, which are then implemented into the on the smartphone. In our study we go a step further and Android application and once again validated on a dataset analyze and compare a simplified subset of the sensor data recorded with a smartphone itself. (only accelerometer magnitude) - which allows us to have a model that will work regardless of the smartphone KEYWORDS orientation and to have a simple yet effective method of Human activity recognition, machine learning, Android integrating a model into an Android application. integration, Tensorflow Light, recurrent neural network, We propose an Android integration of a Machine Learning accelerometer, magnitude. (ML) pipeline for recognizing human activities in real time on a smartphone. In particular, the proposed approach uses the 3-axis accelerometer in the smartphone, processes its data in 1 INTRODUCTION real time, and then the ML model recognizes the user's Human Activity Recognition (HAR) is the process of activities: walking, running, jumping, cycling and standing examining data from one or multiple sensors and still. The proposed Recurrent Neural Network (RNN) model determining which (if any) activity is being performed. The and its machine learning pipeline are developed on a publicly sensors are traditionally placed on key points on the human open activity dataset, then implemented into an Android body and contain composite data (accelerometer, gyroscope, application, which finally, is once again evaluated on a magnetometer data, etc.). Advances in sensor technology dataset recorded with a smartphone itself. Additionally, as have made sensors more compact and precise over the years, part of this study we release an Android application [6], but most importantly more affordable. Today these sensors which can be used by other researchers to easily gather data can be found in the standard package of any smartphone. with a smartphone and as a practical demonstration of how The purpose of this paper is to leverage these smartphone to integrate an ML model with an Android application and sensors to perform HAR in real time, by utilizing an Android use the built-in accelerometer data. application which continuously reads its own sensor data, instead of using the traditional dedicated wearable sensors. The premise is that the smartphone sensors have reached the 2 DATASET required quality to be comparable to the wearable sensors in The models were trained on a publicly available dataset accuracy [1]. The benefit of this approach is that it is much which was originally used to evaluate the impact of sensor placement in activity recognition [7]. The dataset consists of 11 wearable sensor readings from 17 healthy subjects which are the mean and the standard deviation of the 150 values perform any of 33 different activities. There are a total of 9 in the window. The three additional statistical features are: wearables placed on the body: two on each arm and leg, and  Mean first-order difference: average difference one on the back. Each wearable sensor reads 13 values with between consecutive values in the window. Computed a frequency of 50Hz: three for acceleration, three for by first creating a list of first-order differences between rotation, three for magnet flux vector and four for orientation consecutive values in the window and then calculating in quaternion format. This brings the total amount of the mean of this list. readings to 117 per frame (9 wearable sensors with 13  Mean second-order difference: average difference values each). Out of all these measurements only six are used: between consecutive elements in the list of first-order the 3 accelerometer values from each of the two upper leg differences. sensors (left and right). These sensors are chosen as they are  Min-max difference: difference between the minimum approximately at the location where a smartphone would be and maximum value in the window. (in a side pocket). Additionally, the magnitude of each sensor The feature extraction is performed on every sensor (x, y, is added as an additional feature, calculated as: z axis and magnitude on both accelerometers, left and right), which gives a total of 40 features. The features are then 𝑚𝑎𝑔𝑛𝑖𝑡𝑢𝑑𝑒 = √𝑎𝑐𝑐 2 2 2 ( 1 ) 𝑥 + 𝑎𝑐𝑐𝑦 + 𝑎𝑐𝑐𝑧 separated into three datasets: left accelerometer, right Due to the position of the sensors, recognizing motion accelerometer with 20 features each, and combined mainly expressed with the upper torso and arms is accelerometers which contains the data from both the left impossible, so the dataset is truncated to only activities that accelerometer and right accelerometer datasets, by matching are dependent on the legs: walking, running, jumping, cycling the respective features (e.g., x-axis on the left accelerometer and standing still. and x-axis on the right accelerometer are treated as the same feature: x-axis), thus the combined accelerometers dataset also contains 20 features, but it is twice as long. 3 METHODOLOGY To compare the effectiveness of a simplified version of the In order to adapt the dataset to fit the needs of this model that is orientation independent, a second version of application, certain preprocessing and feature extraction is the dataset is created. This dataset uses only the features performed, described in detail in the following subsections. extracted from the magnitudes of both accelerometers (5 features each). It is further split into three parts: magnitude-3.1 Preprocessing and segmentation only left, magnitude-only right and magnitude-only combined, The dataset contains a disproportionate number of readings each containing five features. for standing still in comparison to all other activities. To correct this a random under-sampling is performed (only 5% 3.3 ML Models of the standing still data is used). Additionally, similar Multiple ML models were evaluated, such as K-NN, Linear activities are grouped together, namely jogging and running SVM, Random Forest, Naï ve Bayes and Neural Networks are grouped together as running, and jumping upwards, (DNN and RNN). jumping front and back, jumping side to side, and jump rope Ultimately the RNN model had the best performance. A are grouped as jumping. The resulting distribution of data is simple RNN was chosen as the ML model for this application. illustrated on Figure 1, with running having the most amount The model is created using Keras and contains two RNN of data (1760s), and cycling having the least (860s). layers with 512 nodes each and tanh activation function. The final decision layer is a Dense layer with 5 nodes and a softmax activation function. It is trained for 100 epochs with a sparse categorical cross entropy activation function. 4 EXPERIMENTS With the dataset prepared, the following experiments were conducted:  Accuracy comparison between magnitude-only and full- featured versions of the dataset. Figure 1 Activity distribution after selection  Evaluation of models trained on data from the left accelerometer and evaluated on data from the right, and Once selection has been performed, the data is grouped vice-versa. into 3-second windows. Since the data is collected at a frequency of 50Hz, each window contains 150 records. 4.1 Evaluation and metrics The models were evaluated using K-fold Cross- 3.2 Feature extraction Validation, where K is equal to the number of subjects, and After the data has been split into 3-second windows, five in each iteration a different subject’s data is used as the statistical features are calculated per window. The first two validation set. Splitting the data this way ensures that the test 12 data and train data do not both contain windows from the simplified and right simplified sets) which could be due to same subject (as consecutive windows from the same subject random noise. are very similar). Instead, when using the data from a In order to evaluate if the model takes in a bias from the separate subject as a validation set, a good estimate can be side on which it is trained or if the sides carry an intrinsic made of how the model will behave when a never seen before difference, the model was trained on one side and evaluated person’s data needs to be evaluated. on the other. This was done twice, trained on left and In every iteration of the K-fold Cross-Validation a evaluated on right, and trained on right and evaluated on left. confusion matrix is generated from the predicted values. The results are displayed on Figure 4, along with a control set From there the precision and recall are calculated for every which was trained and evaluated on the same side. activity as well as the overall accuracy. These metrics are compiled for every iteration and the average values across all iterations form the overall evaluation of the model. 4.2 Results Initially nine models were considered and evaluated on both the full-featured dataset and the magnitude-only dataset (for combined accelerometers). The results are illustrated on Figure 2, sorted by accuracy. Figure 4 Comparison between same and opposite side evaluation The accuracy differences are within 2% which is negligible, and in the case of the right accelerometer dataset, evaluating on the left actually increased the overall accuracy. This is due to the slight difference in quality between the left and right sides, and not due to switching sides when evaluating. These results suggest that there is no significant side bias Figure 2 Accuracy comparison of all inspected ML models in the models and thus the activity recognition will work regardless of on which side the smartphone is located. This The accuracy of the models with full features was in addition to the simplified model’s independence from expectedly higher than the magnitude-only version, with the orientation make it the ideal choice for integrating with a drop in accuracy being on average 7% (K-NN being the smartphone. exception with an increase in accuracy of 2%). The RNN had the highest accuracy in both cases, with 98.8% on the full- featured dataset and 95.8% on the magnitude-only dataset. 5 ANDROID INTEGRATION Therefore, the following results focus on the RNN model. In order to integrate with an Android smartphone device, The comparison in accuracy between the full-featured the magnitude-only model with combined accelerometers and magnitude-only versions was made on all three datasets was converted into a tflite format using the Tensorflow Lite (left, right and combined). The results for the RNN are library, which is the most commonly used library for artificial displayed on Figure 3. intelligence in Android. The converted models are then added in the file structure of an Android application which reads them into memory when it starts up and uses them in real time to recognize activities. All Android devices come equipped with accelerometers (along with many other sensors) and they can be accessed with the built-in class SensorManager, which is part of the default library: android.hardware. The data read by the SensorManager is on a by-axis basis and in the standard unit of m/s2. The orientation of the x, y and z axis is illustrated in Figure 5. Figure 3 Comparing full-featured and magnitude-only datasets The frequency with which the sensor records data is adjustable, with the tradeoff being higher quality data vs The average drop in accuracy for the RNN was 3% which is lower battery consumption. In our implementation, the well within acceptable boundaries. As a side note, the right sensor delay is set to 20ms between reads (50Hz frequency). side in general seems to show slightly weaker results, Since there is no way to predict which way the however at most this is 1.5% (when comparing the left smartphone will be oriented in the pocket, the magnitude of 13 the accelerometer is the only thing that is used in the feature The overall accuracy of the model was 90.2%, which is a calculation. The magnitude readings are kept in memory noticeable drop from the 95.8% evaluated from the original until 150 samples are accumulated (exactly 3s), which is the training dataset. This is expected, as there is a certain amount size of the window used in the training of the models. Then of noise introduced to the system from the fact that the the same statistical features are calculated on the collected smartphone is not fixed in place as rigidly as the wearables. window: mean, std. deviation, mean first-order and second- order differences, min-max difference. These values are then placed in a tensor and it is sent as the input into the model, 6 CONCLUSION which is also kept in memory (in the form of an object). The This paper presented a practical way of training and output of the model is also a tensor (the output layer which implementing a HAR model in an Android application, along has a softmax activation function), which is then converted with solving the practical issues of reading smartphone into a single result (the node with the highest value) and is accelerometer data such as unpredictable orientation and displayed on screen. whether it is kept on the left or right side. To determine whether there is an intrinsic difference between the left and right side or whether the models develop a side bias, an experiment was conducted where models were evaluated on the opposite side of where they were trained, and it was determined that no such bias existed. To gain independence from orientation, a simplified dataset was created which used only the magnitude readings. Training on this dataset resulted in an expected drop in accuracy, but within an acceptable margin. An RNN was trained on the magnitude-only dataset and integrated into an Android application which reads the accelerometer data and calculates the features in real time. The calculated features are used as an input for the model, which then outputs the predicted activity, and is subsequently shown on screen. The sensors in the used smartphone did prove to be of a Figure 5 Accelerometer axis orientation in smartphones comparable quality to the wearable sensors as the model Since 150 samples need to be accumulated before the successfully recognized activities recorded with smartphone features are calculated and the model is called to make the sensors with a solid accuracy of 90.2%, even though it was prediction, there is the side effect that the displayed value on trained on a dataset from wearable sensors. screen is 3s behind (in other words the current activity the user is doing will be displayed in 3s). All the data read by the ACKNOWLEDGEMENT accelerometer along with the prediction and a timestamp and is kept in memory (a single entry will contain all the This research was partially supported by the WideHealth calculated features from the 3-second window, the model project - EU Horizon 2020, under grant agreement No prediction and a timestamp). The user can choose to export 952279. this data to csv and use it as a dataset. The model was evaluated on a practically collected REFERENCES dataset with a Samsung Galaxy s20 smartphone (5 minutes of each activity). The predicted value was compared to the [1] Patima Silsupadol, Kunlanan Teja, Vipul Lugade, “Reliability and validity of a smartphone-based assessment of gait parameters across actual activity by cross-referencing the timestamps (the walking speed and smartphone locations: Body, bag, belt, hand, and activities were performed at specific times), and a confusion pocket”, Gait & Posture, Volume 58, 2017, [2] matrix was created, from which the precision, recall and f1 O. D. Lara and M. A. Labrador, "A Survey on Human Activity Recognition using Wearable Sensors," in IEEE Communications score, as well as overall accuracy, was calculated. The results Surveys & Tutorials, vol. 15, no. 3, pp. 1192-1209, Third Quarter 2013 are displayed on Figure 6. [3] Kozina, S., Gjoreski, H., Gams, M., & Luštrek, M. (2013, September). Efficient activity recognition and fall detection using accelerometers. In International competition on evaluating AAL systems through competitive benchmarking (pp. 13-23). Springer, Berlin, Heidelberg. [4] Gjoreski, M.; Gjoreski, H.; Luštrek, M.; Gams, M. How Accurately Can Your Wrist Device Recognize Daily Activities and Detect Falls? Sensors 2016, 16, 800. https://doi.org/10.3390/s16060800 [5] Charissa Ann Ronao, Sung-Bae Cho, Human activity recognition with smartphone sensors using deep learning neural networks, Expert Systems with Applications, Volume 59, 2016, ISSN 0957-4174 [6] https://github.com/ViktorSrbinoski/SmartphoneActivityRecognition [7] Oresti Banos, Miguel Damas, Hctor Pomares, Ignacio Rojas, Mt Attila Toth, and Oliver Amft. A benchmark dataset to evaluate sensor Figure 6 Precision, recall and f1 score results on the practically displacement in activity recognition. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing, UbiComp ’12, pages 1026–1035, collected dataset on a Galaxy s20 smartphone New York, NY, USA, 2012. ACM. 14 Speaking Recognition with Facial EMG Sensors Antonio Nikoloski1, Petar Poposki1, Ivana Kiprijanovska2, *, Simon Stankoski2, Martin Gjoreski3, Charles Nduka1, Hristijan Gjoreski1, 2 1 Ss. Cyril and Methodius University in Skopje, N. Macedonia 2 Emteq Ltd, Sussex Innovation Centre, Science Park Square, Brighton, UK 3Università della Svizzera Italiana, Switzerland ivana.kiprijanovska@emteqlabs.com* ABSTRACT part due to its ability to be applied non-invasively, facial sEMG has been used to detect the activation of facial muscles that are With the advent of interactive virtual reality (VR) applications, activated during speaking. However, most sEMG sensors used in the interest in tools that allow users to engage with VR conventional speaking recognition systems have been attached environments unobtrusively and intuitively is growing. One such around the user's lips and neck. This poses a number of practical interfacing tool for VR applications is speech recognition, which issues, including the need for extra wearable devices in addition can contribute to enhanced human-computer interaction. In this to the VR headset, limited facial muscle movement, and user study, we explore the usage of a novel VR facial mask equipped discomfort. with seven surface electromyography (sEMG) sensors to To overcome these issues, in this study we explore the usage recognize if the user is speaking or not using machine learning. of a novel facial mask equipped with sEMG sensors. The mask We collected speaking and non-speaking data from 30 is incorporated into a VR headset to recognize if the user is participants. The machine learning pipeline that was developed speaking or not. Our approach is based on signal processing and included data preprocessing, de-noising, filtering, segmentation, machine learning (ML), which are used to develop a binary feature engineering, and training of a binary classification model. classification model. The experimental results indicate that the mask can be used to recognize the speaking activity. On the test data of five unseen participants, the best-performing model achieved an accuracy of 2 RELATED WORK 89% and an F1-macro score of 91. Additionally, by removing The first studies with sEMG sensors were performed by each sensor from the dataset, we analyzed the individual Piper[1]. Since then, researchers have been widely using sEMG influence each sensor has on the models' outcomes. We did not sensors to measure the electrical signal that emanates from observe a significant drop in the accuracy of the models, contracting muscles. The usefulness of the sEMG signal for indicating that using the mask speaking can be detected even if measuring human performance was demonstrated by Inman [2] some of the sensors are not used. who investigated the technical aspects of human locomotion. By KEYWORDS the early 1960s, the improvements in signal quality and convenience made the sEMG sensors a common tool in clinical speaking recognition, machine learning, classification, wearable and research laboratories. Despite their popularity, current sensors, surface EMG, facial muscles. recording methods can be problematic in maintaining signal fidelity when vigorous or long-duration activities are monitored [4] [3] . 1 INTRODUCTION Speech recognition by using sEMG was first used in the 80s Virtual reality (VR) is an emerging technology that has [4] [6] . The results in these studies were preliminary but introduced immersive user experience in virtual environments important for the further progress of the field. Jorgensen and and is expected to revolutionize the way we interact with the Binsted [6] showed that it is possible to recognize speaking even digital world. VR applications have already been widely used in if the words are spoken silently and/or without any actual sounds. many different disciplines, ranging from research and training Jou et al. [7] showed that it is possible to recognize not just the facilities to entertainment and healthcare. With the emergence of words but also the phonemes to a certain degree. Additional interactive VR applications, there is an increasing interest in new works include direct synthesis of speech via sEMG – which aids immersive tools that enable users to interact with VR people who have problems with their vocal cords or airways [8] surroundings in an unobtrusive and intuitive manner. One such [9] . interfacing tool for VR applications is speech recognition. Its Compared to the previous studies, we differ in the sense that incorporation with VR provides users with increased flexibility we are using a novel facial mask – emteqPROtm, which is for interfacing with VR environments and can contribute to equipped with seven sEMG sensors. The sEMG sensors may be improved human-computer interaction. more error-prone compared to the intramuscular EMG sensors, In recent years, surface electromyography (sEMG)-based and thus here we study their utility. Additionally, the location of interfaces have been utilized for unobtrusive interaction in a VR our sEMG sensors makes the task of speaking recognition more environment. sEMG is used to measure muscle contractions challenging because the facial mask is placed on the upper part using sensors applied directly on the skin by detecting changes of the face (as part of the VR headset) and not the mouth and the in surface voltages on the skin when muscle activation occurs. In lips – which would be more convenient for speech recognition. 15 3 DATASET features, and statistical features. The feature extraction procedure resulted in a total of 238 features. The data collection protocol included healthy participants that The extracted features were used as input to four classification were asked to read a pre-defined text (news article). Additionally, algorithms: (i) K- Nearest Neighbors [13] - a simple statistical we recorded a segment where the participants were sitting still, algorithm where a datapoint is assigned a class according to the i.e., we recorded a baseline session with a neutral face. This data most numerous class of its k nearest neighbors; (ii) Support was recorded while the participants were watching a neutral Vector Machine Classifier (SVM) [14] – an algorithm that works video, without moving their facial muscles or speaking. A total along the principle of finding a hyperplane in N-dimensional of 30 participants were recorded, of which 18 were male and 12 space to separate two classes of data points; (iii) Random Forest were female, with a mean age between 19 and 25 years. The [15] - an ensemble learning method that trains N decision trees native language of all the participants was Macedonian. using random subsets of data and features and determines the During the data collection protocol, we were using the instance’s class by majority voting among the trained decision emteqPROtm mask [10] [11] to record sEMG sensor data. The trees; and (iv) Extreme Gradient Boosting [16] - a gradient mask has seven EMG sensors (Figure 1): two frontalis sensors (6 boosting algorithm which trains decision tree models and 0 in Figure 1) used to monitor eyebrow movement; two sequentially, and each subsequent model strives to correct the orbicularis sensors (4 and 2 in Figure 1) used to monitor eye errors of its predecessors. movements; two zygomaticus sensors (5 and 1 in Figure 1) used to monitor mouth and cheek movements; and one corrugator sensor (3 in Figure 1) used to monitor forehead movements. 5 EXPERIMENTS 5.1 Evaluation Setup The recorded data was split into training (20 of the participants), validation (5 of the participants) and test datasets (5 of the participants). The train dataset was used to train the models, the validation was used to optimize hyperparameters, and the test dataset was used to report the accuracy. The evaluation metrics we used to test the performance of our models were accuracy and F1 score. Additionally, the experiments were performed so that the training validation and test subsets do not have overlapping participants - i.e., each participant's data is found only in one of the three subsets. This is done so that we replicate a scenario where the model is used in practice on participants that are not in the training dataset. 5.2 Default Hyperparameters Results Figure 2 presents the results (accuracy and F1-score) achieved by each of the algorithms with their default hyperparameters. We additionally included the Dummy Figure 1: emteqPRO face mask with all 7 EMG sensors classifier as a reference (which predicts the majority class). The results show significant improvement by all the algorithms 4 DATA PREPROCESSING AND compared to the Dummy classifier. The Random Forest and the MODELING SVM achieved similar results, while the XGBoost classifier achieved the best results overall (87% accuracy and 89% F1-The sEMG data were continuously recorded at a fixed rate of score). Apart from this, this classifier also scaled efficiently with 1000 Hz. These data underwent a data preparation process, which the size of the datasets, as it was able to quickly and efficiently included data filtering, segmentation, and feature engineering. create and train models. This was also beneficial for the To improve the quality of the sensor data, we performed signal hyperparameter optimization – explained in the next subsection. de-noising and filtering. The EMG signals were initially filtered with a Hampel filter to eliminate sudden peaks in the signals that emerge as a result of quick movements. Additionally, we also applied a frequency-based filtering method based on spectrum interpolation [12] to reduce the noise caused by electromagnetic interference. [12] A sliding window technique was utilized for data segmentation. Specifically, the data were segmented into windows of size of 0.5 seconds with 0.4 seconds overlap (0.1 seconds slide). Finally, for each sEMG channel, we extracted 34 features, including various amplitude-based features, amplitude derivatives, auto-regressive coefficients, frequency-based 16 1 portion of the baseline sessions, the model is falsely predicting 0.87 0.9 0.85 0.86 0.89 speaking activity. We speculate that the reason might be that 0.82 0.83 0.79 0.8 0.73 0.75 these two subjects were moving their head during the baseline 0.7 session, which may have caused the sensors to shift from their 0.58 0.6 original position and deteriorate their contact with the skin. 0.5 0.4 0.3 0.2 0.1 0 Dummy KNN SVM Random Forest XGBoost Accuracy F1 Score Figure 4: Continuous recognition results for the XGBoost Figure 2: Algorithm comparison (accuracy and F1-score) algorithm. The blue line represents true classes (1 – speaking, using default hyperparameters 0 – not speaking), and the orange line represents the predictions (1 – speaking) 5.3 Optimized Hyperparameters Results In the next step, we performed hyperparameter optimization. 5.5 Sensor Analysis Results This process involves iterative changes of certain parameters of We additionally analyzed the results achieved by the models a classifier. During this process, an interval for every if a certain sensor is missing. This way, we were able to check hyperparameter is defined, and afterward, each parameter is the importance of each sensor for the given task. Knowing the iteratively updated, and the performance of the models is positions of the sensors on the face, we wanted to learn how the monitored. During this step, all 238 features of the datasets were data would change if we were to drop data from a certain sensor used, and a large number of numerical and other parameters while keeping the rest. (such as kernel for SVM, booster for XGB, etc.) were tuned. The results are shown in Figure 5, which in general, show that Figure 3 presents the results (accuracy and F1-score) the drop in accuracy and F1 score is not significant for all the achieved by each of the algorithms after the hyperparameter sensors. The accuracy drops from 87% to 85% at most. A more optimization. The results show slight improvement for the KNN, detailed analysis shows that the sensors placed on left and right SVM, and XGBoost algorithms, the latest one achieving 89% orbicularis, corrugator, and left frontalis have the most impact on accuracy and 91% F1-score – which was the best score that we accuracy, i.e., the accuracy drops the most when one of these achieved on this dataset. sensors is missing. One of the reasons for this is that while the participants were speaking, they were actually reading – which 1 0.91 means they activated their eyes which is recorded by the 0.89 0.86 0.86 0.9 0.82 0.83 0.83 orbicularis muscles. This analysis shows us that certain muscles 0.78 0.8 activate more while speaking compared to others, so that is why 0.7 the model itself gains or loses accuracy more, depending on 0.6 which sensor is dropped. 0.5 0.4 0.3 90 88.8 89.0 89.3 0.2 87.6 87.9 87.8 88 87.5 87.3 87.3 0.1 86.5 0 86 85.2 KNN SVM Random Forest XGBoost 84.7 84.9 84.5 Accuracy F1 Score 84 82 Figure 3: Algorithm comparison (accuracy and F1-score) 80 using optimized hyperparameters Left Left Left Corrugator Right Right Right Frontalis Zygomaticus Orbicularis Orbicularis Zygomaticus Frontalis 5.4 Continuous Recognition Results Accuracy F1-score Figure 4 illustrates the continuous recognition results for the five subjects from the test set achieved by the best-performing Figure 5: Sensor analysis showing the performance when a XGBoost classifier. A comparison was made between the true particular sensor is missing. and the predicted class on a time scale, i.e., with a blue line, the true classes are presented (1 represents speaking, 0 represents not 6 CONCLUSION speaking). Additionally, the orange color presents the speaking predictions by the model. Each subject’s data is separated with In this work, we presented a ML approach for speaking black dashed lines in the figure. The results show that a large recognition using facial sEMG sensors integrated into a VR portion of the error is down to the baseline sessions of the last headset. The dataset was collected with 30 healthy participants two subjects in the test dataset, marked with red circles. In a large while reading a news article and watching videos. The results 17 show that the best performing model is XGBoost, which Proceedings of the 2021 ACM International Joint achieved 89% accuracy. Additionally, the error analysis per Conference on Pervasive and Ubiquitous Computing and participant showed that most of the misclassifications were Proceedings of the 2021 ACM International Symposium incorrect speaking predictions in the baseline (non-speaking) on Wearable Computers (pp. 23-25). sessions of two participants. We speculate that this is caused by [11] Gnacek, Michal & Broulidakis, John & Mavridou, Ifigeneia & Fatoorechi, Mohsen & Seiss, Ellen & the head movement of the participants and we plan to tackle this Kostoulas, Theodoros & Balaguer-Ballester, Emili & using the IMU sensor on the emteqPROtm mask. Kiprijanovska, Ivana & Rosten, Claire & Nduka, Charles. An additional problem was that while the participants were 2022. emteqPro-Fully Integrated Biometric Sensing Array reading, they were making small breaks, which were for Non-Invasive Biomedical Research in Virtual Reality. automatically labeled as speaking – but in fact were not speaking. Frontiers in Virtual Reality. 3. (Mar. 2022) This labeling problem will be tackled in future by using audio to [12] Mewett, D. T., Reynolds, K. J., & Nazeran, H. Reducing exactly label the speaking segments. power line interference in digitised electromyogram Finally, we plan to implement person-specific normalization recordings by spectrum interpolation. Medical and on the EMG data. This is an important step given that different Biological Engineering and Computing, 42(4), 524-531, participants have different facial muscles, and even more, those (2004). [13] D. Aha, D. Kibler (1991). Instance-based learning muscles are activated differently while doing the same facial algorithms. Machine Learning. 6:37-66. expressions or speaking. [14] Zhang, Yongli. (2012). Support Vector Machine Classification Algorithm and Its Application. 179-186. ACKWNOLEDGEMENT [15] Breiman, “Random Forests”, Machine Learning, 45(1), 5- Part of this study was supported by the Innovate UK Project no. 32, 2001. 81376: Virtual Reality rehabilitation tailored to older brain injury [16] Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A patients (Healthy Ageing), and part by the WideHealth project Scalable Tree Boosting System. In Proceedings of the 22nd no. 952279 - European Union’s Horizon 2020 research and ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). Association for innovation programme. Computing Machinery, New York, NY, USA, 785–794. REFERENCES [1] Piper H (1912) Elektrophysiologie menschlicher Muskeln. Springer, Berlin, pp 1–163. [2] Inman, V. T., Saunders, J. B., & Abbot, L. C. (1944). Observations on the function of the shoulder joint. Journal of Bone and Joint Surgery, 26, 1-30. [3] M. Wand, M. Janke, and T. Schultz, “Investigations on Speaking Mode Discrepancies in EMG-based Speech Recognition,” in Proc. Interspeech, 2011, pp. 601–604. [4] N. Sugie and K. Tsunoda, “A speech prosthesis employing a speech synthesizer—Vowel discrimination from perioral muscle activities and vowel production,” IEEE Trans. Biomed. Eng., vol. BME-32, no. 7, pp. 485–490, Jul. 1985. [5] M. S. Morse and E. M. O’Brien, “Research summary of a scheme to ascertain the availability of speech information in the myoelectric signals of neck and head muscles using surface electrodes,” Comput. Biol. Med., vol. 16, no. 6, pp. 399–410, 1986. [6] C. Jorgensen and K. Binsted, “Web browser control using EMG based sub vocal speech recognition,” in Proc. 38th Annu. Hawaii Int. Conf. Syst. Sci., 2005, p. 294c. [7] S.-C. Jou, T. Schultz, M. Walliczek, F. Kraft, and A. Waibel, “Towards continuous speech recognition using surface electromyography,” in Proc. Interspeech, 2006, pp. 573–576. [8] J. Freitas, A. Teixeira, and M. S. Dias, “Towards a silent speech interface for portuguese,” in Proc. Biosignals, 2012, pp. 91–100. [23] A. Toth, M. Wand, and T. Schultz, “Synthesizing speech from electromyography using voice transformation techniques,” in Proc. Interspeech, 2009, pp. 652–655. [9] K.-S. Lee, “Prediction of acoustic feature parameters using myoelectric signals,” IEEE Trans. Biomed. Eng., vol. 57, no. 7, pp. 1587–1595, Jul. 2010. [10] Gjoreski, H., I. Mavridou, I., Fatoorechi, M., Kiprijanovska, I., Gjoreski, M., Cox, G., & Nduka, C. EmteqPRO: Face-mounted Mask for Emotion Recognition and Affective Computing. In Adjunct 18 Machine-learning models for MDS-UPDRS III Prediction: A comparative study of features, models, and data sources Vitor Lobo1, Diogo Branco1, Tiago Guerreiro1, Raquel Bouça-Machado2,3, Joaquim Ferreira2,3,4 and CNS Physiotherapy Study Group2 1LASIGE, Faculdade de Ciências, Universidade de Lisboa 2CNS—Campus Neurológico, 3Instituto de Medicina Molecular João Lobo Antunes, 4Faculdade de Medicina, Universidade de Lisboa vitormarqueslobo@gmail.com;djbranco@fc.ul.pt;tjvg@di.fc.ul.pt;raquelbouca@gmail.com;jferreira@medicina.ulisboa.pt ABSTRACT require a visit to a clinic or hospital. Clinicians use validated as-Parkinson’s disease is the second most common neurodegenera- sessments for PD to characterize a patient’s current disease stage tive disease worldwide. Symptoms tend to fluctuate during the [9]. These assessments occur spaced in time and can be hard to day and through disease progression. Clinical evaluations tend capture all the fluctuations that may have happened between to occur spaced in time. Further, the assessments used are mostly appointments. Further, instruments used in clinical practice fo- subjective. The gold standard for evaluating disease severity is cus on subjective evaluations. Namely, visual assessments during MDS-UPDRS. The increase in sensor usage enabled objective clinical visits that are supported by clinical scales. evaluation and continuous monitoring of the disease fluctuations. The gold standard for evaluating disease severity in PD is the One of the symptoms that most affect mobility are gait disor- Movement Disorder Society-Sponsored Revision of the Unified ders. The use of gait characteristics started to become popular to Parkinson’s Disease Rating Scale (MDS-UPDRS). This is a com- monitor the disease. However, the approaches used lack in-depth prehensive rating scale that assesses both motor and non-motor knowledge of machine learning models for disease staging. In symptoms associated with Parkinson’s [7]. To optimize disease our work, we try to estimate the MDS-UPDRS part III score from management, close monitoring of symptom fluctuations is crucial. accelerometer data. We collected data from 74 patients using the However, today this monitoring is usually performed through Axitvity AX3 device both on the wrist and lower back. We did medical appointments, every six months, with a mean duration experiments with different models, features, and windows size. of 30 minutes. Additionally, what published evidence suggests is We achieved a 4.26 Mean Absolute Error on the on left out 10% that patients perform differently during these moments, provid- data using both devices with a 2.5-second sliding window and a ing only information about their best capacity, rather than their random forest model for prediction. We contribute with a com- usual performance in their daily lives. parison of the performed experiments and provide, according to The democratization of sensors’ usage, namely the body-worn our experiments, the optimal models for MDS-UPDRS part III devices, that measure acceleration, and angular velocity allowed estimation using only accelerometer data. the increase of objective evaluations [10]. These devices passively monitor patients during clinical evaluation and in free-living KEYWORDS environments. Furthermore, allow movement metrics and feature gait, accelerometer, mds-updrs, Parkinson’s disease, features, ma- extraction that can be related to motor symptoms or clinical chine learning, models scales used for disease assessments [6]. Gait disorders are one of the symptoms that most affect mobility. Inertial measuring units can help to identify fluctuations. There have been studies 1 INTRODUCTION that leverage the identification of walking bouts to extract gait Parkinson’s Disease (PD) is a neurodegenerative disease that metrics like step length or step variability [1, 4]. affects around 1% of the world’s population. This disease is char- Research using these gait characteristics as a marker for PD acterized by motor and non-motor symptoms [15]. Motor symp-has demonstrated the potential for monitoring the disease in toms include bradykinesia, tremor, rigidity, and gait impairment. several ways [2]. While the use of these gait characteristics has These are present in the early stages of the disease and worsen become a popular approach for monitoring PD, novel research as the disease progresses. has started to analyze signal processing metrics that could also Although there is no cure, the available pharmacological and be of use for this purpose. In a 2019 study, the contributions of non-pharmacological therapeutic interventions effectively con- signal-based features and gait characteristics for the classification trol symptoms. However, as the disease progresses their efficacy of PD were analyzed [13]. Another emerging method to stage tends to reduce and motor complications, such as motor fluc-PD is the use of total scores of the entire MDS-UPDRS or sub- tuations and dyskinesia, appear [11]. These have been labeled parts of the scale. Specifically, MDS-UPDRS III scores have been as ’ON’ and ’OFF’ stages [4]. To minimize the impact of these empirically demonstrated as a good metric for monitoring the fluctuations and inform better the clinicians there is the need to progression of PD [12]. As such, several studies have focused periodically assess the symptoms. Generally, these evaluations on the prediction of this score to monitor disease progression. A recent example of this approach for the monitoring of PD Permission to make digital or hard copies of part or all of this work for personal progression is the 2021 study that leveraged a convolutional or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and neural network (CNN) model trained using inertial data collected the full citation on the first page. Copyrights for third-party components of this from the lower back during gait to estimate MDS-UPDRS III work must be honored. For all other uses, contact the owner/author(s). scores [14]. While these results are promising, the authors suggest Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia that a comparison with traditional feature-engineered machine © 2021 Copyright held by the owner/author(s). learning models could be an avenue for future work, towards 19 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Lobo et al. the deployment of such technologies for continuous monitoring 10-meter walk test. Visualization of each of the segmented gait in- of PD. Other studies have revealed that it is possible to estimate stances was then created in order to exclude session data that con- PD progression using gait data collected with accelerometers [8]. tained sensor failures and misalignment, or mismatched times- However, the relative efficacy and effect of different approaches tamps. During this step, the vector magnitude of the accelerome- to data collection and processing, and machine learning pipeline try signal was computed and appended to each segment using design still lack consensus and clear comparisons that could help the traditional euclidean vector norm formula √︁ 2 2 2 𝑥 + 𝑦 + 𝑧 . To inform future research in this field. avoid the possible temporal drift associated with the process, In our work, we try to estimate the MDS-UPDRS part III from a resampling step was performed after segmentation to ensure accelerometer data. We collected the data using the Axitvity even sampling, as required for the extraction of some of the used AX3 device both on the wrist and lower back [3]. Our dataset Time and Frequency domain features. Finally, all segments were contains data collected from 74 patients (HY between 2 and 4) filtered using a fourth-order, digital low pass Butterworth filter at Campus Neurológico (CNS), a tertiary specialized movement with a cut-off frequency of 20 Hz in order to remove possible disorders center in Portugal. The final subset of data contained ”machine noise” [5]. 267 instances of gait from 104 evaluation sessions. We did differ- ent experiments with 4 models (Random Forest, XGBoost, SVM, Linear Regression), and 59 features from the statistical, spectral, 2.3 Evaluated Models and Features and temporal domains. Furthermore, we used non-overlapping We used 16 statistical, 26 temporal, and 17 spectral domain fea- window sizes of 2.5 and 5 seconds. To validate the trained models tures, with a total of 59. They were computed from all accelerom- we used Leave One Subject Out (LOSO) cross-validation. etry axes and vector magnitude. A sliding window technique Our results showed that the best configuration, with the lowest was used to segment the signal into non-overlapping windows prediction error on the left out of 10% data, achieved a 4.26 MAE, from which the features were extracted. Different feature data with the Random Forest model, and a 2.5-second sliding window frames were then created using 2.5 and 5-second windows, both using combined data from the wrist and lower back. For all of of which were previously used in the literature [14], in order the selected models, the configurations that achieved the best to assess the effect of window size on the estimation task. Dur- results using either of the validation schemes used data collected ing this feature extraction process, MDS-UPDRS III scores were from the lower back or both sensors. Most models performed also computed and appended to the corresponding windows for better using a 5-second window length, with the exception of both data frames. The first step toward feature selection was the xgboost model. The best-performing linear regression and to use a variance filter to exclude features with low (<0.025%) SVM-based models used the SURF and relieF feature selection or zero variance which lowered the feature space from 2081 to methods. 266 in the 2.5-second window and 3081 to 452 in the 5-second Therefore, we contribute with the comparison of different window. While this reduction may seem drastic, it is to be ex- models, features, sensor placement, and window sizes. We pro- pected because of the way Time Series Feature Extraction Library vide, according to our experiments, the optimal models for MDS- works, computing the same feature several times for different fre- UPDRS part III estimation using only accelerometer data. quencies for example which results in a large number of feature columns with hardly any variability, and thus, descriptive power. 2 METHODS A further feature selection step was performed using four differ- The MDS-UPDRS III estimation was performed using different ent feature selection methods that implement different strategies approaches to data collection, signal processing, and using dif- for feature ranking. Each of these feature selection algorithms ferent machine learning pipelines. In this section, we describe was used to rank and select the top 10/25/50 features to be used the steps taken together with the variables for each step, in order for the regression task using the linear regression algorithm, to enable a comparison between different design decisions and and with the support vector-based model. The complete feature their effect on the estimation of the disease stage. subset was also used for these models, in order to establish a baseline comparison with the remaining tree-based models that 2.1 Data Collection are less affected by the number of features due to their capability to perform intrinsic feature selection. We collected data from 74 patients with PD at CNS from peri- For each model, a set of parameters were selected and used in odic evaluations conducted by trained physiotherapists. Each a grid search procedure to test all possible combinations. This participant wore an Axivity AX3 on the wrist and lower back procedure was then carried out for each sensor placement and the during a set of clinical assessments. Accelerometer data was set combined sensors, and for the different sliding window lengths to record at 100 Hz. Our dataset includes 267 instances of gait used during feature extraction, in order to compare the effect of from 104 evaluation sessions of the 10-meter walk. MDS-UPDRS these variables for the estimation task. Leave One Subject Out were also applied for each patient in each session. Among these (LOSO) cross-validation was used during the grid search proce- patients, 49 were male and 23 were female, while the gender of dures in order to avoid overfitting and optimize the models for the remaining 2 patients was not reported. The average patient generalizability. Finally, the optimal models for each combination age was 70.4 years (SD=13.12). The average weight was 71.76 of these variables were saved and used for the ensuing valida- kg (SD=13.89) and the average height was 166.49 cm (SD=9.26). tion tasks. To validate the trained models, the original dataset Finally, the average MDS-UPDRS III score was 40.92 (SD=14.31) was split into training and testing subsets. The training subset and 2.57 (SD=0.97) for the H&Y scale. comprised 90% of the data and was used during the grid-search procedure for training the models using LOSO cross-validation. 2.2 Data Pre-Processing The remaining 10% of the data was then used as a validation set In order to isolate gait instances, the selected data files were to test the model’s performance on unseen data from patients segmented using the annotated timestamps for the 3 trials of the whose data the model had already seen, providing information on 20 Machine-learning models for MDS-UPDRS III Prediction Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia val_m model device_placement win_length ft_sel num_fts loso_mae val_mae 1 rf combined 250 - 266 11.50 4.26 1 xgboost trunk 500 - 229 11.67 4.39 1 svm combined 500 SURF 25 9.99 7.95 1 lin_reg combined 500 reliefF 25 10.21 8.98 2 rf combined 500 - 452 11.39 11.39 2 xgboost trunk 250 - 133 11.49 5.74 2 svm combined 500 SURF 25 9.99 7.95 2 lin_reg combined 500 reliefF 25 10.21 8.98 Table 1: Optimal configurations used by each model to achieve optimal MAE on the left out 10% of data (val_m => 1) and LOSO (val_m => 2). validation schemes using data from both sensors, with the excep- tion of the SVM-based model using a 2.5-second window, which compared to the other options using the same window length achieved lower, albeit negligible, validation MAE using data from the wrist. As for the tree-based models, optimal validation MAE Figure 1: Overall optimal predictions on the 10% of left was attained by models using both sensors with the 2.5-second out data using a Random Forest model on data collected sliding windows, and data from the lower back for the same from both sensors and a 2.5s sliding window. Each point models using the 5-second window. Figures 2a and 2b illustrate represents a window. the intra and inter-model comparison for both of the validation schemes, using different window lengths. While the fluctuations the model’s ability to estimate MDS-UPDRS III scores for patients were relatively low using LOSO CV, most models performed bet- that were already known to these models. These steps yield two ter using a 5-second window length, with the exception of the different scores for each of the optimal models using the same xgboost model. MAE using the left out 10% of validation data Mean Absolute Error (MAE) evaluation metric: the average MAE fluctuated more considerably but was also lowest using 5-second for all LOSO splits during training and the MAE for the held-out windows for all models except RF. validation set. For the purpose of this study, this metric is defined as the mean absolute difference between real (x) and estimated 3.3 Optimal parameters (y) MDS-UPDRS III scores over the number of samples used for As for model parameters, excluding linear regression, the remain- estimation. ing models had different parameters to achieve the best perfor- mance during LOSO CV. For Random Forest (criterion: mae ; 3 RESULTS AND DISCUSSION max_features: 0.333 ; n_estimators: 250), for xgboost (colsam- This section lays out the results from all of the steps taken to- ple_bynode: 1; eta: 0.1 ; importance_type: total_gain; max_depth: ward UPDRS III estimation, including data processing, feature 3 ; num_parallel_tree: 100 ; tree_method: gpu_hist), and for svm extraction and selection, and finally model training and validation (C: 10 ; epsilon: 0.3 ; gamma: auto ; kernel: rbf). The xgboost was results. the one that used only the trunk sensor. The others models used both devices. We used a Grid Search procedure that exhaustively 3.1 Optimal configurations tested all parameter combinations for each model, independently The configuration with the lowest prediction error on the left of the used device placements and sliding window lengths. The out 10% of data used data from both devices processed using exhaustive nature of the grid search procedure makes this method a 2.5-second sliding window and a Random Forest model for of parameter optimization computationally expensive. For this prediction, achieving 4.26 MAE and strong correlation (𝜌 = 0.93) reason, and considering that the procedure was used for several as illustrated in Figure 1. The best performing configuration models, the used parameter space for each model was not as when performing LOSO CV was a Support Vector-based model, comprehensive as those used in some other works with a smaller using data from both sensors but a 5-second feature extraction scope and narrower focus. However, the present results should window, achieving a MAE of 9.99. While predictions using this still serve as a good starting point for model tuning in future model on the validation set were less accurate than some of the research. other options at 7.94 MAE, it achieved the best balance when considering both of the validation schemes. Table 1 summarizes the optimal results achieved by each model along with the used data sources and sliding window length for the 10% left out and LOSO validation tasks. 3.2 Sensor placement and windows size Both device placement and window length used during feature extraction significantly impacted the performance of all models. (a) LOSO CV MAE values (Y-axis) (b) LOSO CV MAE values (Y-axis) For all of the selected models, the configurations that achieved for different device placements for different device placements the best results using either of the validation schemes used data using 5-second windows. using 2.5-second windows. collected from the lower back or both sensors combined. Specifi- cally, all of the non-tree-based models performed better in both Figure 2 21 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Lobo et al. 3.4 Feature importance opportunities for longitudinal studies in free-living environments For the models that benefited from it, several feature selection with larger datasets. methods were tested, along with different numbers of features to select. The best performing linear regression and SVM-based ACKNOWLEDGEMENTS models used the SURF and relieF feature selection methods re- We would like to thank all the participants that kindly partici- spectively, both selecting 25 as the optimal number of features. pated in the studies. This project was partially supported by FCT We then selected the top 20 for each model. Among the 8 top per- through LASIGE Research Unit funding refs. UIDB/00408/2020 forming models across the two tested window lengths, no model and UIDP/00408/2020 and SFRH/BD/144242/2019 to Diogo Branco., used data exclusively from the wrist, and only 3 models used and the WideHealth project which has received funding from data exclusively from the trunk. As for the remaining models, the European Union’s Horizon 2020 research and innovation the majority of top-ranking features were extracted from devices programme under grant agreement No. 952279. mounted on the lower back. In some cases, no wrist features were ranked among the top 20, which suggests that although these REFERENCES were used for the estimation task, their contribution is minimal, [1] Raquel Bouça-Machado, Diogo Branco, Gustavo Fonseca, Raquel Fernan-which is in line with the minimal performance gain in these mod- des, Daisy Abreu, Tiago Guerreiro, Joaquim J Ferreira, and CNS Physiotherapy Study group. 2021. Kinematic and clinical outcomes to evaluate the els when compared to their counterparts using data exclusively efficacy of a multidisciplinary intervention on functional mobility in Parkin-from the lower back. Features from the anteroposterior plane of son’s disease. Frontiers in neurology 12 (2021), 637620. movement (z-axis) were the most prevalent among the top 20 [2] Raquel Bouça-Machado, Constança Jalles, Daniela Guerreiro, Filipa Pona-Ferreira, Diogo Branco, Tiago Guerreiro, Ricardo Matias, and Joaquim J Fer-extracted from the trunk sensor, consisting of 50 out of the 140 reira. 2020. Gait kinematic parameters in Parkinson’s disease: a systematic features considered for this analysis. The vertical plane of move-review. Journal of Parkinson’s disease 10, 3 (2020), 843–853. [3] Clare L Clarke, Judith Taylor, Linda J Crighton, James A Goodbrand, Mar-ment (x-axis) produced the least amount of features among those ion ET McMurdo, and Miles D Witham. 2017. Validation of the AX3 triaxial considered here, with only 22 ranking among the top contribut-accelerometer in older functionally impaired people. Aging Clinical and Ex-ing features. Spectral-domain features were the most prevalent perimental Research 29, 3 (2017), 451–457. [4] Silvia Del Din, Alan Godfrey, Brook Galna, Sue Lord, and Lynn Rochester. among these, making up almost half of the 140 considered fea- 2016. Free-living gait characteristics in ageing and Parkinson’s disease: impact tures, with temporal domain features coming in second by a small of environment and ambulatory bout length. Journal of neuroengineering and margin, and temporal features last consisting of a quarter of this rehabilitation 13, 1 (2016), 1–12. [5] Aiden Doherty, Dan Jackson, Nils Hammerla, Thomas Plötz, Patrick Olivier, total. Malcolm H Granat, Tom White, Vincent T Van Hees, Michael I Trenell, Christoper G Owen, et al. 2017. Large scale population assessment of physical activity using wrist worn accelerometers: the UK biobank study. PloS one 12, 3.5 Limitations 2 (2017), e0169649. The dataset used in this study consisted of data collected from 74 [6] Alberto J Espay, Paolo Bonato, Fatta B Nahab, Walter Maetzler, John M Dean, Jochen Klucken, Bjoern M Eskofier, Aristide Merola, Fay Horak, Anthony E patients. While this number of patients is significant for prelimi- Lang, et al. 2016. Technology in Parkinson’s disease: challenges and opportu-nary results, a larger sample size could improve the estimation nities. Movement Disorders 31, 9 (2016), 1272–1282. [7] Christopher G Goetz, Barbara C Tilley, Stephanie R Shaftman, Glenn T Steb-task and further validate the present findings. Beyond the volume bins, Stanley Fahn, Pablo Martinez-Martin, Werner Poewe, Cristina Sampaio, of data used to train the models, a wider range of MDS-UPDRS Matthew B Stern, Richard Dodel, et al. 2008. Movement Disorder Society-III and Hoehn and Yahr scores could also possibly improve the sponsored revision of the Unified Parkinson’s Disease Rating Scale (MDS-UPDRS): scale presentation and clinimetric testing results. Movement disorders: results, by including a wider variety of walking patterns that in official journal of the Movement Disorder Society 23, 15 (2008), 2129–2170. smaller sample sizes could be considered outliers and negatively [8] Murtadha D Hssayeni, Joohi Jimenez-Shahed, Michelle A Burack, and Behnaz affect performance. Furthermore, the inclusion of a healthy co-Ghoraani. 2021. Ensemble deep model for continuous estimation of Unified Parkinson’s Disease Rating Scale III. Biomedical engineering online 20, 1 (2021), hort in the dataset could provide a baseline for the models to 1–20. recognize healthy gait, exacerbating the difference between data [9] Anthony E Lang, Shirley Eberly, Christopher G Goetz, Glenn Stebbins, David Oakes, Ken Marek, Bernard Ravina, Caroline M Tanner, Ira Shoulson, and from healthy and affected subjects. Therefore, in future work a LABS-PD investigators. 2013. Movement disorder society unified Parkinson longitudinal study in free-living environments with a larger sam-disease rating scale experiences in daily living: longitudinal changes and ple size to address our limitations and extend our conclusions. correlation with other assessments. Movement Disorders 28, 14 (2013), 1980– 1986. [10] Walter Maetzler, Josefa Domingos, Karin Srulijes, Joaquim J Ferreira, and Bas-4 CONCLUSIONS tiaan R Bloem. 2013. Quantitative wearable sensors for objective assessment of Parkinson’s disease. Movement Disorders 28, 12 (2013), 1628–1637. This paper presents a study that compares the different models, [11] C Warren Olanow, Yves Agid, Yoshi Mizuno, Alberto Albanese, U Bonucelli, features, and window sizes to estimate MDS-UPDRS part III using Philip Damier, Justo De Yebenes, Oscar Gershanik, Mark Guttman, F Grandas, et al. 2004. Levodopa in the treatment of Parkinson’s disease: current contro-acceromeleter data. One of the most common disorders for people versies. Movement disorders 19, 9 (2004), 997–1005. with PD is gait. The increase in sensor usage opened the oppor- [12] Antoine Regnault, Babak Boroojerdi, Juliette Meunier, Massimo Bani, Thomas tunity for increasing objective evaluations. However, there is a Morel, and Stefan Cano. 2019. Does the MDS-UPDRS provide the precision to assess progression in early Parkinson’s disease? Learnings from the Parkin-lack of knowledge of the current machine learning approaches. son’s progression marker initiative cohort. Journal of neurology 266, 8 (2019), In our work, we compare 4 machine learning models (random 1927–1936. [13] Rana Zia Ur Rehman, Christopher Buckley, Maria Encarna Micó-Amigo, forest, xgboost, svm, and linear regression), 59 features (16 statis-Cameron Kirk, Michael Dunne-Willows, Claudia Mazzà, Jian Qing Shi, Lisa tical domain, 26 spectral domain, and 17 temporal domain), and Alcock, Lynn Rochester, and Silvia Del Din. 2020. Accelerometry-based digital windows size (2.5 and 5 seconds). To validate our models we used gait characteristics for classification of Parkinson’s disease: what counts? IEEE open journal of engineering in medicine and biology 1 (2020), 65–73. LOSO cross-validation. We showed that the configuration with [14] Rana Zia Ur Rehman, Lynn Rochester, Alison J Yarnall, and Silvia Del Din. the lowest prediction error on the left out 10% of data used data 2021. Predicting the Progression of Parkinson’s Disease MDS-UPDRS-III from both devices processed using a 2.5-second sliding window Motor Severity Score from Gait Data using Deep Learning. In 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society and a Random Forest model for prediction, achieving 4.26 MAE. (EMBC). IEEE, 249–252. This work opens the opportunity to improve the knowledge of [15] Ole-Bjørn Tysnes and Anette Storstein. 2017. Epidemiology of Parkinson’s disease. Journal of neural transmission 124, 8 (2017), 901–905. machine learning approaches. However, in future work, there are 22 Elements of a System for Automatic Monitoring of Specific Mental Health Characteristics at Home Kristina Kirsten, Bert Arnrich Hasso Plattner Institute University of Potsdam Potsdam, Germany {kristina.kirsten,bert.anrich}@hpi.de ABSTRACT assessment, has many advantage as it can minimize retrospective Addressing one’s mental health has never been more important. bias. On the one hand, it enables long-term monitoring which The incidences of mental diseases, such as depression or anxiety makes it easier to detect small changes. On the other hand, data disorders, have drastically increased in recent years. The longer can be collected at the time of occurrence and do not have to an adequate treatment is delayed, the greater the impact on the be remembered and described retrospectively when the actual severity of the illness which often results in long absences from condition has already passed [17]. work. With the development of smart devices and wearables, it This paper presents a collection of elements that can be in- is already possible to measure many physiological parameters cluded in a system for automatic monitoring of mental health in everyday life. In addition, monitoring people in their natural characteristics in the home environment. These approaches go environment offers many advantages, e.g. it is not based on retro- beyond conventional questionnaires and refer to technical pos- spective feelings and memories but can measure and reflect the sibilities for measuring individual characteristics. For this, we momentary state. This conceptual paper presents an overview look at various characteristics of individual mental disorders of possible elements of a system for automated monitoring of and present ways in which these can be measured in an auto- mental health characteristics in the home. We describe examples matic way. However, questionnaires, for example in the form of typical parameters for various mental disorders and present of ecological momentary assessments (EMAs), can always be different systems and methods to measure them. Furthermore, considered as an additional tool for comparison with the auto- we show how the individual components of a system can be matic measurements. Finally, we also review different solutions connected to get a holistic view of specific mental health charac- for measurability and propose a potential system overview. teristics. Finally, we also discuss challenges and limitations. 2 BACKGROUND KEYWORDS Mental illnesses are disorders that are very diverse and individual mental health, wearables, ubiquitous sensing, monitoring concept and can affect thinking, mood, and behavior. In 2019, 280 million people were living with depression, 301 million people had an 1 INTRODUCTION anxiety disorder, 40 million people had a bipolar disorder and Being mindful of mental health is more important than ever. 14 million people suffered from an eating disorder [20]. But also In 2019, according to the World Health Organization (WHO), lesser-known disorders, such as OCD, which affects about 2.3% of one in eight people worldwide suffered from a mental disorder people at least once in their lifetime [11], should not be ignored. [20]. That is associated with significant impairments in thinking, There are characteristics or behavioral patterns that can be emotion regulation, or behavior. The WHO also states that in observed in various mental illnesses and also generally indicate a 2020, the number of people with depression and anxiety disorders bad mental health state. These include, but are not limited to, sad-increased significantly, due to the COVID-19 pandemic. ness and dejection, excessive anxiety or worry, decreased ability The most common mental illnesses include depression, anxiety to concentrate, significant fatigue, low energy, sleep problems, disorders, bipolar disorder, and obsessive-compulsive disorder and inability to cope with everyday problems or stress [2]. (OCD), among others. Often, initial symptoms are not recognized Nevertheless, each mental disorder also has very specific char-and, consequently, diagnoses are made late, which in many cases acteristics. Depressed patients, for example, often describe feeling leads to a worsening of the symptoms [6]. Nevertheless, mental empty and worthless inside and experiencing hopelessness, sad-illnesses have, partly overlapping, typical characteristics. For ness, and restlessness. Sleep is also affected in most patients, but example, fatigue, and lack of energy are among the most common it can go both ways, with insomnia or excessive need for sleep symptoms of depression, or checking things over repeatedly are as symptoms. Furthermore, a loss of interest in hobbies and so- signs of OCD. Some of these characteristics are measurable and cial activities may also indicate depression. Sometimes patients interpretable with modern sensors, devices, and machine learning even report unexplained physical problems such as back pain or models especially when it comes to behavioral or determining headaches [1]. People with bipolar disorder also experience the physiological parameters. In addition, studying people in their above symptoms during the depressive phase. But in addition natural environment respectively at home, so-called ambulatory to this, patients also go through manic episodes. In this phase, many characteristics of the depressive episode reverse. Patients Permission to make digital or hard copies of part or all of this work for personal often experience an energetic and euphoric phase where their or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and motivation is increased, concentration is improved, less sleep is the full citation on the first page. Copyrights for third-party components of this required, and they feel the drive to be active [3]. work must be honored. For all other uses, contact the owner/author(s). There are several types of anxiety disorders, including gen- Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia © 2022 Copyright held by the owner/author(s). eralized anxiety disorder (GAD), panic disorder, social anxiety disorder, and phobia-related disorders [4]. They have in common 23 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Kirsten et al. that people suffer from anxiety over a long period of time, which sensors, heart rate sensors but also Global Positioning System also often increases and interferes with daily activities ranging (GPS) and inertial measurement units (IMUs). The latter is a com-from the job to personal relationships. In anxiety disorders, indi- bination of several inertial sensors such as a 3D accelerometer viduals often experience physical symptoms. GAD often comes and a 3D gyroscope. However, the term IoT covers many more with headaches, muscle and stomach pain, or other unexplained areas and intelligent devices, such as connected personal scales, aches. During a panic attack, affected people may feel a racing smart ovens, and stoves, or smart lighting systems which can be heart, sweat intensely, tremble, experience loss of control, or feel grouped together under the term smart home. chest pain. In addition, people with social anxiety disorder tend to blush, adopt a rigid posture, or speak with an overly soft voice. For OCD, patients suffer from recurrent obsessive thoughts 3.2 Human Activity Recognition or compulsive acts. Obsessive thoughts are ideas, images, or impulses that repeatedly appear in the mind of the affected person. The topic of HAR has been widely researched as it offers enor-The patient cannot successfully suppress these thoughts. Further, mous potential and numerous use cases [8, 9, 12]. It comprises the more obvious symptoms of OCD are compulsive acts or rituals. research field of automatic detection and differentiation of vari- They are closely related to the obsessions and serve to alleviate ous everyday activities and can be divided into video-based and them and the anxiety that is constantly present. The patient is sensor-based HAR. With the development of new and increas-aware of the unusualness of these actions. Most compulsive acts ingly powerful smart devices and wearables, HAR is becoming involve cleaning (especially hand washing), repetitive checking less expensive, easily accessible, and unobtrusive. Research shows to ensure that a potentially dangerous situation does not occur, that when combining data from different devices, such as smart- or order and cleanliness [5]. phone and smartwatch, the results become even more accurate For any mental illness, not every patient needs to experience [13]. These days, HAR goes far beyond simple classifications, all of the characteristic symptoms. Because symptoms can overlap such as the distinction between sitting, standing, and walking. between disorders, it can be difficult to clearly assign them to Among others, HAR also finds great application in the healthcare a single mental illness. By having a system that automatically sector, e.g. through gait analyses that indicate diseases such as monitors a range of characteristics, a more holistic picture of Alzheimer’s [18] or in systems that focus on elderly care to detect mental status can be created, and changes can be detected early. falls [10], for example. Diagnoses for mental illness can only be made by professionals. Experts often use various forms of questionnaires and scales to determine the severity of an illness (e.g. Beck Depression 3.3 Indoor Positioning Systems Inventory for depression or Yale-Brown Obsessive Compulsive The ability to determine a person’s exact location in a home can Scale for OCD). However, collecting and analyzing sensor data help better identify activities that are connected to specific loca-to monitor mental health in general, is a topic that has been tions, for example, compulsive or eating behavior. Although GPS studied a lot in recent years but is still very relevant and has offers high coverage, it is not suitable for indoor localization be-great potential. The majority of studies are related to the analysis cause the receiver and satellite have to be in the line of sight, and of smartphone data, but wearables are also increasingly used for walls, roofs, and other objects prevent this. That is why in recent mental health studies. When it comes to the specific monitoring years approaches for IPS have been designed which use vari-of certain mental illnesses, the vast majority of these studies ous available technologies such as radio-frequency identification relate to anxiety disorders, depression, bipolar disorder or stress (RFID), Wireless Local Area Networks (WLAN), Bluetooth Low in general [14]. This paper focuses on technical possibilities to Energy (BLE) beacons, and more recently Ultra Wideband (UWB) unobtrusively measure certain mental health characteristics in [15, 21, 22]. Localization techniques can be divided into triangu-the home environment by using the latest technologies. lation algorithms (e.g. Time of Arrival (ToA), Time Differences of Arrival (TDoA), Received Signal Strength Indicators (RSSI)-based, 3 MONITORING SYSTEM ELEMENTS Angle of Arrival (AoA)), scene analysis (e.g. Fingerprinting-based To monitor certain mental health characteristics in the home techniques) and proximity detection algorithms [21]. The latter environment, it is possible to use various new wearable devices, is the process of determining whether a user is close to a cer- human activity recognition (HAR), indoor positioning systems tain range. This concept is often found in combination with BLE (IPSs) and already derived parameters from consumer devices. beacons, which are installed stationary at points of interest and send Bluetooth packets that are picked up and processed by the user’s smartphone, calculating the distance. In a scene analysis 3.1 Smart Devices and Wearables with using Fingerprints, measurements as e.g. RSSI-values, are The smartphone is an integral part of everyday life and almost collected in an offline phase for different positions and stored all of us carry it with us all the time. Although it is the most com-in a map. For position determination in real-time, the current mon everyday smart device, the use of so-called wearables has measurements are then compared with offline measurements to also been rising rapidly in recent years [19]. The term Internet determine the user’s location [22]. of Things (IoT) is shaping the technological development of the Different localization techniques have advantages and dis-last decade. It includes devices such as activity trackers, smart- advantages and it depends on the use case which methods are watches, and smart rings. Since these are worn on the body and suitable. Most triangulation techniques (e.g. AoA) provide high therefore often called wearables, they can measure physiological accuracy but require complex hardware and extensive synchro- parameters such as heart rate variability (HRV), blood oxygen nization. Whereas RSSI- and Fingerprinting-based methods are level, or skin conductivity. The modern smart devices contain fairly easy to use but with lower accuracy or, in the case of Finger-a variety of sensors, such as oximetry sensors, skin tempera- printing, with a dependence on a predefined map that is sensitive ture, and ambient temperature sensors, electrodermal activity to any change in the home environment [22]. 24 Elements of a System for Automatic Monitoring of Specific Mental Health Characteristics at Home Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia 3.4 Derived Parameters Table 1: Listing of exemplary mental health characteristics In addition to using raw sensor data for use cases like HAR or IPS, and possibilities of monitoring them. HAR corresponds to consumer devices often provide pre-calculated values and derived the detection of human movements with motion sensors, parameters, such as about sleep. Many device manufacturers PP stands for measuring physiological parameters and IPS try to draw conclusions about sleep duration, sleep quality, and implies the positioning of a person in the room or home. sleep phases. Additionally, information such as screen time, the frequency with which the phone is picked up, or the number of Characteristic HAR PP IPS Others calls and messages is also documented. Even though many of derived smartphone these values are pre-calculated and in some cases do not provide Sleeping and smartwatch much information on their own, they can give insights when Behaviour x x (x) parameters (sleep combined with each other and with data from additional devices. hours, sleep phases, sleep quality) 4 EXEMPLARY SYSTEM OVERVIEW Compulsive Handwashing x (x) This section describes example characteristics and their monitor- Compulsive ing possibilities, and proposes a connected system architecture. Checking x x Stress 4.1 Characteristics Monitoring Level (x) x The possible elements of a monitoring system presented in the interaction with previous section, offer particular value when combining them. Eating IoT devices, e.g. Different systems and methods are needed to measure specific Behavior x x x personal scale, psychological characteristics. To illustrate this, we looked at some microwave symptoms and characteristics of mental illnesses and considered derived smartphone how these can be measured. The following Table 1 shows a short and smartwatch list of mental health characteristics and possible ways of mea- Social parameters (screen suring them. This table represents an exemplary overview and Interaction (x) (x) time, pick up times, therefore does not claim to be complete. With this table, we show phone call and that different characteristics can be measured and documented messages frequencies) with the same sensors, wearables, and systems but also that one characteristic can be determined with more than one measure- ment. We focused on the three main elements for monitoring, namely a HAR system, measuring and evaluating physiological individual expressions. For this purpose, it can be helpful to train parameters (abbreviated with PP in the table), and using an IPS. a personalized machine learning model for a potential patient Additionally, we list other parameters or devices which can sup- in order to observe variations from normal behavior. In general, port the measurement of the respective characteristic. For some personalized models are well suited to represent the individual characteristics, additional information might increase the accu- aspects of everyday activities. racy and lead to a greater knowledge gain (indicated by (x) in the table). In general, it can be said that oftentimes the combination of different input signals and parameters leads to a better system 4.2 Connected System quality [7]. We do not present the exact algorithms and devices, In Figure 1 we demonstrate how the individual components of a as these depend heavily on other external factors (availability of system for monitoring characteristics of mental disorders can be devices, overall use case, acceptance of the user, privacy aspects). connected. Depending on the concrete use case, data from mul- It has long been known that sleep, e.g. in form of insomnia, tiple devices will be constantly collected. For energy efficiency, is an essential feature of mental disorders such as depression or it makes sense to store the collected data on the respective de- anxiety [16]. Sleeping behavior can be observed across a variety vice first, and only send it to a data hub from time to time. For of systems and devices. By means of a HAR system, for example, this, smartphone applications like SensorHub [7] are very useful. it is possible to document how often a person wakes up at night, Multiple (wearable) sensors can be connected via Bluetooth, col- how restful the sleep is, and when and whether one gets out lecting and storing the data in a central place and a unified format of bed in the morning. Monitoring this behavior can help in to provide complete control over the data. Additionally, systems observing depressive phases, where patients sometimes find it like SensorHub provide the possibility to get point-in-time feed- difficult to get out of bed at all. But beyond that, it can also make back from the user by repeatedly querying certain conditions (be-sense to include other information, such as the position in the havior, feelings, experiences), so-called EMAs. This is extremely apartment in order to get more contextual information. valuable and these subjective sensations could be supported and The measurement of physiological parameters can help for the enriched by objective, quantifiable sensor measurements. majority of the characteristics. By measuring skin conductance, A system designed to give a holistic view of a current state is for example stress, which plays a major role in many mental not intended to make assessments or provide results at any time. illnesses, could be detected. Furthermore, it is also known that That means these kinds of systems have a long-term character social behavior changes in some mental disorders. For example, rather than being a snapshot. Moreover, when working with raw social interaction decreases in depressive or anxiety patients but sensor data, this often means that it needs a lot of pre-processing increases in people in a manic phase. and cleaning. This includes e.g. filtering and de-noising. When it For some characteristics, it is particularly interesting to look comes to machine learning, domain-specific knowledge is also at changes over time because mental illnesses often have very helpful in order to come up with meaningful features. 25 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Kirsten et al. Figure 1: System Overview to Monitor Mental Health Characteristics at Home 5 CHALLENGES AND LIMITATIONS REFERENCES Each component of an overall system has its advantages and [1] 2018. Depression (major depressive disorder). https://www.mayoclinic.org/ disadvantages. It always has to be determined which features diseases-conditions/depression/symptoms-causes/syc-20356007 [2] 2019. Mental illness. https://www.mayoclinic.org/diseases-conditions/ predominate for the specific use case. It should also be noted that mental-illness/symptoms-causes/syc-20374968 issues like data security, especially with such sensitive topics as [3] 2020. Bipolar Disorder. https://www.nimh.nih.gov/health/topics/bipolar- disorder mental disorders, play a tremendous role. For this, all actions [4] 2022. Anxiety Disorders. https://www.nimh.nih.gov/health/topics/anxiety- should be transparent to the user. The consumer must be in- disorders formed in advance about all processes, devices and measurements, [5] The American Psychiatric Association (APA). [n.d.]. What Is Obsessive-Compulsive Disorder? https://www.psychiatry.org/patients-families/ocd/ and be able to stop the monitoring at any time. This complete what-is-obsessive-compulsive-disorder/ transparency can result in the user consciously or subconsciously [6] Elisabetta Burchi, Eric Hollander, and Stefano Pallanti. 2018. From treat-adapting his/her behavior when he/she feels observed. However, ment response to recovery: a realistic goal in OCD. International Journal of Neuropsychopharmacology 21, 11 (2018), 1007–1013. these effects should be negligible, as this type of monitoring [7] Jonas Chromik, Kristina Kirsten, Arne Herdick, Arpita Mallikarjuna Kappat-happens over a longer period of time and thus integrates into tanavar, and Bert Arnrich. 2022. SensorHub: multimodal sensing in real-life enables home-based studies. Sensors 22, 1 (2022), 408. everyday life over time. [8] Maria Cornacchia, Koray Ozcan, Yu Zheng, and Senem Velipasalar. 2016. A Furthermore, it should be kept in mind that systems that in-survey on activity detection and classification using wearable sensors. IEEE tegrate everyday user devices (smartphone, smartwatch, and Sensors Journal 17, 2 (2016), 386–403. [9] L Minh Dang, Kyungbok Min, Hanxiang Wang, Md Jalil Piran, Cheol Hee Lee, activity tracker) are also always limited in their battery power, and Hyeonjoon Moon. 2020. Sensor-based and vision-based human activity especially if they are in constant use. Here, a balance must be recognition: A comprehensive survey. Pattern Recognition 108 (2020), 107561. found between monitoring frequency and consumption. The [10] Miguel Ángel Álvarez de la Concepción, Luis Miguel Soria Morillo, Juan Antonio Álvarez García, and Luis González-Abril. 2017. Mobile activity recognition times when the devices have to be charged (usually daily) must and fall detection system for elderly people using Ameva algorithm. Pervasive also be taken into account in the system design. and Mobile Computing 34 (2017), 3–13. In general, one of the most important factors is that the moni- [11] Wayne K Goodman, Dorothy E Grice, Kyle AB Lapidus, and Barbara J Coffey. 2014. Obsessive-compulsive disorder. Psychiatric Clinics 37, 3 (2014), 257–267. toring system is as pleasant and unobtrusive as possible for the [12] Oscar D Lara and Miguel A Labrador. 2012. A survey on human activity user. It must be installed with as little effort as necessary and be recognition using wearable sensors. IEEE communications surveys & tutorials 15, 3 (2012), 1192–1209. perfectly integrated into everyday life. [13] Felipe Barbosa Araújo Ramos, Anne Lorayne, Antonio Alexandre Moura Costa, Reudismam Rolim de Sousa, Hyggo O Almeida, and Angelo Perkusich. 2016. Combining Smartphone and Smartwatch Sensor Data in Activity Recognition Approaches: an Experimental Evaluation.. In SEKE. 267–272. [14] Mahsa Sheikh, Meha Qassem, and Panicos A Kyriacou. 2021. Wearable, envi-6 CONCLUSION ronmental, and smartphone-based passive sensing for mental health monitoring. This paper presented possible ways to measure various charac- Frontiers in Digital Health (2021), 33. [15] Santosh Subedi and Jae-Young Pyun. 2020. A survey of smartphone-based teristics of mental disorders. We want to emphasize that systems indoor positioning system using RF-based wireless technologies. Sensors 20, of this type are not diagnostic tools and are in no way equivalent 24 (2020), 7230. [16] Daniel J Taylor, Kenneth L Lichstein, H Heith Durrence, Brant W Reidel, and to professional assessments. But they can support and help to Andrew J Bush. 2005. Epidemiology of insomnia, depression, and anxiety. describe a given state and to perceive and document changes. In Sleep 28, 11 (2005), 1457–1464. general, it is helpful to make psychological characteristics mea- [17] Timothy J Trull and Ulrich Ebner-Priemer. 2013. Ambulatory assessment. Annual review of clinical psychology 9 (2013), 151. surable and thus to support the subjective feelings of patients by [18] Ramachandran Varatharajan, Gunasekaran Manogaran, Malarvizhi Kumar means of objective measurements. Moreover, even small changes Priyan, and Revathi Sundarasekar. 2018. Wearable sensor devices for early detection of Alzheimer disease using dynamic time warping algorithm. can be detected and documented at an early stage and help to Cluster Computing 21, 1 (2018), 681–690. take countermeasures in time. It could provide new insights into [19] Vini Vijayan, James P Connolly, Joan Condell, Nigel McKelvey, and Philip behavioral patterns, overlaps of different diseases, and personal Gardiner. 2021. Review of wearable devices and data collection considerations for connected health. Sensors 21, 16 (2021), 5589. aspects. Furthermore, these forms of monitoring systems cannot [20] World Health Organization (WHO). 2022. Mental disorders. https://www. only be used for early detection but also for relapse supervision. who.int/news-room/fact-sheets/detail/mental-disorders In future work, an exemplary monitoring system will be built [21] Ali Yassin, Youssef Nasser, Mariette Awad, Ahmed Al-Dubai, Ran Liu, Chau Yuen, Ronald Raulefs, and Elias Aboutanios. 2016. Recent advances in infor detecting compulsive behavior as it occurs in patients suffer- door localization: A survey on theoretical approaches and applications. IEEE ing from OCD. We also want to determine to what extent such Communications Surveys & Tutorials 19, 2 (2016), 1327–1346. [22] Faheem Zafari, Athanasios Gkelias, and Kin K Leung. 2019. A survey of systems are accepted by potential patients and also what other indoor localization systems and technologies. IEEE Communications Surveys limitations and possibilities are encountered. & Tutorials 21, 3 (2019), 2568–2599. 26 Towards Multi-Modal Recordings in Daily Life: A Baseline Assessment of an Experimental Framework ∗ Christoph Anders ∗ Sidratul Moontaha Bert Anrich firstname.lastname@hpi.de Hasso Plattner Institute Potsdam, Germany ABSTRACT that the onset of MF depends on contextual factors such as level Background: of sleep during previous nights, overall health, emotional state, Wearable devices can record physiological signals and more. MF can increase the amount of mistakes an individual from humans to enable an objective assessment of their Mental does, and hinder work-performance amongst others. The impact State. In the future, such devices will enable researchers to work of MF on economies can be estimated from the finding that a on paradigms outside, rather than only inside, of controlled labo- fatigued work-force costs the US economy an approximation ratory environments. This transition requires a paradigm shift on of 18 billion USD per year [2]. Methods that quantify the level how experiments are conducted, and introduces new challenges. Method: of MW an individual experiences in and outside of laboratory Here, an experimental framework for multi-modal base- environments are of interest to a broad community. line assessments is presented. The developed test battery covers MF can be circumvented in various ways, e.g. by taking more stimuli and questionnaire presenters, and multi-modal data can Micro-Breaks [2]. To quantify the impact of interventions, mea-be recorded in parallel, such as Photoplethysmography, Elec- surement frameworks have to be developed in controlled envi- troencephalography, Acceleration, and Electrodermal Activity ronments and evaluated for use in uncontrolled environments. data. The multi-modal data is extracted using a single platform, Subjective measurements of MW can be performed using ques- and synchronized using a shake detection tool. A baseline was tionnaires or discussions with individuals. However, these ap- recorded from eight participants in a controlled environment. proaches take time, require active truthful participation, and are Using Leave-One-Out Cross-Validation, the resampling of data, therefore not suited for every context. To overcome this hurdle, the ideal window size, and the applicability of Deep Learning objective measurement methods are researched, amongst which for Mental Workload Classification were evaluated. In addition, EEG seems promising [3]. participants were polled on the acceptance of using the wearable To-be-developed measurement frameworks for experiments devices. Results: The binary classification performance declined mainly conducted in controlled environments, such as MW quan- by an average of 7.81% when using eye-blink removal, under- tification, need to be combined with research on the quality and lining the importance of data synchronization, correct artefact amount of sensor data needed, accurate synchronization between identification, evaluating and developing artefact removal tech- different modalities, and precise data labeling. Merging research niques, and investigating on the robustness of the multi-modal on all these aspects into one skeleton would increase the overall setup. Experiments showed that the optimal window size for usability of the resulting framework. This paper presents an ex- the acquired data is 30 seconds for Mental Workload classifica- perimental framework for baseline assessment on the use-case of tion, with which a Random Forest classifier and an optimized objective measurements of MW conducted across university stu- Deep Convolutional Neural Network achieved the best-balanced dents. As data storage, compression, and transmission consume classification accuracy of 70.27% and 74.16%, respectively. Con- clusions: a lot of battery power [4], the length of time windows required This baseline assessment gives valuable insights on for accurate classifications, the sampling-rate required, and the how to prototype stimulus presentation with different wearable time-series classification performance were evaluated. Finally, devices and suggests future work packages, paving the way for participants of this study were surveyed about their experiences researchers to investigate new paradigm outside of controlled with the two well-established wearable devices used, since this environments. framework can be customized in terms of stimulus presentation 1 INTRODUCTION and multi-modality used for the Affective Computing research community in general. The measurement framework is presented The concept of Mental Workload (MW) originates from the field in detail, and necessary steps towards an experimental frame- of psychology, refers to the amount of working memory used work for multi-modal recordings in uncontrolled environments in the brain, and is historically researched on in the context of are outlined. laboratories [1]. High levels of MW experienced over an extended period of time lead to Mental Fatigue (MF). It can be assumed 2 EXPERIMENTAL FRAMEWORK ∗ Both authors contributed equally to this research. The experimental framework for this study was built using Psy- choPy ( v2022.2.0) [5] running under Python 3.10.4 in a controlled Permission to make digital or hard copies of part or all of this work for personal environment, as a preliminary step for recordings in daily life. or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and Among the most frequently used software packages for visual the full citation on the first page. Copyrights for third-party components of this 1 stimulus presentation , Psychopy was preferred due to the us-work must be honored. For all other uses, contact the owner /author(s). Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia ability, automated calibration feature, and the real-time stimulus © 2022 Copyright held by the owner/author(s). 1 http://hans-strasburger.userweb.mwn.de/psy_soft.html#imagen 27 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Anders and Moontaha, et al. presentation [6]. The setup was implemented to induce MW in PsychoPy. How data labeling will be performed for recordings in line with common practice from state-of-the-art studies (e.g. [7]). uncontrolled environments remains an open question. As a first step, participants were asked the put all the devices Once the data was labeled, data cleaning needed to be per- into a box and shake them, to synchronize the devices. Then, formed. As time-series data is not uniform over time (e.g. due high magnitude tapping onto the space bar was performed to to a temporary loss of connection), missing values needed to synchronize with Psychopy. After instructing participants to min- be interpolated. Linear interpolation was performed by filling 2 imize movement, a five minute relaxation video was presented missing data with the mean value of two neighboring data points. for baseline recording. An eye-closing session of one-minute du- Additionally, head-movements and eye-blinks predominantly ration followed, before the MW was induced. Participants had compromised the EEG recordings, while movements of the hand to work on the N-Back task (n=3) for five minutes. Afterwards, predominantly compromised readings from the Empatica E4. participants had to work for five minutes on the Stroop task, Removal of artefacts in the data from the Empatica E4 was per- where four colors (yellow, green, blue, and red) were shown for formed in three steps: First, both the raw values for accelera- a duration of 3 seconds. For every wrong answer, a buzz sound tion and BVP were normalized to the range of [-1, 1]. Second, a was played to intensify the workload and provide feedback to fourth-order Butterworth band-pass filter with 0.5 Hz and 3.5 Hz the participants. Both tasks were followed by the pairwise NASA cutoff-frequencies was applied. Third, a Savitzky-Golay filter was Task Load Index (NASA-TLX) questionnaire [8]. By using physio-applied, using a 101-sample window and a 5th-degree polynomial. logical data recorded during the relaxation video and eye-closing These steps removed the baseline-drift in the recorded BVP sig- session as ’Low-to-No-Workload’-class, and using the data from nal. Additionally, adaptive noise cancellation was performed to both MW tasks as ’High-Workload’-class, a binary classification remove movement-artefacts from the BVP signal, by using linear task was formed. Physiological data recorded during answering recursive least-squares filtering. Removal of artefacts from the of the questionnaires, or reading instructions for the MW tasks, EEG signal was performed using spectral filtering with an infinite was excluded. With a ratio of 4:10 for ’Low-to-No-Workload’ to impulse response filter. Following parameter recommendations ’High-Workload’, the recorded data was imbalanced. from the literature [11], a Chebyshev type 2 band-pass filter with Two wearable devices were used: The Empatica E4 which 0.5 Hz and 48.5 Hz cutoff-frequencies and 40 dB attenuation in records skin temperature (4 Hz), PPG (64 Hz), and GSR (4 Hz), the pass-band was applied. Thereby, the power-line interference alongside acceleration-readings (32 Hz) that can be used for the and other artefacts such as jaw-clenching were removed. Strong identification and removal of artefacts. The Muse S device was artefacts for EEG recordings, especially in the frontal channels, used, which records EEG (256 Hz) and accelerometer data (50 are eye-blinks [12]. Here, eye-blink were removed using the in-Hz). Following the 10/20-system for electrode placement [9], the dependent component analysis (ICA) [13]. 3 EEG electrodes of the Muse S device are located at TP9, AF7, Spatial filtering of the EEG data was investigated using the AF8, TP10, with a reference electrode at FPz. common spatial pattern (CSP) algorithm [14] implemented in the meet7 repository [15]. CSP performs a generalized eigenvalue Instructions Video Instructions Eye-Closing Instructions decomposition of two distinct mutlivariate sets of data, for which an additive underlying mixture of sources is assumed. CSP basi- Synchronization 5 min 1 min cally maximizes power differences between the two conditions ’Low-to-No-Workload’ and ’High-Workload’. After derivation of Stroop Instructions N-back NASA TLX NASA TLX filter values for each channel, the filter with the highest Eigen- value is chosen and applied to both the ’Low-to-No-Workload’-, Synchronization 5 min and the ’High-Workload’-, classes. The result is the sum of all 5 min the multiplications of the respective scalar-filters with the cor- responding electrode-channels, resulting in one single channel Figure 1: Study design of the experimental paradigm uti- which best describes the underlying phenomenon optimized for. lized for the multi-modal framework Temporal filtering describes the process of either rejecting recordings from the process of building trials all-together (e.g. physiological data recorded during answering of questionnaires), 3 METHODS or of building trials from the recorded data. Two important pa- 4 rameters have to be taken into account: window-size, and window- The Muse S data was recorded using MindMonitor and loaded overlap. Here, multiple parameters for the window-size were eval- by devicely 5, whereas the Empatica E4 data was recorded usuated: 5 sec, 10 sec, 30 sec. The window-overlap was constantly ing the SesnsorHub Application [10]. Synchronization was per-chosen to be 0.5 sec smaller than the respective window-size: 4.5 formed at simultaneous peaks in the accelerometer data, using jointly6 sec, 9.5 sec, 29.5 sec. on readings from both wearable devices. Acceleration To extract different features, the cleaned BVP signal was used was caused once in the beginning and once at the end of the to extract the heart rate variability using NeuroKit2 8 package experimental protocol: The devices were placed in the same box, [16], which locates the peaks in the peak to peak (RR) inter-and the box was shook. This procedure was repeated after the ex- val of the hear rate variability and calculates different time-and periment. Potential offsets and time-shifts in the recordings were frequency-domain features, partially mentioned below. Addition- automatically corrected by Jointly. Labeling of the sensor data ally, the mean and standard deviations (SD) from GSR and skin was performed using the information contained in the logs from temperature were extracted. The different feature-sets utilized 2 https://www.youtube.com/watch?v=S6jCd2hSVKA 3 https://choosemuse.com/de/muse-s/ 4 https://mind-monitor.com/ 5 7 https://github.com/hpi-dhc/devicely https://github.com/neurophysics/meet 6 8 https://github.com/hpi-dhc/jointly https://neuropsychology.github.io/NeuroKit/ 28 Experimental Framework for Multi-Modal Recordings: Baseline Assessment Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia were extracted from the training data only, and can be summa- Hidden (2D convolution, 5x5, ReLu), 3rd Hidden (2D Max Pooling, rized as follows: CSP features: Gamma, Beta, Alpha, Theta and 2x2), 4th Hidden (Flatten), 5th Hidden (Fully-Connected, ReLu), Delta band powers, mean over the band powers, mean and SD of 6th Hidden (Dropout), 7th Hidden (Fully-Connected, ReLu), 8th the absolute band powers; BVP features: Mean and SD of the RR Hidden (Dropout), Output (Fully-Connected, Single-Output, Sig- intervals (peak to peak of Hear Rate Variability), SD of the suc- moid). cessive differences between RR intervals, ratio of SD and mean For the RF, the default hyperparameters of the RandomForest- RR intervals, low frequency band power (0.04 - 0.15 Hz), high Classifier from scikit-learn were chosen. For the SVM, a radial frequency band power (0.15 - 0.4 Hz), very high frequency band basis function kernel was utilized, and the gamma value was power (0.4 - 0.5 Hz), ratio of low-high band power; GSR: Mean calculated for each evaluation. The best hyperparameters of and SD of absolute values, mean amplitude of Skin Conductivity DCNN were identified using the sequential model based optimiza- Response (SCR) peaks; Local Skin Temperature: Mean and SD tion (SMBO) algorithm with the tree-structured parzen estimator of absolute values; and PSD features: Power spectral density of (TPE), which has been shown to outperform both grid search and raw EEG of TP9, TP10, AF7, AF8. random search [18]. The derived hyperparameters are listed in Table 1. The inputs to all classifiers were min-max normalized. Hyperparameter Value Range Baseline Optimized Dropout 0 - 0.5 (0.1) 0.5 0.3 Epochs 1 -200 (5) 200 25 Batch Size 1 - 1000 (50) 500 350 Conv. Layer 1 10 - 100 (10) 20 70 Conv. Layer 2 25 - 250 (25) 50 125 Hidden Layer 1 100 - 1000 (50) 500 200 Hidden Layer 2 100 - 1000 (50) 250 750 Window Size 5 - 30 5 30 Input Height 20 - 130 (10) 28 110 Input Width 20 - 130 (10) 28 110 Table 1: Hyperparameters for the DCNN. Values in paren- thesis indicate incremental steps. Window size in seconds. 4 RESULTS The first experimental evaluation used two different sets of fea- tures, each resampled to 10 Hz. Averaged results of all of the Leave-One-Out Cross-Validation for the classification tasks are shown in Table 2. Set # Window Size Blink Removal Balanced Acc. Set 1 1200 sec no 74.06 Set 1 1200 sec yes 65.52 Set 1 6000 sec no 82.21 Set 1 6000 sec yes 73.49 Set 2 1200 sec no 77.31 Set 2 1200 sec yes 72.43 Figure 2: The flowchart of the employed study protocol Set 2 6000 sec no 80.94 with the necessary intermediate steps. Set 2 6000 sec yes 71.84 Table 2: TSC Performance for RF. Set 1: Raw TP9, TP10, In total, three different evaluations were performed on the AF8, AF7, Skin Temperature, BVP features. Set 2: Set 1 + data recorded in a controlled environment. First, two different GSR. The row of the best performance is printed in bolt feature sets were investigated for data resampled to 10 Hz, using face. a Random Forest (RF) classifier. This evaluation was performed to investigate on the possibility of reducing the sampling rate required per modality. Second, the optimal time window for time The second experiment evaluated on the optimal window-size. series classification (TSC) of MW was investigated on by com- Results are visualized in Figure 3, where the PSD feature set refers paring the performance of different feature sets utilized by RF to all the extracted features mentioned in 3, and the FE feature-and a Support Vector Machine (SVM). Therefore, the modalities set refers to all but the PSD features. With the FE feature-set, were utilized at the respective sampling rates recorded with and while RF performed best across all time-windows, the average simply combined. Third, the application of Deep Learning to this time series classification performance increased only marginally task was investigated using a Deep Convolutional Neural Net- across all TSC models when varying the window-size. The best work (DCNN) [17]. The DCNN was built of ten layers: Input (2D performance of 70.27% balanced accuracy was achieved for RF convolution, 5x5, ReLu), 1st Hidden (2D Max Pooling, 2x2), 2nd with FE for a window-size of 30 sec. 29 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Anders and Moontaha, et al. SVM FE SVM PSD RF FE RF PSD AVG features such as power-ratios; to recruit more participants; and 72.5 to investigate on feature-importance. Also, resampling the sen- sor data to frequencies other than 10 Hz and investigating the 70.0 effect of interventions to remove MF in controlled environments, should be performed. The presented framework needs to be ex- 67.5 tended to allow automatic randomization of the tasks, recovery from crashes, more robust data extraction, to be evaluated for 65.0 applicability to uncontrolled environments, and published. Ex- perimental paradigms for measuring MW need to be taken from ccuracy in % 62.5 controlled environments, and frameworks that are under devel- opment need to be tested and evaluated in uncontrolled settings. 60.0 Balanced A 6 ACKNOWLEDGMENTS 57.5 We appreciate the contribution of Ahmed Azzouz, Alina Krichev- 55.0 sky, Leonhard Hennicke, Nastassia Heumann, Nikita Shishelyakin and Tanja Manlik as a part of their master project. This research 52.5 was (partially) funded by the HPI Research School on Data Sci- 5 sec 10 sec 30 sec mean ence and Engineering. Figure 3: TSC Performance for RF and SVM. The choice of REFERENCES features and windows significantly impacted inter-subject [1] Hart et al. 1988. Development of nasa-tlx (task load index): results of empir-TSC performance. ical and theoretical research. In Advances in psychology. Vol. 52. Elsevier, 139–183. doi: 10.1016/S0166- 4115(08)62386- 9. [2] Khosro Sadeghniiat-Haghighi and Zohreh Yazdi. 2015. Fatigue management in the workplace. Industrial Psychiatry Journal, 24, 1, (Jan. 1, 2015), 12. doi: The third experiment investigated on the applicability of Deep 10.4103/0972- 6748.160915. [3] Hogervorst et al. 2014. Combining and comparing EEG, peripheral phys-Learning to this task. The baseline-DCNN achieved a balanced iology and eye-related measures for the assessment of mental workload. accuracy of 59.79%, whereas the optimized-DCNN achieved a Frontiers in Neuroscience, 8. doi: 10.3389/fnins.2014.00322. balanced accuracy of 74.16%. [4] Casson. 2019. Wearable EEG and beyond. Biomedical Engineering Letters, 9, 1, (Jan. 2019), 53–71. doi: 10.1007/s13534- 018- 00093- 6. Eight participants were recruited in this baseline assessment [5] Peirce et al. 2022. Building experiments in PsychoPy. ISBN-13: 978-1473991392. and provided subjective feedback on their experiences with the Sage. [6] Rolf Kötter. 2009. A primer of visual stimulus presentation software. Frontiers setup: No participant complained about uncomfortable feelings in neuroscience, 21. due to pressure from the sensors, but sensors felt too bulky, and [7] Giorgi et al. 2021. Wearable technologies for mental workload, stress, and the utilization of three different devices—two sensors and one emotional state assessment during working-like tasks: a comparison with laboratory technologies. Sensors, 21, 7, 2332. doi: 10.3390/s21072332. phone for recordings—seemed too complicated. [8] Ernesto A. Bustamante and Randall D. Spain. 2008. Measurement invariance of the nasa tlx. Proceedings of the Human Factors and Ergonomics Society 5 CONCLUSION Annual Meeting, 52, 19, (Sept. 2008), 1522–1526. doi: 10.1177/1541931208052 01946. In the first experiment, it was found that eye-blink removal wors- [9] Chatrian et al. 1985. Ten percent electrode system for topographic studies of ened the TSC performance. This finding was consistent across spontaneous and evoked eeg activities. American Journal of EEG technology, 25, 2, 83–92. doi: 10.1080/00029238.1985.11080163. all test-runs, and the average loss in balanced classification accu- [10] Chromik et al. 2022. Sensorhub: multimodal sensing in real-life enables racy was with 7.81% substantial. Amongst others, reasons for this home-based studies. Sensors, 22, 1, 408. doi: 10.3390/s22010408. [11] Apicella et al. 2021. High-wearable EEG-based distraction detection in motor circumstance are: Firstly, the existence of only one eye-blink per rehabilitation. Scientific Reports, 11. doi: 10.1038/s41598- 021- 84447- 8. time window of 20 seconds duration was assumed, which proved [12] Ajay Kumar Maddirala and Kalyana C. Veluvolu. 2021. Eye-blink artifact false. Secondly, more advanced algorithms for automatic eye-removal from single channel EEG with k-means and SSA. Scientific Reports, 11. doi: 10.1038/s41598- 021- 90437- 7. blink removal and signal restoration exist, which outperformed [13] A. Hyvärinen and E. Oja. 2000. Independent component analysis: algorithms ICA-based methods [19, 20] and should have been applied. and applications. Neural Networks, 13. doi: 10.1016/S0893- 6080(00)00026- 5. In the second experiment, it was found that the best accu- [14] Blankertz et al. 2008. Optimizing spatial filters for robust eeg single-trial analysis. IEEE Signal Processing Magazine. doi: 10.1109/MSP.2008.4408441. racy was achieved for a time-window with window-size of 30 sec. [15] Waterstraat et al. 2017. On optimal spatial filtering for the detection of This finding is in line with findings in the literature on affective phase coupling in multivariate neural recordings. NeuroImage, 157. doi: 10.1016/j.neuroimage.2017.06.025. computing (e.g. [21]). Furthermore, the FE feature set performed [16] Makowski et al. 2021. Neurokit2: a python toolbox for neurophysiological better for this task than the PSD feature set, for which the TSC signal processing. Behavior research methods, 53. doi: 10.3758/s13428- 020- 0 performance stagnated or even declined. Future work should 1516- y. [17] Sarkar et al. 2016. Wearable eeg-based activity recognition in phm-related investigate on computing PSD features from further cleaned EEG service environment via deep learning. international Journal of Prognostics data, and on features such as power in key frequency bands. and Health Management, 7. doi: 10.36001/ijphm.2016.v7i4.2459. Finally, it was found that the optimization of the DCNN also [18] Bergstra et al. 2013. Hyperopt: a python library for optimizing the hyperparameters of machine learning algorithms. In Proceedings of the 12th Python led to choosing a window-size of 30 sec. This finding is in line in science conference. Vol. 13. Citeseer. with the results from the second experiment, where the average [19] Nguyen et al. 2020. A deep wavelet sparse autoencoder method for online and automatic electrooculographical artifact removal. Neural Computing performance also peaked for the window-size of 30 sec. However, and Applications, 32. doi: 10.1007/s00521-020-04953-0. as this was the maximum value evaluated for, it might be that the [20] Olaf Dimigen. 2020. Optimizing the ICA-based removal of ocular EEG arti-models would have performed better for longer time-windows. facts from free viewing experiments. NeuroImage, 207. doi: 10.1016/j.neuroi mage.2019.116117. The performed baseline assessment highlights future work, [21] Athavipach et al. 2019. A wearable in-ear EEG device for emotion monitoring. such as to investigate on better algorithms for artefact removal Sensors, 19, 18. doi: 10.3390/s19184014. (e.g. [19, 20]); on longer window-sizes, different DL models, more 30 Assessing Sources of Variability of Hierarchical Data in a Repeated-Measures Diary Study of Stress Junoš Lukan Larissa Bolliger Els Clays Jožef Stefan Institute Department of Public Health and Department of Public Health and Department of Intelligent Systems Primary Care Primary Care Jožef Stefan International Ghent University Ghent University Postgraduate School Ghent, Belgium Ghent, Belgium Ljubljana, Slovenia larissa.bolliger@ugent.be els.clays@ugent.be junos.lukan@ijs.si Primož Šiško Mitja Luštrek Jožef Stefan Institute Jožef Stefan Institute Department of Intelligent Systems Department of Intelligent Systems Ljubljana, Slovenia Jožef Stefan International sisko.primoz@gmail.com Postgraduate School Ljubljana, Slovenia mitja.lustrek@ijs.si ABSTRACT In machine learning literature, this problem falls under the topic of affective computing [19]. Typical studies settle for one There are different methodological approaches to stress recog-definition of stress and either measure it by simply asking about nition in different disciplines. In machine learning literature, a it or using one of the established psychological questionnaires typical approach is to select a target variable and try to predict it [2]. Next, stress detection is relayed to machine learning models as generally as feasible, but possibly with person-specific feature as a supervised problem in which objectively measured data are normalization or personalization of models. In medical, psycho- used as predictors of self-reports, serving as labels. logical, and social sciences, the nested nature of data is often The aim of this paper is to employ statistical techniques from taken into account by using multilevel models, especially with medical and social sciences to inform machine learning mod- repeated measures data. In our diary study, we asked partici- elling. Specifically, we analyse daily aggregated data collected pants to assess different aspects of stress every 90 min for 15 in our study and consider possibilities for analysis on a lower, working days. They accessed their questionnaires through an within-day level. We do this by describing the data in terms of Android application which also served to passively record phone multilevel models and then assess how each level of measure- usage and sensor data. At the same time they wore Empatica ments contributes to the overall stress variability. E4 wristbands which collected physiological data. This study de- sign lends itself well to hierarchical consideration. In this paper, 2 METHODS we use variance partitioning, a technique which is also a part of multilevel modelling, to inform a machine learning pipeline. 2.1 Data Collection We show how consideration of different sources of variability Three main data types were collected using different measuring can help us decide how to personalize normalization of data or devices. Physiological parameters were measured by Empatica machine learning models. E4 wristbands, while participants filled in questionnaires on their smartphones for 15 working days. These ecological momentary KEYWORDS assessments (EMAs) were presented at random intervals through- stress detection, ecological momentary assessment, variance par- out the working day, roughly 90 minutes apart, while an addi- titioning, hierarchical data tional, longer questionnaire was offered in the evening, asking about the day as a whole. The questions in each EMA session (a 1 INTRODUCTION set of questions) were selected from questionnaires that measure different aspects of stress and related constructs, such as stress ap-Chronic stress is a well researched medical, psychological, and praisal, negative affect, job demand and job control. Smartphone sociological phenomenon which has been shown to have detri- sensor data and phone usage data were continuously collected by mental health consequences [8]. It is less clear, however, how a self-developed Android application based on the AWARE frame-daily experiences of stress translate into a long-term experience work [9]. The contents of the questionnaires and the data types of chronic stress [13]. In the STRAW project, we have tackled collected have already been described in an extensive protocol this question by carrying out a longitudinal diary study [6]. paper [6]. We collected the data of 56 participants, recruited from aca- Permission to make digital or hard copies of part or all of this work for personal demic institutions in Belgium (29 participants) and Slovenia (26 or classroom use is granted without fee provided that copies are not made or participants). Only the data pertaining to 𝑁 = 55 participants distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this were complete, which included 26 women and 29 men. Their work must be honored. For all other uses, contact the owner /author(s). mean age was 34.9 years with the range from 24 years to 63 years Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia and they held various positions in their institutions, such as PhD © 2022 Copyright held by the owner/author(s). students, employees in administration, and tenured professors. 31 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Junoš Lukan, Larissa Bolliger, Els Clays, Primož Šiško, and Mitja Luštrek The participants adhered to the study protocol well. In their 2.3.2 A Two-Level Model. To model the differences between participation period, each participant responded to more than 96 participants using a linear regression model, we can include EMA sessions on average. The median time difference between a personalized intercept term. The regression equation can be 1 two subsequent workday EMA sessions was 93 minutes, just a described in two parts, where the first level is given by : bit over what was designed [12]. 𝑦 = 𝛽 + 𝜖 (3) 𝑖 𝑗 𝑖 0 𝑖 𝑗 2.2 Classical Machine Learning Data Analysis Here, we are trying to predict the stress score for each day 𝑗 = 1, . . . , 𝑛 within each participant 𝑖 = 1, . . . , 𝑁 . As the first step of the analysis, we followed a classical machine 𝑑 We model the intercepts as the sum of the overall intercept, learning approach for detecting stress (see Figure 3 in [2]). After 𝛾 and person-specific intercepts, , also called the random 00 𝑢𝑖0 preprocessing, we calculated hand-crafted features. For phone error component. The second level regression equation is given sensor data, we used a modified Reproducible Analysis Pipeline for 2 by : Data Streams (RAPIDS, [20]) library, which calculates behavioural 𝛽 = 𝛾 + 𝑢 (4) features using R, Python, and Snakemake [16] following a well- 𝑖 0 00 𝑖 0 defined set of rules (steps). For physiological data, we used our 2.3.3 A Three-Level Model. Since participants in our study an- in-house developed Python library, cr-features [11]. swered the EMA prompts repeatedly throughout the day, we The data were aggregated on a daily basis, by averaging target can add a third level of analysis, that is we consider within-day variables and calculating statistical physiological features that variability. In this case, we are trying to predict the score for were first calculated on short segments. Next, we standardized each EMA session 𝑘 = 1, . . . , 𝑛 within each day 𝑗 within each 𝑠 the data within participants, i.e., by subtracting the daily mean participant 𝑖 . This is a more fine-grained level of analysis and and dividing by daily standard deviation. Finally, we used a leave- includes many more instances, namely 𝑛 = 𝑁 × 𝑛 × 𝑛 𝑑 𝑠 one-subject-out validation technique and tested various linear Joining the expressions for all three levels of intercept, the (e.g., linear regression), non-linear (e.g., support vector regres- equation can be written as: sion) and ensemble machine learning techniques (e.g., ADA boost 𝑦 = 𝛽 + 𝜖 regressor) from scikit-learn [17]. 𝑖 𝑗 𝑘 𝑖 𝑗 0 𝑖 𝑗 𝑘 = 𝛾 + 𝑣 + 𝜖 𝑖 00 𝑖 𝑗 0 𝑖 𝑗 𝑘 2.3 Variance Partitioning = ( 𝛿 + ) + + (5) 000 𝑢 𝑣 𝜖 𝑖 00 𝑖 𝑗 0 𝑖 𝑗 𝑘 Multilevel models (also known as mixed-effect, random-effect Now, the top level intercept, 𝛽 is composed of three different 𝑖 𝑗 0 or mixed models) are methods commonly used in medical, bio- components. The first one, 𝛿 , is fixed for all participants and 000 logical, and social sciences to analyse hierarchical (nested) data days, and it represents the overall intercept corresponding to the [10]. Labels in our dataset are nested in at least three levels: mean of scores aggregated per EMA session. The other two are each participant collected data on multiple days and each day random effects, where 𝑢 is the person-specific intercept, while 𝑖 00 included several measurements. We analysed self-perceived data 𝑣 is the intercept specific to each day within each person. 𝑖 𝑗 0 from questionnaires using mixed models in other publications [4, 5], while in this paper we use the related technique of variance 3 RESULTS partitioning for exploring variability of the data at different levels. Variance partitioning (or partitioning of sums of squared devia- 3.1 Machine Learning on Daily Aggregated tions) can be used to ascribe the overall variability in a dataset to Data different sources of variability. In multilevel models, this sources As described in Section 2.2, we followed a typical machine learn-can be different levels of analysis. ing approach to detect daily stress. We chose negative affect as 2.3.1 Simple Linear Regression. To model daily stress, we can an indicator for stress, which was measured with the Positive use linear regression in the following form: and Negative Affect Schedule (PANAS, [22]). This is the most commonly used questionnaire in similar diary studies looking at daily measures of stress [13]. It is composed of a list of adjectives 𝑦 = 𝛽 + 𝛽 + · · · + 𝛽 𝑥 + 𝜖 (1) 𝑗 0 1𝑥 𝑗 1 𝑝 𝑗 𝑝 𝑗 describing emotional states, which are self-assessed on a scale Here, 𝑦 represents the mean of the chosen indicator of stress 𝑗 from 1 to 5. 𝑛 on a day 𝑗 , 𝛽 is the intercept term, represent 0 𝑥 𝑗 1, . . . , 𝑥 𝑗 𝑝 This approach did not yield good predictions as shown in Fig. 1. 𝑗 =1 𝑛 daily values of 𝑝 features (or predictors), 𝛽 their In fact, most of the models performed no better than the dummy 1, . . . , 𝛽𝑝 𝑗 =1 2 model, as evaluated by the median of the 𝑅 metric across all corresponding regression coefficients, while 𝜖 is the error term 𝑗 participants. Even when considering the individual rounds of the which captures all other factors related to variable 𝑦 , which are leave-one-subject-out validation scheme, the best model (in this not described by the available features (predictors included in case an instance of an XGBoost regressor) achieved a maximum the model). The index 𝑗 runs from 1 to 𝑛, where 𝑛 = 𝑁 × 𝑛 is 𝑑 2 of 𝑅 = 0.52. This corresponds to 52 % of explained variance for the product of the number of participants (𝑁 ) and the number of that particular participant. days each one participated in the study (𝑛 ). 𝑑 We considered modelling within-day stress as the natural next As we are interested in variance partitioning only, we can focus step. However, this gives the possibility of processing the data on the intercept and omit all the predictor terms. Equation (1) on the level of days, rather than only subjects. For example, stan- thus becomes: dardization, feature selection, and model cross validation could 𝑦 = 𝛽 + 𝜖 (2) 𝑗 0 𝑗 1 In general, this equation would include predictor terms, such as 𝛽 𝑥 , but they In the context of machine learning, this is known as a baseline 𝑖 1 𝑖 𝑗 1 are omitted for clarity as mentioned above. or a dummy model, which predicts the same value for all days 2 Similarly, we could write the equation for person specific regression coefficients and participants: the mean. as 𝛽 = 𝛾 + 𝑢 and also model person-specific predictors as 𝛾 𝑊 . 1𝑖 10 1𝑖 01 𝑖 32 Assessing Sources of Variability of Hierarchical Data Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia proportion of variance already explained at the subject-level, so Linear regr. Ridge regr. the total proportion of explained variance increased to 68 %. Lasso regr. This is also illustrated in Figure 3 which shows that individual Bayesian Ridge regr. statistic days differ from the overall mean by maximum of 1.5 points. On RANSAC regr. Support vector regr. max the ordinal axis, the random effects are ordered by participant, Kernel Ridge regr. similarly to Fig. 2. Within participants, however, the data are ormedian Gaussian Process regr. dered consecutively by date. This is manifested in the noisy struc- Random Forest regr. XGBoost regr. ture of the confidence intervals as opposed to the monotonously ADA Boost regr. increasing random effects shown in red points. -1.0 -0.5 0.0 0.5 1.0 value Figure 1: Median and maximum 2 𝑅 value as achieved by different regression methods in a leave-one-subject-out validation scheme. day : all be done on the lowest, daily level. To get an idea of whether a participant more fine-grained analysis of the data might be warranted, we turned to variance partitioning. 3.2 Sources of Variability -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 As mentioned in Section 2.2, the data for machine learning exper-random effects with confidence intervals iments were standardized within participants, i.e., the normal- ization was personalized. In multilevel modelling terms, this is Figure 3: The offset of random effects of interaction terms equivalent of introducing a participant random effect. By defining of person and day (roughly corresponding to person-day- an intercept-only linear mixed model using the lme4 library [3], specific means of stress in one EMA session) main intercept it turned out that the variance explained by these person-specific effect (roughly corresponding to the overall mean). 2 intercepts was 𝜎 = 0.20, which amounted to 57 % of the total 𝑢 variance. By including day-specific intercepts, this model performs sig- The random effect of participants is illustrated in Figure 2. It 2 nificantly better ( 𝜒 = 509, 𝑝 < 0.001). We next consider what shows that the participants differ in how they evaluated their that means in the context of machine learning. negative affect. Their mean assessments are mostly distributed within 1 point away from the overall mean, but some differed 4 DISCUSSION from it by almost 2 points When considering two sources of variability, the person and the day level, we showed that much of the total variance can be ascribed to within-person differences. This can be interpreted to confirm the merit of personalized normalization of the data, but other interpretations are also possible. It should be noted that we only dealt with the target variable in this work. Thus, variance partitioning does not help with deciding whether to normalize independent variables. In general, it is advised to normalize physiological data since there exists participant inherent variability of physiological functioning in the general population [18]. Similarly, explorative analysis indicated that phone sensors vary across devices and it is also feasible to assume that people’s phone usage varies significantly (independent of their stress level). -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 For the target variable itself, the proportion of variance ex- random effects with confidence intervals plained with within-person differences can be interpreted in at least two ways. Either the participants were on average exposed Figure 2: The offset of person random effects (roughly cor- to different levels of stress and this is why their assessments differ responding to person-specific means of daily stress) from in a systematic way. Alternatively, participants can have differing the main intercept effect (roughly corresponding to the thresholds of evaluating something as stressful. Since the self- overall mean). reports are completely subjective, it is not possible to differentiate between these two interpretations with the self-assessments as Next, we considered a three-level model with data aggregated labels. It would be possible to explore this further by taking phys-on an EMA session basis. We modelled a random effect by varying iological measures as ground truth for stress and use them to the intercept among subjects and among days within subjects. explain subjective measures. Treating the physiological measures The variance that was explained by adding the day level was as universal is problematic, however, and they might not even 2 𝜎 = 0.08 or 11 % of the total variance. This is in addition to the be related to stress deterministically. Physiological responses 𝑣 33 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Junoš Lukan, Larissa Bolliger, Els Clays, Primož Šiško, and Mitja Luštrek are not specific to different stress states, but rather a more com-measurements. A review. Journal of Biomedical Informatics, 59, (Feb. 2016), 49–75. doi: 10.1016/j.jbi.2015.11.007. plex relationship exists between the stimuli, physiology, and the [3] Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. 2015. Fitting parameters that control dynamics between them [7]. linear mixed-effects models using lme4. Journal of Statistical Software, 67, 1. Finally, normalization is not at all the only option of removing doi: 10.18637/jss.v067.i01. [4] Larissa Bolliger, Ellen Baele, Elena Colman, Gillian Debra, Junoš Lukan, the person-specific variation. Methods such as linear discriminant Mitja Luštrek, Dirk De Bacquer, and Els Clays. 2022. The association between analysis offer ways that have been shown to perform better [1]. day-to-day stress experiences, recovery, and work engagement among office Including person-day random effects in the three-level model, workers in academia. An ecological momentary assessment study. PLOS ONE. Submitted. the intercept model performs better than the one with only per- [5] Larissa Bolliger, Gillian Debra, Junoš Lukan, Rani Peeters, Mitja Luštrek, son random effects included. Following the same reasoning as for Dirk DeBacquer, and Els Clays. 2022. The association between day-today stress experiences and work–life interference among office workers the two level model, this could be interpreted that day-specific in academia. An ecological momentary assessment study. International normalization would be beneficial. There are several arguments Archives of Occupational and Environmental Health. doi: 10.1007/s00420-02 against this interpretation, however. 2- 01915- y. In press. [6] Larissa Bolliger, Junoš Lukan, Mitja Luštrek, Dirk De Bacquer, and Els Clays. First, as indicated in Section 2.2, participants responded to 2020. Protocol of the stress at work (STRAW) project: how to disentangle questionnaires 5 or 6 times a day. Standardizing with this lit-day-to-day occupational stress among academics based on EMA, physio-tle data is dubious, while using such small samples for feature logical data, and smartphone sensor and usage data. International Journal of Environmental Research and Public Health, 17, 23, (Nov. 2020), 8835. doi: selection or model validation is unacceptable. Second, the ques- 10.3390/ijerph17238835. tionnaire data are not truly continuos, but in fact interval data [7] Justin Brooks, Joshua C. Crone, and Derek P. Spangler. 2021. A physiological and dynamical systems model of stress. International Journal of Psychophys- (at best) that can take 5 possible values. Since each EMA session iology, 166, (Aug. 2021), 83–91. doi: 10.1016/j.ijpsycho.2021.05.005. included only two items from each questionnaire, aggregating at [8] Daniel J. Brotman, Sherita H. Golden, and Ilan S. Wittstein. 2007. The car-this level brings the number of possible values to only 9. Aggre- diovascular toll of stress. The Lancet, 370, 9592, 1089–1100. doi: 10.1016/s01 40- 6736(07)61305- 1. gating on a daily level, however, summarises about 10 different [9] Denzil Ferreira, Vassilis Kostakos, and Anind K. Dey. 2015. AWARE: Mobile measurements, increasing the resolution to 0.1 point. This makes context instrumentation framework. Frontiers in ICT, 2, 6, 1–9. doi: 10.3389 daily means much closer to a continuous variable which can be /f ict.2015.00006. [10] Andrew Gelman and Jennifer Hill. 2006. Data Analysis Using Regression modelled by regression methods. and Multilevel/Hierarchical Models. Cambridge University Press, 648. isbn: We can therefore argue that normalizing data by considering 9780521686891. [11] Vito Janko, Matjaž Boštic, Junoš Lukan, and Gašper Slapničar. 2021. Library each day as a separate unit is not appropriate. We can conclude, for feature calculation in the context-recognition domain. In Proceedings of however, that treating each EMA session as its own instance is the 24nd International Multiconference INFORMATION SOCIETY – IS 2021. beneficial. As stated in Section 3.2, analysis on the EMA session Slovenian Conference on Artificial Intelligence (Ljubljana, Slovenia, Oct. 4– 8, 2021). Mitja Luštrek, Rok Piltaver, and Matjaž Gams, editors. Vol. A, 23–26. level can explain at least 11 % of variance that is not captured https://library.ijs.si/Stacks/Proceedings/Inf ormationSociety/2021/IS2021 by the variability between participants. This conclusion is also _Volume_A.pdf . illustrated in Figs. 2 and 3: while the general pattern of random [12] Junoš Lukan, Larissa Bolliger, Els Clays, Oscar Mayora, Venet Osmani, and Mitja Luštrek. 2021. Participants’ experience and adherence in repeated effects shown by red points in Fig. 3 can already be sensed in measurement studies among office-based workers. In Adjunct Proceedings Fig. 2, the noisy structure of confidence intervals is noticeable of the 2021 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2021 ACM International Symposium on and worth exploring further. Wearable Computers (Virtual, Sept. 21–24, 2021). ACM. doi: 10.1145/346041 8.3479367. 5 CONCLUSIONS [13] Junoš Lukan, Larissa Bolliger, Nele S. Pauwels, Mitja Luštrek, Deirk De Bacquer, and Els Clays. 2022. Work environment risk factors causing day-to-Multilevel models are a well established method in medical, bio- day stress in occupational settings. A systematic review. BMC Public Health, logical, and social sciences for analysing nested and longitudinal 22, 1. doi: 10.1186/s12889- 021- 12354- 8. [14] Jan Luts, Geert Molenberghs, Geert Verbeke, Sabine Van Huffel, and Johan data. In machine learning, research of comparable methods is A. K. Suykens. 2012. A mixed effects least squares support vector machine in its early stages [15]. Some tree-based methods are capable model for classification of longitudinal data. Computational Statistics & Data of taking into account hierarchical (or clustered) nature of data, Analysis, 56, 3, 611–628. doi: 10.1016/j.csda.2011.09.008. [15] Daniel Patrick Martin. 2015. Efficiently Exploring Multilevel Data with Recur-such as MixRF [21], and least squares support vector machines sive Partitioning. PhD thesis. University of Virginia, Enfield, Connecticut. (LS-SVM) have been extended for handling longitudinal data, [16] Felix Mölder et al. 2021. Sustainable data analysis with snakemake. Version 2. F1000Research, 10, 33. doi: 10.12688/f1000research.29032.2. peer review: 2 resulting in a mixed effects LS-SVM [14]. approved. The aim of this paper was not to build multilevel models, [17] F. Pedregosa et al. 2011. Scikit-learn: machine learning in Python. Journal statistical or machine learning ones, but rather use variance par-of Machine Learning Research, 12, 2825–2830. [18] R. W. Picard, E. Vyzas, and J. Healey. 2001. Toward machine emotional titioning to explore how different levels of nested data can be intelligence: analysis of affective physiological state. IEEE Transactions on leveraged. We have shown that while standardization or simi-Pattern Analysis and Machine Intelligence, 23, 10, 1175–1191. doi: 10.1109/34 lar techniques do not lend well to the lowest level due to small .954607. [19] Jianhua Tao and Tieniu Tan. 2005. Affective computing: a review. In Affective sample size, restricting analysis to a higher level discards an im-Computing and Intelligent Interaction. Springer Berlin Heidelberg, 981–995. portant part of variance. In this way, variance partitioning can doi: 10.1007/11573548_125. [20] Julio Vega, Meng Li, Kwesi Aguillera, Nikunj Goel, Echhit Joshi, Kirtiraj help us build better machine learning models by enabling us to Khandekar, Krina C. Durica, Abhineeth R. Kunta, and Carissa A. Low. 2021. systematically explore different levels of hierarchical data and Reproducible analysis pipeline for data streams. Open-source software to decide what data transformations to apply to each level. process data collected with mobile devices. Frontiers in Digital Health, 3. doi: 10.3389/f dgth.2021.769823. [21] Jiebiao Wang, Eric R. Gamazon, Brandon L. Pierce, Barbara E. Stranger, REFERENCES Hae Kyung Im, Robert D. Gibbons, Nancy J. Cox, Dan L. Nicolae, and Lin S. [1] Folami Alamudun, Jongyoon Choi, Hira Khan, Beena Ahmed, and Ricardo Chen. 2016. Imputing gene expression in uncollected tissues within and Gutierrez-Osuna. 2012. Removal of subject-dependent and activity-dependent beyond GTEx. The American Journal of Human Genetics, 98, 4, 697–708. doi: variation in physiological measures of stress. In Proceedings of the 6th In- 10.1016/j.ajhg.2016.02.020. ternational Conference on Pervasive Computing Technologies for Healthcare. [22] David Watson, Lee Anna Clark, and Auke Tellegen. 1988. Development and IEEE. doi: 10.4108/icst.pervasivehealth.2012.248722. validation of brief measures of positive and negative affect. The PANAS [2] Ane Alberdi, Asier Aztiria, and Adrian Basarab. 2016. Towards an automatic scales. Journal of Personality and Social Psychology, 54, 6, 1063–1070. doi: early stress recognition system for office environments based on multimodal 10.1037/0022- 3514.54.6.1063. 34 Academic Performance Relation with Behavioral Trends and Personal Characteristics: Wearable Device Perspective Berrenur Saylam, Ekrem Yusuf Ekmekci, Eren Altunoğlu, Ozlem Durmaz Incel Computer Engineering Department, Boğaziçi University İstanbul, Turkey {berrenur.saylam,ozlem.durmaz}@boun.edu.tr ABSTRACT surveys corresponding to the subject’s origin, sex, education level, Understanding the relevant factors related to students’ academic bad habits, as well as state-of-the-art sleep, big five, mental health performance can help to construct a more precise methodology inventories (the details are given in Table 1). We aim explore the for conducting successful academic life. Several studies examine factors affecting students’ academic performances. the relationship between students’ lives and academic perfor- We utilize the NetHealth open source data [5] which contains mances using statistical techniques with subjective responses students’ sleep routines, daily physical activities, communication collected via questionnaires in the literature. In the last decade, behaviors collected with mobile phones, and a detailed survey wearable devices, such as smartwatches and smartphones, have about family history, living conditions, and personality. Data gained popularity in the research community since they can pro- related to sleep and activity is collected from wearable devices vide objective measurements of the users’ activity, sleep, and and documented. We aim to find the relation between some of mood states with integrated sensors. It is possible to extract the abovementioned aspects and academic performance. markers related to individuals’ physiological and psychological We have a large dataset from different academic periods (waves) states. This study explores the most important factors from wear- and various survey data. However, the surveys were not filled ables and questionnaires about students’ academic grades using in every period, hence, we focused on one period with the least the NetHealth dataset. We utilize machine learning techniques, amount of missing information. Before applying our models, we specifically Random Forest, rather than classical statistical ana- performed a preprocessing procedure by imputing the data with lyzes in literature. We believe that we contribute to interpreting proper techniques to handle missing values and preparing them the underlying factors related to grade by examining objectively- for the final analysis. We utilized machine learning techniques, measured multi-modal datasets. We also focus on classifying the specifically Random Forest (RF) algorithm, both for factor selec- grades with Random Forest and achieve overall 76% accuracy. tion and classification. In addition, we provide essential parame- The most important factors affecting academic performance are ters for the student’s academic performance. These are related observed to be sleep, big five personalities, health, and mental to sleep, big five personalities, health, mental health, personal health. information, and origin data in order. We believe that these in- formation can be helpful in understanding affecting factors for KEYWORDS further improvement of student life to get better performance Wearable computing, machine learning, multi-modality, well- during their academic life. being, pervasive computing, student grades, behavioral patterns, One of the essential contribution of our work is bringing dif- personality traits ferent factors together and trying to produce a combination of them. In that way, we aim to find the most important predictors 1 INTRODUCTION for students’ academic performance by combining other focus areas, such as sleep, mental health, and activities, in the scope of Understanding the underlying factors of academic performance one study. may help students to perform better throughout their academic Considering the studies utilizing NetHealth data, some are life. Many studies have investigated these factors affecting aca- analyzing the data on different topics such as biometric-based demic performance, including family history, psychological well- authentication [6], physical activity and sleep pattern [7]. There being, and physical activity [1, 2, 3, 4]. Some approached the are studies doing network analysis [8, 9], physical activity predic-situation from family history [1], and some focused on the ex- tion [10]. To the best of our knowledge, no similar study exists istence of physical activity in the curriculum [2]. Also, some among the listed papers. studies considered sleep based on self-reported measures [3]. The rest of the paper is organized as follows: In Section 2, However, they are based on one modality, focusing on one factor we explain state of the art on student grades studies and from and trying to understand its effect on the target (i.e., students’ point of wearable domain. In Section 3, we explain dataset details academic performance). This approach does not provide a meta-and the preprocessing steps for further analyses. In Section 4, understanding between different modalities. Thus, a multi-modal we present academic grade’s classification results with different approach is necessary to obtain a more expanded view. balancing strategies. We give factors for best case. Finally, in This study focuses on multi-modal data analysis collected Section 5, we discuss our findings with other future study ideas. from objectively measured wearable devices’ sensors and several Permission to make digital or hard copies of part or all of this work for personal 2 RELATED WORKS or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and Many related works exist about student’s academical perfor-the full citation on the first page. Copyrights for third-party components of this mance from the point of different domains such as educational, work must be honored. For all other uses, contact the owner/author(s). psychological and smartphone sensing [11, 12, 13, 14]. Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia © 2022 Copyright held by the owner/author(s). Objectively measured signals sensed from wearables applied into the research field related to student’s mental health and 35 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Saylam, et al. academic performance, to the best of our knowledge, starts with • Network survey data: Interactions’ network data with the StudentLife [11] project. related information such as relationship type, duration, In [12], authors collect the day-to-day and week-by-week frequency of interaction, similarity, etc. impact of workload on stress, sleep, activity, mood, sociability, mental well-being and academic performance via smartphone 3.2 Preprocessing sensors. They examined strong correlations between smartphone As stated in Section 3.1, there are 8 waves. Each wave has different sensors and student’s mental health along with their academical survey questions and thus responses. For instance, in waves scores by not counting behavioral differences. 1, 2, 3, 7, there are no questions related to stress, while in 4, 5, 6, 8, In [13], authors extracted related factors to the students aca-there are. Similarly, sleep ground truth is not collected during demical grades from academic related behaviors, personality, the study waves 5, 7. Thus, we chose to work on wave 1 as it affect, stress, lifestyle and sensed behaviors with wearables. They contains relatively higher responses than other waves. modelled behavior change points to capture individual’s behav- Firstly, we constructed a sub-dataset from NetHealth concen- iors while having the same final grade. One of the findings is trating on our purpose. The details are explained in Section 3.2.1. study duration has positive correlation with the final grade. Then, we prepossessed our data by deleting highly correlated In [14], researchers examined the relation between wearable ones (in Section 3.2.3). Finally, we applied the Random Forest device sensors and survey with student’s grade in a similar man-algorithm for the rest of the study. ner. They used SVM with different kernel setups. They found social features such as negative email contacts and negative inter- 3.2.1 Dataset Preparation. As the dataset includes many differ- actions are lower on students with high GPA. Also, accelerometer ent data types, each of them has various parameters, we decided sensor in wearables have an impact on discriminating the higher which parameters to use before starting our study. We consid- and lower performants. This study is similar to our experiment, ered all parameters from wearable devices and course-grades where there is multi-modal data from wearable sensors and sur- datasets. However, we selected some of the collected data from veys. We also examine the related factors on different datasets, the survey dataset. Surveys constitute, mainly, bad habits, big-five but our study also explores class balancing scenarios. personality inventory, education, exercise, health, mental health, origin, personal information, sex, and sleep related answers. We 3 METHODOLOGY used only the summarizing parameters provided by the survey for mental health, personal information, and sleep. We select 3.1 Dataset some parameters from the origin category manually. We used We utilized the NetHealth dataset1. It is collected from under-the parameters of parents’ status, economic condition, number graduate students from Notre Dame (ND) University between of siblings, and religion. Table 1 gives the final list of utilized Fall 2015 and Spring 2019. Thus, there are 8 waves corresponding parameters. At the end of the naming, some parameters have _1 to each semester. There are approximately 700 students’ data indications, which relate to the measuring from wave1. from the 2015 − 2017 period and 300 from the 2017 − 2019 period caused by the drops in participation. Data collection consists of 3.2.2 Handling Missing Values. Once the dataset was prepared the social network, physical activity, sleep data from Fitbit wear- for analysis, we noticed missing values over columns. We pre- able device, and ground truth data from questionnaires about ferred to keep these columns and impute them since they are physical and mental health, social-psychological states, tastes, partially missed. We applied the most frequent imputation tech- and various self-reported behaviors, demographics, and back- nique to the categorical ones and the mean imputation technique ground traits. The collection procedure is approved through IRB to the numerical ones. However, there is enough correlation for protocols, and each participant has consented. Nevertheless, not activity-related wearable data to use the KNN imputation tech- all data collection is publicly shared due to privacy concerns. nique. Thus, we used this technique. Finally, sleep data from The details of the collected dataset per modality are as follows. wearables did not contain any missing values. We performed our study with boldly-marked sub-datasets. 3.2.3 Correlation. We checked the correlation between parame- • Communication data: Collection of smartphone-based com- ters to reduce dimensionality. We deleted the ones which exhibit munication logs data. higher than %80 correlations. These are cardiomins, fatburnmins, • Wearable data: Collected measurements regarding activ- lowrangemins, minsasleep, minsawake, peakmins parameters. We ity and sleep such as the number of steps, active minutes, can deduct the information related to them from other parame- heart rate, sleep duration, sleep time, and awaken time ters, for instance, cardiocals for cardiomins and fatburnmins. We using Fitbit. decided on the threshold value after many experiments. When • Courses and grades data: Administrative records from we increase it, we keep the highly correlated ones, and when we ND Registrar’s Office containing course and grade infor- decrease the threshold, more parameters will be deleted, which mation. causes unnecessary parameter loss. Eliminating them prevents • Calendar: Weekly calendar showing the days about the misleading results due to highly correlated features in detecting beginning of classes, break weeks, holidays, etc. interactions between different features. We had 93 parameters. • Survey data: Self-reported questionnaires related to phys- After removal of the 6 highly correlated ones, we have 87 features. ical and mental health, social-psychological states, tastes, and various self-reported behaviors, and demographics 3.2.4 Target value’s distribution. In this study, we are working towards the identification of important parameters and the appli- and background traits. cation of machine learning methods regarding students’ grades. Thus, before starting the analysis, we examined target values, i.e., student grades distribution, to observe whether there is class 1http://sites.nd.edu/nethealth/ imbalance. The distribution is in Figure 1. Here, it is seen that we 36 Academic Performance via Wearable Devices Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Table 1: Details of the features Dataset Measured Values complypercent (percent minutes using Fitbit), meanrate (mean heart rate), sdrate (st. dev. heart rate), steps, floors, sedentaryminutes, Wearable data lightlyactiveminutes, fairlyactiveminutes, veryactiveminutes, (Activity) lowrangemins (low range minutes), fatburnmins, cardiomins, peakmins, lowrangecal, fatburncal, cardiocal, peakcal timetobed (time went to bed), timeoutofbed (time out of bed), bedtimedur (minutues in bed in minutes), minstofallasleep (minutes to fall asleep), Wearable data minsafterwakeup (minutes in bed after waking), minsasleep (minutes asleep), (Sleep) minsawake (minutes awake during sleep period), Efficiency (minsasleep/(minsasleep + minsawake) Courses and grades AcademicPeriod, CourseReferenceNumber, FinalGrade usetobacco_1 (used tobacco), usebeer_1 (drank beer), usewine_1 (drank wine or liquor), Survey data usedrugs_1 (used rec drugs like marij. or cocaine), (Bad habits) usedrugs_prescr_1 (used presc. drugs not prescribed), usecaffine_1 (drank caffenated drinks) Survey data Extraversion_1, Agreeableness_1, Conscientiousness_1, (BigFive/Personal inventory) Neuroticism_1, Openness_1 hs_1 (high school type), hssex_1 (high school sex composition), Survey data hsgrade_1 (high school average grade), apexams_1 (# of hs ap exams), (Education) degreeintent_1 (highest intended degree), hrswork_1 (paid hours senior year), ndfirst_1 (Notre Dame first choice of applied colleges?) hsclubrc_1 (club activities), exercise_1 (excersise), Survey data clubsports_1 (play club, intramural or rec sports) , (Exercise) varsitysports_1 (play varsity sports), swimming_1 (swim), Dieting_1 (special type of diet), PhysicalDisability_1 (physical disability) SelfEsteem_1 (on the whole, I am satisfied with myself), Trust_1 (most people can be trusted), SRQE_Ext_1 (external self-regulation (exercise)), SRQE_Introj_1 (introjective self-regulation (exercise)), Survey data SRQE_Ident_1(identified self-regulation (exercise), (Health) SelfEff_exercise_scale_1 (when i am feeling tired), SelfEff_diet_scale_1 (self_efficacy score (diet items)), selfreg_scale_1 (i have trouble making plans to help me reach my goals) STAITraitTotal_1 (state_trait anxiety score), CESDOverall_1 (CES depression score), Survey data BAIsum_1 (beck anxiety score), STAITraitGroup_1 (state_trait anxiety 2 category), (Mental health) CESDGroup_1 (CES depression - 2 categories), BAIgroup_1 (beck anxiety (3 category)), majorevent_1 (life changes) momdec_1 (is your mother deceased?), momusa_1 (was mother born outside usa?), daddec_1 (is your dad deceased?), dadusa_1 (was your dad born outside usa?), parentstatus_1 (parents living together or divorced/living apart), Survey data dadage_1 (father’s age), momage_1 (mom’s age), numsib_1 (number of siblings), (Origin) birthorder_1 (which # in birth order are you?), parentincome_1 (parent’s total income last year), parenteduc_1 (combined parent education), momrace_1 (mother’s race), dadrace_1 (father’s race), momrelig_1 (mother’s religious preference), dadrelig_1 (father’s religious preference), yourelig_1 (your religious prefence) selsa_rom_1 (romantic loneliness), Survey data selsa_fam_1 (family loneliness), (Personal info) selsa_soc_1 (social loneliness) Survey data (Sex) gender_1 (gender) PSQI_duration_1 (computed time in bed), PSQIGlobal_1 (PSQI total score), Survey data PSQIGroup_1 (PSQI two categories), (Sleep) MEQTotal_1 (MEQ (chronotype) score - high score morning person), MEQGroup_1 (MEQ (chronotype) groups - 5 categories)) have A grade on the majority, and we have very few instances the same class distribution [15]. After SMOTE, we got 41856 from the B-, C+, C, C- classes. More specifically, we have 41856, instances from each class. 19321, 10048, 7265, 2526, 1617, 1258, 354, 4346 from classes A, A-, B+, B, B-, C+, C, C-, S (satisfactory), respectively. To well classify minority classes, we applied the SMOTE (synthetic minority over-sampling) technique to produce synthetic data by keeping 37 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Saylam, et al. Table 2: Classification performance details precision recall f1-score support A 0.53 0.56 0.55 10434 A- 0.52 0.44 0.47 10410 B 0.66 0.71 0.68 10449 B+ 0.67 0.60 0.63 10507 B- 0.85 0.87 0.86 10507 C 0.89 0.92 0.91 10494 C+ 0.88 0.89 0.88 10418 Figure 1: Target value distribution: Grade C- 0.98 0.98 0.98 10502 S 0.78 0.83 0.80 10455 accuracy 0.76 94176 3.3 Model details and performance metrics macro avg 0.75 0.75 0.75 94176 weighted avg 0.75 0.76 0.75 94176 As a classification method, we used RF algorithm because it is an ensemble method and performs better than the other used methods in literature in this domain [16]. The used parameters for RF are n estimators 1000, criterion Gini, and max features sqrt from scikit-learn toolkit2. %75 and %25 train and test sizes are chosen, respectively. 4 CLASSIFICATION PERFORMANCE EVALUATION Since our target variable is already categorical, we used the dataset after preprocessing without any other change in the classification task. In Table 2, we present f1-score details of each class performance and the global average of the f1-scores with the accuracy metric. We obtained %76 average accuracy. We see that the best performances are achieved for the classes B-, C+, C-, Figure 2: Confusion Matrix S. Before SMOTE application, it was %65 average accuracy; fur- thermore, we had lower f1 scores for these indicated classes, but we did not present the details due to the page limit. The confused instances may be observed in Figure 2. For instance, A class is confused mostly with A+ with an important ratio. It is expected since these are very close classes. The class S is mostly confused with others. It can be interpreted as expected since a satisfac- tory result corresponds to passing the course. SMOTE generates instances based on a similarity measurement rather than replicat- ing existing ones. Thus, the bias is relatively lower compared to simple replications of instances since these are newly generated ones. Nevertheless, we also applied the under-sampling strategy and down-sampled higher class instances to be equal to the class Figure 3: Feature Importance for Classification with fewer instances. Thus, we obtained 354 instances for each class. When we applied RF to that data, we obtained even worse performance, which is 47% average accuracy. It is expected since 5 DISCUSSION AND CONCLUSION we deleted most data points, so learning with few instances led In this study, we applied a machine learning technique, RF, to to lower results. see how accurately we can classify and predict students’ grades In addition, in Figure 3, we provide the most critical factors using surveys and wearable data. In addition, we extract the most to obtain this classification performance by calculating the most important factors affecting the model’s performance. Results important 20 parameters via RF feature selection. The order is indicate sleep, big five, health, mental health, personal information, following: MEQTotal (sleep), Trust (health), Extraversion (big five), and origin survey parameters have higher effects on performance. selsa_soc (personal info), selsa_rom (personal info), Openness (big We differ from state-of-the-art [12, 13, 14] by applying SMOTE. five), Neuroticism (big five), SRQE_Ext (health), dadage (origin), For further research, one may examine other waves since there PSQI_duration (sleep), PSQIGlobal (sleep), BAISum (mental health), are 8 to obtain more instances from each class. Also, since the hsgrade (education), SRQE_Introj (health), CESDOverall (mental dataset is collected from one of the top University students, it health), SelfEff_exercise_scale (health), Agreeableness (big five), is expected to have higher grades, i.e., A, A+. Thus, applying a momage (origin), MEQGroup (sleep). The explanation of these similar experimental data collection setup to students with lower parameters is presented in Table 1. We can interpret this result performances in the courses may be helpful. as the most important factors arrive from survey datasets. The important sub-surveys are sleep, big five, health, mental health, ACKNOWLEDGEMENTS personal information, and origin. Tübitak Bideb 2211-A academic reward is gratefully acknowl- edged. This work is supported by The Turkish Ministry of Devel- 2https://scikit-learn.org/stable/ opment under the TAM Project number DPT2007K120610. 38 Academic Performance via Wearable Devices Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia REFERENCES [9] Liu, Shikang, et al. “Network analysis of the NetHealth data: exploring co- [1] Misty, Lacour, and D. Tissington Laura. “The effects of poverty on academic evolution of individuals’ social network positions and physical activities." Applied achievement." Educational Research and Reviews 6.7 (2011): 522-527. network science 3.1 (2018): 1-26. [2] Shephard, Roy J. “Curricular physical activity and academic performance." [10] Faust, Louis, et al. “Physical activity trend extraction: a framework for extract-Pediatric exercise science 9.2 (1997): 113-126. ing moderate-vigorous physical activity trends from wearable fitness tracker data." [3] Purta, Rachael, et al. “Experiences measuring sleep and physical activity pat-JMIR mHealth and uHealth 7.3 (2019): e11075. terns across a large college cohort with fitbits." Proceedings of the 2016 ACM [11] StudentLife Dataset 2014. http://studentlife.cs.dartmouth.edu/. international symposium on wearable computers. 2016. [12] Wang, Rui, et al. “StudentLife: assessing mental health, academic performance [4] GOMES, Maria V., Luciano Francisco Sousa ALVES, and Louelson AL COSTA. and behavioral trends of college students using smartphones." Proceedings of the "de Azevedo." Dinâmica socioespacial urbana de Cuité–PB resultante da implan-2014 ACM international joint conference on pervasive and ubiquitous computing. tação do campus de saúde e educação da UFCG. 152f. João Pessoa (2014). 2014. [5] Purta, Rachael, et al. “Experiences measuring sleep and physical activity pat- [13] Wang, Rui, et al. “SmartGPA: how smartphones can assess and predict aca-terns across a large college cohort with fitbits." Proceedings of the 2016 ACM demic performance of college students." Proceedings of the 2015 ACM international international symposium on wearable computers. 2016. joint conference on pervasive and ubiquitous computing. 2015. [6] Vhaduri, Sudip, and Christian Poellabauer. "Multi-modal biometric-based im- [14] Sano, Akane, et al. “Recognizing academic performance, sleep quality, stress plicit authentication of wearable device users." IEEE Transactions on Information level, and mental health using personality traits, wearable sensors and mobile Forensics and Security 14.12 (2019): 3116-3125. phones." 2015 IEEE 12th International Conference on Wearable and Implantable [7] Purta, Rachael, et al. “Experiences measuring sleep and physical activity pat-Body Sensor Networks (BSN). IEEE, 2015. terns across a large college cohort with fitbits." Proceedings of the 2016 ACM [15] Chawla, Nitesh V., et al. “SMOTE: synthetic minority over-sampling technique." international symposium on wearable computers. 2016. Journal of artificial intelligence research 16 (2002): 321-357. [8] Fridmanski, Ethan, et al. “Clustering in a newly forming social network by [16] Can, Yekta Said, et al. “Stress detection in daily life scenarios using smart subjective perceptions of loneliness." Journal of American College Health (2020): phones and wearable sensors: A survey." Journal of biomedical informatics 92 1-6. (2019): 103139. 39 Detection of postpartum anemia using machine learning David Susič Lea Bombač Tavčar Hana Hrobat david.susic@ijs.si bombac.lea@gmail.com hana.hrobat@icloud.com Jožef Stefan Institute University Medical Centre Ljubljana, University of Ljubljana, Faculty of Jamova cesta 39 Division of Gynaecology and Medicine Ljubljana, Slovenia Obstetrics Vrazov trg 2 Šlajmerjeva 3 Ljubljana, Slovenia Ljubljana, Slovenia Lea Gornik Miha Lučovnik Anton Gradišek lea.gornik@gmail.com miha.lucovnik@kclj.si anton.gradisek@ijs.si University of Ljubljana, Faculty of University Medical Centre Ljubljana, Jožef Stefan Institute Medicine Division of Gynaecology and Jamova cesta 39 Vrazov trg 2 Obstetrics Ljubljana, Slovenia Ljubljana, Slovenia Šlajmerjeva 3 Ljubljana, Slovenia ABSTRACT unacceptably high prevalence of anaemia in women after child- birth in both, up to 50% in developed and up to 80% in developing Postpartum anemia is seen as a health problem and should be countries [4], it appears to be of great importance to treat iron treated. We evaluate performance of nine machine learning re-deficiency effectively. In addition to the increased transfusion gression models in predicting the postpartum anemia six weeks risk, peripartum iron deficiency anaemia can affect the wellbeing after childbirth. We focus on tree key parameters: ferritin, haemoglobin, of both the mother and child. It causes cardiovascular symp-and transferrin saturation. Our models are compared with the toms like palpitations and dizziness, breathlessness. It increases baseline model, which always predicts the mean value of the a risk of infections as well as excessive postpartum bleeding. Fur- training data. We found that the models for ferritin and trans- thermore, postpartum anemia adversely affects maternal mood, ferrin saturation have good predictive performances, whereas cognition, and behavior resulting in increased fatigue, reduced this was not the case for haemoglobin prediction, as all of the physical and mental performance [6]. This is associated with implemented models were outperformed by the baseline model. several negative consequences, such as impaired health-related KEYWORDS quality of life [3]. Impaired health-related quality of life linked to postpartum anemia include depression, fatigue, and reduced postpartum anemia, haemoglobin level, machine learning cognitive abilities. All of these symptoms significantly interferes with mother-child interactions and impact a woman’s ability to 1 INTRODUCTION breastfeed [1]. Postpartum anemia is a common maternal health problem glob- Postpartum anemia should be treated by restoring iron stores. ally and constitutes a significant health problem in women after Although there is a number of treatment options for women with birth, even in the developed world. Women may develop it either postpartum anaemia, the debate about iron supplementation and because of antepartum depletion of iron stores or peripartum the ideal form of administration is ongoing and is not universal excessive blood loss [1]. It is associated with several negative con-in all countries. Currently, common treatment includes iron sup- sequences, such as maternal fatigue [2, 3]. With the unacceptably plementation administered orally or intravenouslly (IV). The tra-high prevalence of anaemia in women after childbirth in both, ditional treatment for mild to moderate iron deficiency anaemia up to 50% in developed and up to 80% in developing countries [4], is oral supplementation of iron with iron sulfate perorally be- it appears to be of great importance to treat iron deficiency effec-cause of its low cost and simple use. There are advantages and tively. Ferrum sulphate perorally is the most commonly used iron disadvantages of either of the two approaches, which we will not for pospartum anemia because of its low cost and simple use. Def- go into detail here. Since the postpartum anaemia contributes inition of postpartum anaemia rely on haemoglobin values alone, to a major healthcare problem even in developed countries, it defined as Hb level <100 g/L. Postpartum haemorrhage defined as is important to treat it efficiently [7]. However, IV iron may be a blood loss of 500 ml or more within 24 hours after birth is one of preferred because the non-compliance and absorption challenges the most frequent complications of delivery. This makes women of oral iron, but it includes increased drug costs and the need for vulnerable and frequently results in postpartum anemia. Conse-supervised treatment in healtcare institutions. Recent robust stud- quently, this increases the risk for a peripartum blood transfusion, ies have compared different iron preparations and there has been a treatment with potential severe adverse outcomes [5]. With the a network meta-analysis of different iron medications. However, no randomized clinical trial has directly compared intravenous derisomaltosie, intravenous carboxymaltose and peroral ferrous Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or sulphate for treatment of postpartum anemia, including fatigue distributed for profit or commercial advantage and that copies bear this notice and measurements. the full citation on the first page. Copyrights for third-party components of this In this paper, we address the question on predicting the post-work must be honored. For all other uses, contact the owner /author(s). Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia partum anemia six weeks after childbirth. We look at three key © 2022 Copyright held by the owner/author(s). parameters from blood tests that are related to anemia, namely 40 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Susič et al. Table 1: Dataset features. also be cause as a consequence of an injury during childbirth or Cesarean section. Typically, CRP levels are increased after Personal Blood test childbirth. If the high level of CRP (>8 mg/L) still persists after six weeks after the childbirth, this indicates inflammation. Age [years] Haemoglobin [g/L] Gestational age [weeks] Serum iron [𝜇 mol/L] 3 METHODOLOGY Number of children born TIBC [𝜇 gmol/L] The aim of this initial study was to evaluate the performance of Number of total pregnancies Transferrin saturation [%] several machine learning (ML) models in predicting the values of Number of total childbirths Ferritin [𝜇 g/L] haemoglobin, ferritin, and transferrin saturation levels in blood Number of total abortions Phosphate [mg/dL] of the anemia patients six weeks after childbirth, as these parame- Type of childbirth CRP [mg/L] ters are related to anemia. The input of the models were personal Transfusion features and the features of the blood test immediately after the Marital status childbirth. In each experiment, only one of the three quantities Education was the output. Thus, we ran three experiments with the same BMI before childbirth input and different outputs. Additionally, we ran additional sep- BMI after childbirth arate experiments for each of the three medication groups. We Medication compared our results with the baseline, which always predicted the mean output value of the training data. the ferritin, haemoglobin, and transferrin saturation. Using a 4 RESULTS database containing 296 patients that were diagnosed with ane- Our dataset included 224 patients with 20 predictor features. We mia, we investigate the possibilities to predict these relevant used mean absolute error (MAE), root mean squared error (RMSE), blood test values using machine-learning models. We present the and mean absolute percentage error (MAPE) as the evaluation results of our initial studies. metrics, with MAE as the main metric of performance evaluation. 2 DATA Formulas for calculation of MAE, RMSE , and MAPE are given in equations (1), (2), and (3). Parameter 𝑦 denotes predicted values, 𝑖 The initial dataset included 296 patients that were diagnosed 𝑥 denotes true values, and 𝑛 denotes the total number of data 𝑖 with anemia and 27 features that had some missing values. As points. this was our initial study, we did not perform any missing data inputation, but rather dropped the patients that had missing Í𝑛 |𝑦 − 𝑥 | 𝑖 𝑖 𝑖 =1 values in any of the columns. We were left with 224 patients that 𝑀 𝐴𝐸 = (1) 𝑛 had data for all 27 features. Based on the medications that the patients were given during their treatment, they can be separated √︄ Í𝑛 (𝑦 − 𝑥 )2 𝑖 𝑖 in three groups: 80 of the patients were treated with Iroprem, 75 𝑖 =1 𝑅𝑀 𝑆 𝐸 = (2) 𝑛 were treated with Monofer, and 69 were treated with Tardyfer. Both Monofer and Iroprem are IV medications with iron, while 𝑛 1 ∑︁ − 𝑥 𝑦 𝑖 𝑖 Tardyfer is administered orally as tablets. 𝑀 𝐴𝑃 𝐸 = (3) 𝑛 𝑥𝑖 The data included personal data and blood test results. Blood 𝑖 =1 tests were performed both right after the childbirth as well as six We implemented nine ML regression models. Regression mod-weeks after. The list of personal and blood test features is given els predict a continuous variable(s). Linear regression (LR), Kernel in Table 1. Ridge (KR), and elastic net regression (EN) find linear correla- In the dataset, there are 13 personal features and 2 · 7 blood test tions between the predictor features and the output. Bayesian features. Among personal features, gestational age corresponds ridge regression (BR) formulates linear regression using proba- the number of weeks since the last period. The type of child- bility distributions rather than point estimates. Support vector birth is a categorical variable and can either be vaginal delivery, regression (SVR) finds a hyper-plane in the feature space that has planned Cesarean section, or elective Cesarean section. Trans- maximum number of data points. Gradient boosting regressor fusion is a binary variable indicating whether a patient needed (GB), Light gradient boosting machine (LGBM), extreme grad- a blood transfusion after the childbirth or not. Marital status ing boosting regressor (XGB), and CatBoost regressor (CB) are is a categorical variable and can either be lives alone, married, ensemble methods that combine the predictions of multiple deci- or non-marital partnership. Education is ordinal variable of 10 sion tree regressors. A decision tree regressor uses a tree diagram different values with the lowest representing elementary school for decision making, where each branch is partitioned based on education and the highest representing a doctoral degree. Lastly, a threshold for a predictor feature. BMI stands for body mass index. The models trained on the whole dataset were compared in a In the blood test features, serum iron describes the amount 10-fold cross validation with the folds stratified with respect to of iron in the blood. TIBC stands for total iron binding capacity, the medication. The models trained for separate medication only which is a good indicator of the amount of iron in blood. If the iron were compared in a 5-fold cross validation due to the smaller level in blood is low, the TIBC is higher as the free capacity for dataset size. For each of the output variables, we also show a his- binding of the iron is higher. Transferrin saturation is the value togram of values distribution along with the mean and standard of serum iron divided by the TIBC of the available transferrin. deviation (SD). The higher the transferrin saturation, the bigger the iron stores The models’ training and performance evaluation was done in the body. Lastly, CRP stands for C-reactive protein, which using Python 3.7 and libraries Numpy 1.18.5 [8], Scikit 0.24.2 [9], is high is there is inflammation in the body. Inflammation can LightGBM 3.2.1 [10], XGBoost 1.4.2 [11], and CatBoost 0.26 [12]. 41 Detection of postpartum anemia Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia 4.1 Ferritin Table 3: Results for the prediction of ferritin for each med- ication separately. Distribution of ferritin blood levels six weeks after childbirth is given in Figure 1. We see that the patients that were given medication Tardyfer had significantly lower levels than those Model Iroprem MAE Monofer MAE Tardyfer MAE that were given medications Iroprem or Monofer. The mean and LR 93.65 70.85 41.33 SD values of the distribution are 185.88 𝜇 g/L and 141.31 𝜇 g/L, LGBM 86.19 57.03 21.74 respectively. Results of the regression models are given in Table XGB 95.48 62.44 19.43 2. CB 81.26 58.78 20.48 KR 98.90 69.93 33.11 EN 92.76 63.77 31.81 BR 96.24 61.47 21.99 GB 88.37 70.00 25.69 SVR 97.41 58.20 19.27 Baseline 94.61 55.87 23.42 Figure 1: Distribution of ferritin blood levels in patients six weeks after childbirth. Table 2: Results for the prediction of ferritin. −2 Model MAE RMSE MAPE [10 ] Figure 2: Distribution of haemoglobin blood levels in pa- tients six weeks after childbirth. CB 61.96 87.44 80.11 XGB 62.76 93.97 61.23 LGBM 63.07 88.31 65.88 Table 4: Results for the prediction of haemoglobin. GB 64.14 91.32 83.86 LR 68.42 89.45 86.26 −2 Model MAE RMSE MAPE [10 ] KR 69.3 90.62 80.2 Baseline 6.11 8 4.62 EN 79.64 99.56 158.81 BR 6.31 8 4.77 BR 80.43 101.93 135.88 SVR 6.33 8.01 4.80 Baseline 111.81 138.88 272.51 EN 6.56 8.22 4.96 SVR 112.91 140.25 260.76 LR 6.67 8.41 5.03 CB 6.74 8.34 5.10 We see that the best performing model according to both LGBM 7.16 8.93 5.41 metrics was the CB. Except for the SVR, other models have had XGB 7.2 9.19 5.44 similar performances to that of CB. Additionally, we see that GB 7.28 9.03 5.52 most of the models significantly outperform the baseline. KR 7.43 9.45 5.59 The results of the models performance of predictions for sepa- rate medications only are shown in Table 3. The models within each medication have similar performances. In the case of Monofer, We see that the models do not perform well in predicting all of the models’ performances are worse than that of the base- haemoglobin, as they perform worse than the baseline for both line. the general case and the separate medication cases. 4.2 Haemoglobin 4.3 Transferrin saturation Distribution of haemoglobin blood levels six weeks after child- Distribution of transferrin saturation in blood six weeks after birth is given in Figure 2. We see that the distributions are very childbirth is given in Figure 3. We see that the distributions are similar between all three medication groups. The mean and SD very similar between all three medication groups. The mean and values of the distribution are 133.87 g/L and 8.10 g/L, respectively. SD values of the distribution are 33.56 % and 11.53 %, respectively. Results are given in Tables 4 and 5. Results of the regression models are given in Table 6. 42 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Susič et al. Table 5: Results for the prediction of haemoglobin for each Table 7: Results for the prediction of transferrin saturation medication separately. for each medication separately. Model Iroprem MAE Monofer MAE Tardyfer MAE Model Iroprem MAE Monofer MAE Tardyfer MAE LR 6.99 7.04 8.17 LR 7.68 9.4 11.84 LGBM 5.46 6.41 8.21 LGBM 7.16 7.59 11.39 XGB 6.38 6.58 9.20 XGB 8.36 9.12 12.86 CB 5.75 6.58 8.03 CB 7.2 7.87 11.44 KR 7.65 7.13 8.63 KR 7.73 8.94 12.01 EN 5.69 6.43 7.36 EN 7.16 8.54 11.24 BR 5.31 6.45 7.33 BR 6.8 7.83 12.02 GB 5.79 7.23 9.41 GB 7.88 8.78 11.61 SVR 5.42 6.62 7.28 SVR 6.94 7.62 11.62 Baseline 5.17 5.85 7.22 Baseline 6.49 7.75 11.82 haemoglobin, and transferrin saturation. We compared the results with the baseline model, which always predicted the output mean of the training data. We found that the models for ferritin and transferrin saturation had good predictive performance, whereas this was not the case for haemoglobin prediction, as all models were outperformed by the baseline model. ACKNOWLEDGMENTS The authors acknowledge the funding from the Slovenian Re- search Agency (ARRS), Grant (PR-10495) and Basic core funding P2-0209. The dataset was collected as a part of the study Clini- caltrials.gov registration number NCT03957057. REFERENCES [1] Nils Milman. 2011. Postpartum anemia i: definition, prevalence, causes, and consequences. Annals of hematology, 90, 11, 1247–1253. [2] Kathryn A. Lee and Mary Ellen Zaffke. 1999. Longitudinal changes in fa-Figure 3: Distribution of transferrin saturation in blood of tigue and energy during pregnancy and the postpartum period. Journal of Obstetric, Gynecologic, & Neonatal Nursing, 28, 2, 183–191. patients six weeks after childbirth. [3] Kiyoshi Ando et al. 2006. Health-related quality of life among japanese women with iron-deficiency anemia. Quality of life research, 15, 10, 1559– 1563. Table 6: Results for the prediction of transferrin saturation. [4] Nils Milman. 2012. Postpartum anemia ii: prevention and treatment. Annals of hematology, 91, 2, 143–154. [5] Andreas Greinacher, Konstanze Fendrich, Ralf Brzenska, Volker Kiefel, and −2 Model MAE RMSE MAPE [10 ] Wolfgang Hoffmann. 2011. Implications of demographics on future blood supply: a population-based cross-sectional study. Transfusion, 51, 4, 702– KR 8.74 10.93 36.74 709. [6] Christian Breymann. 2005. Iron deficiency and anaemia in pregnancy: mod-LR 8.78 10.97 36.80 ern aspects of diagnosis and therapy. European Journal of Obstetrics & Gy-EN 8.82 11.14 38.45 necology and Reproductive Biology, 123, S3–S13. Baseline 8.88 11.12 39.16 [7] Lisa M. Bodnar, Anna Maria Siega-Riz, William C Miller, Mary E Cogswell, and Thad McDonald. 2002. Who should be screened for postpartum anemia? SVR 9.11 11.38 39.51 an evaluation of current recommendations. American journal of epidemiology, CB 9.11 11.31 38.81 156, 10, 903–912. BR 9.22 11.41 40.49 [8] Charles R. Harris, Jarrod K. Millman, Stefan J. van der Walt, Ralf Gommers, Pauli Virtanen, and David Caurnapeau. 2020. Array programming with GB 9.51 11.89 40.20 numpy. Nature, 585, 357–362. doi: https://doi.org/10.1038/s41586- 020- 2649- LGBM 9.55 12.10 39.58 2. [9] F. Pedregosa et al. 2011. Scikit-learn: machine learning in Python. Journal XGB 9.62 12.11 39.64 of Machine Learning Research, 12, 2825–2830. http://www.jmlr.org/papers/v olume12/pedregosa11a/pedregosa11a.pdf . [10] Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: a highly efficient gradient We see that the top three performing models outperform the boosting decision tree. In Proceedings of the 31st International Conference baseline, with the best model being the KR. The results of the on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., models performance of predictions for separate medications only Long Beach, California, USA, 3149–3157. isbn: 9781510860964. [11] Tianqi Chen and Carlos Guestrin. 2016. XGBoost: a scalable tree boosting are shown in Table 7. Unlike Monofer and Tardyfer, the models system. In Proceedings of the 22nd ACM SIGKDD International Conference do not perform well in the case of Iroprem. on Knowledge Discovery and Data Mining (KDD ’16). ACM, San Francisco, California, USA, 785–794. isbn: 978-1-4503-4232-2. doi: 10.1145/2939672.29 5 DISCUSSION AND CONCLUSION 39785. [12] Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika We evaluated nine classic machine learning regression models for Dorogush, and Andrey Gulin. 2018. Catboost: unbiased boosting with categorical features. In Proceedings of the 32nd International Conference on Neural the prediction of three key parameters associated with anaemia Information Processing Systems (NIPS’18). Curran Associates Inc., Montréal, collected from blood tests six weeks after childbirth: ferritin, Canada, 6639–6649. 43 Covid symptoms home questionnaire classification and outcome verification by patients Goran Jakimovski Dragana Nikolova Faculty of Electrical Engineering and Information Technology Faculty of Electrical Engineering and Information Technology University of “Ss. Cyril and Methodus” University of “Ss. Cyril and Methodus” Skopje, Macedonia Skopje, Macedonia goranj@feit.ukim.edu.mk nikolova.dragana98@gmail.com with Covid-19 data. They are using different approaches to get the best ABSTRACT results when combining ML and Covid. Furthermore, [4] is again using Testing for Covid, in a time of pandemic, can put a lot of overhead on CT scans and ML to classify patients as infectious or not, which would be useful to decrease infection spread amongst the population. the medical and testing facilities. Moreso, in a pandemic crisis, people become more hypochondriacs and get tested even if a slightest Much like in [1], authors in [5] are helping other authors with an symptom of Covid is detected. This leads to many people, infected and overview of the ML techniques. Additionally, they are offering data sets not infected, to gather at the medical facilities, thus increasing the to help with the further investigation. The research done in [1] and [5], coupled with the research in [6], gives authors the means, the possibility for not infected people to get Covid infection. Our knowledge, the data set and the information on how to proceed with the application registers patients and, by using a medical survey, research for covid and ML. The research in [6] evaluates all the data determines if the patient is supposed to get tested for Covid or even and the publishing process of papers regarding Covid and ML and how more severe measure are to be taken. Additionally, our application uses the publication process changes the initial paper submission. medical tests results from patients to determine the success rate of the Further analysis is done in [7] about covid detection and CT images prediction. The case study has shown that the application has 89% using a pre-trained data set that can help classify the new data set before success rate of classification. Using this application, only people with training and testing using deep learning and multi-layered convolution the right symptoms will be advised to get tested, thus lowering the algorithms. This way, the data set can be increased and overcome the overload placed on the medical facilities and minimizing the virus persisting problem of ML with not having enough data to perform the spread. training and testing. The overall analysis of all the research in ML and data set is concluded by the authors in [8], where they give a detailed analysis of the functions and usage of ML and Covid. KEYWORDS There is a lot of research of Machine Learning/Deep Learning analysis, classification, Covid, survey, symptoms, test cases techniques to detect Covid using medical images. Our approach is simpler and uses medical questionnaires and human input to improve the detection of Covid 19 in patients. The architecture of our Covid 1 Introduction Medical App system is described in Section 2, whereas the behavior and case scenarios are described in Section 3. Section 4 concludes the Although world-wide pandemics are not that often, yet Covid paper and gives information about further development. pandemic hit the world fast, with many patients dying and doctors not being able to understand the cause in time. The aftereffect of the pandemic has left many people with health issues with more and more 2 Architecture of the system people becoming hypochondriacs. Technology was and is still used to alleviate the hit from the virus and help prevent the spread of the corona The Covid Medial Application is designed to help patients and the virus and maintain the current lifestyle as much as possible. On the other health system by classifying patients into six categories. These side, technology was used to help fight against the virus and return life categories range from the patient not having Covid (or the least suspect to its original form. of a Covid infection), to an almost certain Covid infection (requires isolation and medical treatment). The users of the applications are A lot of research has been done on the Covid virus and Corona taking a short survey (questionnaire) about their wellbeing and outbreak, including image processing, machine learning and so on. In symptoms, and the result of the questionnaire is the classification of the [1], they give a summary of the different machine learning techniques user into one of categories [9]. The application accesses the survey from to predict and classify covid-19 cases. They are using mathematical an API that is standardized and provided by the InferMedica Medical models and machine learning to predict Covid-19 cases. The authors in Platform, implemented and approved by World Health Organization [2], have further used machine learning and image processing to (WHO). The API contains all sorts of Covid data that can be retrieved determine the cause of pneumonia in covid-19 infected patients. They and many surveys the users can take, our application utilizes only the are using X-rays and CT images to create a software to determine how API for classification of Covid, which is done based on symptoms and to classify patients based on pneumonia and Covid-19 images. patient’s wellbeing. The machine learning approach is also used in most papers but in Besides all the Covid recommendations and information that is [3], authors are trying to investigate the best possible options and weight displayed in the application, the users can take the survey and find out, distribution in the ML techniques to get the best results when working 44 based on their symptoms, in which category they belong. The categories with the diagnosis and recommendations. This information can also be are: easily translated and wrapped. • No risk – the patient is the least likely to have Covid Patients that might have higher risk of Covid infection (placed in • Self-monitoring – the patient should continue to monitor that category by the API) can isolate themselves in time to prevent the symptoms but is not likely to have Covid others to be infected. Furthermore, the entire pandemic made many • Call a doctor – there is an infection, but it is not Covid- patients hypochondriacs and suspect Covid symptoms even for a small related cough. Thus, by using this application, if they get classified in no • Quarantine – the patient is advised to quarantine himself Covid infection categories, uninfected patients can avoid going to the from the environment and perform Covid tests hospitals for unnecessary Covid tests, and reducing the possibility to get infected in the testing areas. • Isolation call – the person should isolate themselves from the environment with high probability for Covid infection On the figure below (Figure 2), we can see a part of the survey • Isolation ambulance – the person has high probability for interface and the questions that the users have to answer to be classified Covid infection and should call for ambulance since the in the categories. symptoms are severe. The architecture and the organization of the application is presented on Figure 1. Figure 2 Questions from the survey (multipart) The series of questions can vary from input fields for body temperature measured or blood pressure, to multiple choice questions and Yes/No questions. The requirements from the questionnaire are simple and easily understandable that every patient can answer even if with severe health issues. The interface is adjusted and simplified as to not impose any incorrect information that could lead to a faulty classification. On Figure 3 we can see a list of results that the patient received, as Figure 1 Organization of the application a result of the survey. From Figure 3, we can see that the information is presented in different color based on the severity of the classification, followed by a short information summary intended for the classification. The patient can take the questionnaire multiple On Figure 1 we can see that our application is a wrapper around times, and each result is marked and presented to the user with the date the API provided from InferMedica, which first and foremost, provides and time of the questionnaire taken and the result. a human readable survey that patients can take and classify their symptoms into a category. The questionnaire helps patients with symptoms of Covid to determine the best possible action to take, in case they are suspecting Covid infection. Users of the application access it via web link, where users can get Covid-related information, access their profile and take the questionnaire. The questionnaire taken from a patient is packed, formatted and sent to the API, the API returns the result, which is displayed back to the patient. As presented on Figure 1, we can see that the application uses two APIs from InferMedica. The first API is diagnosis endpoint, that we Figure 3 Result of the classification use to obtain the questions to form the questionnaire. These questions are predetermined, can easily be translated into any language, and be On the other hand, medical personnel also have access to these adapted if the questionnaire changes from the endpoint. The second classifications, but only to patients that they have been assigned to. API is the triage endpoint that is used to perform the diagnosis and Based on the outcome of the classification, the medical personnel can classification of the patient. Also, the result returns a short info status schedule an appointment for testing or send an ambulance to the that is presented to the patient with information about how to proceed appointed address. The panel of the medical personnel is similar to the 45 one of the patient’s, except it additionally displays the information of Our medical application allows users to take a Covid survey based the patient that took the survey and contact information. on their symptoms and be classified into categories of high to low Covid infection. Alongside with the classification, a short information is presented on how to proceed with their result and how to minimize 3 Evaluation of the system further infection on other patients. The survey, as stated before, is intended to keep patients with low risk of infection to visit Covid testing places in order to avoid getting infected. Also, by advising patients with Each medical classification system cannot guarantee a faultless low possibility of Covid infection to not get tested, reduces crowding classification method, so there is always a chance that the classification the medical facilities and Covid test centers, thus reducing overhead of might not be correct. If there are numerous of medical tests and the medical system. However, patients can still ignore the results from findings, a different doctor might give a different diagnosis and our application and get tested to make sure if they have Covid or not. classification of a patient’s condition. Even more so in our case, where we are using a questionnaire to classify a patient in a six different Covid The case study of the API and our application was conducted with categories, it gives a rough classification as a basic step of the diagnosis. 20 patients who already have been tested with Polymerase Chain The questionnaire, as stated before, is taken from Infermedica, which Reaction (PCR) test for Covid in the past. More than half of the patients was previously issued by the WHO, but it is not something that can be (15 of them) have been tested twice for Covid, thus the total number of used with absolute certainty and fully depended upon. That is why, in test cases is 35. The patients already had the diagnosis for Covid from this section of the paper, we are also making an evaluation of the results their PCR test before the survey was taken on our application. After of the questionnaire. which, we have compared the results from the survey with the results from the PCR tests of the patients. The results from the case study are presented on Figure 4. Figure 4 Results from the case study of our application with 35 tests On Figure 4 we can see the results from our application (shown with prediction and classification. If we consider the binary classification, blue bars) and the results from the PCR tests (shown with orange bars). the success rate of the API is increased to 89%. As we can see from the results, the PCR and the application bars are mostly the same. The deviation in the PCR and the application results 4 Conclusion are mostly in categories one, two and three. The most common error is when the API suggests category one, but the PCR shows category Our application tends to use a simplified system for online diagnosis three. This error is minimal since the first three categories are linked of Covid patients that uses questionnaire designed to give initial with low to no infection. The next frequent error is in the last two diagnosis of the patient. This initial diagnosis is used to give patients categories, when the API suggests category five, but the PCR suggests information as to whether they have Covid or not and to suggest testing category six and vice versa. If we put the results of the questionnaire and medical care, only if necessary, thus reducing the overhead on the in binary form (the patient has Covid or the patient doesn’t have testing places and the medical facilities from patients that are with low Covid), the first three categories will form the result that the patient risk or no infection at all. The case study in section III shows that the doesn’t have Covid, whereas the last three categories will form the questionnaire is accurate enough to give initial diagnosis and sufficient result that the patient has Covid. If the categories are binary, the error enough to determine if the patient has Covid or not with 89% accuracy. between the API and the PCR is close to zero. The minimal diversion is detected in the subcategories presented by the questionnaire. Also, For future work we propose testing the system with patients before the PCR gives information as to whether the patient has or hasn’t got they go to the hospital or testing facilities for Covid. The user can Covid, the subcategorizing is done based on hospitalization of the update the results of the API with the results from the medical/test patient and the recommendations received from their doctor. facilities. This can be done by result category, and the system can If we consider the six categories offered by the API, the overall success present the accuracy of the API result next to the result. Thus, users rate of the API, compared with the PCR tests is at 85% of accurate can get classified into the categories, but also receive accuracy information provided by users of the application that have been classified and afterwards tested. 46 [5] Chadaga, K., Prabhu, S., Vivekananda, B., Battling COVID-19 using machine learning: A review, Cogent Engineering, 8:1, 1958666, DOI: References 10.1080/23311916.2021.1958666 [6] Jemioło, P.; Storman, D.; Orzechowski, P. Artificial Intelligence for [1] Swapnarekha, H, Behera, S., Nayak, J., Naik, B., Role of intelligent COVID-19 Detection in Medical Imaging—Diagnostic Measures and computing in COVID-19 prognosis: A state-of-the-art review, Chaos, Wasting—A Systematic Umbrella Review. J. Clin. Med. 2022, 11, 2054. Solitons & Fractals, Volume 138, 2020, ISSN 0960-0779 https://doi.org/10.3390/ jcm11072054 [2] Bharati, S., Podder, P., Mondal, R. H, Prasath, S., Medical Imaging with [7] Mehboob, F., Rauf, A., Jiang, R. et al. Towards robust diagnosis of Deep Learning for COVID- 19 Diagnosis: A Comprehensive Review, COVID-19 using vision self-attention transformer. Sci Rep 12, 8922 arXiv:2107.09602 (2022). https://doi.org/10.1038/s41598-022-13039-x [3] Mohammed, M., Abdulkareem, K., Al-Waisy, A., Benchmarking [8] Swapnarekha, H., Behera,H., Nayak, J., Naik, B., Role of intelligent Methodology for Selection of Optimal COVID-19 Diagnostic Model computing in CнOVID-19 prognosis: A state-of-the-art review, Chaos, Based on Entropy and TOPSIS Methods, IEEE Access, May 2020, Solitons & Fractals,Volume 138,2020,ISSN 0960- 10.1109/ACCESS.2020.2995597 0779,https://doi.org/10.1016/j.chaos.2020.109947. [4] Subhalakshmi, R.T., Appavu, S, Sasikala, S., Deep learning based fusion [9] Infermedica Medical Platform, Covid-19 survey API, model for COVID-19 diagnosis and classification using computed https://developer.infermedica.com/docs/api tomography images, oncurrent Engineering: Research and Applications 2022, Vol. 30(1) 116–127 47 Piloting ICT Solutions for Integrated Care Mitja Luštrek Samo Drobne Sokratis G Papageorgiou Department of Intelligent Systems Faculty of Civil and Geodetic Neurology Department Jožef Stefan Institute Engineering Aiginition Hospital, National and Jožef Stefan International University of Ljubljana Kapodistrian Univ. of Athens Postgraduate School Ljubljana, Slovenia Athens, Greece Ljubljana, Slovenia samo.drobne@fgg.uni-lj.si sokpapa@med.uoa.gr mitja.lustrek@ijs.si Roberta Matković Bojan Blažica Efthalia Angelopoulou Teaching Institute Computer Systems Department Neurology Department for Public Health Jožef Stefan Institute Aiginition Hospital, National and of Split and Dalmatian County Ljubljana, Slovenia Kapodistrian Univ. of Athens Split, Croatia bojan.blazica@ijs.si Athens, Greece roberta.matkovic@nzjz-split.hr angelthal@med.uoa.gr Miodrag Miljkovic Pietro Hiram Guzzi Marketing Department Municipality of Miglierina Special hospital Merkur Miglierina, Italy Vrnjacka banja, Serbia sindaco@comunemiglierina.it miljkovicdzoni@gmail.com ABSTRACT retirement age, and social and technological innovations that can improve the care for the seniors and their quality of life. The SI4CARE project is aiming to develop a strategy and action The SI4CARE project [2] aims to create a transnational plans to improve health and social care in the Adriatic-Ionian ecosystem for social innovation in integrated care with a focus region. It started with surveying the state of affairs in the region, on ICT technology. It started with surveying the status quo of identifying needs and challenges, as well as best practices that health and social care in the Adriatic-Ionian region, identifying can answer them. Based on these, wishes for improvement were needs and challenges, as well as best practices that can answer formulated. The paper describes the methodology of this process them. It then formulated wishes and actions for improved health and the key findings. Some of the best practices are being piloted and social care, which will eventually result in a transnational to support the development and monitoring of the policy actions. strategy and national/regional action plans. In the paper, we describe nine pilots that involve pervasive health To gain a deeper insight into the benefits of the identified best technology and otherwise strongly leverage ICT to benefit senior practices and ways of implementing them, the project started 13 users. Most employ wearables and other sensing devices to pilots in seven countries. We describe the nine that involve monitor the users and provide health and care services, or provide pervasive health technology and otherwise strongly leverage ICT telehealth and care through web and mobile technology. to benefit senior users. Most employ wearables and other devices to monitor the users and provide health and care services, or KEYWORDS provide telehealth and care through web and mobile technology. Social innovation, integrated care, telehealth, telecare, transnational strategy, action plan 2 SI4CARE PROJECT: FROM STATUS QUO TO ACTION 1 INTRODUCTION The SI4CARE project used a systematic and evidence-based The population of Europe and the rest of the developed world is approach for devising a strategy and actions to improve rapidly aging. In the last 20 years, the old-age dependency ratio integrated care via social and technological innovation, with the of working-age population vs. seniors in Europe decreased from aim of presenting solid arguments to decision makers. 4 : 1 to 3 : 1, and it is projected to further decrease to 1.75 : 1 by 2050 [1]. This will result in a range of problems, including a lack 2.1 Status Quo of Health and Social Care of people who can support the seniors once they can no longer The first step was to survey the status quo (the state of affairs) in live independently. These problems will have to be tackled from health and social care in the Adriatic-Ionian region, comprising multiple angles: with demographic policies, increases in Slovenia, Croatia, Bosnia and Herzegovina, Serbia, Montenegro, Permission to make digital or hard copies of part or all of this work for personal or Greece and Italy. Four key activities were done: classroom use is granted without fee provided that copies are not made or distributed  We surveyed the literature, such as statistical reports, for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must national and regional policy documents and legislation. be honored. For all other uses, contact the owner/author(s).  We conducted semi-structured interviews with high-level Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia stakeholders such as highly placed employees at relevant © 2022 Copyright held by the owner/author(s). 48 ministries, non-governmental organizations and educational include providing information and training to seniors about institutions. The interviews included 26 questions on the health(care), particularly dementia, and digital technology; healthcare system, financial and physical accessibility of promotion of social inclusion; and organizing provision of healthcare services, future challenges and other topics. 31 (health)care (e.g., via mobile medical units). stakeholders were interviewed in total. Qualitative analysis of the answers was performed, focusing on the main points 2.3 Wishes to Improve Care and Quality of Life raised among the participants. Based on the analysis of status quo (Section 2.1) and inspired by  We administered a questionnaire to various people employed the best practices (Section 2.2), the SI4CARE project formulated in health and social care services. The questionnaire included a number of wishes that – if fulfilled – would leverage social and 29 items on the use of healthcare services by seniors, their technological innovation to improve the care and quality of life accessibility, the ability to obtain information on healthcare, of seniors. These were developed for each of the involved and the status of seniors in the society and their social care. countries, and validated in a focus group involving stakeholders. A subset of these questions was asked specifically about Since the analysis of the status quo found a strong need for people with memory impairment or dementia. We received the introduction of new technologies, and many technology- responses from 222 health and social care staff. based best practices were identified, it is not surprising that  We administered the same questionnaire to seniors. We various initiatives aimed at increasing the use of telehealth and received responses from 619 people. telecare comprise the largest group of wishes. They had different Our finding was that in general, the provision of healthcare focus: rehabilitation (where the current availability is particularly services is moderately good, with a lack of human resources cited poor), cost-effectiveness (which is a prerequisite for institutional as a key problem. Rehabilitation was noted to be less available funding), non-pharmacological interventions (that tend to be than other services, and people with dementia face more neglected), applications that do not require institutional support problems than the general elderly population. A significant (which are typically inexpensive and non-pharmacological) … problem is that seniors are poorly informed about healthcare. Activities to improve digital skills of seniors were also wished Even though healthcare is mostly covered by insurance, many for, as well as better digital infrastructure. seniors face significant financial problems, mainly due to low Unlike best practices, most wishes were not technological. pensions. In part, this appears to be because, despite the This is perhaps because wishes are about goals, whereas insurance, they sometimes still need to resort to private services. technology in health and social care is often a means of achieving Waiting times are a common issue, which may explain the use of these goals. The non-technological wishes include increases in private services. Physical accessibility is also a major issue – the human resources (which were found to be a key reason for the seniors have significant difficulties using public transport. inadequacies of healthcare provision), improved overview of the Secondary healthcare for people living in rural areas was also state of care and solutions for improvement (essentially activities found to be difficult to access. similar to SI4CARE’s but put on a more sustainable basis), Seniors have a low digital literacy and find anything improvements in home care, training and better policies. involving the internet (e.g., booking an appointment) a major problem. High-level stakeholders feel that new technologies 2.4 Transnational Strategy and Action Plans have not been successfully integrated in the healthcare system, The preparation of the transnational strategy and and this is even more true for the questionnaire respondents. national/regional action plans – one for each country involved – Most stakeholders believe such technologies are important, is still in progress. The strategy is organized in five pillars: though, validating the objectives of the SI4CARE project.  Digital transitions are concerned with pervasive health technologies and other ICT-based innovations exemplified 2.2 Best Practices by the pilots presented in this paper. The SI4CARE project identified and documented 115 best  Digitalization process will support digital transitions by practices in social and technological innovation to improve the providing the required infrastructure and knowledge. care and quality of life of seniors, selected based on their  Economic and financial implications deal with appropriate effectiveness as demonstrated by experience. funding for healthcare and other aspects of long-term care. Since SI4CARE emphasizes the use of ICT technologies in  Governance and policies address sustainable and integrated care, most of the identified best practices are geographically appropriately distributed provision of care, technology-based. The largest group involve pervasive health ensuring its quality and properly trained staff. technology, such as wearables to monitor users, either to help  The SI4CARE community will ensure the sustainability of them manage their health or to provide functions such as fall the project via organizations that will exist after the end of detection. Some also use sensing integrated in fitness devices or the funding period. 3D cameras to support rehabilitation. There are also web and The national/regional action plans aim at implementing this mobile platforms that support various activities interesting to strategy in individual countries. Their main components are seniors (e.g., gardening, cognitive training), facilitate specific actions, which essentially fulfill the wishes discussed in communication and social inclusion. A few best practices are Section 2.3. These wishes are being validated by stakeholders in intended for hospitals and other care organizations (e.g., for events organized in each country, one of which is also taking management of health records). place at the Information Society 2022 multiconference. Some of the best practices – less relevant to this paper but Afterwards, the action plans will be presented to high-level otherwise just as important – are non-technological. Examples decision makers. 49 3 PILOTS OF ICT SOLUTIONS FOR south to north. In 2018, healthcare mobility in Calabria amounted INTEGRATED CARE to approx. € 310 million. This is particularly relevant for small towns and villages where people suffer from a lack of general 3.1 Mobile Application for Self-management of medicine and efficient public transportation to regional hubs. Heart Failure Due to some recent programs, many rural areas in Calabria have good internet connections. In this pilot, with the help of Heart failure is a common and debilitating disease among UCCP del Reventino (a team of physicians), we are evaluating seniors, and a leading cause of hospitalizations. It requires the use of tele-assistance and remote monitoring of chronic complex management difficult for many seniors. Healthcare patients (elderly people and people affected by dementia). institutions provide only periodic checkups and cardiac The developed services are particularly useful for patients rehabilitation, the latter not to all who would benefit. Resources who require a re-evaluation of an already known clinical picture, to provide more support are hard to come by, so a mobile people suffering from rare diseases, and frail people who require application to assist self-management is an attractive solution. constant contact with health facilities. Teleadvice also proved of The HeartMan application [3] provides a personalized great utility in the context of COVID-19. exercise program and nutrition advice, support for measurement of vital parameters, medication reminders, mindfulness exercises 3.4 Specialized Outpatient Clinic for Memory, intended to improve the patients’ mental health and wellbeing, Dementia and Parkinson's Disease and cognitive behavioral techniques to improve the adherence to the application’s advice. The first step of the pilot was to make Approximately 20% of the population above the age of 65 are the application easier to deploy and to remove physiological affected by mild cognitive impairment or dementia. As the status monitoring as input for its decisions, as this is a barrier from the quo analysis indicated, these people have limited access to usability and regulatory perspective. The user experience was specialized healthcare. This is more pronounced in remote areas. also improved. The ongoing second step is a feasibility study Greece has many small and isolated islands with a high with 20 patients using the application and 10 controls. percentage of elderly inhabitants and understaffed health centers. The lesson learned so far is that designing an application for The Aeginition Hospital of the National and Kapodistrian heart-failure patients is difficult due to the complex topic and University of Athens developed an outpatient clinic pilot through poor digital and health literacy of this group. Our solution was to the National Telemedicine Network, in collaboration with the guide less advanced users by simple automatic prompts, and not 2nd Regional Healthcare Administration of Piraeus and the require them to do much on their own initiative. Aegean Islands. Through this clinic, patients with cognitive or movement disorders living in remote Aegean islands are 3.2 ICT Solution for Monitoring the Health of examined by a specialized healthcare team (neurologist, Patients after Returning Home psychiatrist and neuropsychologist) through video-conferencing. Based on the questionnaires from 58 telemedicine visits, all Special Hospital Merkur is a secondary health institution in stakeholders are highly satisfied with this telemedicine service, Serbia specializing in diabetes. Upon discharge, patients often mentioning improved care, better health, and convenience, return to bad habits, and diabetes complications occur. In reduced transportation and cost. The low number of cases addition, they face problems when they need to see a doctor. compared to the available capacity points to the need to better The main aim of the pilot was to investigate the integration of disseminate the information about the availability of modern communication technology in diabetes treatment to telemedicine in the area by involving local health professionals facilitate better coordination between stakeholders. The patients and other telemedicine services in Greece. were trained to use the SmartCare mobile application, and to input the necessary data (insulin, sugar, mass, blood pressure, 3.5 Tele-exercise for the Elderly and Patients temperature, etc.). Merkur's medical team had insight into the with Cognitive Disorders/Dementia patient's condition and intervened as needed. In addition, patients were trained to contact doctors for consultations from home. Physical activity is a well-established non-pharmaceutical The combined effect of the involvement of patients in their intervention for health improvement in the elderly. It improves health condition, and the remote intervention of doctors, proved mobility, fitness, and cognitive function, prevents falls, improves to reduce the risk of diabetes complications. The pilot functionality and quality of life as well as increases socialization. demonstrated the feasibility of remote treatment in Serbia, which The Aeginition Hospital of the National and Kapodistrian can also lead to significant financial savings. It should be University of Athens in collaboration with the Medical School of repeated on a larger sample on a national level to provide a basis Athens developed a tele-exercise pilot to provide specialized for the introduction of telemedicine in the health system. online physical activity programs for the elderly. Small groups of about 10 individuals receive aerobic and resistance training 3.3 ICT to Enable Accessibility to Health with a frequency of 2–5 times/week and duration of 40 min per Systems by the Elderly intervention, guided in real-time via video-conference by specialized healthcare professionals. The elderly involved were In the Italian healthcare system, regional governments are trained to use the tablets though which they are participating. responsible for ensuring the delivery of a health benefits package All participants report high satisfaction rates and improved through a network of health management organizations. There is functionality in everyday life. Key lessons learned are that tele- a remarkable difference among regions, with northern regions exercise is feasible and effective non-pharmacological treatment providing better services, resulting in migration of patients from that enhances social interaction, and that effective collaboration 50 between healthcare providers is necessary. The elderly face assessment service interprets movement and activity data from difficulties in the use of new technologies and training is needed. devices in the user’s home. The aim is to automatically detect abnormal behaviors that may indicate an emerging disease. 3.6 Individualized Training Based on The lesson learned so far is that there is a need for a more Biomechanical Measurements systematic coordination of the call center with public health care The importance of physical activity was already discussed in the units, doctors, social care workers and emergency units. previous pilot description. The status quo analysis in Slovenia showed that the availability of physical exercising and 3.9 Accessibility to Integrated Long-term Care rehabilitation services is not adequate. Resorting to the private In the pilot project we analyzed both spatial accessibility and sector may result in lower quality of services as they might be accessibility of information. Slovenia is rural country. Older provided by people without the necessary knowledge and skills. people in rural Slovenia face poor access to public services and We prepared a pilot in which training was based on initial especially to health facilities. In terms of spatial accessibility, we screening of the participants by an orthopedist and experienced identified the locations of buildings where seniors live alone. In coaches, followed by biomechanical measurements of lower 2021, there were 42,344 seniors living alone in houses (27,136 extremities. Isometric measurement of peak torque and aged 65–79 and 15,208 aged 80 and older) in Slovenia. tensiomyography were used along with a body composition There are a number of elderly care services advertised online, measurement. 24 participants performed training 2 times per but the offer is scattered and searching for such information is week for 3 months under two conditions: half of the participants time-consuming. To avoid these obstacles, we set up a web exercised in a gym, while the other half online. In the in-person platform where different providers (formal and informal) are scenario, participants were divided in small groups. The focus presented in one place. We included all formal providers in was proper posture and exercise execution. Only after absorbing Slovenia in the database. We enabled self-registration of service proper technique, the training increased intensity. providers and spatial representation of providers via the web. Both conditions were warmly accepted by participants, with We highlighted areas with poor accessibility to health and the in-person one slightly preferred. Working in small groups not social care services, and will present them to local decision-only enabled individual training, but also group cohesion, makers and caregivers to improve integrated long-term care and resulting in socialization after exercising in the nearby café. transport for them. We will also present them our web app. 3.7 Nursing by Monitoring 4 CONCLUSION The pilot carried out in Split, Croatia, was motivated by the well- established issue of inadequate resources to provide quality care The paper presented the SI4Care project and its methodology to to seniors who cannot live independently. bring social innovation to integrated care. The focus was on the The pilot used monitoring technology that requires minimal presentation of the pilots that address the identified needs and interactions with senior users, since they are not familiar with wishes in the region. The fact that most, nine out of thirteen, of digital technology. 10 medically non-certified wristbands, the piloting activities within the SI4Care project involve some equipped with LoRaWAN radio, ensure data delivery to large sort of pervasive health technology testifies to the importance of distances without using mobile phones as a gateway. The such technologies also for integrated care. Preliminary results wristbands enable 10-minute acquisition of heart rate, GPS from most pilots show benefits for stakeholders and good location, steps, calories, and wrist temperature, as well as having acceptance. However, digital literacy is a significant barrier, and alarms for low heart rate and falls, and a help button. The data is in some cases also infrastructure, organizational readiness and received by a system called IoT Wallet, which allows future legislation. Pervasive technology clearly cannot be introduced in expansion since it supports adding add more wristbands. isolation, which is why our strategy consists of five pillars, only LoRaWAN technology turned out to provide broad coverage one of which is concerned with pervasive technology. with a relatively low power consumption. ACKNOWLEDGMENTS 3.8 Access to Public Social Services by This paper has been produced with the financial assistance of the Telemedical Monitoring (Click for Life) European Union. The content of the paper is the sole Seniors represent a high percentage of the population of Region responsibility of project partners and can under no circumstances of Central Macedonia (RCM) in Greece (22% are over 65), with be regarded as reflecting the position of the European Union a significant proportion of them living alone (approx. 100,000). and/or ADRION programme authorities. The SI4CARE project They face difficulties in access to public social services, is supported by the Interreg ADRION Programme funded under especially in high-density urban places and remote rural areas. the European Regional Development Fund and IPA II fund. The RCM regional authority launched the pilot project 'Click for Life', offering telemedicine/homecare assistance to seniors REFERENCES with a low income living alone. Approx. 3000 users participate [1] Eurostat, 2021. Eurostat Regional Yearbook (2021 edition). DOI: 10.2785/894358 so far. They are provided: (1) 24-hour monitoring via devices [2] SI4CARE – Social Innovation for integrated health CARE of ageing with fall detection and a panic button. The panic button enables population in ADRION Regions. https://si4care.adrioninterreg.eu/ communication with a call center 24 hours/day. (2) Medical [3] M. Luštrek, M. Bohanec, C. Cavero Barca, M. C. Ciancarelli, E. Clays et al., 2021. A personal health system for self-management of congestive history is accessible to relatives and health professionals, and the heart failure (HeartMan): Development, technical evaluation, and proof-users can receive notifications from the relatives. (3) Behavioral of-concept randomized controlled trial. JMIR Med. Inform. 9, 3, e24501. DOI: 10.2196/24501 51 Network Anomaly Detection using Federated Learning for the Internet of Things Ana Cholakoska Bojan Jakimovski Bjarne Pfitzner Ss. Cyril and Methodius University Ss. Cyril and Methodius University Hasso Plattner Institute in Skopje in Skopje Digital Health — Connected Faculty of Electrical Engineering Faculty of Electrical Engineering Healthcare and Information Technologies and Information Technologies Potsdam, Germany Skopje, North Macedonia Skopje, North Macedonia bjarne.pfitzner@hpi.de acholak@feit.ukim.edu.mk kti1562018@feit.ukim.edu.mk Hristijan Gjoreski Bert Arnrich Marija Kalendar Ss. Cyril and Methodius University Hasso Plattner Institute Ss. Cyril and Methodius University in Skopje Digital Health — Connected in Skopje Faculty of Electrical Engineering Healthcare Faculty of Electrical Engineering and Information Technologies Potsdam, Germany and Information Technologies Skopje, North Macedonia bert.arnrich@hpi.de Skopje, North Macedonia hristijang@feit.ukim.edu.mk marijaka@feit.ukim.edu.mk Danijela Efnusheva Ss. Cyril and Methodius University in Skopje Faculty of Electrical Engineering and Information Technologies Skopje, North Macedonia danijela@feit.ukim.edu.mk ABSTRACT 1 INTRODUCTION The widespread use of IoT devices has contributed greatly to In the last decade, a significant increase in the usage of Inter- the continuous digitisation and modernisation of areas such as net of Things (IoT) devices has been observed. The ability to healthcare, facility management, transportation, and household. connect various kinds of devices from different manufacturers These devices allow for real-time mobile sensing, use input and to a network wirelessly and share data has proven beneficial then simplify and automate everyday tasks. However, like all to nearly every domain where this technology is involved, in- other devices connected to a network, IoT devices are also subject cluding household, industry, infrastructure, transportation, and to anomalous behaviour primarily due to security vulnerabilities healthcare[3]. Additionally, the actions that end users can take or malfunction. Apart from this, they have limited resources are increasing everyday and vary from changing ambient param- and can hardly cope with such anomalies and attacks. Therefore, eters of a home or car setting easily and on-the-go to remotely early detection of anomalies is of great importance for the proper and securely controlling a manufacturing process inside a smart functioning of the network and the protection of users’ personal factory setting. Implementing these devices into an ambient as- data above all. In this paper, deep learning and federated learning sisted living (AAL) setting has proven to be beneficial both for the algorithms are applied in order to detect anomalies in IoT network patients and for the medical staff, as it can improve monitoring traffic. The results obtained show that all the models achieve and medical assistance (if needed), as well as medication dose high accuracy, with the FL models providing slight worse results adjustment[7]. compared to the DL models. However, with the increase in the However, the diversity of IoT devices, accompanied by wire- amount of user data, the model based on federated learning is less networking and a slow standardisation process, have led to expected to have better results over time. many issues regarding the privacy and security of data and also the processes based on that data. The occurrence of various cyber KEYWORDS attacks on networks composed of IoT devices, but also on indi- federated learning; deep learning; malware; internet of things; vidual IoT devices performing specific tasks, is becoming more anomaly detection common [8]. By disabling, reconfiguring or reprogramming such devices, attackers can manipulate the network, obtain private data illegally and maybe even induce a life-threatening situation, especially in the e-health domain. Therefore, it is significantly Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or important to detect potential attacks and anomalies that occur distributed for profit or commercial advantage and that copies bear this notice and in an IoT setting. the full citation on the first page. Copyrights for third-party components of this This paper examines the detection of anomalies in IoT network work must be honored. For all other uses, contact the owner/author(s). traffic by using deep learning and federated learning algorithms. Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia © 2022 Copyright held by the owner/author(s). The remainder of this paper is structured as follows. Section 52 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Cholakoska et al. 2 gives an overview of the approaches tackling IoT network for tackling anomaly detection in smart buildings. The results anomaly detection using deep and federated learning algorithms. showed twice as fast convergence during training, compared to Section 3 describes the used dataset and gives an insight into the centralised LSTM. the importance of the features. The experiments done in this research and the discussion of the results obtained are presented 3 DATASET AND EXPLORATORY DATA in Section 4, while Section 5 gives a brief summary and provides ANALYSIS further research directions. For the purpose of this research we used the publicly available dataset N-BaIoT [11]. It is a dataset created by a group of re-2 RELATED WORK searchers from the University of California, Irvine, School of Information and Computer Sciences in the USA. The dataset ad- One of the most popular approaches when tackling network dresses the lack of public botnet datasets, especially for the IoT anomaly detection is the usage of network intrusion detection domain. It is composed of real-time network traffic data gathered systems (NIDS). By examining network data flow patterns (signa- from nine commercial IoT devices, including a baby monitor, secu- tures), the NIDS can track inconsistencies (also called anomalies) rity cameras, a webcam, doorbells, and a thermostat, which have and resolve them in a timely manner. However, directly analysing been infected by the most common families of botnet attacks: the behaviour of the IoT devices has proven to be more beneficial Mirai and Bashlite [1]. in detecting newer and unknown types of attacks, in spite of the overall lower detection accuracy and higher computational cost [6]. Usingmachinelearning(ML)techniqueshashadabigimpact on the development of NIDS and malware anomaly detection systems in general. Lin et al. [9] propose a combination of Support Vector Machines (SVMs) and Artificial Fish Swarm algorithms for IoT botnet detection. A combination [5] using different ML algorithms, also including an SVM has been done to evaluate the accuracy in detecting Mirai DDoS attacks. The authors in [16] used Convolutional Neural Networks (CNN) with binary visu- alisation to provide fast zero-day malware detection. However, some of the datasets used in these research papers provide only network traffic flow from conventional networks and have little to do with the attacks which target IoT networks. A further issue is that using traditional ML techniques increases the security risk, as data has to be moved away from the network and the Figure 1: N-BaIoT dataset distribution by class data source to a powerful system performing the ML training. Federated learning (FL) has emerged as a new decentralised The N-BaIoT dataset consists of 7,062,606 entries with 115 way of training models on privately held datasets that can or different features, which are further divided into 10 attack cat- should not be shared for security and privacy reasons. The train- egories: gafgyt_combo, gafgyt_junk, gafgyt_scan, gafgyt_tcp, ing process consists of a central server and several clients, where gafgyt_udp, mirai_ack, mirai_scan, mirai_syn, mirai_udp, mi-the former facilitates the training and the latter possess the pri- rai_udpplain and one benign category, which contains the nor- vate data. In each round of federated training, the server randomly mal traffic flow of the observed devices. As it can be seen from selects a subset of clients who receive the current model param- Figure 1, which shows the distribution of the dataset used in eters. Then, local training is performed by each of the clients, the upcoming experiments, only a portion (509,149 entries) is keeping the local data on-site. The updated model parameters considered for the model training in both DL and FL experiments. are then sent back to the server, where the global server model For the DL experiments, the dataset is further divided into a is updated. Opposed to centralised ML or classical decentralised train and test partition including 80% and 20% of the data, while techniques, FL can work with both independent and identically maintaining the distribution intact. As for the FL experiments, distributed (IID) and non-IID datasets. [10] the data is divided into 50 IID datasets which include a train and Several approaches have been using this decentralised tech- test subsets. They represent the 50 clients which will take part in nique in order to detect anomalies in IoT networks. The DIoT the FL process. approach [2] uses federated learning to aggregate profiles of IoT network behaviour. It was evaluated in real-world conditions and Table 1: Most important dataset features reported no false alarms. Saharkhizan et al. [14] used a recurrent neural network with ensemble learning to detect cyberattacks Number Feature on IoT devices. The evaluation of the model was performed on a Modbus dataset of network traffic. Some of the approaches even 1 H L0.01_mean used a combination of FL and a distributed ledger (blockchain) 2 Ml_dir_L0.01_mean [12, 17] in order to detect anomalies in networks. In [13], the fed-3 Ml_dir_L0.01_variance erated deep learning model created for zero-day botnet attacks on 4 H_L0.01_variance IoT devices outperformed traditional decentralised approaches, 5 H_L0.1_mean as well as both localised deep learning (DL) and distributed DL methods. In [15], a novel privacy-by-design FL model using a After preprocessing the data, an exploratory analysis was stacked long short-time memory (LSTM) model is introduced done in order to obtain the features which have the greatest 53 Network Anomaly Detection using Federated Learning for the Internet of Things Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Figure 2: DL model using the five layer NN - accuracy Figure 4: FL model using the five layer NN - accuracy Figure 5: FL model using the three layer NN - accuracy Figure 3: DL model using the three layer NN - accuracy server. In the FL experiments 35 rounds were performed, which corresponds to approximately 35 epochs in the DL experiments. influence. The mutual dependence between the features and As previously mentioned, two DL models, the first one using the class was determined with the help of Mutual Information a NN with multiple layers and the second one using a simple NN Gain. From Table 1, it can be noticed that the five features with were trained and tested. From Figures 2 and 3 we can notice that the greatest importance are H L0.01_mean, Ml_dir_L0.01_mean, the accuracy between the two models is very similar - the first Ml_dir_L0.01_variance, H_L0.01_variance and H_L0.1_mean. model obtained an accuracy of 90.75% on the test data, while the second model obtained an accuracy of 90.18%. Furthermore, if 4 EXPERIMENTS AND DISCUSSION the confusion matrices of both DL models are analysed, it can be This paper compares two DL and two FL models for network noted that both models make the same mistake - predicting class anomaly detection, which are able to distinguish anomalous be- 4 (gafgyt_scan) as class 5 (gafgyt_tcp). haviour or a deviation from the normal traffic flow of IoT devices. When it comes to the results obtained from the FL process after After performing the training, all models were evaluated in order 35 rounds it can be seen that the first model obtained an accuracy to see their accuracy in detecting anomalies. In the first exper- of 88% (Figure 4). As for the second simplified model, the accuracy iment, a feed-forward neural network with 5 layers, an input is 86% (Figure 5). This means that even though a simpler NN was layer, 3 hidden layers and an output layer was used. In the sec- used, the second model actually performed similarly in terms of ond experiment, a simple feed-forward neural network with one FL. We can also observe the minor differences in accuracy ( 1- hidden layer was used. In both cases, the output layer has 11 5%) between the DL and FL models, which means that although neurons, which represent all the classes in the dataset. the DL models performed slightly better, the FL models can also Both models have the same hyperparameters. We used the accurately predict anomalies. Adam optimiser with a learning rate of 0.001, which works well From Figures 6 and 7 we can analyse the SHAP (SHapley Ad- for many use cases and models. Since the model performs a multi- ditive exPlanations) force plot, which shows the contribution class prediction task, we minimised the categorical cross entropy of each feature in making a prediction. We can see that the fea- loss during training. The DL experiments were performed us- tures 69, 25, 75, 87, 56 and 101 (HH_jit_L3_mean, H_L0.1_mean, ing the TensorFlow framework and the FL experiments were HH_jit_L0.1_mean, HpHp_L3_weight, HH_L0._covariance and performed using the Flower [4] framework and TensorFlow Fed-HpHp_L0.1_weight) have the greatest influence in making the erated, applying the FedAvg aggregation strategy [10] on the prediction. The features 69, 25 and 75 have a positive impact on 54 Information Society 2022, 10–14 October 2022, Ljubljana, Slovenia Cholakoska et al. Figure 6: SHAP force plot for DL model using the five layer NN. Figure 7: SHAP force plot for DL model using the three layer NN. decision-making, i.e. prediction, while the features 87, 56 and 101 [6] Satish Kumar, Sunanda Gupta, and Sakshi Arora. 2021. Research Trends in affect negatively on the performance. When we compare Figures Network-Based Intrusion Detection Systems: A Review. IEEE Access 9 (2021), 6 & 7 and Table 1, we can see that the most important features 157761–157779. https://doi.org/10.1109/ACCESS.2021.3129775 [7] Isabel Laranjo, Joaquim Macedo, and Alexandre Santos. 2012. Internet of are different. This is because the SHAP method deals with the Things for Medication Control: Service Implementation and Testing. Elsevier model and its output, while Mutual Information Gain deals with Procedia Technology 5 (10 2012), 777–786. https://doi.org/10.1016/j.protcy. 2012.09.086 the preprocessed data. [8] In Lee. 2020. Internet of Things (IoT) Cybersecurity: Literature Review and IoT Cyber Risk Management. Future Internet 12 (09 2020), 157. https://doi. org/10.3390/fi12090157 5 CONCLUSION AND FUTURE WORK [9] Kuan-Cheng Lin, Sih-Yang Chen, and Jason Hung. 2014. Botnet Detection This paper compares two models of DL and FL for accurate anom-Using Support Vector Machines with Artificial Fish Swarm Algorithm. Journal of Applied Mathematics 2014 (04 2014), 1–9. https://doi.org/10.1155/2014/ aly detection purposes in IoT networks. The FL model distributes 986428 the learning process to several clients, thus preserving data pri- [10] H. Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. 2017. Learning Differentially Private Language Models Without Losing Accuracy. vacy and security. Both models achieve high accuracy, with the CoRR abs/1710.06963 (2017). arXiv:1710.06963 http://arxiv.org/abs/1710.06963 FL models providing similar results to the DL models. [11] Yair Meidan, Michael Bohadana, Yael Mathov, Yisroel Mirsky, Asaf Shabtai, Future work will include implementing some security mech-Dominik Breitenbacher, and Yuval Elovici. 2018. N-BaIoT—Network-Based Detection of IoT Botnet Attacks Using Deep Autoencoders. IEEE Pervasive anisms to the FL models and evaluating the trade-off between Computing 17, 3 (2018), 12–22. https://doi.org/10.1109/MPRV.2018.03367731 privacy and accuracy. Also, these models can be further tested [12] Yisroel Mirsky, Tomer Golomb, and Yuval Elovici. 2020. Lightweight collabo-and improved by being provided with new substantial datasets rative anomaly detection for the IoT using blockchain. J. Parallel and Distrib. Comput. 145 (06 2020). https://doi.org/10.1016/j.jpdc.2020.06.008 which may combine similar categories of attacks and/or include [13] Segun I. Popoola, Ruth Ande, Bamidele Adebisi, Guan Gui, Mohammad Ham-novel attacks on IoT networks. New federated learning algo- moudeh, and Olamide Jogunola. 2022. Federated Deep Learning for Zero-Day Botnet Attack Detection in IoT-Edge Devices. IEEE Internet of Things Journal rithms can also be tested and evaluated on the same and new 9, 5 (2022), 3930–3944. https://doi.org/10.1109/JIOT.2021.3100755 datasets, which can lead to a novel federated learning algorithm [14] Mahdis Saharkhizan, Amin Azmoodeh, Ali Dehghantanha, Kim-Kwang Ray-for anomaly detection purposes. mond Choo, and Reza M. Parizi. 2020. An Ensemble of Deep Recurrent Neural Networks for Detecting IoT Cyber Attacks Using Network Traffic. IEEE Internet of Things Journal 7, 9 (2020), 8852–8859. https://doi.org/10.1109/JIOT. 2020.2996425 ACKNOWLEDGMENTS [15] Raed Abdel Sater and A. Ben Hamza. 2021. A Federated Learning Approach The authors would like to thank Daniel Denkovski and Valentin to Anomaly Detection in Smart Buildings. ACM Trans. Internet Things 2, 4, Article 28 (aug 2021), 23 pages. https://doi.org/10.1145/3467981 Rakovic for the useful discussions on the research topic. This [16] Robert Shire, Stavros Shiaeles, Keltoum Bendiab, Bogdan Ghita, and Nicholas work has been supported by the WideHealth project - European Kolokotronis. 2019. Malware Squid: A Novel IoT Malware Traffic Analysis Union’s Horizon 2020 research and innovation programme under Framework Using Convolutional Neural Network and Binary Visualisation. In Internet of Things, Smart Spaces, and Next Generation Networks and Systems, grant agreement No. 952279. Olga Galinina, Sergey Andreev, Sergey Balandin, and Yevgeni Koucheryavy (Eds.). Springer International Publishing, Cham, 65–76. [17] Devrim Unal, Mohammad Hammoudeh, Muhammad Asif Khan, Abdelrahman REFERENCES Abuarqoub, Gregory Epiphaniou, and Ridha Hamila. 2021. Integration of [1] Abdelmuttlib Ibrahim Abdalla Ahmed. 2020. Systematic Literature Review Federated Machine Learning and Blockchain for the Provision of Secure Big on IoT-Based Botnet Attack. Data Analytics for Internet of Things. Comput. Secur. 109, C (oct 2021), 14. IEEE Access 8 (12 2020). https://doi.org/10.1109/ ACCESS.2020.3039985 https://doi.org/10.1016/j.cose.2021.102393 [2] Ulrich Matchi Aïvodji, Sébastien Gambs, and Alexandre Martin. 2019. IOTFLA : A Secured and Privacy-Preserving Smart Home Architecture Implementing Federated Learning. In 2019 IEEE Security and Privacy Workshops (SPW). 175– 180. https://doi.org/10.1109/SPW.2019.00041 [3] Saurabh Bagchi, Tarek F. Abdelzaher, Ramesh Govindan, Prashant Shenoy, Akanksha Atrey, Pradipta Ghosh, and Ran Xu. 2020. New Frontiers in IoT: Networking, Systems, Reliability, and Security Challenges. IEEE Internet of Things Journal 7, 12 (2020), 11330–11346. https://doi.org/10.1109/JIOT.2020. 3007690 [4] Daniel J. Beutel, Taner Topal, Akhil Mathur, Xinchi Qiu, Javier Fernandez-Marques, Yan Gao, Lorenzo Sani, Kwing Hei Li, Titouan Parcollet, Pedro Porto Buarque de Gusmão, and Nicholas D. Lane. 2020. Flower: A Friendly Federated Learning Research Framework. https://doi.org/10.48550/ARXIV. 2007.14390 [5] Rohan Doshi, Noah Apthorpe, and Nick Feamster. 2018. Machine Learning DDoS Detection for Consumer Internet of Things Devices. In 2018 IEEE Security and Privacy Workshops (SPW). 29–35. https://doi.org/10.1109/SPW.2018.00013 55 56 Indeks avtorjev / Author index Altunoğlu Eren ............................................................................................................................................................................. 35 Anders Christoph.......................................................................................................................................................................... 27 Angelopoulou Efthalia ................................................................................................................................................................. 48 Arnrich Bert ..................................................................................................................................................................... 23, 27, 52 Blažica Bojan ............................................................................................................................................................................... 48 Bolliger Larissa ............................................................................................................................................................................ 31 Bombač Tavčar Lea ..................................................................................................................................................................... 40 Bouça Raquel ............................................................................................................................................................................... 19 Branco Diogo ............................................................................................................................................................................... 19 Cholakoska Ana ........................................................................................................................................................................... 52 Clays Els....................................................................................................................................................................................... 31 Denkovski Daniel ......................................................................................................................................................................... 11 Drobne Samo ................................................................................................................................................................................ 48 Durmaz İncel Özlem .................................................................................................................................................................... 35 Efnusheva Danijela ...................................................................................................................................................................... 52 Ekmekci Ekrem Yusuf ................................................................................................................................................................. 35 Ferreira Joaquim ........................................................................................................................................................................... 19 Gjoreski Hristijan ............................................................................................................................................................. 11, 15, 52 Gjoreski Martin ............................................................................................................................................................................ 15 Gornik Lea ................................................................................................................................................................................... 40 Gradišek Anton ............................................................................................................................................................................ 40 Guerreiro Tiago ............................................................................................................................................................................ 19 Guzzi Pietro Hiram....................................................................................................................................................................... 48 Hrastič Aleksander ......................................................................................................................................................................... 7 Hrobat Hana ................................................................................................................................................................................. 40 Jakimovski Bojan ......................................................................................................................................................................... 52 Jakimovski Goran ......................................................................................................................................................................... 44 Kalendar Marija............................................................................................................................................................................ 52 Kiprijanovska Ivana ..................................................................................................................................................................... 15 Kirsten Kristina ............................................................................................................................................................................ 23 Kizhevska Emilija ........................................................................................................................................................................ 11 Kocuvan Primož ............................................................................................................................................................................. 7 Kranjec Matej ................................................................................................................................................................................. 7 Lobo Vítor .................................................................................................................................................................................... 19 Lučovnik Miha ............................................................................................................................................................................. 40 Lukan Junoš ................................................................................................................................................................................. 31 Luštrek Mitja .......................................................................................................................................................................... 31, 48 Matkovic Roberta ......................................................................................................................................................................... 48 Miljkovic Miodrag ....................................................................................................................................................................... 48 Moontaha Sidratul ........................................................................................................................................................................ 27 Nduka Charles .............................................................................................................................................................................. 15 Nikoloski Antonio ........................................................................................................................................................................ 15 Nikolova Dragana ........................................................................................................................................................................ 44 Papageorgiou Sokratis G .............................................................................................................................................................. 48 Pfitzner Bjarne ............................................................................................................................................................................. 52 Poposki Petar ................................................................................................................................................................................ 15 Saylam Berrenur ........................................................................................................................................................................... 35 Šiško Primož ................................................................................................................................................................................ 31 Srbinoski Viktor ........................................................................................................................................................................... 11 Stankoski Simon ........................................................................................................................................................................... 15 Susič David .................................................................................................................................................................................. 40 57 58 Vseprisotne zdravstvene storitve in pametni senzorji Pervasive Health and Smart Sensing Uredniki  Editors: Nina Rescic, Oscar Mayora, Daniel Denkovski Document Outline 02 - Naslovnica - notranja - H - TEMP 03 - Kolofon - H - TEMP 04 - IS2022 - Predgovor - TEMP 05 - IS2022 - Konferencni odbori - TEMP 07 - Kazalo - H Blank Page 09 - Predgovor podkonference - H 10 - Programski odbor podkonference - H 1 - Hrastič Abstract 1 Introduction 2 Related work 3 Methodology 3.1 Data collection 3.2 Algorithm 4 Results 5 Conclusion 2- Srbinoski 3 - Nikoloski 4 - Lobo Abstract 1 Introduction 2 Methods 2.1 Data Collection 2.2 Data Pre-Processing 2.3 Evaluated Models and Features 3 Results and Discussion 3.1 Optimal configurations 3.2 Sensor placement and windows size 3.3 Optimal parameters 3.4 Feature importance 3.5 Limitations 4 Conclusions References 5 - Kirsten Abstract 1 Introduction 2 Background 3 Monitoring System Elements 3.1 Smart Devices and Wearables 3.2 Human Activity Recognition 3.3 Indoor Positioning Systems 3.4 Derived Parameters 4 Exemplary System Overview 4.1 Characteristics Monitoring 4.2 Connected System 5 Challenges and Limitations 6 Conclusion References 6 - Anders Abstract 1 Introduction 2 Experimental Framework 3 Methods 4 Results 5 Conclusion 6 Acknowledgments 7 - Lukan Abstract 1 Introduction 2 Methods 2.1 Data Collection 2.2 Classical Machine Learning Data Analysis 2.3 Variance Partitioning 3 Results 3.1 Machine Learning on Daily Aggregated Data 3.2 Sources of Variability 4 Discussion 5 Conclusions 8 - Saylam Abstract 1 Introduction 2 Related Works 3 Methodology 3.1 Dataset 3.2 Preprocessing 3.3 Model details and performance metrics 4 Classification Performance Evaluation 5 Discussion and Conclusion 9 - Susič Abstract 1 Introduction 2 Data 3 Methodology 4 Results 4.1 Ferritin 4.2 Haemoglobin 4.3 Transferrin saturation 5 Discussion and conclusion Acknowledgments 10 - Jakimovski 11 - Luštrek 12 - Cholakoska Abstract 1 Introduction 2 Related work 3 Dataset and exploratory data analysis 4 Experiments and discussion 5 Conclusion and future work Acknowledgments References 12 - Index - H Blank Page 08 - Naslovnica - notranja - H - TEMP.pdf Blank Page 07 - Kazalo - H.pdf Blank Page 12 - Index - H.pdf Blank Page Blank Page Blank Page