Proceedings of the 33rd International Conference on Information Model ing and Knowledge Bases EJC 2023 Editors Tatjana Welzer Družovec Marko Hölbl Lili Nemec Zlatolas Saša Kuhar June 2023 Title Proceedings of the 33rd International Conference on Information Model ing and Knowledge Bases EJC 2023 Editors Tatjana Welzer Družovec (University of Maribor, Faculty of Electrical Engineering and Computer Science) Marko Hölbl (University of Maribor, Faculty of Electrical Engineering and Computer Science) Lili Nemec Zlatolas (University of Maribor, Faculty of Electrical Engineering and Computer Science) Saša Kuhar (University of Maribor, Faculty of Electrical Engineering and Computer Science) Review Thalheim Bernhard Yasushi Kiyoki (Christian-Albrecht University of Kiel) (Kamazawa University Xing Chen Marina Tropmann-Frick (Kanagawa Institute of Technology) (University of Applied Sciences Hamburg) Hannu Jaakkola Naofumi Yoshida (Tampere University) (Komazawa University) Technical editor Jan Perša (University of Maribor, University Press) Cover designers Katja Udir Mišič Jan Perša (University of Maribor) (University of Maribor, University Press) Cover graphics Udir Mišič, 2023 Conference 33rd International Conference on Information Model ing and Knowledge Bases EJC 2023 Location and date Maribor, Slovenia, 5 – 9 June 2023 Program Yasushi Kiyoki (Keio University), Yukiko Chen (Kanagawa Institute of Technology) committee Marina Tropmann-Frick (University of Applied Sciences Hamburg), Bernhard Talheim (Christian-Albrechts University at Kiel), Hannu Jaakkola (Tampere University) Organizing Hannu Jaakkola (Tampere University), Xing Chen (Kanagawa Institute of Technology), committee Tatjana Welzer Družovec (University of Maribor), Marko Hölbl (University of Maribor), Saša Kuhar (University of Maribor), Lili Nemec Zlatolas (University of Maribor), Maja Pušnik (University of Maribor), Luka Hrgarek (University of Maribor), Marko Kompara (University of Maribor), Janez Martin Kričej (University of Maribor), Tomi Perša (University of Maribor) Editing Marina Tropmann-Frick (University of Applied Sciences Hamburg), Hannu Jaakkola committee (Tampere University), Naofumi Yoshida ( Komazawa University) Programme Coordination Naofumi Yoshida (Kamazawa University), Tatjana Welzer Družovec (University of Committee Maribor), Tatiana Endrjukaite (Transport and Telecommunication Institute) Published by University of Maribor, University Press Slomškov trg 15, 2000 Maribor, Slovenia https://press.um.si, zalozba@um.si Issued by University of Maribor, Faculty of Electrical Engineering and Computer Science Koroška cesta 46, 2000 Maribor, Slovenia https://www.feri.um.si, feri@um.si Publication type E-book Edition 1st Available at http://press.um.si/index.php/ump/catalog/book/785 Published at Maribor, Slovenia, June 2023 © University of Maribor, University Press /Univerza v Mariboru, Univerzitetna založba Besedilo/ Text © authors, Welzer Družovec, Hölbl, Nemec Zlatolas, Kuhar, 2023 This book is published under a Creative Commons 4.0 International licence (CC BY 4.0). This license lets others remix, tweak, and build upon your work even for commercial purposes, as long as they credit you and license their new creations under the identical terms. This license is often compared to “copyleft” free and open source software licenses. Any third-party material in this book is published under the book’s Creative Commons licence unless indicated otherwise in the credit line to the material. If you would like to reuse any third-party material not covered by the book’s Creative Commons licence, you wil need to obtain permission directly from the copyright holder. https://creativecommons.org/licenses/by/4.0/ CIP - Kataložni zapis o publikaciji Univerzitetna knjižnica Maribor 004(082) INTERNATIONAL Conference on Information Modelling and Knowledge Bases EJC. Proceedings (33 ; 2023 ; Maribor) Proceedings of the 33rd International Conference on Information Modelling and Knowledge Bases EJC 2023 [Elektronski vir] : [Maribor, Slovenia, 5 - 9 June] / [editors Tatjana Welzer Družovec ... [et al.]. - 1st ed. - E-publikacija. - Maribor : University of Maribor, University Press, 2023 Način dostopa (URL): https://press.um.si/index.php/ump/catalog/book/785 ISBN 978-961-286-745-4 doi: 10.18690/um.feri.5.2023 1. Welzer-Družovec, Tatjana COBISS.SI-ID 154141187 ISBN 978-961-286-745-4 (pdf) 978-961-286-746-1 (USB flash drive) DOI https://doi.org/10.18690/um.feri.5.2023 Price Free copy For publisher Prof. Dr. Zdravko Kačič, Rector of University of Maribor Attribution Welzer Družovec, T. et al. (2023). Proceedings of the 33rd International Conference on Information Model ing and Knowledge Bases EJC 2023. University of Maribor, Maribor, University Press. doi: 10.18690/um.feri.5.2023 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 T. Welzer Družovec et al. (ed.) Table of Contents Preface Tatjana Welzer Družovec, Marko Hölbl, Lili Nemec Zlatolas, 1 Saša Kuhar 1 KEY NOTE SPEECHES 5 Modelology - A New Scientific and Engineering Discipline: The P1 Science and Practice of Models and Modelling 7 Bernhard Talheim Open Science – a Reality Check Within ATHENA European P2 University 9 József Györkös 2 DATA MINING AND DATA ANALYSIS 11 A Context-based Time Series Analysis and Prediction Method for 1 Public Health Data 13 Asako Uraki, Yasushi Kiyoki, Koji Murakami, Akira Kano 2 Towards a Definition of a Responsible Artificial Intel igence Sabrina Göl ner, Marina Tropmann-Frick, Boštjan Brumen 37 A Time-series Semantic-computing Method for 5D World Map 3 System 75 Yasushi Kiyoki, Asako Uraki, Shiori Sasaki, Yukio Chen 3 ADVANCED APPLICATIONS 97 Adaptive Charging and Discharging Strategies for Smart Grid 4 Energy Storage Systems 99 Alexander Dudko, Tatiana Endrjukaite Condition Monitoring and Fault Detection of a Laser Oscil ator 5 Feedback System Arne Grünhagen, Annika Eichler, Marina Tropmann-Frick, 123 Görschwin Fey ii TABLE OF CONTENTS. ‘Anywhere to Work’ a Data Model for Selecting Workplaces 6 According to Intents and Situations 149 Hitoshi Kumagai, Naoki Ishibashi, Yasushi Kiyoki 4 ART APPLICATIONS 171 An Implementation Method of GACA: Global Art Collection 7 Archive 173 Yosuke Tsuchiya, Naoki Ishibashi Art Sensorium Project: A System Architecture of Unified Art 8 Col ections for Virtual Art Experiences Naoki Ishibashi, Tsukasa Fukuda, Yosuke Tsuchiya, Yuki Enzaki, 191 Hiroo Iwata 5 CHALLENGES IMPOSED BY THE SOCIETY 205 The Change of COVID-19 Coverage in American, German and 9 Japanese Daily Newspapers: A Computer-Assisted Text Analysis and Comparison 207 Yukiko Sato, Stefan Brückner How to Incorporate Accessibility to Design Principles for 10 is Artefacts? 223 Juho-Pekka Makipaa, Tero Vartiainen Thammasat AI City Distributed Platform and Its Validation in 11 Social Distribution and Ambient Lighting Virach Sornlertlamvanich, Somrudee Deepaisarn, 235 Thatsanee Charoenporn Digital Modeling the Impact of EU Energy Sector 12 Transformations on the Economic Security of Enterprises Olena Khadzhynova, Žaneta Simanavičienė, Oleksiy Mints, 259 Kateryna Polupanova 6 COMMUNICATION AND COLLABORATION 271 Lessons Learned from Col aborative Prototype Development 13 Between University and Enterprises Janne Harjamaki, Mika Saari, Mikko Nurminen, Petri Rantanen, 274 Jari Soini, David Hastbacka A Primitive Action-driven Recognition Method for the Realization 14 of Global Heterogeneous Sign Language Recognition Takafumi Nakanishi, Ayako Minematsu, Ryotaro Okada, 301 Osamu Hasegawa, Virach Sornlertlamvanch TABLE OF CONTENTS iii. Algorithm Outline for Sketch Map Drawing from Spatial Data 15 Distil ed from Natural Language Descriptions 317 Marek Menšik, Petr Rapant, Adam Albert 7 SPATIAL AND TEMPORAL 333 A Risk-Resilience Calculation Method for Environmental Change 16 and Disaster Analysis with 5D World Map System Visualization 335 Shiori Sasaki, Yasushi Kiyoki, Amane Hamano On the Parallel Spaces of Knowledge and Experience Based on 17 the Concept of “Dark-matter” 363 Xing Chen, Yasushi Kiyoki A Spatio-Temporal and Categorical Correlation Computing 18 Method for Induction and Deduction Analysis Yasuhiro Hayashi, Yasushi Kiyoki, Yoshinori Harada, Kazuko Makino, 385 Seigo Kaneoya PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 T. Welzer Družovec et al. (ed.) Preface TATJANA WELZER DRUŽOVEC, MARKO HÖLBL, LILI NEMEC ZLATOLAS, SAŠA KUHAR (EDS.) Information model ing is becoming increasingly important for researchers, information system designers, and users. Information is becoming increasingly complex; there are more layers of abstraction and databases, and knowledge bases are getting bigger. One of the areas of information model ing is conceptual model ing. This conference's objective is to bring together specialists from various areas of computer science and other disciplines who share a passion for comprehending and resolving issues about information model ing and knowledge bases, as wel as putting the findings of research into practice. We also aim to identify and investigate emerging model ing domains and knowledge bases that merit further study. Therefore linguistics, management science, cognitive science, knowledge management, philosophy, and logic are relevant fields. The conference wil have three categories of presentations, i.e. ful papers, short papers and position papers. As a continuation of the Scandinavian conference series that had been going on since 1982, the European-Japanese Conferences on Information Model ing and Knowledge Bases (EJC) started in 1988 as a cooperative project between Japan and Finland. Professors Hannu and Hannu Jaakkola from Finland and Professor Ohsuga 2 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 from Japan initialy managed the conferences. The geographical focus expanded to include Europe initially and then more countries. The 33rd International Conference on Information Model ing and Knowledge Bases (EJC 2023) constitute a worldwide research forum for exchanging scientific results and experiences achieved in computer science and other related disciplines using innovative methods and progressive approaches. In this way, a platform has been established, drawing together researchers and practitioners dealing with information model ing and knowledge bases. The main topics of EJC conferences target a variety of themes, in the topics include, but are not limited to: 1. Conceptual model ing: Model ing and specification languages; Domain-specific conceptual model ing; Concepts, concept theories and ontologies; Conceptual model ing of large and heterogeneous systems; Conceptual model ing of spatial, temporal and biological data; Methods for developing, validating and communicating conceptual models. 2. Knowledge and information model ing and discovery: Knowledge discovery, knowledge representation and knowledge management; Advanced data mining and analysis methods; Conceptions of knowledge and information; Model ing information requirements; Intel igent information systems; Information recognition and information model ing. 3. Linguistic modelling: Models of HCI; Information delivery to users; Intelligent informal querying; Linguistic foundation of information and knowledge; Fuzzy linguistic models; Philosophical and linguistic foundations of conceptual models. 4. Cross-cultural communication and social computing: Cross-cultural support systems; Integration, evolution and migration of systems; Collaborative societies; Multicultural web-based software systems; Intercultural collaboration and support systems; Social computing, behavioural modelling and prediction. 5. Environmental model ing and engineering: Environmental information systems (architecture); Spatial, temporal and observational information systems; Large-scale environmental systems; Col aborative knowledge base systems; Agent concepts and conceptualisation; Hazard prediction, prevention and steering systems. 6. Multimedia data modelling and systems: Modelling multimedia information and knowledge; Content-based multimedia data management; Content-based Editors: Preface 3. multimedia retrieval; Privacy and context enhancing technologies; Semantics and pragmatics of multimedia data; Metadata for multimedia information systems. The careful evaluation gave us the following program: 9 papers have been selected as long papers and 9 as short papers. We have two invited talks in the program and a presentation of Slovenian research projects and results. We thank all colleagues for their support of this issue of the EJC conference, especial y the program committee, the organising committee, and the programme coordination team. The conference proceedings, including after the conference revised presentations, will be published in the Series of “Frontiers in Artificial Intelligence” by IOS Press (Amsterdam). The books “Information Model ing and Knowledge Bases” are edited by the Editing Committee of the conference. The conference wil be productive and fruitful in advancing research and application of information model ing and knowledge bases. 4 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 1 Key note speeches Ločna stran Ločna stran PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 T. Welzer Družovec et al. (ed.) Modelology - A New Scientific and Engineering Discipline: The Science and Practice of Models and Modelling BERNHARD TALHEIM Christian-Albrechts University at Kiel, Germany bernhard.thalheim@email.uni-kiel.de Every object and every idea can be used as a model in an application scenario if it becomes useful in the scenario as an instrument in a function. Through this use and function, an object or idea becomes a model, at least for a certain or long time, for the respective model user in its context and environment. Models, therefore, tel something about the use, the function, the scenario and the users without this being explicitly seen in the model. The model-being of an object or idea also explains a lot about the object or idea. 8 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Models are used in many ways in science and technology as wel as in al phases of daily life up to ceremonies, presentations, stencils or a guide. That's why models are universal tools of every human activity since every object and idea can become a model, and models are usual y much simpler and focused on concrete use. The functions of models also allow a classification, i.e., an abstract characterisation of the model-being for the respective scenario. Model-being represents a model as a model “of something”. Instrument-being is the starting point for classifying models as representational, activity, explanatory, orientational, instructional, perceptual, declarative, socialization, or interactional models, or conceptual or investigative models, and their respective subcategories. An object or idea may be used in more than one capacity at the same time so that a model may be assigned to different categories at the same time. This raises questions such as the following: when does something become a model, which characteristics distinguish a model, which quality is expected from a model, to what extent can a model be trusted, which properties exclude being a model, to what extent is a model suitable and when not, what potential and what performance can be expected from a model, etc. Keywords: modelology, discipline, models, universal tools, modellkunde. PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 T. Welzer Družovec et al. (ed.) Open Science – a Reality Check Within ATHENA European University JÓZSEF GYÖRKÖS University of Maribor, Faculty of Electrical Engineering and Computer Science, Maribor, Slovenia jozsef.gyorkos@um.si Open science is wel recognised and established policy principle of the European Union (2019) as well as UNESCO (2021), and it is implemented by various educational and research institutions and societies. In the above-mentioned resources, open science is defined as “system change al owing for better science through open and collaborative ways of producing and sharing knowledge and data” and a framework that “contributes to reducing the digital, technological and knowledge divides existing between and within countries.” Open access to research publications and research data became even an obligatory activity part of the national regulation (Scientific Research and Innovation Activities Act, 2022). In the keynote lecture, the fundamental principles and ambitions of the open science policy wil be followed by a selection of implementation models and use cases. The most exposed principles, like the FAIR (findable, accessible, interoperable, re- 10 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 usable) open data, new generation metrics and even citizen science, wil be stressed. An experience-based assessment, stemming on the first implementations of open science principles, wil highlight the different, even contradictory, expectations of various stakeholders, including the researchers, scholarly communication, research funders, publishers and society in general. In the second part of the lecture, a brief introduction to the ATHENA European university will be given, followed by the implementation models of the open science principles in the – diverse by content but homogeneous by interest – landscape of the ATHENA al iance partner institutions. ATHENA stays for Advanced Technology Higher Education Alliance and joins nine mid-range Europe-wide partner institutions aiming for excel ence in joint research and educational projects, enhanced synergies including the research infrastructure, fostering research quality and impact, and, nevertheless, attracting funds to support its programmes. In conclusion, the lecture wil demonstrate how open science can serve as a binding policy in al iances with clear goals and heterogeneous structure regarding the initial mission, size and research capacities in different fields of science. Keywords: open science, implementation models, controversies, ATHENA European university, capacity building 2 Data Mining and Data Analysis Ločna stran 12 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Ločna stran A CONTEXT-BASED TIME SERIES ANALYSIS AND PREDICTION METHOD FOR PUBLIC HEALTH DATA ASAKO URAKI,1 YASUSHI KIYOKI,1 KOJI MURAKAMI,2 AKIRA KANO3 1 Keio University, Keio Research Institute at SFC, Tokyo, Japan aco@sfc.keio.ac.jp, kiyoki@sfc.keio.ac.jp 2 Prevent Science Co., Ltd, Tokyo, Japan murakami@preventme.co.jp 3 Fujimino Emergency Hospital, Saitama, Japan kano@koyu-kai.jp The important process of time series analysis for public health data is to determine target data as a semantic discrete value, according to a context from continuous phenomenon around our circumstance. The phenomenon is expressed as continuous values or discrete data along with time, independent of context. Difference of situations in the phenomenon on time axis expresses one of the key features of time series data, and differences are reflected with adjacent discrete values. Typical y, each field of experts has their own fields’ specific and practical knowledge to specify an appropriate target part of data which contains the key features of their intended context in each Keywords: analysis. Those are often implicit, thus not defined as context-based systematical y and quantitatively. In this paper, we present a system, differential context-based time series analysis and prediction method for computing, public health data. The most essential point of our approach is time-series to express a basis of time series context as the combination of analysis, time-series the following 5 elements (1: granularity setting on time axis, 2: prediction, feature extraction method, 3: time-window setting, 4: differential public health data, computing function, and 5: pivot setting) to determine target big data, AI, data as semantic discrete values, according to the time series cyber-physical context of analysis for public health data. system DOI https://doi.org/10.18690/um.feri.5.2023.1 ISBN 978-961-286-745-4 14 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 1 Introduction Analysis and Prevention for future situation from the past and current data are important activities to realize preemptive medicine for human health and early detection of spreading infection disease in nature and societies. As our background researches, our semantic computing method [9,10,12] and 5D World Map system [1,2,3,4,5,6,7,8,11] are applied to analysis from the viewpoint of personal health-situation and spreading of infection disease. The important process of time series analysis for public health data is to determine target data as a semantic discrete value, according to a context from continuous phenomenon around our circumstance. The phenomenon is expressed as continuous values or as just a raw discrete data along the time, independent of context. Difference of situations in the phenomenon on time axis expresses one of the key features of time series data, and differences are reflected with adjacent discrete values. Typical y, each field of experts has their own field’s specific and practical knowledge to specify an appropriate target part of data which contains the key features of their intended time series context for each analysis. However, those are often implicit therefore not defined systematical y and quantitatively. In this paper, we present a context-based time series analysis and prediction method for public health data. The most essential point of our approach is to express a basis of time series context as the combination of the following 5 elements (1: granularity setting on time axis, 2: feature extraction method, 3: time-window setting, 4: differential computing function, and 5: pivot setting) to determine target data as the semantic discrete value according to the time series context of analysis and prediction for public health data. As our experiment, we realized analysis and prediction by applying actual public health data. Ordinally in time-series data analysis and prediction, we determine the target part of data along with time axis with an implicit expert knowledge as each context. Our main contribution is to make it possible to explicitly express each context for determination of the target part of data. Then we are able to express certain context quantitatively, to make compatibility, to share the context quantitatively among the different analysis and prediction environment. By changing the context, we can get the different results for discussion and comparison between the different point of A. Uraki, Y. Kiyoki, K. Murakami, A. Kano: A Context-based Time Series Analysis and Prediction Method for Public Health Data 15. view, and to be able to review analytical results of phenomena among users and analyst. Our system enables to express expert knowledge of analysis and prediction on each specific field, and this makes it possible to analyze and discuss interdisciplinary between different fields. The meaning of our new context definition is to fix the closed world of the semantic differential computing on time axis from viewpoint of database system. So, we can normalize the targe part of data according to a context, and it means we can compare the feature extracted from the target data. One of the main features of our method is to get different results by switching the time series context. Our new concept to express time series context by 5 elements is shown in Figure 1. If we apply our new context definition to prediction, we are able to realize comparison and discussion between several prediction results for the same data by switching the time series context. Therefore, we are not focusing to evaluate the output results with the real data. Our method realizes quantitative comparison between different time series contexts. The context definition is quite important to deal with interdisciplinary phenomenon between human and providence of nature, such as the field of health, medication, environment, and culture which are having time axis. As our experiment, we realized our system and context definition in the field of public health to analyze and prediction of infection disease. The experiments show the prediction feasibility of our method in the field of public health data, effectiveness to generate results for discussion regarding switching context, and applicability to express time series context of an expert knowledge for analysis and prediction as combination of the 5 elements to make the knowledge explicit and quantitative expression. In the next section, we present our method by explaining overview and definitions of data and functions. In section 3, we show two experiments by applying actual public health data, and we conclude this paper in section 4. 16 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Figure 1: Concept of the context-based time series analysis and prediction method for public health data. The left side shows context 1 (ex: for overal weekly trend while lockdown of pandemic situation), and the right side another context n (ex: prediction laboratory throughput capacity of an infection disease). By switching those contexts, we can get the different target data (as semantic discrete values) for analysis, and it makes different prediction results according to the context from same input data. Source: own. 2 A Context-based Time Series Analysis and Prediction Method for Public Health Data In this section, we define a context-based time series analysis and prediction method for public health data. First, we explain overview of our method to process the 5 elements of context to analysis and prediction for public health data. Then we define 5 elements to express basis of a time series context to generate target data as semantic discrete values of analysis and prediction according to the time series context, and we also define of input data, target data, and output data of our method. 2.1 Overview of the Context-based Time Series Analysis and Prediction Method We realize our method by the following 4 steps to get analysis and prediction result. Overview of the proposing method is shown in Figure 2. A. Uraki, Y. Kiyoki, K. Murakami, A. Kano: A Context-based Time Series Analysis and Prediction Method for Public Health Data 17. Step 1: Input data (continuous data/raw discrete data) along with time (before considering time series context) Input time series data for reference (IRD) Input time series data for prediction (ITD) Step2: Define a time series context for analysis and prediction by combination of the 5 elements Granularity setting on time axis (GS) Feature extraction method (FEM) Time-window setting (TWS) Differential computing function (DCF) Pivot setting on time axis for prediction (PV) Step3: Extract target data and pivot according to the 5 elements defined in Step2 Confirmed target data (CRD) Time point of Pivot on ITD (PV) Step4: Output prediction result according to the context Output prediction data (OPD) Figure 2: Overview of a context-based time series analysis and prediction method for public health data. We realize our model by the 4 steps to output prediction result. Source: own. 18 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 2.2 Data Structure and Definition of 5 elements to express a time series context to determine semantic discrete values as target of analysis and prediction Our approach is to express the basis of time series context for analysis and prediction as combination of the following 5 elements, Granularity setting on time axis (GS), Feature extraction method (FEM), Time-window setting (TWS), Differential computing function (DCF), and Pivot setting on time axis for prediction (PV) to determine the appropriate target data as semantic discrete values on time axis. In the field of analysis of time series data, we have many viewpoints for expressing time series context to fix target data not only those elements but also for the specific criteria on each field. In this paper, we only focused on the analysis and prediction of public health data, and we designed it by applying above 5 elements with the expert knowledge which are previously implicit in the brain of each analyst. Therefore, by applying the above 5 elements, we are able to express a time series context to fix target data and pivot on time axis for analysis and prediction in the field of public health data. Each element has each role to fix the target data as explaining in the following subsections. Granularity setting on time axis (GS): Setting granularity on time axis corresponds to fix a semantic sampling rate of target data as semantic discrete values. We have many conceptual numeral systems on time axis and its sampling size according to a context. The granularity setting in time (GS) includes two variables, original granularity in time (OG), and target granularity in time (TG). We define data structure of GS as the following. In general, as we can feel in daily life, we have many kinds of numeral systems on time axis, such as base 24 system (24 ticks of conceptual and hierarchal structures between day and hour), sexagesimal system (60 ticks of conceptual and hierarchal structures for second, minute, and hour). We also have several conceptual systems on time axis, such as hierarchal text writing system of paper (hierarchal system between phrase, sentence, section, and paper), music hierarchal system (hierarchal system between note, phrase, section, movement, and a piece of music) Moreover, we use specific segmentation to fix an important part of data by applying additional information other than data itself, such as heart beat rate during workout or not, A. Uraki, Y. Kiyoki, K. Murakami, A. Kano: A Context-based Time Series Analysis and Prediction Method for Public Health Data 19. blood pressure value in duration of medication or not. We have introduced the conceptual tick and the hierarchal system of music data as “grain and tree-structured granularity” for music analysis in our previous research [14]. By switching the system and selecting the level of granularity, we can specify an appropriate grain of data according to a context. Ordinally in the field of data analysis, the granularity of input data wil be used as is the granularity of the targe data. Our model makes it possible to explicit desired granularity of context to generate target data for each analysis and prediction. If we have an attention on larger granularity, the difference between data expresses comprehensive feature. Other cases, if we have another attention on smal er granularity, the difference between data expresses detailed differential feature of the data. 𝐺𝐺𝐺𝐺𝑎𝑎 ⊃ {𝑂𝑂𝐺𝐺𝑏𝑏, 𝑇𝑇𝐺𝐺𝑏𝑏} (1) (𝑎𝑎 = 1, 𝑎𝑎𝑎𝑎)(𝑎𝑎 = 𝑔𝑔𝑔𝑔𝑎𝑎𝑔𝑔𝑔𝑔𝑔𝑔𝑎𝑎𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 𝑐𝑐𝑐𝑐𝑔𝑔𝑔𝑔𝑔𝑔𝑐𝑐𝑔𝑔 𝑎𝑎𝑚𝑚𝑔𝑔ℎ𝑐𝑐𝑜𝑜 𝑔𝑔𝑜𝑜, 𝑎𝑎𝑎𝑎 = 𝑎𝑎𝑎𝑎𝑚𝑚𝑔𝑔𝑎𝑎𝑔𝑔𝑎𝑎 𝑔𝑔𝑔𝑔𝑎𝑎𝑛𝑛𝑚𝑚𝑔𝑔 𝑐𝑐𝑜𝑜 𝑔𝑔𝑔𝑔𝑎𝑎𝑔𝑔𝑔𝑔𝑔𝑔𝑎𝑎𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 𝑐𝑐𝑐𝑐𝑔𝑔𝑔𝑔𝑔𝑔𝑐𝑐𝑔𝑔 𝑎𝑎𝑚𝑚𝑔𝑔ℎ𝑐𝑐𝑜𝑜) (𝑛𝑛 = 1, 𝑛𝑛𝑎𝑎)(𝑛𝑛 = 𝑔𝑔𝑔𝑔𝑎𝑎𝑔𝑔𝑔𝑔𝑔𝑔𝑎𝑎𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 𝑔𝑔𝑜𝑜, 𝑛𝑛𝑎𝑎 = 𝑎𝑎𝑎𝑎𝑚𝑚𝑔𝑔𝑎𝑎𝑔𝑔𝑎𝑎 𝑔𝑔𝑔𝑔𝑎𝑎𝑛𝑛𝑚𝑚𝑔𝑔 𝑐𝑐𝑜𝑜 𝑔𝑔𝑔𝑔𝑎𝑎𝑔𝑔𝑔𝑔𝑔𝑔𝑎𝑎𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔) Feature extraction method (FEM): Setting of the feature extraction method means normalization of values of target data in a closed world of context for analysis and prediction. We define data structure of FEM as the fol owing. The feature extraction method is the quantification method to generate value of the target data on time axis, such as ratio out of maximum number, specific quantity, semantic feature value on specific viewpoint, and so on. 𝐹𝐹𝐹𝐹𝐹𝐹𝑐𝑐 (2) (𝑐𝑐 = 1, 𝑐𝑐𝑎𝑎)(𝑐𝑐 = 𝑜𝑜𝑚𝑚𝑎𝑎𝑔𝑔𝑔𝑔𝑔𝑔𝑚𝑚 𝑚𝑚𝑚𝑚𝑔𝑔𝑔𝑔𝑎𝑎𝑐𝑐𝑔𝑔𝑔𝑔𝑐𝑐𝑔𝑔 𝑎𝑎𝑚𝑚𝑔𝑔ℎ𝑐𝑐𝑜𝑜 𝑔𝑔𝑜𝑜, 𝑐𝑐𝑎𝑎 = 𝑎𝑎𝑎𝑎𝑚𝑚𝑔𝑔𝑎𝑎𝑔𝑔𝑎𝑎 𝑔𝑔𝑔𝑔𝑎𝑎𝑛𝑛𝑚𝑚𝑔𝑔 𝑐𝑐𝑜𝑜 𝑜𝑜𝑚𝑚𝑎𝑎𝑔𝑔𝑔𝑔𝑔𝑔𝑚𝑚 𝑚𝑚𝑚𝑚𝑔𝑔𝑔𝑔𝑎𝑎𝑐𝑐𝑔𝑔𝑔𝑔𝑐𝑐𝑔𝑔 𝑎𝑎𝑚𝑚𝑔𝑔ℎ𝑐𝑐𝑜𝑜) Time-window setting (TWS): Setting the time- window means selection of intended range of time axis for each context. Time-window setting (TWS) contains two variables, starting time point (TS) and ending time point (TE). We define data structure of TWS as the following. 20 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 𝑇𝑇𝑇𝑇𝐺𝐺𝑑𝑑 ⊃ {𝑇𝑇𝐺𝐺𝑡𝑡𝑡𝑡,𝑇𝑇𝐹𝐹𝑡𝑡𝑡𝑡} (3) (𝑜𝑜 = 1, 𝑜𝑜𝑎𝑎) (𝑜𝑜 = 𝑔𝑔𝑔𝑔𝑎𝑎𝑚𝑚_𝑤𝑤𝑔𝑔𝑔𝑔𝑜𝑜𝑐𝑐𝑤𝑤 𝑠𝑠𝑚𝑚𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 𝑔𝑔𝑜𝑜, 𝑜𝑜𝑎𝑎 = 𝑎𝑎𝑎𝑎𝑚𝑚𝑔𝑔𝑎𝑎𝑔𝑔𝑎𝑎 𝑔𝑔𝑔𝑔𝑎𝑎𝑛𝑛𝑚𝑚𝑔𝑔 𝑐𝑐𝑜𝑜 𝑔𝑔𝑔𝑔𝑎𝑎𝑚𝑚_𝑤𝑤𝑔𝑔𝑔𝑔𝑜𝑜𝑐𝑐𝑤𝑤 𝑠𝑠𝑚𝑚𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 𝑎𝑎𝑚𝑚𝑔𝑔ℎ𝑐𝑐𝑜𝑜) (𝑔𝑔𝑡𝑡 = 𝑔𝑔𝑔𝑔𝑎𝑎𝑚𝑚 𝑡𝑡𝑐𝑐𝑔𝑔𝑔𝑔𝑔𝑔 𝑐𝑐𝑔𝑔 𝐼𝐼𝐼𝐼𝐼𝐼) Differential computing function (DCF): Setting of the differential computing method means switching type of ruler to calculate differential feature between adjacent data on time axis for each context. We define data structure of the DCF as the following. The differential computing method is a quantification method to calculate value of the difference of the target data on time axis, such as regressive curve, tilt/angle, slope, trend line linear, substruction, ratio between each substruction, a color system to calculate distance between colors, and so on. The determination of the differential computing method is often considered with the feature extraction method since in many cases, as both two are developed together in specific research field such as metadata generation method and the specific calculation system for the metadata. 𝐼𝐼𝐷𝐷𝐹𝐹𝑒𝑒 (4) (𝑚𝑚 = 1, 𝑚𝑚𝑎𝑎) (𝑚𝑚 = 𝑜𝑜𝑔𝑔𝑜𝑜𝑜𝑜𝑚𝑚𝑔𝑔𝑚𝑚𝑔𝑔𝑔𝑔𝑔𝑔𝑎𝑎𝑔𝑔 𝑐𝑐𝑐𝑐𝑎𝑎𝑡𝑡𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 function 𝑔𝑔𝑜𝑜, 𝑚𝑚𝑎𝑎 = 𝑎𝑎𝑎𝑎𝑚𝑚𝑔𝑔𝑎𝑎𝑔𝑔𝑎𝑎 𝑔𝑔𝑔𝑔𝑎𝑎𝑛𝑛𝑚𝑚𝑔𝑔 𝑐𝑐𝑜𝑜 𝑜𝑜𝑔𝑔𝑜𝑜𝑜𝑜𝑚𝑚𝑔𝑔𝑚𝑚𝑔𝑔𝑔𝑔𝑔𝑔𝑎𝑎𝑔𝑔 𝑐𝑐𝑐𝑐𝑎𝑎𝑡𝑡𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 function) Pivot setting (PV): Setting pivot means to fix an effective timing of starting prediction by applying target data of each context. The pivot is a starting time point of predication that matched the conceptual meaning of the starting time of target data, such as the same day of week, same month in a year, having a similarity in supportive data, having a similarity in target data, and so on. We define data structure of PV as the following. 𝑃𝑃𝑃𝑃ℎ (5) (ℎ = 1, ℎ𝑎𝑎) (ℎ = pivot setting 𝑔𝑔𝑜𝑜, ℎ𝑎𝑎 = 𝑎𝑎𝑎𝑎𝑚𝑚𝑔𝑔𝑎𝑎𝑔𝑔𝑎𝑎 𝑔𝑔𝑔𝑔𝑎𝑎𝑛𝑛𝑚𝑚𝑔𝑔 𝑐𝑐𝑜𝑜 𝑡𝑡𝑔𝑔𝑝𝑝𝑐𝑐𝑔𝑔 𝑠𝑠𝑚𝑚𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔) (𝑔𝑔𝑡𝑡 = 𝑔𝑔𝑔𝑔𝑎𝑎𝑚𝑚 𝑡𝑡𝑐𝑐𝑔𝑔𝑔𝑔𝑔𝑔 𝑐𝑐𝑔𝑔 𝐼𝐼𝑇𝑇𝐼𝐼) A. Uraki, Y. Kiyoki, K. Murakami, A. Kano: A Context-based Time Series Analysis and Prediction Method for Public Health Data 21. 2.3 Data Structure and Definition of Input Data, Target data, and Output Data of the Context-based Time Series Analysis and Prediction Method Our input data consists of the following two data, 1) Input Reference Data (IRD) and 2) Input Target Data (ITD). IRD is a base data for picking up target data according to a context. ITD is a prediction target data by applying the target data also according to the context. We define the data structure of the two inputs data as the following formula. ITD=IRD might be possible when if the IRD is also desired prediction data, and ITD=nul might be also possible when if only analysis. We define IRD and ITD as the followings. Input time series data for reference (IRD): �𝐼𝐼𝐼𝐼𝐼𝐼[𝑛𝑛,𝑡𝑡,𝑣𝑣,𝑡𝑡], ⋯ , 𝐼𝐼𝐼𝐼𝐼𝐼[𝑛𝑛𝑛𝑛,𝑡𝑡𝑛𝑛,𝑣𝑣𝑛𝑛,𝑡𝑡𝑛𝑛]� (6) (𝑔𝑔 = 1, 𝑔𝑔𝑎𝑎)(𝑔𝑔𝑎𝑎 = 𝑎𝑎𝑎𝑎𝑚𝑚𝑔𝑔𝑎𝑎𝑔𝑔𝑎𝑎 𝑔𝑔𝑔𝑔𝑎𝑎𝑛𝑛𝑚𝑚𝑔𝑔 𝑐𝑐𝑜𝑜 𝐼𝐼𝐼𝐼𝐼𝐼) (𝑔𝑔 = 1, 𝑔𝑔𝑎𝑎)(𝑔𝑔𝑎𝑎 = 𝑎𝑎𝑎𝑎𝑚𝑚𝑔𝑔𝑎𝑎𝑔𝑔𝑎𝑎 𝑔𝑔𝑔𝑔𝑎𝑎𝑛𝑛𝑚𝑚𝑔𝑔 𝑐𝑐𝑜𝑜 𝑔𝑔𝑔𝑔𝑎𝑎𝑚𝑚 𝑡𝑡𝑐𝑐𝑔𝑔𝑔𝑔𝑔𝑔) (𝑝𝑝 = 1, 𝑝𝑝𝑎𝑎)(𝑝𝑝𝑎𝑎 = 𝑎𝑎𝑎𝑎𝑚𝑚𝑔𝑔𝑎𝑎𝑔𝑔𝑎𝑎 𝑝𝑝𝑎𝑎𝑔𝑔𝑔𝑔𝑚𝑚) (𝑡𝑡 = 1, 𝑡𝑡𝑎𝑎)(𝑡𝑡𝑎𝑎 = 𝑎𝑎𝑎𝑎𝑚𝑚𝑔𝑔𝑎𝑎𝑔𝑔𝑎𝑎 𝑔𝑔𝑔𝑔𝑎𝑎𝑛𝑛𝑚𝑚𝑔𝑔 𝑐𝑐𝑜𝑜 𝑡𝑡𝑎𝑎𝑔𝑔𝑎𝑎𝑎𝑎𝑚𝑚𝑔𝑔𝑚𝑚𝑔𝑔) Input time series data for prediction (ITD): �𝐼𝐼𝑇𝑇𝐼𝐼[𝑛𝑛,𝑡𝑡,𝑣𝑣,𝑡𝑡], ⋯ , 𝐼𝐼𝑇𝑇𝐼𝐼[𝑛𝑛𝑛𝑛,𝑡𝑡𝑛𝑛,𝑣𝑣𝑛𝑛,𝑡𝑡𝑛𝑛]� (7) (𝑎𝑎 = 1, 𝑎𝑎𝑎𝑎)(𝑎𝑎𝑎𝑎 = 𝑎𝑎𝑎𝑎𝑚𝑚𝑔𝑔𝑎𝑎𝑔𝑔𝑎𝑎 𝑔𝑔𝑔𝑔𝑎𝑎𝑛𝑛𝑚𝑚𝑔𝑔 𝑐𝑐𝑜𝑜 𝐼𝐼𝑇𝑇𝐼𝐼) Confirmed target data (CRD): The data structure of the target data is formalized as the fol owing. �𝐷𝐷𝐼𝐼𝐼𝐼[𝑛𝑛,𝑡𝑡𝑡𝑡,𝑣𝑣𝑡𝑡,𝑡𝑡𝑡𝑡], ⋯ , 𝐷𝐷𝐼𝐼𝐼𝐼[𝑛𝑛𝑛𝑛,𝑡𝑡𝑡𝑡𝑛𝑛,𝑣𝑣𝑡𝑡𝑛𝑛,𝑡𝑡𝑡𝑡𝑛𝑛]� (8) (𝑔𝑔𝑔𝑔 = 1, 𝑔𝑔𝑔𝑔𝑎𝑎)(𝑔𝑔𝑔𝑔𝑎𝑎 = 𝑎𝑎𝑎𝑎𝑚𝑚𝑔𝑔𝑎𝑎𝑔𝑔𝑎𝑎 𝑔𝑔𝑔𝑔𝑎𝑎𝑛𝑛𝑚𝑚𝑔𝑔 𝑐𝑐𝑜𝑜 𝑔𝑔𝑔𝑔𝑎𝑎𝑚𝑚 𝑡𝑡𝑐𝑐𝑔𝑔𝑔𝑔𝑔𝑔) (𝑝𝑝𝑔𝑔 = 1, 𝑝𝑝𝑔𝑔𝑎𝑎)(𝑝𝑝𝑔𝑔𝑎𝑎 = 𝑎𝑎𝑎𝑎𝑚𝑚𝑔𝑔𝑎𝑎𝑔𝑔𝑎𝑎 𝑝𝑝𝑎𝑎𝑔𝑔𝑔𝑔𝑚𝑚) (𝑡𝑡𝑔𝑔 = 1, 𝑡𝑡𝑔𝑔𝑎𝑎)(𝑡𝑡𝑔𝑔𝑎𝑎 = 𝑎𝑎𝑎𝑎𝑚𝑚𝑔𝑔𝑎𝑎𝑔𝑔𝑎𝑎 𝑔𝑔𝑔𝑔𝑎𝑎𝑛𝑛𝑚𝑚𝑔𝑔 𝑐𝑐𝑜𝑜 𝑡𝑡𝑎𝑎𝑔𝑔𝑎𝑎𝑎𝑎𝑚𝑚𝑔𝑔𝑚𝑚𝑔𝑔) Output prediction data (OPD): Data structure of the output data structure is formalized as the following. 22 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 �𝑂𝑂𝑃𝑃𝐼𝐼[𝑛𝑛,𝑡𝑡𝑡𝑡,𝑣𝑣𝑡𝑡,𝑡𝑡𝑡𝑡], ⋯ , 𝑂𝑂𝑃𝑃𝐼𝐼[𝑛𝑛𝑛𝑛, 𝑡𝑡𝑡𝑡𝑛𝑛,𝑣𝑣𝑡𝑡𝑛𝑛,𝑡𝑡𝑡𝑡𝑛𝑛]� (9) (𝑔𝑔𝑡𝑡 = 1, 𝑔𝑔𝑡𝑡𝑎𝑎)(𝑔𝑔𝑡𝑡𝑎𝑎 = 𝑎𝑎𝑎𝑎𝑚𝑚𝑔𝑔𝑎𝑎𝑔𝑔𝑎𝑎 𝑔𝑔𝑔𝑔𝑎𝑎𝑛𝑛𝑚𝑚𝑔𝑔 𝑐𝑐𝑜𝑜 𝑔𝑔𝑔𝑔𝑎𝑎𝑚𝑚 𝑡𝑡𝑐𝑐𝑔𝑔𝑔𝑔𝑔𝑔) (𝑝𝑝𝑡𝑡 = 1, 𝑝𝑝𝑡𝑡𝑎𝑎)(𝑝𝑝𝑡𝑡𝑎𝑎 = 𝑎𝑎𝑎𝑎𝑚𝑚𝑔𝑔𝑎𝑎𝑔𝑔𝑎𝑎 𝑝𝑝𝑎𝑎𝑔𝑔𝑔𝑔𝑚𝑚) (𝑡𝑡𝑡𝑡 = 1, 𝑡𝑡𝑡𝑡𝑎𝑎)(𝑡𝑡𝑡𝑡𝑎𝑎 = 𝑎𝑎𝑎𝑎𝑚𝑚𝑔𝑔𝑎𝑎𝑔𝑔𝑎𝑎 𝑔𝑔𝑔𝑔𝑎𝑎𝑛𝑛𝑚𝑚𝑔𝑔 𝑐𝑐𝑜𝑜 𝑡𝑡𝑎𝑎𝑔𝑔𝑎𝑎𝑎𝑎𝑚𝑚𝑔𝑔𝑚𝑚𝑔𝑔) 2.4 Functions of the Context-based Time Series Analysis and Prediction Method Function to determine target data and pivot: We define the function to determine target data and pivot as the following. 𝑜𝑜𝑑𝑑𝑒𝑒𝑡𝑡𝑒𝑒𝑡𝑡𝑛𝑛𝑑𝑑𝑛𝑛𝑒𝑒_𝑡𝑡𝑒𝑒𝑟𝑟_𝑑𝑑𝑎𝑎𝑡𝑡𝑎𝑎�𝐺𝐺𝐺𝐺𝑎𝑎,𝐹𝐹𝐹𝐹𝐹𝐹𝑏𝑏,𝐹𝐹𝑇𝑇[𝑡𝑡𝑡𝑡,𝑡𝑡𝑒𝑒],𝐼𝐼𝐷𝐷𝐹𝐹𝑐𝑐,𝑃𝑃𝑃𝑃ℎ,𝐼𝐼𝐼𝐼𝐼𝐼[𝑛𝑛,𝑡𝑡,𝑣𝑣,𝑡𝑡],𝐼𝐼𝑇𝑇𝐼𝐼[𝑛𝑛𝑛𝑛,𝑡𝑡,𝑣𝑣,𝑡𝑡]� → �𝐷𝐷𝐼𝐼𝐼𝐼[𝑛𝑛,𝑡𝑡𝑡𝑡,𝑣𝑣𝑡𝑡,𝑡𝑡𝑡𝑡], ⋯ , 𝐷𝐷𝐼𝐼𝐼𝐼[𝑛𝑛𝑛𝑛,𝑡𝑡𝑡𝑡𝑛𝑛,𝑣𝑣𝑡𝑡𝑛𝑛,𝑡𝑡𝑡𝑡𝑛𝑛],𝑃𝑃𝑃𝑃𝑛𝑛� (10) Function for prediction according to the context: We define the function for prediction as the following. 𝑜𝑜𝑒𝑒𝑒𝑒𝑡𝑡𝑡𝑡𝑎𝑎𝑐𝑐𝑡𝑡_𝑡𝑡𝑡𝑡𝑒𝑒𝑑𝑑𝑑𝑑𝑐𝑐𝑑𝑑𝑝𝑝𝑛𝑛_𝑑𝑑𝑎𝑎𝑡𝑡𝑎𝑎�𝐼𝐼𝑇𝑇𝐼𝐼[𝑛𝑛,𝑡𝑡,𝑣𝑣,𝑡𝑡],𝐷𝐷𝐼𝐼𝐼𝐼[𝑛𝑛,𝑡𝑡,𝑣𝑣,𝑡𝑡],𝑃𝑃𝑃𝑃𝑛𝑛� → �𝑂𝑂𝑃𝑃𝐼𝐼[𝑛𝑛,𝑡𝑡𝑡𝑡,𝑣𝑣𝑡𝑡,𝑡𝑡𝑡𝑡], ⋯ , 𝑂𝑂𝑃𝑃𝐼𝐼[𝑛𝑛𝑛𝑛,𝑡𝑡𝑡𝑡𝑛𝑛,𝑣𝑣𝑡𝑡𝑛𝑛,𝑡𝑡𝑡𝑡𝑛𝑛]� (11) 3 Experiment Study We realized our experimental studies by applying COVID-19 number of confirmed cases data as the actual phenomenon of one of the public health issues. Experiment 1: This experiment is to predict Covid-19 number of cases to expect tightness of testing throughput capacity of the Covid-19 (infection disease) in a laboratory during the coming week in Austria with single parameterized input data. The point of this experiment is to know what day of the week we need to expect maximum number of processing. A testing laboratory has implicit statistics that the number of confirmed cases is mostly synchronized with tightness day of the testing throughput capacity. To determine reference data for prediction, we set the fol owing two contexts by an expert of analysis in the field of public health data. The time series A. Uraki, Y. Kiyoki, K. Murakami, A. Kano: A Context-based Time Series Analysis and Prediction Method for Public Health Data 23. context 1 is to reflect the most recent situation for prediction. The context 2 is to reflect the most similar situation in for prediction. By applying the two contexts to determine reference data to the prediction, we expect to get different prediction result according to each context. The important knowledge in the context setting of experiment 1 is to focus on the number of confirmed cases on Sundays, since most of the testing sites are closed on Sundays in Austria by business restrictions same as other DACH countries, and the confirmed cases on Sundays are expressing only the results from the regional core hospitals to care relatively severe conditioned patients. The expert of analysis in the field of public health data who set those contexts assumes that ratio comparing by the Sunday’s number expresses one of the key features of situation of the fol owing days on the week. Therefore, starting time point of the TWS and the PV is on Sunday in this experiment. Input data and time series context 1 for experiment 1: to predict tightness of testing throughput capacity of the Covid-19 (infection disease) in a laboratory during the coming week in Austria by reflecting the most recent situation for prediction. Input time series data for reference (IRD) = Covid-19 number of confirmed cases in Austria which is published by ECDC [16] Input time series data for prediction (ITD) = prediction data is same as IRD, Covid19 number of confirmed cases in Austria Granularity setting (GS) Original granularity (OG) = daily Target granularity (TG) = daily Feature extraction method (FEM) = actual number of confirmed cases Time-window setting (TWS) = most recent 1 week starting from Sunday on IRD Differential computing function (DCF) = ratio between starting point number of cases Pivot setting on ITD (PV) = the most recent Sunday on ITD 24 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 The confirmed reference data (CRD) for context 1 of experiment1 and the prediction output (OPD) are shown in Figure 3. Figure 3: The confirmed reference data (CRD) and the prediction output (OPD) for context 1 of experiment1. The top left data (blue line) shows the most recent one-week data starting from Sunday which reflect the context 1, and the number of confirmed cases on each day calculated as the ratio between cases of Sunday and cases of each day as shown in the table. The ratio applied to predict the next one-week number of cases as the right bottom table. The output prediction data (OPD) is shown in top right (orange line). Source: own. Input data and time series context 2 for experiment 1: to predict tightness of testing throughput capacity of the Covid-19 (infection disease) in a laboratory during the coming week in Austria by reflecting the most similar past situation (amount of change and absolute value) for prediction Input time series data for reference (IRD) = Covid-19 number of confirmed cases in Austria which is published by ECDC [16] Input time series data for prediction (ITD) = prediction data is same as IRD, Covid19 number of confirmed cases in Austria Granularity setting on time axis (GS) A. Uraki, Y. Kiyoki, K. Murakami, A. Kano: A Context-based Time Series Analysis and Prediction Method for Public Health Data 25. Original granularity (OG) = daily (for reference data), weekly (for determination of reference data) Target granularity (TG) = daily for prediction Feature extraction method (FEM) = actual number of confirmed cases Time-window setting (TWS) = most recent 1 week starting from Sunday that matched condition (condition: combination of slope trend (over two weeks of decreasing trend) and absolute value (the time point right after under 5000 cases)) Differential computing function (DCF) = ratio between starting point number of cases Pivot setting on ITD (PV) = the most recent Sunday on ITD The Process to determine the confirmed reference data (CRD) for the context 2 of experiment1 is shown in Figure 4, and the confirmed reference data (CRD) for context 1 of experiment1 and the prediction output (OPD) are shown in Figure 5. Figure 4: Process to determine the confirmed reference data (CRD) for the context 2 of experiment1. The figure shows number of confirmed cases on every Sundays (as weekly data), and the red squares show two candidate weeks that matched condition of time-window setting (TWS) of the context 2. The most recent week from the candidates (the right red square) has been determined as the reference data (CRD). The green square shows time point of the pivot (PV) for prediction. Source: own. 26 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Figure 5: The confirmed reference data (CRD) and the prediction output (OPD) for context 2 of experiment1. The top left data (blue line) shows the one-week data that matched condition starting from Sunday which reflect the context 2, and the number of confirmed cases on each day calculated as the ratio between cases of Sunday and cases of each day as shown in the table. The ratio applied to predict the next one-week number of cases as the right bottom table. The output prediction data (OPD) is shown in top right (orange line). Source: own. Figure 6: Comparison between CRD (confirmed referential data), OPD (output prediction data), and the actual number of confirmed cases. Left-side chart is result of context 1 of experiment1(by reflecting the most recent situation for prediction), and the right-side chart is result of context 2 of experiment (by reflecting the most similar past situation (amount of change and absolute value) for prediction). Dash line shows prediction data, double line shows CRD (confirmed referential data), and the solid line shows actual number of cases which later published from ECDC. By comparing OPD and the actual number of cases, the context 2 is closer than context 1. Source: own. A. Uraki, Y. Kiyoki, K. Murakami, A. Kano: A Context-based Time Series Analysis and Prediction Method for Public Health Data 27. Comparison between CRD (confirmed referential data), OPD (output prediction data), and the actual number of confirmed cases are shown in Figure 6. The left-side chart is result of context 1 of experiment1, and the right-side chart is result of context 2 of experiment. Dash line shows prediction data, double line shows CRD (confirmed referential data), and the solid line shows actual number of cases which later published from ECDC. By comparing OPD and the actual number of cases, the context 2 is closer than context 1. Results of the experiment 1 show following discussions. Prediction feasibility of our method in the field of public health data Realized quantitative comparison between different time series context Effectiveness to generate results for discussion regarding switching the setting of 5 elements to reflect better settings of time series context to the other prediction quantitatively Applicability to express time series context of an expert knowledge for analysis and prediction as the combination of 5 elements, to make the knowledge explicit and quantitative expression Experiment 2: Experiment 2 is focusing on switching two major variant, Alpha and Delta, during pandemic situation of Covid-19 in 2022 with multiparametric input data. This experiment is to predict the timing to reach over 90% ratio of a spreading (increasing) variant while switching with another variant. To determine reference data for prediction, we set the fol owing two contexts to select input time series data for reference (IRD), by an expert of analysis in the field of public health data. The time series context 1 is to reflect the closer population, and the time series context 2 is to reflect the closer population density. By applying those contexts, we selected Covid-19 confirmed number of variant cases of nationwide data of United Kingdom and city level data of London as the input time series data for reference (IRD) which is published by government of United Kingdom [15]. Those areas have already switched majority from Alpha variant to Delta variant completely, and the Delta variant reached over 90% ratio. And we also selected Covid-19 confirmed number 28 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 of variant cases of Saitama as input time series data for prediction (ITD) which are still in the half time point of switching majority from Alpha variant to Delta variant. The important knowledge in the context setting of experiment 2 is to focus on closer population and closer population density. The expert of analysis in the field of public health data who set those contexts assumes that rapidness of switching two major variant is corresponding population density, and rapidness is slower in higher density area and rapidness is quicker in lower density area. We expect to get different prediction result according to each context, and we also expect get basis to compare it for discussion of the expert’s assumption. In this experiment, we extract rapidness of switching two major variants by applying the rapidness calculation function which we already introduced in [13] as the differential computing function (DCF), and we reflect it to predict rapidness of Saitama (ITD) for the second half situation. The concept of the experiment 2 is shown in the Figure 7. For the other time series context were also set by the expert from an epidemiological laboratory in Japan for public health data analysis to express their desired context for this comparison. Figure 7.: Concept of the experiment2 to predict the timing to reach over 90% ratio of a spreading (increasing) variant while switching with another variant. The top left shows context 1 by applying London data (IRD) and target data (CRD), and the top right shows context 2 by applying United Kingdom data (IRD) and target data (CRD). The bottom shows time series prediction data of Saitama (ITD). Source: own. A. Uraki, Y. Kiyoki, K. Murakami, A. Kano: A Context-based Time Series Analysis and Prediction Method for Public Health Data 29. Input data and time series context 1 for experiment 2: to predict the timing to reach over 90% ratio of a spreading (increasing) variant while switching with another variant in Saitama by reflecting London (closer feature is population density) for prediction Input time series data for reference (IRD) = Covid-19 number of confirmed Alpha and Delta variant cases out of all confirmed variant cases in London (already switched the majority of the two variants) which is published by government of United Kingdom [15]. Population density of London is 4761 people/square kilometers. Input time series data for prediction (ITD) = Covid-19 number of confirmed Alpha and Delta variant cases out of all confirmed variant cases in Saitama (still in the half time point of switching majority from Alpha variant to Delta variant) which is collected by hospital and testing facility in Saitama. Population density of Saitama is 6127 people/square kilometers. Granularity setting on time axis (GS) Original granularity (OG) = weekly Target granularity (TG) = weekly Feature extraction method (FEM) = ratio of confirmed each variant case out of all confirmed variant cases Time-window setting (TWS) = period during the ratio of IRD and ITD between the min-max ratio from 10% and 90% Differential computing function (DCF) = area size calculation function [13], ratio calculation function of area-a size between IRD and ITD, switching rapidness coefficient table between two variants Pivot setting on ITD (PV) = crossing point (time point of switching the majority of two variant cases) on ITD The input time series data for reference (IRD) of London and the time series input data for prediction (ITD) of Saitama is shown in Figure 8. 30 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Figure 8: The left chart shows time series input data for reference (IRD) of London. The right chart shows time series input data for prediction (ITD) of Saitama. This data is timeseries ratio of confirmed patient of each variant of Covid-19 which expresses situation of majority of variant of each area. London data shows situation is that this area is already switched the majority of the two variants. Saitama data shows situation that this area is stil in the half time point of switching majority from Alpha variant to Delta variant. Source: own. Determined target data for analysis (CRD) for context 1 of the experiment 2 is shown in Figure 9. The size of the area A, B, C and D express rapidness of the switching and the smal er size shows quicker switching and larger size shows slower switching. Figure 9: Target data for analysis (CRD) of London for experiment 2 by applying 5 elements of context which expressing rapidness of switching two major variants. The size of the area A, B, C and D express rapidness of the switching and the smal er size shows quicker switching and larger size shows slower switching. Source: own. A. Uraki, Y. Kiyoki, K. Murakami, A. Kano: A Context-based Time Series Analysis and Prediction Method for Public Health Data 31. Input data and time series context 2 for experiment 2: to predict the timing to reach over 90% ratio of a spreading (increasing) variant while switching with another variant in Sweden by reflecting United Kingdom (population density is not closer) for prediction Input time series data for reference (IRD) = Covid-19 number of confirmed Alpha and Delta variant cases out of all confirmed variant cases in United Kingdom (already switched the majority of the two variants) which is published by government of United Kingdom [15]. Population density of United Kingdom is 257 people/square kilometers (not closer). Input time series data for prediction (ITD) = Covid-19 number of confirmed Alpha and Delta variant cases out of all confirmed variant cases in Saitama (still in the half time point of switching majority from Alpha variant to Delta variant) which is col ected by hospital and testing facility in Saitama. Population density of Saitama is 6127 people/square kilometers (not closer). Granularity setting on time axis (GS) Original granularity (OG) = weekly Target granularity (TG) = weekly Feature extraction method (FEM) = ratio of confirmed each variant case out of al confirmed variant cases Time-window setting (TWS) = period during the ratio of IRD and ITD between the min-max ratio from 10% and 90% Differential computing function (DCF) = area size calculation function [13], ratio calculation function of area-a size between IRD and ITD, switching rapidness coefficient table between two variants Pivot setting on ITD (PV) = crossing point (time point of switching the majority of two variant cases) on ITD The input time series data for reference (IRD) of United Kingdom and the time series input data for prediction (ITD) of Saitama is shown in Figure 10. 32 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Figure 10: The left chart shows time series input data for reference (IRD) of United Kingdom. The right chart shows time series input data for prediction (ITD) of Saitama. This data is time-series ratio of confirmed patient of each variant of Covid-19 which expresses situation of majority of variant of each area. United Kingdom data shows situation is that this area is already switched the majority of the two variants. Saitama data shows situation that this area is stil in the half time point of switching majority from Alpha variant to Delta variant. Source: own. Determined target data for analysis (CRD) for context 1 of the experiment 2 is shown in Figure 11. The size of the area A, B, C and D express rapidness of the switching and the smal er size shows quicker switching and larger size shows slower switching. The area size analysis of the input time series data for prediction (ITD) of Saitama is shown in Figure 12. Figure 11: Target data for analysis (CRD) of Al UK for experiment 2 by applying 5 elements of context which expressing rapidness of switching two major variants. The size of the area A, B, C and D express rapidness of the switching and the smal er size shows quicker switching and larger size shows slower switching. Source: own. A. Uraki, Y. Kiyoki, K. Murakami, A. Kano: A Context-based Time Series Analysis and Prediction Method for Public Health Data 33. Figure 12: Area size analysis of the input time series data for prediction (ITD) of Saitama. The area size of A expresses switching force of Delta variant against Alpha variant. The area size of C expresses endurance force of Alpha variant against Delta variant. This area size is differential computing result and it reflects feature of the Saitama data for the prediction in the next step processing. Source: own. Output data and comparison between context 1 and 2 of experiment 2: By applying the London data and United Kingdom data of rapidness ratio between area A, B, C and D, we can get the prediction result of the Saitama after the crossing point as area size and the timing of the data over 90% ratio. The output prediction data (OPD) of the experiment 2 is shown in Figure 13 for both context 1 and 2. Results of the experiment 2 shows following discussions. − Prediction feasibility of our method in the field of public health data with the multiparametric input data − Realized quantitative comparison between different time series context on different places which have different environmental feature − Effectiveness for discussion regarding − switching the setting of 5 elements with field specific condition − processing different kind of granularity on time axis 34 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 − to reflect better settings of time series context to the other prediction quantitatively − Applicability of field specific function of public health data analysis to our presenting method Figure 13: Result of Experiment-2 as the output prediction value (OPD) of Saitama. The bottom left values show prediction result by applying London data, and the bottom right values show prediction result by applying UK data. The top right values show the actual area size and week which later issued for Saitama. By comparing those results, the prediction results by applying CRD of London is much closer than CRD of UK to the actual data. Source: own. 4 Conclusion We have presented a context-based time series analysis and prediction method for public health data. The most essential point of our approach is to express a basis of context as the combination of the following 5 elements (1: granularity setting on time axis, 2: feature extraction method, 3: time-window setting, 4: differential computing function, and 5: pivot setting) to determine target data as the semantic discrete value according to the context of analysis for public health data. As our experiment, we realized analysis and prediction by applying public health data. As our future work, we will design appropriate evaluation in this field to express the essence of our method, we wil apply our method not only for prediction but also A. Uraki, Y. Kiyoki, K. Murakami, A. Kano: A Context-based Time Series Analysis and Prediction Method for Public Health Data 35. for datamining/analysis/search, and we wil extend our method and the system to realize mutual understanding and knowledge sharing on global human-health issues in the world-wide scope. Acknowledgement We are grateful to Dr. Shiori Sasaki for essential and helpful discussion of this study. References [1] Yasushi Kiyoki, Xing Chen, “A Semantic Associative Computation Method for Automatic Decorative-Multimedia Creation with “Kansei” Information” (Invited Paper), The Sixth Asia-Pacific Conferences on Conceptual Modelling (APCCM 2009), 9 pages, January 20-23, 2009. [2] Yasushi Kiyoki, Xing Chen, Shiori Sasaki, Chawan Koopipat, “A Global y-Integrated Environmental Analysis and Visualization System with Multi-Spectral & Semantic Computing in “Multi-Dimensional World Map””, Information Modelling and Knowledge Bases XXVIII, pp.106-122,2017 [3] Yasushi Kiyoki and Saeko Ishihara: “A Semantic Search Space Integration Method for Meta-level Knowledge Acquisition from Heterogeneous Databases,” Information Modeling and Knowledge Bases (IOS Press), Vol. 14, pp.86-103, May 2002. [4] Yasushi Kiyoki, Shiori Sasaki, Nhung Nguyen Trang, Nguyen Thi Ngoc Diep, "Cross-cultural Multimedia Computing with Impression-based Semantic Spaces," Conceptual Model ing and Its Theoretical Foundations, Lecture Notes in Computer Science, Springer, pp.316-328, March 2012. [5] Yasushi Kiyoki: “A “Kansei: Multimedia Computing System for Environmental Analysis and Cross-Cultural Communication,” 7th IEEE International Conference on Semantic Computing, keynote speech, Sept. 2013. [6] Shiori Sasaki, Yusuke Takahashi, Yasushi Kiyoki: “The 4D World Map System with Semantic and Spatiotemporal Analyzers,” Information Modelling and Knowledge Bases, Vol.XXI, IOS Press, 18 pages, 2010. [7] Totok Suhardijanto, Yasushi Kiyoki, Ali Ridho Barakbah: “A Term-based Cross-Cultural Computing System for Cultural Semantics Analysis with Phonological-Semantic Vector Spaces,” Information Modelling and Knowledge Bases XXIII, pp.20-38, IOS Press, 2012. [8] Chalisa Veesommai, Yasushi Kiyoki, Shiori Sasaki and Petchporn Chawakitchareon, "Wide-Area River-Water Quality Analysis and Visualization with 5D World Map System", Information Modelling and Knowledge Bases, Vol. XXVII, pp.31-41, 2016. [9] Chalisa Veesommai, Yasushi Kiyoki, “Spatial Dynamics of The Global Water Quality Analysis System with Semantic-Ordering Functions”. Information Modelling and Knowledge Bases, Vol. XXIX, 2018. [10] Yasushi Kiyoki, Asako Uraki, Chalisa Veesommai, “A Seawater-Quality Analysis Semantic-Space in Hawaii-Islands with Multi-Dimensional World Map System”, 18th International Electronics Symposium (IES2016), Bali, Indonesia, September 29-30, 2016. [11] Shiori Sasaki and Yasushi Kiyoki, "Real-time Sensing, Processing and Actuation Functions of 5D World Map System: A Col aborative Knowledge Sharing System for Environmental Analysis", Information Modelling and Knowledge Bases, Vol. XXVIII, IOS Press, pp. 220-239, May 2016. [12] Shiori Sasaki, Koji Murakami, Yasushi Kiyoki, Asako Uraki: “Global & Geographical Mapping and Visualization Method for Personal/Collective Health Data with 5D World Map System,” Information Modelling and Knowledge Bases (IOS Press), Vol. XXXII, pp. 134 – 149, 2020. 36 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 [13] Yasushi Kiyoki, Koji Murakami, Shiori Sasaki, Asako Uraki, “Human-Health-Analysis Semantic Computing & 5D World Map System” Information Modelling and Knowledge Bases (IOS Press), Vol. XXXIII, pp. 141 – 151, 2022. [14] A. Ijichi and Y. Kiyoki: “A Kansei Metadata Generation Method for Music Data Dealing with Dramatic Interpretation ”Information Modeling and Knowledge Bases, IOS Press, Vol.XVI, pp.170-182,(2004). [15] UK Health Security Agency. Variant of Concern Technical Briefing 23. Available at: https://www.gov.uk/government/publications/investigation-of-sars-cov-2-variants-technical-briefings [16] ECDC European Centre for Disease Prevention and Control, Data on the daily number of new reported COVID-19 cases and deaths by EU/EEA country, Available at: https://www.ecdc.europa.eu/en/publications-data/data-daily-new-cases-covid-19-eueea-country TOWARDS A DEFINITION OF A RESPONSIBLE ARTIFICIAL INTELLIGENCE SABRINA GÖLLNER,1 MARINA TROPMANN-FRICK,1 BOŠTJAN BRUMEN2 1 Hamburg University of Applied Sciences, Department of Computer Science, Hamburg, Germany. sabrina.goel ner@haw-hamburg.de, marina.tropmann-frick@haw-hamburg.de 2 University of Maribor, Faculty of Electrical Engineering and Computer Science, Maribor, Slovenia. bostjan.brumen@um.si Our research aims to contribute to the concept of responsible artificial intelligence (AI), a topic under significant discussion in EU politics, further emphasized by recent EU publications. Primarily, AI, while beneficial, can be a potential weapon, necessitating responsible use and prevention against misuse or misalignment. In recognizing the critical role of AI research in aiding legislators and machine learning practitioners, our work aims to help prepare for future AI advancements. To the best of our knowledge, we establish the first unified definition of Keywords: responsible AI. As part of a structured literature review, we structured clarify the current state of the art in the context of responsible literature review, AI. Based on the knowledge of the analysis part we also have artificial intelligence, discussed an approach for developing a future framework for responsible AI, responsible AI. The results demonstrate that responsible AI privacy-preserving should be a human-centered approach, encompassing ethical AI, explainable AI, considerations, explainability of models, privacy, security, and ethical AI, trust. trustworthy AI DOI https://doi.org/10.18690/um.feri.5.2023.2 ISBN 978-961-286-745-4 38 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 1 Introduction Over the years, significant research has been conducted to enhance Artificial Intel igence (AI), which is already widely used in various life and industry sectors. In 2020 and 2021, the European Commission published a series of papers [1,2,3] outlining their strategy for AI. The white paper "A European Approach to Excellence and Trust" from 2020 outlines political strategies to encourage the use of AI while reducing the potential risks associated with certain applications of this technology. This proposal aims to establish a legal framework for trustworthy AI in Europe so that the second objective of building an ecosystem for trust can be implemented. The framework should fully respect the values and rights of EU citizens. It is repeatedly emphasized that AI should be human-centered and that European values have a high priority. The papers also address chal enging issues such as ethical issues, privacy, explainability, safety, and sustainability. It is pointed out how important security is in the context of AI, and they also present a risk framework in five risk groups for AI systems in short form. The document authors recognize that ”[EU] Member States are pointing at the current absence of a common European framework.” This indicates that a common EU framework is missing, and it is an important political issue. The document ”Communication on Fostering a European Approach to AI“ represents a plan of the EU Commission, where numerous efforts are presented that are intended to advance AI in the EU or have already been undertaken. In the beginning, it is stated that the EU wants to promote the development of ”human - centric, sustainable, secure, inclusive and trustworthy artificial intel igence (AI) [which] depends on the ability of the European Union“. The Commission’s goal is to ensure that excel ence in the field of AI is promoted. Collaborations with stakeholders, building research capacity, environment for developers, and funding opportunities are talked about as well as bringing AI into the play for climate and environment. Part of the discussion on trust led to the question of how to create innovation. It was pointed out that the EU approach should be ”human-centered, risk-based, proportionate, and dynamic.“ The plan also says they want to develop ”cutting-edge, ethical and secure AI, (and) promoting a human-centric approach in the global context“. At the end of the document, there is an important statement: ”The revised plan, therefore, provides a valuable opportunity to strengthen competitiveness, the capacity for S. Gollner, M. Tropmann-Frick, B. Brumen: Towards a Definition of a Responsible Artificial Intelligence 39. innovation, and the responsible use of AI in the EU“. The EC has also published the ”Proposal for a Regulation laying down harmonized rules on artificial intelligence“ which contains, for example, a list of prohibited AI practices and specific regulations for AI systems that pose a high risk to health and safety as wel as some transparency requirements. It becomes noticeable that terms in the mentioned political documents that are used to describe the goal of trustworthy AI, however, keep changing (are inconsistent), and remain largely undefined. The documents al reflect, on the one hand, the benefits and on the other hand the risks of AI from a political perspective. It becomes clear that AI can improve our lives, solves problems in many ways, and is bringing added value but also can be a deadly weapon. But on the other hand, the papers do not exactly define what trustworthy AI even means in concrete terms. Topics and subtopics are somehow addressed but there is no clear definition of (excel ence and) trustworthiness, but more indirectly mentions some aspects which are important, e.g., ethical values, transparency, risks for safety as wel as sustainability goals. Furthermore, we believe that trust as a goal (as defined vaguely in the documents) is also not sufficient to deploy AI. Rather, we need approaches for ”responsible AI”, which reflect the EU values. This should of course also be trustworthy, but that concept covers just a part of the responsibility. Therefore, in this paper, our goal is to find out the state-of-the-art from the scientific perspective and whether there is a general definition for ”trustworthy AI”. Furthermore, we want to clarify whether or not there is a definition for ”responsible AI”. The latter should actually be at the core of the political focus if we want to go towards ”excel ence“ in AI. As a step towards responsible AI, we conduct a structured literature review that aims to provide a clear answer to what it means to develop responsible AI. During our initial analysis, we found that there is a lot of inconsistency in the terminology overall, not only in the political texts. There is also a lot of overlap in the definitions and principles for responsible AI. In addition, similar/content-wise similar expressions exist that further complicate the understanding of responsible AI as a whole. There are already many approaches in the analyzed fields, namely trustworthy, ethical, explainable, privacy-preserving, and secure AI, but there are stil 40 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 many open problems that need to be addressed in the future. Best to our knowledge this is the first detailed and structured review regarding responsible AI. The paper is organized as follows: First, we explain our research methodology, including our research aims and objectives, and the databases and research queries we used for searching. Next, we analyze the existing definitions for responsible AI in the literature, along with related expressions and their definitions. We compare these definitions to determine the essence of responsible AI. We then summarize our key findings within the previously defined scopes of responsible AI, conducting both qualitative and quantitative analyses. In the discussion section, we outline the key points and pillars for developing responsible AI. Finally, we conclude by mentioning the limitations of our work and discussing future research. 2 Research Methodology In order to address the research questions, we conducted a systematic literature review (SLR) using the guidelines outlined in [4]. The process of performing the structured literature review for our study is explained in detail in the following subsections. 2.1 Research Aims and Objectives Our research focuses on exploring the different aspects of "Responsible AI" including privacy, explainability, trust, and ethics. Our objectives are to define the term "responsible AI", examine the current state of research in this field, and identify areas that require further investigation. Ultimately, we aim to uncover any chal enges, opportunities, and open problems that exist in this area. In summary, we provide the following contributions: 1. Specify a concise Definition of ”Responsible AI” 2. Analyze the state of the art in the field of ”Responsible AI” S. Gollner, M. Tropmann-Frick, B. Brumen: Towards a Definition of a Responsible Artificial Intelligence 41. 2.2 Research Questions Formulation Based on the aims of the research, we state the following research questions: − RQ1: What is a general or agreed on definition of ”Responsible AI” and what are the associated terms defining it? − RQ2: What does ”Responsible AI” encompass? 2.3 Databases In order to get the best results when searching for the relevant studies, we used the indexing data sources. These sources enabled us a wide search of publications that would otherwise be overlooked. The fol owing databases were searched: − ACM Digital Library (ACM) − IEEE Explore (IEEE) − SpringerLink (SL) − Elsevier ScienceDirect (SD) The reason for selecting these databases was to limit our search to peer-reviewed research papers only. 2.4 Studies Selection To search for documents, the following search query was used in the different databases: ("Artificial Intelligence" OR "Machine Learning" OR "Deep Learning" OR "Neural Network" OR "AI" OR "ML") AND (Ethic* OR Explain* OR Trust*) AND (Privacy*). Considering that inconsistent terminology is used for ”Artificial Intel igence”, the terms ”Machine Learning”, ”Deep Learning” and ”Neural Network” were added, which should be considered synonyms. Because there are already many papers using the abbreviations AI and ML, these were included to the set of synonyms. 42 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 The phrases ”Ethic”, ”Trust” and ”Explain” as wel as ”Privacy” was included with an asterisk (*), for all combinations of the terms following the asterisk, are included in the results (e.g. explain*ability). The search strings were combined using the Boolean operator OR for inclusiveness and the operator AND for the intersection of al sets of search strings. These sets of search strings were put within parentheses. To ensure that al state-of-the-art papers were included, the search was limited to a three-year period from 2020 to 2022, with the search conducted in December 2022. The search results were sorted by relevance to eliminate non-relevant papers, as some search engines lack advanced options. During the screening stage, the authors followed specific guidelines to exclude irrelevant papers. Papers did not pass the screening if: 1. They mention AI in the context of cyber-security, embedded systems, robotics, autonomous driving or internet of things, or alike. 2. They are not related to the defined terms of responsible AI. 3. They belong to general AI studies. 4. They only consist of an abstract. 5. They are published as posters. These defined guidelines were used to greatly decrease the number of ful -text papers to be evaluated in subsequent stages, al owing the examiners to focus only on potentially relevant papers. The initial search produced 10.313 papers of which 4.121 were retrieved from ACM, 1064 from IEEE, 1.487 from Elsevier Science Direct, and 3.641 from Springer Link. The screening using the title, abstract, and keywords removed 6.507 papers. During the check of the remaining papers for eligibility, we excluded 77 irrelevant studies and 9 inaccessible papers. We ended up with 254 papers that we included for the qualitative and quantitative analysis (see Figure 1). S. Gollner, M. Tropmann-Frick, B. Brumen: Towards a Definition of a Responsible Artificial Intelligence 43. Figure 1: Structured review flow chart: the Preferred Reporting Items for Systematic Reviews and Meta– Analyses (PRISMA) flow chart detailing the records identified and screened, the number of ful -text articles retrieved and assessed for eligibility, and the number of studies included in the review. Source: own. 3 Analysis In this section, we analyze existing definitions of “responsible AI” in literature. We also examine content-wise-similar expressions and their definitions, comparing and searching for any overlaps. As a result, we extract the essence of the analysis to formulate our definition of responsible AI. 3.1 Responsible AI In this subsection, we answer the first research question: What is a general or agreed on definition of ’Responsible AI’, and what are the associated terms defining it? 44 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 3.1.1 Terms defining Responsible AI Upon careful examination of 254 papers, it was found that a mere 5 of them specifical y address the definition of "responsible" AI. The papers use the fol owing terms in connection with ’responsible AI’: − Fairness, Privacy, Accountability, Transparency and Soundness [5] − Fairness, Privacy, Accountability, Transparency, Ethics, Security & Safety [6] − Fairness, Privacy, Accountability, Transparency, Explainability [7] − Fairness, Accountability, Transparency, and Explainability [8] − Fairness, Privacy, Sustainability, Inclusiveness, Safety, Social Good, Dignity, Performance, Accountability, Transparency, Human Autonomy, Solidarity [9] However, after reading al 254 analyzed papers we strongly believe, that the terms that are included in those definitions can be mostly treated as subterms or ambiguous terms. − ’Fairness’[5] and ’Accountability’ [5,6,7], as wel as the terms ’Inclusiveness, Sustainability, Social Good, Dignity, Human Autonomy, Solidarity’ [9] according to our definition, are subterms of Ethics. − ’Soundness’[5], interpreted as ’Reliability’ or ’Stability’, is included within Security and Safety. − Transparency [5,6,7] is often used as a synonym for explainability in the whole literature. Therefore we summarize these terms of the above definitions to: ”Ethics, Trustworthiness, Security, Privacy, and Explainability”. However, only the terms alone are not enough to get a picture of responsible AI. Therefore, we wil analyze and discuss what the meaning of the five terms ”Ethics, Trustworthiness, Security, Privacy, and Explainability” in the context of AI is, and how they depend on each other. During the analysis, we found also content-wise similar expressions to the concept of ”responsible AI” which we want to include in the findings. This topic will be dealt with in the next section. S. Gollner, M. Tropmann-Frick, B. Brumen: Towards a Definition of a Responsible Artificial Intelligence 45. 3.1.2 Content-wise similar expressions for Responsible AI Our analysis uncovered that the terms "Responsible AI," "Ethical AI," and "Trustworthy AI" are frequently utilized interchangeably. Furthermore, we determined that "Human-Centered AI" holds a similar significance. Therefore, we treat the terms: − ”Trustworthy AI”, found in [10,11,12,13,14,15,16], and [17] as cited in [18] − ”Ethical AI”, found in [19,20,21,22,23], and [24] as cited in [25] − ”Human-Centered AI”, found in [26] as cited in [23] as the content-wise similar expressions for ”Responsible AI” hereinafter. 3.2 Collection of definitions The resulting collection of definitions from ’responsible AI’ and ’content-wise similar expressions for responsible AI’ from the papers results in the following Venn diagram: We compared the definitions in the Venn diagram and determine the following findings: − From all four sets there is an overlap of 24% of the terms: Explainability, Safety, Fairness, Accountability, Ethics, Security Privacy, Transparency. − The terms occurring in the set of the definition for ’trust’ only occurred in these, which is why this makes up the second largest set in the diagram. This is since most of the terms actually come from definitions for trustworthy AI. − There are also 6 null sets. To tie in with the summary from the previous section, it should be pointed out once again that the terms ’Explainability, Safety, Fairness, Accountability, Ethics, Security Privacy, Transparency’ can be grouped into generic terms as follows: Ethics, Security, Privacy, and Explainability. 46 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Set Terms Solidarity, Performance, A Sustainability, Soundness, Inclusiveness B - C - Equality, Usability, Accuracy under Uncertainty, D Assessment, Reliability, Data Control, Data Minimization Reproducibility, Generalization User Acceptance E Social Good F Human-Centered, Human Control, Human Agency G - H Autonomy, Non-Maleficience, Trust I - J Human Values, Non-Discrimination K - L Compliant with Rules and Laws, Social Robustness M Human Autonomy, Dignity N - Explainability, Safety, Fairness, O Accountability, Ethics, Security Privacy, Transparency Figure 2: Venn diagram Source: own. We also strongly claim that ’trust/trustworthiness’ should be seen as an outcome of a responsible AI system, and therefore we determine, that it belongs to the set of requirements. And each responsible AI should be built in a ’human-centered’ manner, which makes it therefore another important subterm. On top of these findings, we specify our definition of Responsible AI in order to answer the first research question: DEFINITION OF RESPONSIBLE AI Responsible AI is human-centered and ensures users’ trust through ethical ways of decision making. The decision-making must be fair, accountable, not biased, with good intentions, non-discriminating, and consistent with societal laws and norms. Responsible AI ensures, that automated decisions are explainable to users while always preserving users privacy through a secure implementation. Figure 3: Definition of responsible AI Source: own. S. Gollner, M. Tropmann-Frick, B. Brumen: Towards a Definition of a Responsible Artificial Intelligence 47. As mentioned in the sections before, the terms defining ”responsible AI” result from the analysis of the terms in sections 3.1.1 and 3.1.2. We presented a figure depicting the overlapping of the terms of content-wise similar expressions of Responsible AI, namely ”Ethical AI, Trustworthy AI, and Human-Centered AI”, and extracted the main terms of it. Also by summarizing the terms Fairness and Accountability into Ethics, and clarifying the synonyms (e.g., explainability instead of transparency), we final y redefined the terms defining ”responsible AI” as ”Human-centered, Trustworthy, Ethical, Explainable, Privacy(-preserving) and Secure AI” . 3.3 Aspects of Responsible AI After analyzing the literature, we have identified six categories related to responsible AI in section 3. These categories are Human-centered, Trustworthy, Ethical, Explainable, Privacy-preserving, and Secure AI. Adhering to these categories wil ensure the responsible development and use of AI. To answer the second research question (RQ2), we analyze the state-of-the-art of topics ”Trustworthy, Ethical, Explainable, Privacy-preserving and Secure AI” in the following subsections. We have decided to deal with the topic of ’Human-Centered AI’ in a separate paper so as not to go beyond the scope of this work. To find out the state of the art of the mentioned topics in AI, all 254 papers were assigned to one of the categories ”Trustworthy AI, Ethical AI, Explainable AI, Privacy-preserving AI, and Secure AI”, based on the prevailing content of the paper compared to each of the topics. The detailed analysis of these papers is beyond the scope of the present work and wil be presented in our future work. Nevertheless, we highlight their most important features in the following subsections. 3.3.1 Trustworthy AI A concise statement for trust in AI is as follows: ”Trust is an attitude that an agent will behave as expected and can be relied upon to reach its goal. Trust breaks down after an error or a misunderstanding between the agent and the trusting individual. The psychological state of trust in AI is an emergent property of a complex system, usual y involving many cycles of design, training, deployment, measurement of performance, regulation, redesign, and retraining.” [27] 48 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 In summary, Trustworthy AI aims to provide the benefits of AI while addressing scenarios that have significant implications for people and society. To be accepted in society, it is crucial for AI applications to prioritize trust as a key goal and make every effort to maintain and measure it throughout al stages of development. Despite this importance, achieving trustworthy AI remains a significant chal enge as it has not yet been comprehensively addressed. 3.3.2 Ethical AI In this section, we wil outline the discoveries made in the realm of ethical AI. The most fitting explanation of ethics in relation to AI is the one provided in source [28]: ” AI ethics is the at empt to guide human conduct in the design and use of artificial automata or artificial machines, aka computers, in particular, by rational y formulating and fol owing principles or rules that reflect our basic individual and social commitments and our leading ideals and values [28].” During our analysis, we noticed that Ethical AI deals often with fairness. Fair AI can be understood as ” AI systems [which] should not lead to any kind of discrimination against individuals or col ectives in relation to race, religion, gender, sexual orientation, disability, ethnicity, origin or any other personal condition. Thus, fundamental criteria to consider while optimizing the results of an AI system is not only their outputs in terms of error optimization but also how the system deals with those groups. ”[6] In any case, the development of ethical artificial intelligence should be also subject to proper oversight within the framework of robust laws and regulations. It is also stated, that transparency is widely considered also as one of the central AI ethical principles [29]. In the state-of-the-art overview of [30] the authors deal with the relations between explanation and AI fairness and examine, that fair decision-making requires extensive contextual understanding, and AI explanations help identify potential variables that are driving the unfair outcomes. Mostly, transparency and explainability are achieved using so-cal ed explainability (XAI) methods. Therefore, it is discussed separately in the fol owing subsection. S. Gollner, M. Tropmann-Frick, B. Brumen: Towards a Definition of a Responsible Artificial Intelligence 49. 3.3.3 Explainable AI The choices made by AI systems or humans utilizing AI can greatly affect the welfare, liberties, and prospects of those influenced by those choices. That's why the issue of AI explainability is a crucial ethical concern. This subsection deals with the analysis of the literature in the field of explainable AI (XAI). We found an interesting definition in [6] which is quite suitable for defining explainable AI: Given a certain audience, explainability refers to the details and reasons a model gives to make its functioning clear or easy to understand. [ 6 ] Numerous XAI techniques have been extensively discussed in literature. The authors of [6] as wel as [31] give a detailed overview of the known techniques and their strengths and weaknesses, therefore we wil only cover this topic in short. First, the models can be distinguished into two different approaches to XAI, the intrinsical y transparent models and the Post-hoc explainability target models that are not readily interpretable by design. These so-cal ed ”black-box models” are the more problematic ones, because they are way more difficult to understand. The post-hoc explainability methods can then be distinguished further into model-specific and model-agnostic techniques. We can also distinguish general y between data-dependent and data-independent mechanisms for gaining interpretability as well as global and local interpretability methods. The general public needs more transparency about how ML/AI systems can fail and what is at stake if they fail. Ideal y, they should clearly communicate the outcomes and focus on the downsides to help people think about the trade-offs and risks of different choices (for example, the costs associated with different outcomes). But in addition to the general public also Data Scientists and ML Practitioners represent another key stakeholder group. In the study by [32] the effectiveness and interpretability of two existing tools were investigated; the results indicate that data scientists over-trust and misuse interpretability tools. 50 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 There is a “right to explanation” in the context of AI systems that directly affect individuals through their decisions, especially in legal and financial terms, which is one of the themes of the General Data Protection Regulation (GDPR) [33,34]. Therefore, we need to protect data through secure and privacy-preserving AI-methods, which are analyzed in the following section. 3.3.4 Privacy-preserving and Secure AI As previously mentioned, trust in AI is dependent on privacy and security. However, the success of ML models relies heavily on data, including sensitive information. This has resulted in increasing worries about privacy violations, such as the unlawful use and exposure of private data [35,36]. To ensure complete privacy protection, we require holistic methods that consider the usage of data and user activities and transactions. [37]. Privacy-preserving and Secure AI methods can help mitigate those risks. We define ”Secure AI” as protecting data from malicious threats, which means protecting personal data from any unauthorized third-party access or malicious attacks and exploitation of data. It is set up to protect personal data using different methods and techniques to ensure data privacy. Data privacy is about using data responsibly. This means proper handling, processing, storage, and usage of personal information. It is all about the rights of individuals with respect to their personal information. Therefore, data security is a prerequisite for data privacy. Although the AI field is undergoing extensive research into privacy and security, achieving flawless privacy preservation and security in AI is currently not possible. Nonetheless, several chal enges require addressing to further advance in this area. S. Gollner, M. Tropmann-Frick, B. Brumen: Towards a Definition of a Responsible Artificial Intelligence 51. Table 1: Quantitative Analysis Feature of a study Representation Percentage Sources Trustworthy AI (28/254, 11% ) * Reviews and Surveys 9/28 32% [11,17,38,13,39,14,40,41,42] Perceptions of trust 4/28 14% [43,44,45,27] Frameworks 9/28 32% [26,46,47,48,49,15,50,51,52] Miscel aneous 6/28 28% [53,54,55,56,16,57] Ethical AI (85/254,34%) * Frameworks 19/85 22% [35,58,59,7,20,60,29,24,61,62] [63,64,65,66,67,68,69,70,71] Ethical issues 22/85 26% [72,20,73,74,75,76,77,78] [79,80,81,28,82,36,83,84] [85,86,87,88,89,90] Miscel aneous 33/85 39% [91,19,92,93,94,95,96,22,21,97,98] [99,100,101,102,9,103,104] [105,106,107,108,109,110,111] [112,113,114,115,116,117,118,8] Reviews and Surveys 10/85 12% [119,120,121,122,123,124,125,126,127,30] Tools 1/85 1% [128] Explainable AI (46/254 , 18%) * Reviews and Surveys 10/46 22% [6,31,33,12,129,34] [130,131,132,133] Stakeholders 7/46 15% [134,135,136,137] [32,138,139] XAI Approaches 14/46 30% [140,5,141,142,143,144] [145,146,147,148,149,150,151,152] Frameworks 4/46 9% [153,154,155,156] Miscel aneous 11/46 24% [157,158,159,160,161] [162,163,164,165,166,167] Privacy-preserving and Secure AI (95/254 , 38%) * Reviews and Surveys 10/95 10% [168,169,170,171,172,37] [173,174,175,176] Differential Privacy 12/95 13% [177,178,179,180,181,182] [183,184,185,186,187,188] Secure Multi-Party Computation 2/95 2% [189,190] Homomorphic Encryption 4/95 4% [142,191,192,193] Federated learning 35/95 37% [194,195,196,197,198,199,200,201] [202,203,204,205,206] [207,208,209,210,211,212,213,214,215] [216,217,218,219,220,221,222] [223,224,225,226,227,228,229] Hybrid Approaches 8/95 xx% [230,231,232,233,234,235,236,237] Security Threats 7/95 8% [238,239,240,241,242,243,244] Miscel aneous 16/95 17% [245,246,247,248,249,250,251,252,253,254] [255,256,257,258,259,260] *percentage does not add up to 100 due to rounding. 52 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Within the topic ”Privacy-Preserving and Secure AI”, most papers belong to ”Federated learning”, obviously being a very emerging research field in the time frame. There were also many different papers that were not assigned to any specific category (see ”Miscel aneous)” since the topic is very multifaceted. In the topic area of ”Ethical AI”, the most common category was ’Miscellaneous’, since the authors of the ethical AI field handle very different topics. In addition, second most of them could be assigned to the category ’ethical issues’ since this is a hot topic in the field of ethics. The rest of the papers dealt with ethical frameworks that try to integrate ethical AI in the context of a development process.Most studies in the field of XAI deal with coming up with new XAI approaches to solve different explainability problems with new AI models. There were also a few that presented stakeholder analyses specifically in the context of the explainability of AI models. Few of them presented miscel aneous topics that could not be assigned to any specific category or framework to integrate explainable AI. In Trustworthy AI, we saw that most presented a review or survey on the current state of Trustworthy AI in research. There were also papers that presented frameworks especial y for trustworthiness or papers that reported on how Trust is perceived and described by different users. 4 Discussion Several key points have emerged from the analysis. It has become clear that AI wil have an ever-increasing impact on our daily lives, from delivery robots to e-health, smart nutrition and digital assistants, and the list is growing every day. AI should be viewed as a tool, not a system that has infinite control over everything. It should therefore not replace humans or make them useless, nor should it lead to humans no longer using their own intel igence and only letting AI decide. We need a system that we can truly cal ”responsible” AI. The analysis has clearly shown that the elements of ethics, privacy, security and explainability are the true pillars of responsible AI, which should lead to a basis of trust. S. Gollner, M. Tropmann-Frick, B. Brumen: Towards a Definition of a Responsible Artificial Intelligence 53. 4.1 Pillars of Responsible AI Here we highlight the most important criteria that a responsible AI should fulfil. These are also the points that a developer should consider if she wants to develop responsible AI. Therefore, they also form the pillars for the future framework. Key-requirements for the Ethical AI are as follows: − fair: non-biased and non-discriminating in every way, − accountability: justifying the decisions and actions, − sustainable: built with long-term consequences in mind, satisfying the Sustainable Development Goals, − compliant: with robust laws and regulations. Key-requirements for the privacy and security techniques are identified as follows: − need to comply with regulations: HIPAA, COPPA, and more recently the GDPR (like, for example, the Federated Learning), − need to be complemented by proper organizational processes, − must be used depending on tasks to be executed on the data and on specific transactions a user is executing, − use hybrid PPML-approaches because they can take advantage of each component, providing an optimal trade-off between ML task performance and privacy overhead, − use techniques that reduce communication and computational cost (especially in distributed approaches). Key-requirements for Explainable AI are the following: − Human-Centered: the user interaction plays a important role and how he understands and interacts with the system, − Explanations must be tailored to the user needs and target group − Intuitive User interface/experience: the results need to be presented in a understandable visual language, 54 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 − Explainable is also feature to say how wel the system does its work (non functional requirement), − Impact of explanations on decision making process, − Key-Perceptions of trustworthy AI are as fol ows: − ensure user data is protected, − probabilistic accuracy under uncertainty, − provides an understandable, transparent, explainable reasoning process to the user, − usability, − act ”as intended” when facing a given problem, − perception as fair and useful, − reliability. Therefore, we define Responsible AI as an interdisciplinary and dynamic process: it goes beyond technology and includes laws (compliance and regulations) and society standards such as ethics guidelines and the Sustainable Development Goals. Figure 3 shows that on the one hand there are social/ethical requirements/pillars and on the other hand the technical requirements/pil ars. Al of them are dependent on each other. If the technical and ethical side is satisfied the user trust is maintained. Trust can be seen as the perception of the users of AI. Each pillar of ethics includes "sub-modules" such as accountability, fairness, sustainability, and compliance. These are essential to ensure that AI meets ethical standards. Furthermore, the explainability methods must value privacy, meaning they must not have that much access to a model so that it results in a privacy breach. Privacy is dependent on security because security is a prerequisite for it. Every "responsible system" requires humans to care for it. These individuals must handle the system responsibly, conducting maintenance work and regularly checking metrics to ensure that their responsibilities are fulfilled. To achieve this, special S. Gollner, M. Tropmann-Frick, B. Brumen: Towards a Definition of a Responsible Artificial Intelligence 55. metrics are used as a continuous check. This makes responsible AI a joint effort between the system-side and the developer-side. Figure 4: Pillars of the Responsible AI framework Source: own. In section 3.3, the concept of Human-Centered AI is highlighted as a crucial aspect of responsible AI. It is closely linked to the "Human-in-the-loop" approach, which emphasizes the importance of human involvement in the development and use of AI. This approach al ows for the detection and correction of errors and retraining of the system throughout its lifespan, ensuring that AI is designed and utilized for the benefit of humans. Therefore, responsible AI is interdisciplinary, and it is not static but it is a dynamic process that needs to be taken care of in the whole system lifecycle. 56 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 4.2 Trade-offs To fulfill all aspects comes with tradeoffs as discussed for example in [16] and comes for example at the cost of data privacy. For example, the methods that make the model more robust against attacks or methods that try to explain a model's behavior and could leak some information. Managing AI systems that are accurate, fair, private, robust, and explainable simultaneously is a challenging task. To begin, we suggest creating a benchmark for each requirement, which wil determine the extent to which each requirement is met. 5 Research Limitations Our study aims to provide a thorough and detailed analysis of the available literature on responsible AI from various journals. However, we encountered limitations in accessing some journals that were not freely available despite extensive access provided by our institutions. Despite our best efforts, accessibility remained an issue. It is also possible that some relevant research publications were not included in the databases we used for our search. Furthermore, our study only included the most recent state-of-the-art research, which may have caused us to miss out on some older but still relevant developments. Another limitation of the presented work is the missing in-depth analysis of the papers reviewed. Due to paper length constraints, we have omitted a detailed overview of each of the reviewed papers’ contributions in each of the subsections of section 3.3. 6 Conclusion The field of AI is rapidly evolving and a legal framework is necessary to ensure responsible practices. However, the terms "trustworthy AI" and "responsible AI" lack clear definitions, making it difficult to establish efficient regulations. Instead of focusing solely on trust, regulations for responsible AI must be defined. As a leading authority in setting standards, such as the GDPR, the EU should be informed and prepared for upcoming research and legal regulations. This research provides an important contribution to the concept of responsible AI, being the first to address it comprehensively through a structured literature review and presenting an S. Gollner, M. Tropmann-Frick, B. Brumen: Towards a Definition of a Responsible Artificial Intelligence 57. overarching definition. The review analyzed 254 recent high-quality works on the topic, and included a qualitative analysis of the papers covered. We have defined the concept of "responsible AI" and conducted a thorough analysis of its key components. These components include human-centered design, trustworthy development, ethical considerations, explainability, privacy preservation, and security. By prioritizing these aspects, we can ensure the responsible development and use of AI products, and establish legal frameworks to regulate their use. In the discussion section, we propose a framework for responsible AI based on the insights gained from our analysis. In future research, we plan to analyze individual papers to determine their contributions to responsible AI, and explore topics such as human-centered AI and "human-in-the-loop" approaches. We also aim to develop benchmarking methods for responsible AI and establish a holistic framework to guide responsible AI development. References A complete list of 260 references is available at https://drive.google.com/file/ d/1Fm-9hKkrY_YAzS02TWec2L3lIqgPSmqm/view?usp=sharing, or by scanning the QR code below. Figure 4: QR Code with the list of references Source: own. [1] European Commission. White Paper on Artificial Intelligence A European approach to excel ence and trust. European Commission,.; 2020. Available from: https://digital-strategy.ec.europa.eu/en/library/communication-fostering-european-approach-artificial-intelligence. [2] European Commission. Coordinated Plan on Artificial Intelligence 2021 Review. European Commission.; 2021. Available from: https://digital-strategy.ec.europa.eu/en/library/ coordinated-plan-artificial-intelligence-2021-review. [3] Commission E. Proposal for a REGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL LAYING DOWN HARMONISED RULES ON ARTIFICIAL INTELLIGENCE (ARTIFICIAL INTELLIGENCE ACT) AND AMENDING CERTAIN UNION LEGISLATIVE ACTS. European Commission.; 2021. Available from: https://eur-lex.europa.eu/legal- content/EN/TXT/?qid=1623335154975&uri=CELEX%3A52021PC0206. 58 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 [4] Kitchenham B, Brereton OP, Budgen D, Turne M, Bailey J, Linkman S. Systematic literature reviews in software engineering – A systematic literature review. Information and Software Technology. 2009;51:7-15. [5] Maree C, Modal JE, Omlin CW. Towards Responsible AI for Financial Transactions. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI); 2020. p. 16-21. [6] Alejandro Barredo Arrieta, Natalia D´ıaz-Rodr´ıguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and chal enges toward responsible AI. Information Fusion. 2020;58:82-115. Available from: https://www.sciencedirect.com/science/article/pii/S1566253519308103. [7] Eitel-Porter R. Beyond the promise: implementing ethical AI. AI and Ethics. 2021;1(1):73-80. [8] Werder K, Ramesh B, Zhang RS. Establishing Data Provenance for Responsible Artificial Intelligence Systems. ACM Transactions on Management Information Systems. 2022 Jun;13(2):1-23. Available from: https://dl.acm.org/doi/10.1145/3503488. [9] Jakesch M, Buc¸inca Z, Amershi S, Olteanu A. How Different Groups Prioritize Ethical Values for Responsible AI. In: 2022 ACM Conference on Fairness, Accountability, and Transparency. Seoul Republic of Korea: ACM; 2022. p. 310-23. Available from: https://dl.acm.org/doi/10.1145/3531146.3533097. [10] level expert group on artificial intel igence H. Ethics guidelines for trustworthy AI e. European Commission.; 2019. Available from: https://digital-strategy.ec.europa.eu/en/policies/expert-group-ai. [11] Jain S, Luthra M, Sharma S, Fatima M. Trustworthiness of Artificial Intelligence. In: 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS); 2020. p. 907-12. [12] Sheth A, Gaur M, Roy K, Faldu K. Knowledge-Intensive Language Understanding for Explainable AI. IEEE Internet Computing. 2021;25(5):19-24. [13] Wing JM. Trustworthy AI. Commun ACM. 2021;64(10):64-71. [14] Zhang T, Qin Y, Li Q. Trusted Artificial Intel igence: Technique Requirements and Best Practices. In: 2021 International Conference on Cyberworlds (CW); 2021. p. 303-6. ISSN: 2642-3596. [15] Li B, Qi P, Liu B, Di S, Liu J, Pei J, et al. Trustworthy AI: From Principles to Practices. ACM Computing Surveys. 2022 Aug:3555803. Available from: https://dl.acm.org/doi/10.1145/ 3555803. [16] Strobel M, Shokri R. Data Privacy and Trustworthy Machine Learning. IEEE Security & Privacy. 2022 Sep;20(5):44-9. Available from: https://ieeexplore.ieee.org/document/9802763/. [17] Kumar A, Braud T, Tarkoma S, Hui P. Trustworthy AI in the Age of Pervasive Computing and Big Data. In: 2020 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops); 2020. p. 1-6. [18] Floridi L, Taddeo M. What is data ethics? Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences. 2016 12;374:20160360. [19] Hickok M. Lessons learned from AI ethics principles for future actions. AI and Ethics. 2021;1(1):41-7. [20] Loi M, Heitz C, Christen M. A Comparative Assessment and Synthesis of Twenty Ethics Codes on AI and Big Data. In: 2020 7th Swiss Conference on Data Science (SDS); 2020. p. 41-6. [21] Morley J, Elhalal A, Garcia F, Kinsey L, Mökander J, Floridi L. Ethics as a Service: A Pragmatic Operationalisation of AI Ethics. Minds and Machines. 2021. [22] Ibánez JC, Olmeda MV. Operationalising AI ethics: how are companies bridging the gap between practice and principles? An exploratory study. AI & SOCIETY. 2021. [23] Fjeld J, Achten N, Hilligoss H, Nagy A, Srikumar M. Principled artificial intelligence: Mapping consensus in ethical and rights-based approaches to principles for AI. Berkman Klein Center Research Publication. 2020;(2020-1). S. Gollner, M. Tropmann-Frick, B. Brumen: Towards a Definition of a Responsible Artificial Intelligence 59. [24] Milossi M, Alexandropoulou-Egyptiadou E, Psannis KE. AI Ethics: Algorithmic Determinism or Self-Determination? The GPDR Approach. IEEE Access. 2021;9:58455-66. [25] Floridi L, Cowls J, Beltrametti M, Chatila R, Chazerand P, Dignum V, et al. AI4People—an ethical framework for a good AI society: opportunities, risks, principles, and recommendations. Minds and Machines. 2018;28(4):689-707. [26] Shneiderman B. Bridging the Gap Between Ethics and Practice: Guidelines for Reliable, Safe, and Trustworthy Human-Centered AI Systems. ACM Trans Interact Intel Syst. 2020;10(4). [27] Middleton SE, Letouzé E, Hossaini A, Chapman A. Trust, regulation, and human-in-the-loop AI: within the European region. Communications of the ACM. 2022 Apr;65(4):64-8. Available from: https://dl.acm.org/doi/10.1145/3511597. [28] Hanna R, Kazim E. Philosophical foundations for digital ethics and AI Ethics: a dignitarian approach. AI and Ethics. 2021. [29] Ville Vakkuri, Kai-Kristian Kemel , Marianna Jantunen, Erika Halme, Pekka Abrahamsson. ECCOLA — A method for implementing ethical y aligned AI systems. Journal of Systems and Software. 2021;182:111067. Available from: https://www.sciencedirect.com/science/article/pii/ S0164121221001643. [30] Zhou J, Chen F, Holzinger A. Towards Explainability for AI Fairness. In: Holzinger A, Goebel R, Fong R, Moon T, Mu¨l er KR, Samek W, editors. xx AI - Beyond Explainable AI. vol. 13200. Cham: Springer International Publishing; 2022. p. 375-86. Series Title: Lecture Notes in Computer Science. Available from: https://link.springer.com/10.1007/978-3-031-04083-2_18. [31] Burkart N, Huber MF. A Survey on the Explainability of Supervised Machine Learning. J Artif Int Res. 2021;70:245-317. [32] Kaur H, Nori H, Jenkins S, Caruana R, Wal ach H, Wortman Vaughan J. Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. CHI ’20. New York, NY, USA: Association for Computing Machinery; 2020. p. 1-14. [33] Choraś M, Pawlicki M, Puchalski D, Kozik R. Machine Learning – The Results Are Not the only Thing that Matters! What About Security, Explainability and Fairness? In: Krzhizhanovskaya VV, Za´vodszky G, Lees MH, Dongarra JJ, Sloot PMA, Brissos S, et al., editors. Computational Science – ICCS 2020. vol. 12140. Cham: Springer International Publishing; 2020. p. 615-28. [34] Vellido A. The importance of interpretability and visualization in machine learning for applications in medicine and health care. Neural Computing and Applications. 2020;32(24):18069-83. [35] Cheng L, Varshney KR, Liu H. Social y Responsible AI Algorithms: Issues, Purposes, and Challenges. J Artif Int Res. 2021;71:1137-81. [36] Abolfazlian K. Trustworthy AI Needs Unbiased Dictators! In: Maglogiannis I, Iliadis L, Pimenidis E, editors. Artificial Intel igence Applications and Innovations. Cham: Springer International Publishing; 2020. p. 15-23. [37] Bertino E. Privacy in the Era of 5G, IoT, Big Data and Machine Learning. In: 2020 Second IEEE International Conference on Trust, Privacy and Security in Intel igent Systems and Applications (TPS- ISA); 2020. p. 134-7. [38] Singh R, Vatsa M, Ratha N. Trustworthy AI. In: 8th ACM IKDD CODS and 26th COMAD. CODS COMAD 2021. New York, NY, USA: Association for Computing Machinery; 2021. p. 449-53. [39] Beckert B. The European way of doing Artificial Intel igence: The state of play implementing Trustworthy AI. In: 2021 60th FITCE Communication Days Congress for ICT Professionals: Industrial Data – Cloud, Low Latency and Privacy (FITCE); 2021. p. 1-8. [40] Kaur D, Uslu S, Rittichier KJ, Durresi A. Trustworthy Artificial Intel igence: A Review. ACM Computing Surveys. 2023 Mar;55(2):1-38. Available from: https://dl.acm.org/doi/10.1145/3491209. 60 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 [41] Yang G, Ye Q, Xia J. Unbox the black-box for the medical explainable AI via multi-modal and multicentre data fusion: A mini-review, two showcases and beyond. Information Fusion. 2022 Jan;77:29-52. Available from: https://linkinghub.elsevier.com/retrieve/pii/S1566253521001597. [42] Gittens A, Yener B, Yung M. An Adversarial Perspective on Accuracy, Robustness, Fairness, and Privacy: Multilateral-Tradeoffs in Trustworthy ML. IEEE Access. 2022:1-1. Available from: https://ieeexplore.ieee.org/document/9933776/. [43] Araujo T, Helberger N, Kruikemeier S, de Vreese CH. In AI we trust? Perceptions about automated decision-making by artificial intelligence. AI & SOCIETY. 2020;35(3):611-23. [44] Knowles B, Richards JT. The Sanction of Authority: Promoting Public Trust in AI. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. FAccT ’21. New York, NY, USA: Association for Computing Machinery; 2021. p. 262-71. [45] Lee MK, Rich K. Who Is Included in Human Perceptions of AI?: Trust and Perceived Fairness around Healthcare AI and Cultural Mistrust. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. New York, NY, USA: Association for Computing Machinery; 2021. [46] Toreini E, Aitken M, Coopamootoo K, El iott K, Zelaya CG, van Moorsel A. The Relationship between Trust in AI and Trustworthy Machine Learning Technologies. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. FAT* ’20. New York, NY, USA: Association for Computing Machinery; 2020. p. 272-83. [47] Wang J, Moulden A. AI Trust Score: A User-Centered Approach to Building, Designing, and Measuring the Success of Intel igent Workplace Features. In: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. CHI EA ’21. New York, NY, USA: Association for Computing Machinery; 2021. [48] Liao QV, Sundar SS. Designing for Responsible Trust in AI Systems: A Communication Perspective. In: 2022 ACM Conference on Fairness, Accountability, and Transparency. Seoul Republic of Korea: ACM; 2022. p. 1257-68. Available from: https://dl.acm.org/doi/10.1145/3531146. 3533182. [49] Seshia SA, Sadigh D, Sastry SS. Toward verified artificial intelligence. Communications of the ACM. 2022 Jul;65(7):46-55. Available from: https://dl.acm.org/doi/10.1145/3503914. [50] Banerjee S, Alsop P, Jones L, Cardinal RN. Patient and public involvement to build trust in artificial intelligence: A framework, tools, and case studies. Patterns. 2022 Jun;3(6):100506. Available from: https://linkinghub.elsevier.com/retrieve/pii/S2666389922000988. [51] Thuraisingham B. Trustworthy Machine Learning. IEEE Intelligent Systems. 2022 Jan;37(1):21-4. Available from: https://ieeexplore.ieee.org/document/9756264/. [52] Choung H, David P, Ross A. Trust and ethics in AI. AI & SOCIETY. 2022 May. Available from: https://link.springer.com/10.1007/s00146-022-01473-4. [53] Jacovi A, Marasović A, Miller T, Goldberg Y. Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. FAccT ’21. New York, NY, USA: Association for Computing Machinery; 2021. p. 624-35. [54] Peng Hu, Yaobin Lu, Yeming (Yale) Gong. Dual humanness and trust in conversational AI: A person-centered approach. Computers in Human Behavior. 2021;119:106727. Available from: https:// www.sciencedirect.com/science/article/pii/S0747563221000492. [55] Holzinger A, Dehmer M, Emmert-Streib F, Cucchiara R, Augenstein I, Ser JD, et al. Information fusion as an integrative cross-cutting enabler to achieve robust, explainable, and trustworthy medical artificial intelligence. Information Fusion. 2022 Mar;79:263-78. Available from: https://linkinghub.elsevier.com/retrieve/pii/S1566253521002050. [56] Allahabadi H, Amann J, Balot I, Beretta A, Binkley C, Bozenhard J, et al. Assessing Trustworthy AI in times of COVID-19. Deep Learning for predicting a multi-regional score conveying the degree of lung compromise in COVID-19 patients. IEEE. 2022:32. S. Gollner, M. Tropmann-Frick, B. Brumen: Towards a Definition of a Responsible Artificial Intelligence 61. [57] Utomo S, John A, Rouniyar A, Hsu HC, Hsiung PA. Federated Trustworthy AI Architecture for Smart Cities. In: 2022 IEEE International Smart Cities Conference (ISC2). Pafos, Cyprus: IEEE; 2022. p. 1-7. Available from: https://ieeexplore.ieee.org/document/9922069/. [58] Benjamins R. A choices framework for the responsible use of AI. AI and Ethics. 2021;1(1):49-53. [59] Bourgais A, Ibnouhsein I. Ethics-by-design: the next frontier of industrialization. AI and Ethics. 2021. [60] Peters D, Vold K, Robinson D, Calvo RA. Responsible AI—Two Frameworks for Ethical Design Practice. IEEE Transactions on Technology and Society. 2020;1(1):34-47. [61] Contractor D, McDuff D, Haines JK, Lee J, Hines C, Hecht B, et al. Behavioral Use Licensing for Responsible AI. In: 2022 ACM Conference on Fairness, Accountability, and Transparency. Seoul Republic of Korea: ACM; 2022. p. 778-88. Available from: https://dl.acm.org/doi/10.1145/3531146.3533143. [62] Joisten K, Thiemer N, Renner T, Janssen A, Scheffler A. Focusing on the Ethical Chal enges of Data Breaches and Applications. In: 2022 IEEE International Conference on Assured Autonomy (ICAA). Fajardo, PR, USA: IEEE; 2022. p. 74-82. Available from: https://ieeexplore.ieee. org/document/9763591/. [63] Bruschi D, Diomede N. A framework for assessing AI ethics with applications to cybersecurity. AI and Ethics. 2022 May. Available from: https://link.springer.com/10.1007/s43681022-00162-8. [64] Vyhmeister E, Castane G, Östberg PO, Thevenin S. A responsible AI framework: pipeline contextualisation. AI and Ethics. 2022 Apr. Available from: https://link.springer.com/10.1007/s43681-022-00154-8. [65] Belenguer L. AI bias: exploring discriminatory algorithmic decision-making models and the application of possible machine-centric solutions adapted from the pharmaceutical industry. AI and Ethics. 2022 Feb. Available from: https://link.springer.com/10.1007/s43681-022-00138-8. [66] Svetlova E. AI ethics and systemic risks in finance. AI and Ethics. 2022 Nov;2(4):713-25. Available from: https://link.springer.com/10.1007/s43681-021-00129-1. [67] Li J, Chignell M. FMEA-AI: AI fairness impact assessment using failure mode and effects analysis. AI and Ethics. 2022 Mar. Available from: https://link.springer.com/10.1007/ s43681-022-00145-9. [68] Georgieva I, Lazo C, Timan T, van Veenstra AF. From AI ethics principles to data science practice: a reflection and a gap analysis based on recent frameworks and practical experience. AI and Ethics. 2022 Jan. Available from: https://link.springer.com/10.1007/s43681-021-00127-3. [69] Kumar S, Choudhury S. Normative ethics, human rights, and artificial intelligence. AI and Ethics. 2022 May. Available from: https://link.springer.com/10.1007/s43681-022-00170-8. [70] Solanki P, Grundy J, Hussain W. Operationalising ethics in artificial intelligence for healthcare: a framework for AI developers. AI and Ethics. 2022 Jul. Available from: https://link.springer. com/10.1007/s43681-022-00195-z. [71] Krijger J, Thuis T, de Ruiter M, Ligthart E, Broekman I. The AI ethics maturity model: a holistic approach to advancing ethical data science in organizations. AI and Ethics. 2022 Oct. Available from: https://link.springer.com/10.1007/s43681-022-00228-7. [72] Ayling J, Chapman A. Putting AI ethics to work: are the tools fit for purpose? AI and Ethics. 2021. [73] Maclure J. AI, Explainability and Public Reason: The Argument from the Limitations of the Human Mind. Minds and Machines. 2021;31(3):421-38. [74] Gambelin O. Brave: what it means to be an AI Ethicist. AI and Ethics. 2021;1(1):87-91. [75] Xiaoling P. Discussion on Ethical Dilemma Caused by Artificial Intel igence and Countermeasures. In: 2021 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC); 2021. p. 453-7. 62 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 [76] Gil KS. Ethical dilemmas // Ethical dilemmas: Ned Ludd and the ethical machine. AI & SOCIETY. 2021;36(3):669-76. [77] Stahl BC. Ethical Issues of AI. In: Stahl BC, editor. Artificial Intel igence for a Better Future: An Ecosystem Perspective on the Ethics of AI and Emerging Digital Technologies. Cham: Springer International Publishing; 2021. p. 35-53. [78] Mulligan C, Elaluf-Calderwood S. AI ethics: A framework for measuring embodied carbon in AI systems. AI and Ethics. 2022 Aug;2(3):363-75. Available from: https://link.springer.com/ 10.1007/s43681-021-00071-2. [79] Rochel J, Evéquoz F. Getting into the engine room: a blueprint to investigate the shadowy steps of AI ethics. AI & SOCIETY. 2020. [80] Charles D Raab. Information privacy, impact assessment, and the place of ethics*. Computer Law & Security Review. 2020;37:105404. Available from: https://www.sciencedirect.com/science/article/pii/S0267364920300091. [81] Stahl BC, Antoniou J, Ryan M, Macnish K, Jiya T. Organisational responses to the ethical issues of artificial intelligence. AI & SOCIETY. 2021. [82] Sætra HS, Coeckelbergh M, Danaher J. The AI ethicist’s dilemma: fighting Big Tech by supporting Big Tech. AI and Ethics. 2021 Dec. Available from: https://doi.org/10.1007/s43681-021-00123-7. [83] Petrozzino C. Who pays for ethical debt in AI? AI and Ethics. 2021. [84] Weinberg L. Rethinking Fairness: An Interdisciplinary Survey of Critiques of Hegemonic ML Fairness Approaches. Journal of Artificial Intelligence Research. 2022 May;74:75-109. Available from: https://jair.org/index.php/jair/article/view/13196. [85] Cooper AF, Moss E, Laufer B, Nissenbaum H. Accountability in an Algorithmic Society: Relationality, Responsibility, and Robustness in Machine Learning. In: 2022 ACM Conference on Fairness, Accountability, and Transparency. Seoul Republic of Korea: ACM; 2022. p. 864-76. Available from: https://dl.acm.org/doi/10.1145/3531146.3533150. [86] Vakkuri V, Kemel KK, Tolvanen J, Jantunen M, Halme E, Abrahamsson P. How Do Software Companies Deal with Artificial Intelligence Ethics? A Gap Analysis. In: The International Conference on Evaluation and Assessment in Software Engineering 2022. Gothenburg Sweden: ACM; 2022. p. 100-9. Available from: https://dl.acm.org/doi/10.1145/3530019.3530030. [87] Wal er RR, Wal er RL. Assembled Bias: Beyond Transparent Algorithmic Bias. Minds and Machines. 2022 Sep;32(3):533-62. Available from: https://link.springer.com/10.1007/ s11023-022-09605-x. [88] Hagendorff T. Blind spots in AI ethics. AI and Ethics. 2022 Nov;2(4):851-67. Available from: https://link.springer.com/10.1007/s43681-021-00122-8. [89] Bickley SJ, Torgler B. Cognitive architectures for artificial intel igence ethics. AI & SOCIETY. 2022 Jun. Available from: https://link.springer.com/10.1007/s00146-022-01452-9. [90] Munn L. The uselessness of AI ethics. AI and Ethics. 2022 Aug. Available from: https://link. springer.com/10.1007/s43681-022-00209-w. [91] Hagendorff T. The Ethics of AI Ethics: An Evaluation of Guidelines. Minds and Machines. 2020;30(1):99-120. [92] Kiemde SMA, Kora AD. Towards an ethics of AI in Africa: rule of education. AI and Ethics. 2021. [93] Zhou J, Chen F, Berry A, Reed M, Zhang S, Savage S. A Survey on Ethical Principles of AI and Implementations. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI); 2020. p. 3010-7. [94] Prunkl C, Whittlestone J. Beyond Near- and Long-Term: Towards a Clearer Account of Research Priorities in AI Ethics and Society. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. AIES ’20. New York, NY, USA: Association for Computing Machinery; 2020. p. 138-43. S. Gollner, M. Tropmann-Frick, B. Brumen: Towards a Definition of a Responsible Artificial Intelligence 63. [95] Zhang B, Anderljung M, Kahn L, Dreksler N, Horowitz MC, Dafoe A. Ethics and Governance of Artificial Intel igence: Evidence from a Survey of Machine Learning Researchers. J Artif Int Res. 2021;71:591-666. [96] Forbes K. Opening the path to ethics in artificial intelligence. AI and Ethics. 2021. [97] Tartaglione E, Grangetto M. A non-Discriminatory Approach to Ethical Deep Learning. In: 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom); 2020. p. 943-50. [98] Forsyth S, Dalton B, Foster EH, Walsh B, Smilack J, Yeh T. Imagine a More Ethical AI: Using Stories to Develop Teens’ Awareness and Understanding of Artificial Intel igence and its Societal Impacts. In: 2021 Conference on Research in Equitable and Sustained Participation in Engineering, Computing, and Technology (RESPECT); 2021. p. 1-2. [99] Madaio M, Egede L, Subramonyam H, Wortman Vaughan J, Wal ach H. Assessing the Fairness of AI Systems: AI Practitioners’ Processes, Chal enges, and Needs for Support. Proceedings of the ACM on Human-Computer Interaction. 2022 Mar;6(CSCW1):1-26. Available from: https://dl.acm.org/doi/10.1145/3512899. [100] Tolmeijer S, Christen M, Kandul S, Kneer M, Bernstein A. Capable but Amoral? Comparing AI and Human Expert Col aboration in Ethical Decision Making. In: CHI Conference on Human Factors in Computing Systems. New Orleans LA USA: ACM; 2022. p. 1-17. Available from: https://dl.acm.org/doi/10.1145/3491102.3517732. [101] Boyd K. Designing Up with Value-Sensitive Design: Building a Field Guide for Ethical ML Development. In: 2022 ACM Conference on Fairness, Accountability, and Transparency. Seoul Republic of Korea: ACM; 2022. p. 2069-82. Available from: https://dl.acm.org/doi/10.1145/3531146. 3534626. [102] Chien I, Deliu N, Turner R, Wel er A, Vil ar S, Kilbertus N. Multi-disciplinary fairness considerations in machine learning for clinical trials. In: 2022 ACM Conference on Fairness, Accountability, and Transparency. Seoul Republic of Korea: ACM; 2022. p. 906-24. Available from: https://dl.acm.org/doi/10.1145/3531146.3533154. [103] Lu Q, Zhu L, Xu X, Whittle J, Douglas D, Sanderson C. Software engineering for responsible AI: an empirical study and operationalised patterns. In: Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice. Pittsburgh Pennsylvania: ACM; 2022. p. 241-2. Available from: https://dl.acm.org/doi/10.1145/3510457.3513063. [104] Rubeis G. iHealth: The ethics of artificial intel igence and big data in mental healthcare. Internet Interventions. 2022 Apr;28:100518. Available from: https://linkinghub.elsevier.com/retrieve/pii/S2214782922000252. [105] Valentine L, D’Alfonso S, Lederman R. Recommender systems for mental health apps: advantages and ethical challenges. AI & SOCIETY. 2022 Jan. Available from: https://link.springer.com/10.1007/s00146-021-01322-w. [106] Persson E, Hedlund M. The future of AI in our hands? To what extent are we as individuals moral y responsible for guiding the development of AI in a desirable direction? AI and Ethics. 2022 Nov;2(4):683-95. Available from: https://link.springer.com/10.1007/s43681-021-00125-5. [107] Nakao Y, Stumpf S, Ahmed S, Naseer A, Strappel i L. Toward Involving End-users in Interactive Human-in-the-loop AI Fairness. ACM Transactions on Interactive Intelligent Systems. 2022 Sep;12(3):1-30. Available from: https://dl.acm.org/doi/10.1145/3514258. [108] Fabris A, Messina S, Silvel o G, Susto GA. Algorithmic fairness datasets: the story so far. Data Mining and Knowledge Discovery. 2022 Sep. Available from: https://link.springer.com/10.1007/s10618-022-00854-z. [109] Bélisle-Pipon JC. Artificial intelligence ethics has a black box problem. AI and Society. 2022:16. [110] Haüßermann JJ, Lütge C. Community-in-the-loop: towards pluralistic value creation in AI, or—why AI needs business ethics. AI and Ethics. 2022 May;2(2):341-62. Available from: https://link. springer.com/10.1007/s43681-021-00047-2. 64 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 [111] Fung P, Etienne H. Confucius, cyberpunk and Mr. Science: comparing AI ethics principles between China and the EU. AI and Ethics. 2022 Jun. Available from: https://link.springer.com/10.1007/s43681-022-00180-6. [112] Starke G, Schmidt B, De Clercq E, Elger BS. Explainability as fig leaf? An exploration of experts’ ethical expectations towards machine learning in psychiatry. AI and Ethics. 2022 Jun. Available from: https://link.springer.com/10.1007/s43681-022-00177-1. [113] Stahl BC. From computer ethics and the ethics of AI towards an ethics of digital ecosystems. AI and Ethics. 2022 Feb;2(1):65-77. Available from: https://link.springer.com/10.1007/ s43681-021-00080-1. [114] Brusseau J. From the ground truth up: doing AI ethics from practice to principles. AI & SOCIETY. 2022 Jan. Available from: https://link.springer.com/10.1007/s00146-021-01336-4. [115] Anderson MM, Fort K. From the ground up: developing a practical ethical methodology for integrating AI into industry. AI & SOCIETY. 2022 Jul. Available from: https://link.springer.com/10.1007/s00146-022-01531-x. [116] Ramanayake R. Immune moral models? Pro-social rule breaking as a moral enhancement approach for ethical AI. AI & SOCIETY. 2022:13. [117] Hunkenschroer AL, Kriebitz A. Is AI recruiting (un)ethical? A human rights perspective on the use of AI for hiring. AI and Ethics. 2022 Jul. Available from: https://link.springer.com/10.1007/s43681-022-00166-4. [118] Jacobs M, Simon J. Reexamining computer ethics in light of AI systems and AI regulation. AI and Ethics. 2022 Oct. Available from: https://link.springer.com/10.1007/s43681-022-00229-6. [119] Stahl BC, Rodrigues R, Santiago N, Macnish K. A European Agency for Artificial Intel igence: Protecting fundamental rights and ethical values. Computer Law & Security Review. 2022 Jul;45:105661. Available from: https://linkinghub.elsevier.com/retrieve/pii/S0267364922000097. [120] Rasheed K, Qayyum A, Ghaly M, Al-Fuqaha A, Razi A, Qadir J. Explainable, trustworthy, and ethical machine learning for healthcare: A survey. Computers in Biology and Medicine. 2022 Oct;149:106043. Available from: https://linkinghub.elsevier.com/retrieve/pii/S0010482522007569. [121] Huang C, Zhang Z, Mao B, Yao X. An Overview of Artificial Intel igence Ethics. IEEE Transactions on Artificial Intelligence. 2022:1-21. Available from: https://ieeexplore.ieee.org/document/9844014/. [122] Lin H, Zhang Y, Chen X, Zhai R, Kuai Z. Artificial Intelligence Ethical in Environmental Protection. In: 2022 International Seminar on Computer Science and Engineering Technology (SCSET). Indianapolis, IN, USA: IEEE; 2022. p. 137-40. Available from: https://ieeexplore.ieee.org/document/9700880/. [123] Petersen E, Potdevin Y, Mohammadi E, Zidowitz S, Breyer S, Nowotka D, et al. Responsible and Regulatory Conform Machine Learning for Medicine: A Survey of Chal enges and Solutions. IEEE Access. 2022;10:58375-418. Available from: https://ieeexplore.ieee.org/document/9783196/. [124] Benefo EO, Tingler A, White M, Cover J, Torres L, Broussard C, et al. Ethical, legal, social, and economic (ELSE) implications of artificial intel igence at a global level: a scientometrics approach. AI and Ethics. 2022 Jan. Available from: https://link.springer.com/10.1007/ s43681-021-00124-6. [125] Karimian G, Petelos E, Evers SMAA. The ethical issues of the application of artificial intel igence in healthcare: a systematic scoping review. AI and Ethics. 2022 Mar. Available from: https://link. springer.com/10.1007/s43681-021-00131-7. [126] Attard-Frost B. The ethics of AI business practices: a review of 47 AI ethics guidelines. AI and Ethics. 2022:18. [127] Tsamados A, Aggarwal N, Cowls J, Morley J, Roberts H, Taddeo M, et al. The ethics of algorithms: key problems and solutions. AI & SOCIETY. 2022 Mar;37(1):215-30. Available from: https:// link.springer.com/10.1007/s00146-021-01154-8. S. Gollner, M. Tropmann-Frick, B. Brumen: Towards a Definition of a Responsible Artificial Intelligence 65. [128] Wang A, Liu A, Zhang R, Kleiman A, Kim L, Zhao D, et al. REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets. International Journal of Computer Vision. 2022 Jul;130(7):1790- 810. Available from: https://link.springer.com/10.1007/s11263-022-01625-5. [129] Sun L, Li Z, Zhang Y, Liu Y, Lou S, Zhou Z. Capturing the Trends, Applications, Issues, and Potential Strategies of Designing Transparent AI Agents. In: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. CHI EA ’21. New York, NY, USA: Association for Computing Machinery; 2021. [130] Giulia Vilone, Luca Longo. Notions of explainability and evaluation approaches for explainable artificial intelligence. Information Fusion. 2021;76:89-106. Available from: https://www. sciencedirect.com/science/article/pii/S1566253521001093. [131] Saleem R, Yuan B, Kurugol u F, Anjum A, Liu L. Explaining deep neural networks: A survey on the global interpretation methods. Neurocomputing. 2022 Nov;513:165-80. Available from: https://linkinghub.elsevier.com/retrieve/pii/S0925231222012218. [132] Saraswat D, Bhattacharya P, Verma A, Prasad VK, Tanwar S, Sharma G, et al. Explainable AI for Healthcare 5.0: Opportunities and Challenges. IEEE Access. 2022;10:84486-517. Available from: https://ieeexplore.ieee.org/document/9852458/. [133] Minh D, Wang HX, Li YF, Nguyen TN. Explainable artificial intelligence: a comprehensive review. Artificial Intelligence Review. 2022 Jun;55(5):3503-68. Available from: https://link.springer.com/10.1007/s10462-021-10088-y. [134] Brennen A. What Do People Real y Want When They Say They Want Explainable AI? We Asked 60 Stakeholders. In: Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems. CHI EA ’20. New York, NY, USA: Association for Computing Machinery; 2020. p. 1-7. [135] Ehsan U, Liao QV, Muller M, Riedl MO, Weisz JD. Expanding Explainability: Towards Social Transparency in AI Systems. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. New York, NY, USA: Association for Computing Machinery; 2021. [136] Ehsan U, Wintersberger P, Liao QV, Mara M, Streit M, Wachter S, et al. Operationalizing Human-Centered Perspectives in Explainable AI. In: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. CHI EA ’21. New York, NY, USA: Association for Computing Machinery; 2021. [137] Jesus S, Belém C, Balayan V, Bento J, Saleiro P, Bizarro P, et al. How Can I Choose an Explainer? An Application-Grounded Evaluation of Post-Hoc Explanations. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. FAccT ’21. New York, NY, USA: Association for Computing Machinery; 2021. p. 805-15. [138] Suresh H, Gomez SR, Nam KK, Satyanarayan A. Beyond Expertise and Roles: A Framework to Characterize the Stakeholders of Interpretable Machine Learning and Their Needs. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. New York, NY, USA: Association for Computing Machinery; 2021. [139] Maltbie N, Niu N, van Doren M, Johnson R. XAI Tools in the Public Sector: A Case Study on Predicting Combined Sewer Overflows. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ESEC/FSE 2021. New York, NY, USA: Association for Computing Machinery; 2021. p. 1032-44. [140] Alexandre Heuillet, Fabien Couthouis, Natalia Díaz-Rodríguez. Explainability in deep reinforcement learning. Knowledge-Based Systems. 2021;214:106685. Available from: https://www.sciencedirect.com/science/article/pii/S0950705120308145. [141] Sokol K, Flach P. One Explanation Does Not Fit All. KI - Künstliche Intelligenz. 2020;34(2):235-50. [142] Yuan L, Shen G. A Training Scheme of Deep Neural Networks on Encrypted Data. In: Proceedings of the 2020 International Conference on Cyberspace Innovation of Advanced Technologies. CIAT 2020. New York, NY, USA: Association for Computing Machinery; 2020. p. 490-5. 66 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 [143] Zytek A, Liu D, Vaithianathan R, Veeramachaneni K. Sibyl: Explaining Machine Learning Models for High-Stakes Decision Making. In: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. CHI EA ’21. New York, NY, USA: Association for Computing Machinery; 2021. . [144] Zhang W, Dimiccoli M, Lim BY. Debiased-CAM to mitigate image perturbations with faithful visual explanations of machine learning. In: CHI Conference on Human Factors in Computing Systems. New Orleans LA USA: ACM; 2022. p. 1-32. Available from: https://dl.acm.org/doi/10.1145/3491102.3517522. [145] Golder A, Bhat A, Raychowdhury A. Exploration into the Explainability of Neural Network Models for Power Side-Channel Analysis. In: Proceedings of the Great Lakes Symposium on VLSI 2022. Irvine CA USA: ACM; 2022. p. 59-64. Available from: https://dl.acm.org/doi/10.1145/3526241.3530346. [146] Sun J, Liao QV, Muller M, Agarwal M, Houde S, Talamadupula K, et al. Investigating Explainability of Generative AI for Code through Scenario-based Design. In: 27th International Conference on Intelligent User Interfaces. Helsinki Finland: ACM; 2022. p. 212-28. Available from: https://dl.acm.org/doi/10.1145/3490099.3511119. [147] Terziyan V, Vitko O. Explainable AI for Industry 4.0: Semantic Representation of Deep Learning Models. Procedia Computer Science. 2022;200:216-26. Available from: https://linkinghub.elsevier.com/retrieve/pii/S1877050922002290. [148] Tiddi I, Schlobach S. Knowledge graphs as tools for explainable machine learning: A survey. Artificial Intelligence. 2022 Jan;302:103627. Available from: https://linkinghub.elsevier.com/ retrieve/pii/S0004370221001788. [149] Bacciu D, Numeroso D. Explaining Deep Graph Networks via Input Perturbation. IEEE Transactions on Neural Networks and Learning Systems. 2022:1-12. Available from: https://ieeexplore.ieee.org/document/9761788/. [150] Mery D, Morris B. On Black-Box Explanation for Face Verification. In: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Waikoloa, HI, USA: IEEE; 2022. p. 1194-203. Available from: https://ieeexplore.ieee.org/document/9706895/. [151] Haffar R, Sánchez D, Domingo-Ferrer J. Explaining predictions and attacks in federated learning via random forests. Applied Intelligence. 2022 Apr. Available from: https://link.springer.com/10.1007/s10489-022-03435-1. [152] Rožanec JM, Fortuna B, Mladenić D. Knowledge graph-based rich and confidentiality preserving Explainable Artificial Intelligence (XAI). Information Fusion. 2022 May;81:91-102. Available from: https://linkinghub.elsevier.com/retrieve/pii/S1566253521002414. [153] Mohseni S, Zarei N, Ragan ED. A Multidisciplinary Survey and Framework for Design and Evaluation of Explainable AI Systems. ACM Trans Interact Intell Syst. 2021;11(3–4). [154] Sharma S, Henderson J, Ghosh J. CERTIFAI: A Common Framework to Provide Explanations and Analyse the Fairness and Robustness of Black-Box Models. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. AIES ’20. New York, NY, USA: Association for Computing Machinery; 2020. p. 166-72. [155] Sokol K, Flach P. Explainability Fact Sheets: A Framework for Systematic Assessment of Explainable Approaches. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. FAT* ’20. New York, NY, USA: Association for Computing Machinery; 2020. p. 56-67. [156] Nazaretsky T, Cukurova M, Alexandron G. An Instrument for Measuring Teachers’ Trust in AI-Based Educational Technology. In: LAK22: 12th International Learning Analytics and Knowledge Conference. Online USA: ACM; 2022. p. 56-66. Available from: https://dl.acm.org/doi/10.1145/3506860.3506866. [157] Hailemariam Y, Yazdinejad A, Parizi RM, Srivastava G, Dehghantanha A. An Empirical Evaluation of AI Deep Explainable Tools. In: 2020 IEEE Globecom Workshops (GC Wkshps); 2020. p. 1-6. [158] Colaner N. Is explainable artificial intelligence intrinsically valuable? AI & SOCIETY. 2021. S. Gollner, M. Tropmann-Frick, B. Brumen: Towards a Definition of a Responsible Artificial Intelligence 67. [159] Patel N, Shokri R, Zick Y. Model Explanations with Differential Privacy. In: 2022 ACM Conference on Fairness, Accountability, and Transparency. Seoul Republic of Korea: ACM; 2022. p. 1895-904. Available from: https://dl.acm.org/doi/10.1145/3531146.3533235. [160] Tsiakas K, Murray-Rust D. Using human-in-the-loop and explainable AI to envisage new future work practices. In: The15th International Conference on PErvasive Technologies Related to Assistive Environments. Corfu Greece: ACM; 2022. p. 588-94. Available from: https://dl.acm.org/doi/10.1145/3529190.3534779. [161] Combi C, Amico B, Bel azzi R, Holzinger A, Moore JH, Zitnik M, et al. A manifesto on explainability for artificial intelligence in medicine. Artificial Intelligence in Medicine. 2022 Nov;133:102423. Available from: https://linkinghub.elsevier.com/retrieve/pii/S0933365722001750. [162] Watson M, Shiekh Hasan BA, Moubayed NA. Agree to Disagree: When Deep Learning Models With Identical Architectures Produce Distinct Explanations. In: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Waikoloa, HI, USA: IEEE; 2022. p. 1524-33. Available from: https://ieeexplore.ieee.org/document/9706847/. [163] Fel T, Vigouroux D, Cadene R, Serre T. How Good is your Explanation? Algorithmic Stability Measures to Assess the Quality of Explanations for Deep Neural Networks. In: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Waikoloa, HI, USA: IEEE; 2022. p. 1565-75. Available from: https://ieeexplore.ieee.org/document/9706798/. [164] Hu B, Vasu B, Hoogs A. X-MIR: EXplainable Medical Image Retrieval. In: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Waikoloa, HI, USA: IEEE; 2022. p. 1544-54. Available from: https://ieeexplore.ieee.org/document/9706900/. [165] Padovan PH, Martins CM, Reed C. Black is the new orange: how to determine AI liability. Artificial Intelligence and Law. 2022 Jan. Available from: https://link.springer.com/10.1007/ s10506-022-09308-9. [166] Ratti E, Graves M. Explainable machine learning practices: opening another black box for reliable medical AI. AI and Ethics. 2022 Feb. Available from: https://link.springer.com/10.1007/s43681-022-00141-z. [167] Storey VC, Lukyanenko R, Maass W, Parsons J. Explainable AI. Communications of the ACM. 2022 Apr;65(4):27-9. Available from: https://dl.acm.org/doi/10.1145/3490699. [168] Boulemtafes A, Derhab A, Chal al Y. A review of privacy-preserving techniques for deep learning. Neurocomputing. 2020;384:21-45. Available from: https://www.sciencedirect.com/science/article/pii/S0925231219316431. [169] Chen H, Hussain SU, Boemer F, Stapf E, Sadeghi AR, Koushanfar F, et al. Developing Privacy-preserving AI Systems: The Lessons learned. In: 2020 57th ACM/IEEE Design Automation Conference (DAC); 2020. p. 1-4. [170] Mercier D, Lucieri A, Munir M, Dengel A, Sheraz A. Evaluating Privacy-Preserving Machine Learning in Critical Infrastructures: A Case Study on Time-Series Classification. IEEE Transactions on Industrial Informatics. 2021:1-1. Conference Name: IEEE Transactions on Industrial Informatics. [171] Biswas S, Khare N, Agrawal P, Jain P. Machine learning concepts for correlated Big Data privacy. Journal of Big Data. 2021 Dec;8(1):157. Available from: https://doi.org/10.1186/ s40537-021-00530-x. [172] Chang H, Shokri R. On the Privacy Risks of Algorithmic Fairness. In: 2021 IEEE European Symposium on Security and Privacy (EuroS P); 2021. p. 292-303. [173] Sergey Zapechnikov. Privacy-Preserving Machine Learning as a Tool for Secure Personalized Information Services. Procedia Computer Science. 2020;169:393-9. Available from: https://www.sciencedirect.com/science/article/pii/S1877050920303598. [174] Liu B, Ding M, Shaham S, Rahayu W, Farokhi F, Lin Z. When Machine Learning Meets Privacy: A Survey and Outlook. ACM Computing Surveys. 2021;54(2). 68 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 [175] Sousa S, Kern R. How to keep text private? A systematic review of deep learning methods for privacy-preserving natural language processing. Artificial Intelligence Review. 2022 May. Available from: https://link.springer.com/10.1007/s10462-022-10204-6. [176] Zhang G, Liu B, Zhu T, Zhou A, Zhou W. Visual privacy attacks and defenses in deep learning: a survey. Artificial Intelligence Review. 2022 Aug;55(6):4347-401. Available from: https://link.springer.com/10.1007/s10462-021-10123-y. [177] Harikumar H, Rana S, Gupta S, Nguyen T, Kaimal R, Venkatesh S. Prescriptive analytics with differential privacy. International Journal of Data Science and Analytics. 2021. [178] Suriyakumar VM, Papernot N, Goldenberg A, Ghassemi M. Chasing Your Long Tails: Differentially Private Prediction in Health Care Settings. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. FAccT ’21. New York, NY, USA: Association for Computing Machinery; 2021. p. 723-34. [179] Zhu Y, Yu X, Chandraker M, Wang YX. Private-kNN: Practical Differential Privacy for Computer Vision. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020. p. 11851-9. [180] Guevara M, Desfontaines D, Waldo J, Coatta T. Differential Privacy: The Pursuit of Protections by Default. Commun ACM. 2021;64(2):36-43. [181] Ding X, Chen L, Zhou P, Jiang W, Jin H. Differentially Private Deep Learning with Iterative Gradient Descent Optimization. ACM/IMS Transactions on Data Science. 2021 Nov;2(4):1-27. Available from: https://dl.acm.org/doi/10.1145/3491254. [182] Alishahi M, Moghtadaiee V, Navidan H. Add noise to remove noise: Local differential privacy for feature selection. Computers & Security. 2022 Dec;123:102934. Available from: https://linkinghub.elsevier.com/retrieve/pii/S0167404822003261. [183] Lal AK, Karthikeyan S. Deep Learning Classification of Fetal Cardiotocography Data with Differential Privacy. In: 2022 International Conference on Connected Systems & Intel igence (CSI). Trivandrum, India: IEEE; 2022. p. 1-5. Available from: https://ieeexplore.ieee.org/document/9924087/. [184] Hassanpour A, Moradikia M, Yang B, Abdelhadi A, Busch C, Fierrez J. Differential Privacy Preservation in Robust Continual Learning. IEEE Access. 2022;10:24273-87. Available from: https://ieeexplore.ieee.org/document/9721905/. [185] Gupta R, Singh AK. A Differential Approach for Data and Classification Service-Based Privacy-Preserving Machine Learning Model in Cloud Environment. New Generation Computing. 2022 Jul. Available from: https://link.springer.com/10.1007/s00354-022-00185-z. [186] Liu J, Li X, Wei Q, Liu S, Liu Z, Wang J. A two-phase random forest with differential privacy. Applied Intelligence. 2022 Oct. Available from: https://link.springer.com/10.1007/ s10489-022-04119-6. [187] Zhao JZ, Wang XW, Mao KM, Huang CX, Su YK, Li YC. Correlated Differential Privacy of Multiparty Data Release in Machine Learning. Journal of Computer Science and Technology. 2022 Feb;37(1):231-51. Available from: https://link.springer.com/10.1007/s11390-021-1754-5. [188] Arcolezi HH, Couchot JF, Renaud D, Al Bouna B, Xiao X. Differential y private multivariate time series forecasting of aggregated human mobility with deep learning: Input or gradient perturbation? Neural Computing and Applications. 2022 Aug;34(16):13355-69. Available from: https://link. springer.com/10.1007/s00521-022-07393-0. [189] Anh-Tu Tran, The-Dung Luong, Jessada Karnjana, Van-Nam Huynh. An efficient approach for privacy preserving decentralized deep learning models based on secure multi-party computation. Neurocomputing. 2021;422:245-62. Available from: https://www.sciencedirect.com/science/article/pii/S0925231220315095. [190] Wang Q, Feng C, Xu Y, Zhong H, Sheng VS. A novel privacy-preserving speech recognition framework using bidirectional LSTM. Journal of Cloud Computing. 2020;9(1):36. [191] Park S, Byun J, Lee J. Privacy-Preserving Fair Learning of Support Vector Machine with Homomorphic Encryption. In: Proceedings of the ACM Web Conference 2022. Virtual Event, Lyon France: ACM; 2022. p. 3572-83. Available from: https://dl.acm.org/doi/10.1145/3485447.3512252. S. Gollner, M. Tropmann-Frick, B. Brumen: Towards a Definition of a Responsible Artificial Intelligence 69. [192] Liu C, Jiang ZL, Zhao X, Chen Q, Fang J, He D, et al. Efficient and Privacy-Preserving Logistic Regression Scheme based on Leveled Ful y Homomorphic Encryption. In: IEEE INFOCOM 2022 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). New York, NY, USA: IEEE; 2022. p. 1-6. Available from: https://ieeexplore.ieee.org/document/9797933/. [193] Byun J, Park S, Choi Y, Lee J. Efficient homomorphic encryption framework for privacy-preserving regression. Applied Intelligence. 2022 Aug. Available from: https://link.springer.com/10.1007/s10489-022-04015-z. [194] Can YS, Ersoy C. Privacy-Preserving Federated Deep Learning for Wearable IoT-Based Biomedical Monitoring. ACM Trans Internet Technol. 2021;21(1). [195] Chen L, Zhang W, Xu L, Zeng X, Lu Q, Zhao H, et al. A Federated Paral el Data Platform for Trustworthy AI. In: 2021 IEEE 1st International Conference on Digital Twins and Paral el Intelligence (DTPI); 2021. p. 344-7. [196] Diddee H, Kansra B. CrossPriv: User Privacy Preservation Model for Cross-Silo Federated Software. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. ASE ’20. New York, NY, USA: Association for Computing Machinery; 2020. p. 1370-2. [197] Fereidooni H, Marchal S, Miettinen M, Mirhoseini A, Möl ering H, Nguyen TD, et al. SAFELearn: Secure Aggregation for private FEderated Learning. In: 2021 IEEE Security and Privacy Workshops (SPW); 2021. p. 56-62. [198] Divya Jatain, Vikram Singh, Naveen Dahiya. A contemplative perspective on federated machine learning: Taxonomy, threats & vulnerability assessment and chal enges. Journal of King Saud University - Computer and Information Sciences. 2021. Available from: https://www.sciencedirect.com/science/article/pii/S1319157821001312. [199] Viraaji Mothukuri, Reza M Parizi, Seyedamin Pouriyeh, Yan Huang, Ali Dehghantanha, Gautam Srivastava. A survey on security and privacy of federated learning. Future Generation Computer Systems. 2021;115:619-40. Available from: https://www.sciencedirect.com/science/article/pii/ S0167739X20329848. [200] Shayan M, Fung C, Yoon CJM, Beschastnikh I. Biscotti: A Blockchain System for Private and Secure Federated Learning. IEEE Transactions on Paral el and Distributed Systems. 2021;32(7):1513-25. [201] Yang M, He Y, Qiao J. Federated Learning-Based Privacy-Preserving and Security: Survey. In: 2021 Computing, Communications and IoT Applications (ComComAp); 2021. p. 312-7. [202] Gong Q, Ruan H, Chen Y, Su X. CloudyFL: a cloudlet-based federated learning framework for sensing user behavior using wearable devices. In: Proceedings of the 6th International Workshop on Embedded and Mobile Deep Learning. Portland Oregon: ACM; 2022. p. 13-8. Available from: https://dl.acm.org/doi/10.1145/3539491.3539592. [203] Kalloori S, Klingler S. Cross-silo federated learning based decision trees. In: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing. Virtual Event: ACM; 2022. p. 1117-24. Available from: https://dl.acm.org/doi/10.1145/3477314.3507149. [204] Zhao J, Zhu H, Wang F, Lu R, Liu Z, Li H. PVD-FL: A Privacy-Preserving and Verifiable Decentralized Federated Learning Framework. IEEE Transactions on Information Forensics and Security. 2022;17:2059-73. Available from: https://ieeexplore.ieee.org/document/9777682/. [205] Beilharz J, Pfitzner B, Schmid R, Geppert P, Arnrich B, Polze A. Implicit model specialization through dag-based decentralized federated learning. In: Proceedings of the 22nd International Middleware Conference. Middleware ’21. New York, NY, USA: Association for Computing Machinery; 2021. p. 310-22. Available from: https://doi.org/10.1145/3464298.3493403. [206] Hao M, Li H, Xu G, Chen H, Zhang T. Efficient, Private and Robust Federated Learning. In: Annual Computer Security Applications Conference. ACSAC. New York, NY, USA: Association for Computing Machinery; 2021. p. 45-60. Available from: https://doi.org/10.1145/3485832.3488014. 70 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 [207] Li KH, de Gusmão PPB, Beutel DJ, Lane ND. Secure aggregation for federated learning in flower. In: Proceedings of the 2nd ACM International Workshop on Distributed Machine Learning. DistributedML ’21. New York, NY, USA: Association for Computing Machinery; 2021. p. 8-14. Available from: https://doi.org/10.1145/3488659.3493776. [208] Xu R, Baracaldo N, Zhou Y, Anwar A, Joshi J, Ludwig H. FedV: Privacy-Preserving Federated Learning over Vertical y Partitioned Data. In: Proceedings of the 14th ACM Workshop on Artificial Intel igence and Security. AISec ’21. New York, NY, USA: Association for Computing Machinery; 2021. p. 181-92. Available from: https://doi.org/10.1145/3474369.3486872. [209] Xu T, Zhu K, Andrzejak A, Zhang L. Distributed Learning in Trusted Execution Environment: A Case Study of Federated Learning in SGX. In: 2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC); 2021. p. 450-4. ISSN: 2575-4955. [210] Chai Z, Chen Y, Anwar A, Zhao L, Cheng Y, Rangwala H. FedAT: a high-performance and communication-efficient federated learning system with asynchronous tiers. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC ’21. New York, NY, USA: Association for Computing Machinery; 2021. p. 1-16. Available from: https://doi.org/10.1145/3458817.3476211. [211] Cho H, Mathur A, Kawsar F. Device or User: Rethinking Federated Learning in Personal-Scale Multi-Device Environments. In: Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems. SenSys ’21. New York, NY, USA: Association for Computing Machinery; 2021. p. 446-52. Available from: https://doi.org/10.1145/3485730.3493449. [212] Li Y, Hu G, Liu X, Ying Z. Cross the Chasm: Scalable Privacy-Preserving Federated Learning against Poisoning Attack. In: 2021 18th International Conference on Privacy, Security and Trust (PST); 2021. p. 1-5. [213] Zhang K, Yiu SM, Hui LCK. A Light-Weight Crowdsourcing Aggregation in Privacy-Preserving Federated Learning System. In: 2020 International Joint Conference on Neural Networks (IJCNN); 2020. p. 1-8. ISSN: 2161-4407. [214] Li S, Ngai E, Ye F, Voigt T. Auto-weighted Robust Federated Learning with Corrupted Data Sources. ACM Transactions on Intelligent Systems and Technology. 2022 Oct;13(5):1-20. Available from: https://dl.acm.org/doi/10.1145/3517821. [215] Bonawitz K, Kairouz P, Mcmahan B, Ramage D. Federated learning and privacy. Communications of the ACM. 2022 Apr;65(4):90-7. Available from: https://dl.acm.org/doi/10.1145/3500240. [216] Antunes RS, Andre´ da Costa C, Küderle A, Yari IA, Eskofier B. Federated Learning for Healthcare: Systematic Review and Architecture Proposal. ACM Transactions on Intel igent Systems and Technology. 2022 Aug;13(4):1-23. Available from: https://dl.acm.org/doi/10.1145/3501813. [217] Nguyen DC, Pham QV, Pathirana PN, Ding M, Seneviratne A, Lin Z, et al. Federated Learning for Smart Healthcare: A Survey. ACM Computing Surveys. 2023 Apr;55(3):1-37. Available from: https://dl.acm.org/doi/10.1145/3501296. [218] Zhu S, Qi Q, Zhuang Z, Wang J, Sun H, Liao J. FedNKD: A Dependable Federated Learning Using Fine-tuned Random Noise and Knowledge Distil ation. In: Proceedings of the 2022 International Conference on Multimedia Retrieval. Newark NJ USA: ACM; 2022. p. 185-93. Available from: https://dl.acm.org/doi/10.1145/3512527.3531372. [219] Wang Z, Yan B, Dong A. Blockchain Empowered Federated Learning for Data Sharing Incentive Mechanism. Procedia Computer Science. 2022;202:348-53. Available from: https://linkinghub.elsevier.com/retrieve/pii/S1877050922005816. [220] Wang N, Xiao Y, Chen Y, Hu Y, Lou W, Hou YT. FLARE: Defending Federated Learning against Model Poisoning Attacks via Latent Space Representations. In: Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security. Nagasaki Japan: ACM; 2022. p. 946-58. Available from: https://dl.acm.org/doi/10.1145/3488932.3517395. S. Gollner, M. Tropmann-Frick, B. Brumen: Towards a Definition of a Responsible Artificial Intelligence 71. [221] Giuseppi A, Manfredi S, Menegatti D, Pietrabissa A, Poli C. Decentralized Federated Learning for Nonintrusive Load Monitoring in Smart Energy Communities. In: 2022 30th Mediterranean Conference on Control and Automation (MED). Vouliagmeni, Greece: IEEE; 2022. p. 312-7. Available from: https://ieeexplore.ieee.org/document/9837291/. [222] Lo SK, Liu Y, Lu Q, Wang C, Xu X, Paik HY, et al. Towards Trustworthy AI: Blockchain-based Architecture Design for Accountability and Fairness of Federated Learning Systems. IEEE Internet of Things Journal. 2022:1-1. Available from: https://ieeexplore.ieee.org/document/9686048/. [223] Gholami A, Torkzaban N, Baras JS. Trusted Decentralized Federated Learning. IEEE. 2022:6. [224] Yang Z, Shi Y, Zhou Y, Wang Z, Yang K. Trustworthy Federated Learning via Blockchain. IEEE Internet of Things Journal. 2022:1-1. Available from: https://ieeexplore.ieee.org/document/ 9866512/. [225] Abou El Houda Z, Hafid AS, Khoukhi L, Brik B. When Col aborative Federated Learning Meets Blockchain to Preserve Privacy in Healthcare. IEEE Transactions on Network Science and Engineering. 2022:1-11. Available from: https://ieeexplore.ieee.org/document/9906419/. [226] Chowdhury A, Kassem H, Padoy N, Umeton R, Karargyris A. A Review of Medical Federated Learning: Applications in Oncology and Cancer Research. In: Crimi A, Bakas S, editors. Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. Cham: Springer International Publishing; 2022. p. 3-24. [227] Sav S, Bossuat JP, Troncoso-Pastoriza JR, Claassen M, Hubaux JP. Privacy-preserving federated neural network learning for disease-associated cell classification. Patterns. 2022 May;3(5):100487. Available from: https://linkinghub.elsevier.com/retrieve/pii/S2666389922000721. [228] Ma Z, Ma J, Miao Y, Li Y, Deng RH. ShieldFL: Mitigating Model Poisoning Attacks in Privacy-Preserving Federated Learning. IEEE Transactions on Information Forensics and Security. 2022;17:1639-54. Available from: https://ieeexplore.ieee.org/document/9762272/. [229] Li J, Yan T, Ren P. VFL-R: a novel framework for multi-party in vertical federated learning. Applied Intelligence. 2022 Sep. Available from: https://link.springer.com/10.1007/s10489-022-04111-0. [230] Chuanxin Z, Yi S, Degang W. Federated Learning with Gaussian Differential Privacy. In: Proceedings of the 2020 2nd International Conference on Robotics, Intelligent Control and Artificial Intelligence. RICAI 2020. New York, NY, USA: Association for Computing Machinery; 2020. p. 296-301. [231] Grivet Sébert A, Pinot R, Zuber M, Gouy-Pail er C, Sirdey R. SPEED: secure, PrivatE, and efficient deep learning. Machine Learning. 2021;110(4):675-94. [232] Jarin I, Eshete B. PRICURE: Privacy-Preserving Collaborative Inference in a Multi-Party Setting. In: Proceedings of the 2021 ACM Workshop on Security and Privacy Analytics. IWSPA ’21. New York, NY, USA: Association for Computing Machinery; 2021. p. 25-35. [233] Owusu-Agyemeng K, Qin Z, Xiong H, Liu Y, Zhuang T, Qin Z. MSDP: multi-scheme privacy-preserving deep learning via differential privacy. Personal and Ubiquitous Computing. 2021. [234] Nuria Rodríguez-Barroso, Goran Stipcich, Daniel Jiménez-López, José Antonio Ruiz-Millán, Eugenio Martínez-Cámara, Gerardo González-Seco, et al. Federated Learning and Differential Privacy: Software tools analysis, the Sherpa.ai FL framework and methodological guidelines for preserving data privacy. Information Fusion. 2020;64:270-92. Available from: https://www.sciencedirect.com/science/article/pii/S1566253520303213. [235] Wibawa F, Catak FO, Kuzlu M, Sarp S, Cali U. Homomorphic Encryption and Federated Learning based Privacy-Preserving CNN Training: COVID-19 Detection Use-Case. In: EICC 2022: Proccedings of the European Interdisciplinary Cybersecurity Conference. Barcelona Spain: ACM; 2022. p. 85-90. Available from: https://dl.acm.org/doi/10.1145/3528580.3532845. [236] Feng X, Chen L. Data Privacy Protection Sharing Strategy Based on Consortium Blockchain and Federated Learning. In: 2022 International Conference on Artificial Intelligence and 72 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Computer Information Technology (AICIT). Yichang, China: IEEE; 2022. p. 1-4. Available from: https://ieeexplore.ieee.org/document/9930188/. [237] Tan AZ, Yu H, Cui L, Yang Q. Towards Personalized Federated Learning. IEEE Transactions on Neural Networks and Learning Systems. 2022:1-17. Available from: https://ieeexplore.ieee.org/document/9743558/. [238] Rahimian S, Orekondy T, Fritz M. Differential Privacy Defenses and Sampling Attacks for Membership Inference. In: Proceedings of the 14th ACM Workshop on Artificial Intel igence and Security. AISec ’21. New York, NY, USA: Association for Computing Machinery; 2021. p. 193-202. Available from: https://doi.org/10.1145/3474369.3486876. [239] Ha T, Dang TK, Le H, Truong TA. Security and Privacy Issues in Deep Learning: A Brief Review. SN Computer Science. 2020;1(5):253. [240] Joos S, Van hamme T, Preuveneers D, Joosen W. Adversarial Robustness is Not Enough: Practical Limitations for Securing Facial Authentication. In: Proceedings of the 2022 ACM on International Workshop on Security and Privacy Analytics. Baltimore MD USA: ACM; 2022. p. 2-12. Available from: https://dl.acm.org/doi/10.1145/3510548.3519369. [241] Jankovic A, Mayer R. An Empirical Evaluation of Adversarial Examples Defences, Combinations and Robustness Scores. In: Proceedings of the 2022 ACM on International Workshop on Security and Privacy Analytics. Baltimore MD USA: ACM; 2022. p. 86-92. Available from: https://dl.acm.org/doi/10.1145/3510548.3519370. [242] Brown H, Lee K, Mireshghal ah F, Shokri R, Tramér F. What Does it Mean for a Language Model to Preserve Privacy? In: 2022 ACM Conference on Fairness, Accountability, and Transparency. Seoul Republic of Korea: ACM; 2022. p. 2280-92. Available from: https://dl.acm.org/doi/10.1145/3531146.3534642. [243] Muhr T, Zhang W. Privacy-Preserving Detection of Poisoning Attacks in Federated Learning. In: 2022 19th Annual International Conference on Privacy, Security & Trust (PST). Fredericton, NB, Canada: IEEE; 2022. p. 1-10. Available from: https://ieeexplore.ieee.org/document/9851993/. [244] Giordano M, Maddalena L, Manzo M, Guarracino MR. Adversarial attacks on graph-level embedding methods: a case study. Annals of Mathematics and Artificial Intel igence. 2022 Oct. Available from: https://link.springer.com/10.1007/s10472-022-09811-4. [245] Agarwal A, Chattopadhyay P, Wang L. Privacy preservation through facial de-identification with simultaneous emotion preservation. Signal, Image and Video Processing. 2020. [246] Aminifar A, Rabbi F, Pun KI, Lamo Y. Privacy Preserving Distributed Extremely Randomized Trees. In: Proceedings of the 36th Annual ACM Symposium on Applied Computing. SAC ’21. New York, NY, USA: Association for Computing Machinery; 2021. p. 1102-5. [247] Anastasi a Girka, Vagan Terziyan, Mari a Gavriushenko, Andri Gontarenko. Anonymization as homeomorphic data space transformation for privacy-preserving deep learning. Procedia Computer Science. 2021;180:867-76. Available from: https://www.sciencedirect.com/science/article/pii/S1877050921003914. [248] He Q, Yang W, Chen B, Geng Y, Huang L. TransNet: Training Privacy-Preserving Neural Network over Transformed Layer. Proc VLDB Endow. 2020;13(12):1849-62. [249] Zhou T, Shen J, He D, Vijayakumar P, Kumar N. Human-in-the-Loop-Aided Privacy-Preserving Scheme for Smart Healthcare. IEEE Transactions on Emerging Topics in Computational Intelligence. 2020:1-10. [250] Goldsteen A, Ezov G, Shmelkin R, Moffie M, Farkash A. Data minimization for GDPR compliance in machine learning models. AI and Ethics. 2021. [251] Boenisch F, Battis V, Buchmann N, Poikela M. “I Never Thought About Securing My Machine Learning Systems”: A Study of Security and Privacy Awareness of Machine Learning Practitioners. In: Mensch Und Computer 2021. MuC ’21. New York, NY, USA: Association for Computing Machinery; 2021. p. 520-46. [252] Abuadbba S, Kim K, Kim M, Thapa C, Camtepe SA, Gao Y, et al. Can We Use Split Learning on 1D CNN Models for Privacy Preserving Training? In: Proceedings of the 15th ACM Asia S. Gollner, M. Tropmann-Frick, B. Brumen: Towards a Definition of a Responsible Artificial Intelligence 73. Conference on Computer and Communications Security. ASIA CCS ’20. New York, NY, USA: Association for Computing Machinery; 2020. p. 305-18. [253] Ghamry ME, Halim ITA, Bahaa-Eldin AM. Secular: A Decentralized Blockchain-based Data Privacy-preserving Model Training Platform. In: 2021 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC); 2021. p. 357-63. [254] Zhou T, Shen J, He D, Vijayakumar P, Kumar N. Human-in-the-Loop-Aided Privacy-Preserving Scheme for Smart Healthcare. IEEE Transactions on Emerging Topics in Computational Intelligence, title=Human-in-the-Loop-Aided Privacy-Preserving Scheme for Smart Healthcare. 2020 Jan:1-10. [255] Bai Y, Fan M, Li Y, Xie C. Privacy Risk Assessment of Training Data in Machine Learning. In: ICC 2022 - IEEE International Conference on Communications. Seoul, Korea, Republic of: IEEE; 2022. p. 1015-5. Available from: https://ieeexplore.ieee.org/document/9839062/. [256] Abbasi W, Mori P, Saracino A, Frascol a V. Privacy vs Accuracy Trade-Off in Privacy Aware Face Recognition in Smart Systems. In: 2022 IEEE Symposium on Computers and Communications (ISCC). Rhodes, Greece: IEEE; 2022. p. 1-8. Available from: https://ieeexplore.ieee.org/document/ 9912465/. [257] Montenegro H, Silva W, Gaudio A, Fredrikson M, Smailagic A, Cardoso JS. Privacy-Preserving Case-Based Explanations: Enabling Visual Interpretability by Protecting Privacy. IEEE Access. 2022;10:28333-47. Available from: https://ieeexplore.ieee.org/document/9729808/. [258] Mao Q, Chen Y, Duan P, Zhang B, Hong Z, Wang B. Privacy-Preserving Classification Scheme Based on Support Vector Machine. IEEE Systems Journal. 2022:1-11. Available from: https://ieeexplore.ieee.org/document/9732431/. [259] Harichandana BSS, Agarwal V, Ghosh S, Ramena G, Kumar S, Raja BRK. PrivPAS: A real time Privacy-Preserving AI System and applied ethics. In: 2022 IEEE 16th International Conference on Semantic Computing (ICSC). Laguna Hil s, CA, USA: IEEE; 2022. p. 9-16. Available from: https://ieeexplore.ieee.org/document/9736272/. [260] Tian H, Zeng C, Ren Z, Chai D, Zhang J, Chen K, et al. Sphinx: Enabling Privacy-Preserving Online Learning over the Cloud. In: 2022 IEEE Symposium on Security and Privacy (SP). San Francisco, CA, USA: IEEE; 2022. p. 2487-501. Available from: https://ieeexplore.ieee.org/document/ 9833648/. 74 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 A TIME-SERIES SEMANTIC- COMPUTING METHOD FOR 5D WORLD MAP SYSTEM YASUSHI KIYOKI,1, 2, 3 ASAKO URAKI,1, 2 SHIORI SASAKI,1, 2, 3 YUKIO CHEN4 1 Keio University, Graduate School of Media and Governance, Kanagawa, Japan kiyoki@sfc.keio.ac.jp, aco@sfc.keio.ac.jp, ssasaki@musashino-u.ac.jp 2 Keio University, SFC Research Institute, Kanagawa, Japan kiyoki@sfc.keio.ac.jp, aco@sfc.keio.ac.jp, ssasaki@musashino-u.ac.jp 3 Musashino University, Graduate School of Data Science, Tokyo, Japan kiyoki@sfc.keio.ac.jp, ssasaki@musashino-u.ac.jp 4 Kanagawa Institute of Technology, Department of Information & Computer Sciences, Kanagawa, Japan chen@ic.kanagawa-it.ac.jp “Semantic space creation” and “distance-computing” are basic functions to realize semantic computing for environmental phenomena memorization, retrieval, analysis, integration and visualization. We have introduced “SPA-based (Sensing, Processing and Actuation) Multi-dimensional Semantic Computing Method” for realizing a global environmental system, “5-Dimensional World Map System”. This method is important to design new environmental systems with Cyber-Physical Space-integration to detect environmental phenomena occurring in a physical-space (real space). This method maps those phenomena to a multi-dimensional semantic-space, performs semantic computing, and actuates the semantic- computing results to the physical space with visualizations for Keywords: cyber & physical expressing environmental phenomena, causalities and influences. space integration, As an actual system of this method, currently, the 5D World Map SPA-function, System is globally utilized as a Global Environmental Semantic spatio-temporal computing, Computing System, in SDG14, United-Nations-ESCAP: semantic (https://sdghelpdesk.unescap.org/toolboxes). It is significant to computing, memorize those situations and compute environmental change world map-based visualization, in various aspects and contexts, in order to discover actual warning message phenomena occurring in the nature of our planet. propagation DOI https://doi.org/10.18690/um.feri.5.2023.3 ISBN 978-961-286-745-4 76 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 1 Introduction We have introduced the architecture of a global environmental system, “5-Dimensional World Map System” [3,4,6,10], to realize environmental knowledge memorization, sharing, retrieval, integration and visualization with semantic computing. The basic space of this system consists of a temporal (1st dimension), spatial (2nd, 3rd and 4th dimensions) and semantic dimensions (5th dimensions), representing a large-scale and multiple-dimensional semantic space. This space memorizes and recal s various environmental knowledge expressed in multimedia information resources with temporal, spatial and semantic correlation computing functions, and realizes a 5D World Map for dynamical y creating temporal-spatial and semantic multiple views. We also introduce the concept of “SPA (Sensing, Processing and Analytical Actuation Functions)” for realizing an global environmental system, to apply it to our 5-Dimensional World Map System. This concept is effective and advantageous to design environmental systems with Cyber-Physical integration to detect environmental phenomena as real data resources in a physical-space (real space), map them to cyber-space to make analytical and semantic computing, and actuate the analytically computed results to the real space with visualization for expressing environmental phenomena, causalities and influences. Semantic computing [1,2,5] is an important and promising approach to semantic analysis for various environmental phenomena and changes in a real world. This paper presents a new concept of "Time-series-Analytical Semantic-Space and Computing for environmental phenomena” for realizing global environmental analysis [8,9,10,11,13,14]. This space and computing method are based on semantic space creation with time-analysis for analyzing and interpreting environmental phenomena and changes occurring in the world. We focuse on semantic interpretations of time-series data, as an experimental study for creating "Time-Series Analysis Semantic-Space for environment. ” Y. Kiyoki et al.: A Time-series Semantic-computing Method for 5D World Map System 77. 2 Global Environmental Analysis with Semantic Computing We have introduced “5D World Map System” with Spatio-Temporal and Semantic Computing in SPA, as the architecture of a multi-visualized and dynamic knowledge representation system [3,4,6,10]” applied to environmental analysis and semantic computing. The basic space of this system consists of a temporal (1st dimension), spatial (2nd, 3rd and 4th dimensions) and semantic dimensions (5th dimension, representing a large-scale and multiple-dimensional semantic space). This space memorizes and recal s various multimedia information resources with temporal, spatial and semantic correlation computing functions, and realizes a 5D World Map for dynamical y creating temporal-spatial and semantic multiple views applied for various “environmental multimedia information resources.” 2.1 Semantic Computing in 5D World Map System We have presented the dynamic evaluation and mapping functions for multiple views of temporal-spatial metrics and integrate the results of semantic evaluation to analyze environmental multimedia information resources[3,4,6,10]. Our semantic computing system realizes the interpretations on "semantics" and “impressions” of environmental phenomena with multimedia information resources, according to "contexts"[1,2,5]. The main feature of this system is to create world-wide global maps and views of environmental situations expressed in multimedia information resources (image, sound, text and video) dynamical y, according to user’s viewpoints. Spatially, temporal y, semantical y and impressionably evaluated and analyzed environmental multimedia information resources are mapped onto a 5D time-series multi-geographical space. The basic concept of the 5D World Map System is shown in Figures 1 and 2. The 5D World Map system applied to environmental multimedia computing visualizes world-wide and global relations among different areas and times in environmental aspects, by using dynamic mapping functions with temporal, spatial, semantic and impression-based computations [3,4,6,10,11,13]. 78 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Correlation & Spatiotemporal Information Visualization Differential Analysis Resources Calculation q 1 Chronologically-ordered mapping Multimedia & Sensor Data World Map Set data 2 data 1 with Spatiotemporal (5D World Map) y Information data t n related to q 2 Environment and Health Semantic Space q mapping v e for Multimedia 1 x data 1 Domain A Domain A mapping Sensor data Media data mapping data 2 e datam 2 Time series Analysis e p Feature Space for Sensor data Multi-dimensional Domain B Domain B Semantic/Feature Sensor data Media data Spaces Figure 1: 5D World Map System for world-wide semantic computing for Global Environmental Analysis Source: own. 2.2 SPA: Sensing, Processing and Analytical Actuation Functions in 5D World Map “SPA” is a fundamental concept for realizing environmental systems with three basic functions of “Sensing, Processing and Analytical Actuation” for Physical-Cyber integration. “SPA” is effective and advantageous to detect environmental phenomena as real data resources in a physical-space (real space), map them to the cyber-space to make analytical and semantic computing, and actuate the analytical y computed results to the real space by visualization for expressing environmental phenomena with causalities and influence. This concept is applied to our semantic computing in 5D World Map System, as shown in Figure 1, Figure 2 and Figure 3. The important application of the semantic computing system are “Global Environment-Analysis” for making appropriate and urgent solutions to global environment changes in terms of short and long-term changes. The “six functional-pillars” are essentially important with “environmental knowledge-base creation” for Y. Kiyoki et al.: A Time-series Semantic-computing Method for 5D World Map System 79. sharing, analyzing and visualizing various environmental phenomena and changes in a real world. The 5D World Map System realizes Cyber-Physical Space-integration, as shown in Figure 1, to detect environmental phenomena with real data resources in a physical-space (real space), map them to the cyber-space to make knowledge bases and analytical computing, and actuate the computed results to the real space with visualization for expressing environmental phenomena, causalities and influences. The 5D World Map System and its applications create new analytical circumstance with the SPA concept (Sensing, Processing and Analytical Actuation) for sharing, analyzing and visualizing natural and social environmental aspects. This system realizes “environmental analysis and situation-recognition” which wil be essential for finding out solutions for global environmental issues. The 5D World Map System collects and facilitates a lot of environmental information resources, which are characteristics of ocean species, disasters, water-quality and deforestation. 3 A Time-series Semantic Computing Method for Global Environmental Analysis We introduce a concept of “time-series-context”, as a context on time-series in semantic computing on a multi-dimensional space. The “time-series-context” is a data structure to specify dimensional projection(dimensional selection), that is, the projection (selection of dimensions) to be applied in “time-series semantic computing.” − One of the most important processes of multi-dimensional & time-series semantic computing is to define semantic “time-series-context”. − It is essential to compare between two different time-series on semantic features, expressing a time-series-context, for realizing semantic interpretations and predictions on natural environmental phenomena. We define a multi-dimensional & time-series semantic computing method for timeseries data in a time axis with the definition of time-series context. 80 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 y t x Figure 2: A Time-series Semantic-Computing Space for 5D World Map System Source: own. 3.1 Basic data structures and operations The basic data structures for time-series semantic computing are defined as fol ows: − Space: Multi-Dimensional semantic space with time-axis − Basic elements: point-series --> time-series (point-series along a time-series for expressing a phenomenon) − Time-series-context: “time-series grain” & “time-interval” 3.2 Semantic computing process To define a time-series-context, we express semantic meaning of temporal difference and its interpretations according to the time-series-context. − If switching the N time-series-contexts for same data, we can obtain N different semantic meanings of temporal difference in each context. − The definition of the time-series-context by the 3 steps is corresponding to set the closed world on time axis. Y. Kiyoki et al.: A Time-series Semantic-computing Method for 5D World Map System 81. Step 1: Define semantic viewpoint to fix target axes, that are corresponding to multiple parameters as the semantic feature combination, reflecting expert-knowledge and viewpoint. Step 2: Define semantic viewpoint to fix target time-series data to calculate semantic distances, 3.3 Semantic computing functions In this section, we express the data structures and functions for “time-series stream-creation”. The following 6 basic functions are defined to express the query-timeseries, that creates time-series query expression as a new time-series stream: (1) Time-series data structures To realize “time-series stream-creation” (creating time-series stream), the fol owing basic settings on time-series data structures are defined: (1-1) “time-granularity (granularity in time)” setting, (1-2) “time-interval” setting, (1-3) “time-grains-combination” setting, (1-4) “time-series-context” setting. (2) Time-series stream A time-series stream is defined with a basic-atomic-time-element, that is expressed: (2-1) basic-atomic-time-element form: (time-i, (value-i-1, value-i-2, ---, value-i-m)). (2-2) Time-series stream expression: By combining basic-atomic-time-elements, any time-series stream is expressed and created. (time-grain setting, time-interval setting, time-grains-combination) 82 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Time-series stream expressions: Time-series-semantic-integration-method: Temporal-Atomic-element:(time-series), as the time-grain setting: ((time-1, (value-1-1, value-1-2, ---, value1-m)) , (time-2, (value-2-1, value-2-2, ---, value-2-m)), ---, (time-n, (value-n-1, value-n-2, ---, value-n-m))): Atom-1: ((t-1,t-2,t-3)(v-1-i,v-2-i,v-3-i)) (The “i” is fixed with a “time-series context”.) Atom-2: ((t-1,t-2,t-3)(v-1-j,v-2-j,v-3-j)) Atom-3: ((t-1,t-2,t-3)(v-1-k,v-2-k,v-3-k)) (2-3) Time-series-semantic-integration (Time-series stream is expressed and created in the following basic structures): (time-grains-combination for time-interval setting,) (2-3-1) Vertical integration for time-series stream: (((t-1,t-2,t-3)(v-1-i,v-2-i,v-3-i)), ((t-1,t-2,t-3)(v-1-j,v-2-j,v-3-j)), ((t-1,t-2,t-3)(v-1-k,v-2-k,v-3-k))) - - - (2-3-2) Horizontal integration for time-series stream: ((t-1,t-2,t-3) ( (v-1-i,v-1-j,v-1-k), (v-2-i,v-2-j,v-2-k), (v-3-i,v-3-j,v-3-k))) (2-3-3) Time-series stream integration: (((t-1,t-2,t-3), ((t-4,t-5,t-6)) (( (v-1-i,v-1-j,v-1-k), (v-2-i,v-2-j,v-2-k), (v-3-i,v-3-j,v-3-k))), ( (v-4-i,v-4-j,v-4-k), (v-5-i,v-5-j,v-5-k), (v-6-i,v-6-j,v-6-k))) (3) Geographical-time-series form: Geographical time-series stream, as a time-series stream, is defined with a basic-geo-atomic-time-element, that is expressed: Y. Kiyoki et al.: A Time-series Semantic-computing Method for 5D World Map System 83. (3-1) basic-geo-atomic-time-element form: S1(place1) (time-1, (value-1-1, value-1-2, ---, value-1-m)) (time-2, (value-2-1, value-2-2, ---, value-2-m)) (time-3, (value-3-1, value-3-2, ---, value-3-m)) S2(place2) (time-4, (value-4-1, value-4-2, ---, value-4-m)) (time-5, (value-5-1, value-5-2, ---, value-5-m)) (time-6, (value-6-1, value-6-2, ---, value-6-m)) S3(place3) (time-7, (value-7-1, value-7-2, ---, value-7-m)) (time-8, (value-8-1, value-8-2, ---, value-8-m)) (time-9, (value-9-1, value-9-2, ---, value-9-m)) (4) Time-series-stream-comparison (semantic-distance computing between two time-series streams) (4-a) distance of features (time-series-context features) between different timings in same time-series (4-b) distance of features (time-series-context features) between different phenomena in different time-series (4-c) distance of features (time-series-context features) between different places (geographical places) in the same phenomena in different time-series The basic distance function between two time-series streams is defined in the following form: Timeseries-streams-distance((t1, t2, ---, tn) (y1, y2, ---, yn) , (t1', t2', ---, tn') (y1', y2' ---, yn')). (4-1) Timeseries-streams-distance as the sum of each parameters’ distances: Timeseries-streams-distance-1((t1, t2, ---, tn) (y1, y2, ---, yn) , (t1', t2', ---, tn') (y1', y2' ---, yn')) => Σ(i=0, n) |yi - yi'| 84 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 (4-2) Timeseries-streams-distance as distance-in-time-interval-normalization: Timeseries-streams-distance-2((t1, t2, ---, tn) (y1, y2, ---, yn) , (t1', t2', ---, tn') (y1', y2' ---, yn')) => distance-in-time-interval-normalization ((t1, t2, ---, tn) (y1, y2, ---, yn) , (t1', t2', ---, tn') (y1', y2' ---, yn')) (4-3)Timeseries-streams-distance as distance-in-start-time-normalization Timeseries-streams-distance-3((t1, t2, ---, tn) (y1, y2, ---, yn) , (t1', t2', ---, tn') (y1', y2' ---, yn')) => distance-in-start-time-normalization ((t1, t2, ---, tn) (y1, y2, ---, yn) , (t1', t2', ---, tn') (y1', y2' ---, yn')) (5) Geographical phenomenon-distance function form is defined for time-series comparison between different places. Geographical-timeseries-streams-distance((S1(place1),((time-1, (value-1-1, value-1-2, ---, value-1-m)), (time-2, (value-2-1, value-2-2, ---, value-2-m)), (time-3, (value-3-1, value-3-2, ---, value-3-m))), (S2(place2),(time-4, (value-4-1, value-4-2, ---, value-4-m)), (time-5, (value-5-1, value-5-2, ---, value-5-m)), (time-6, (value-6-1, value-6-2, -- -, value-6-m)))) Data of Country A t Same timing Data of Country B t Figure 3: The Concept of a difference comparison with Time-series Semantic-Computing with 5D World Map System Source: own. Y. Kiyoki et al.: A Time-series Semantic-computing Method for 5D World Map System 85. 3.4 Time-series semantic computing for phenomenon-prediction with time-interval ratio We present a ratio computing method for phenomenon-prediction with time-interval ratio between peak values. This method predicts the timing of future peak with differential computing, according to “time-series-context”. The basic data structure of “time-series context” is defined in 6 Key elements: − Original time-granularity (granularity in time) (OG) − Target timing-granularity (TG) − Feature extraction method (FEM) − Focusing time-interval (FW) (time-interval) − Differential computing method (DCM) − Pivot extraction method on ITD (PEM) Two time-series data (IRD, ITD) consists of time and its corresponding values expressed in target granularity in two different places, F and J. The ratio computing method is applied to three peaks of the corresponding values existing in IRD(FT1, FT2, FT3) in timing in the place F, and three peaks in ITD(JT1, JT2, “JT3(target for estimation)”)) in the place J. Then, JT3 is computed as an estimated timing in the following process, as the timing when the situation corresponding to the third peak will occur in the future in the place J. The important feature of this method is to define differential computing function as ratio computing between the timing of the peaks in time-series data to estimate the future peak. Input data for analysis (IRD) = Time-series sequence of (parameter-value, time-point) for expressing the situation with the selected parameter in the place F: − The number of confirmed-values in the place F. 86 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Input data for prediction (ITD) = Time-series sequence of (parameter-value, time-point) for expressing the situation with the selected parameter in the place J: − The number of confirmed-values in the place J. As for the 6 key elements to express time-series-context, we applied the followings. Meaning of the 6 key elements settings is to determine the common situation in two different time-series, as “time-series-context”. The 6 Key elements are expressed as a “time-series-context” for phenomenon-prediction with time-interval ratio between peak values: The “time-series-context” definition is set in the following: − Original time-granularity (granularity in time) = daily − Target timing-granularity (TG) = peaks − Feature extraction method (FEM) = time point of peaks − Focusing time-interval (FW) = from first peak to the last peak 3 or more peaks that matched condition(condition:) − Differential computing method (DCM) = ratio computing function for the number of days between the selected adjacent peaks, by applying average, difference, and other functions − Pivot extraction method on ITD (PEM) = most recent 2 or more peaks that matched condition(condition:) The prediction process with the differential computing method (DCM) is defined as the ratio computing function for computing the number of days between the selected adjacent peaks, by applying average, difference, and other operations. The basic data structure for the prediction with the differential computing method is expressed: FT1-3: time points at 3 peaks (FT1, FT2, FT3) timings, corresponding to top-three maximum parameter-values in the time-series sequence in the place F. Y. Kiyoki et al.: A Time-series Semantic-computing Method for 5D World Map System 87. JT1-3: time points at 3 peaks (JT1, JT2, “JT3 (target for estimation)”) timings, corresponding to top-three maximum parameter-values in the time-series sequence in the place J. The process for the prediction with the differential computing method is expressed: (Step-1) 3 peaks selection (FT1, FT2, FT3) in time series in IRD, (Step-2-1) ratio computing in time-interval: (FT3-FT2)/(FT2-FT1), (Step-2-2) average computing in time-interval: average((FT3-FT2), (FT2-FT1)), (Step-2-3) differential computing in time interval: (FT3-FT2) => JT3-JT2 = (FT3-FT2) (Step-3) (JT3-JT2)/(JT2-JT1) = (FT3-FT2)/(FT2-FT1) => JT3 is computed as an estimated time of peak-3 in this ratio computing if (2-1) is applied. Then, JT3 is obtained as the prediction result of a next peak timing, occurring in the future. 4 Time-series semantic computing in 5D World Map System We have integrated the time-series semantic computing method into the 5D World Map System. The following cases are example targets of semantic computing in 5D World Map System with geographical time-series stream defined with a basic-geo-atomic-time-element in Section 3: Time-series semantic computing for Global Environmental Analysis. 4.1 Case I: Earthquake analysis with time-series semantic computing on 5D World Map System This case shows the analysis of the depth of earthquakes, which occurred around the world during the period from Aug. 23rd to Aug. 28th, 2014, and Jan 7th to Jan. 13th, 2023. The target data is acquired from USGS Earthquake Feeds. Figure 4 shows the visualization results of the time-series change of geographical distribution of the depth values of significant earthquakes with over 2.5 magnitude values in one week of August 2014. From the results, we can observe intuitively that there is a point where deep earthquakes had happened through the whole period (eg. 88 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Alaska), and there is an emergent timing that deep earthquakes happened in Fiji (2014/08/25) and consequently in Japan (2014/08/26). Also, Figure 5 shows the visualization results of the depth values of significant earthquakes with over 2.5 magnitude values in one week of January 2023. We can observe that there is a point where deep earthquakes had happened through the whole period (eg. New Zealand, Jan 7th, 9th, 10th and 13th). In this case, the time-series query expression as a new time-series stream will be created by the depth value of earthquake by 6 basic fun2ctions defined in Section 3 to express the query-time-series-stream for time-series semantic computing. In this case, the time-interval is set as 1 week, and the time-granularity is set as 1 day. 2014/08/22 2014/08/23 2014/08/24 2014/08/25 2014/08/26 2014/08/27 2014/08/28 Figure 4: Time-series change of geographical distribution of the depth values of significant earthquakes with over 2.5 magnitude values, which occurred around the world during the period from Aug. 23rd to Aug. 28th, 2014 Source: own. The following is an example of query creation and time-series-context settings for the analysis of significant earthquake and prediction with time-series semantic computing. Input data for analysis (IRD) = time-series values of earthquake depth in two points − Depth value of earthquake in Alaska − Depth value of earthquake in New Zealand Y. Kiyoki et al.: A Time-series Semantic-computing Method for 5D World Map System 89. 2023/01/7 2023/01/8 2023/01/9 2023/01/10 2023/01/11 2023/01/12 2023/01/13 Figure 5: Time-series change of geographical distribution of the depth values of significant earthquakes with over 2.5 magnitude values, which occurred around the world during the period from Jan. 7th to Jan. 13th, 2023 Source: own. Input data for prediction (ITD) = time-series values of earthquake depth in a target point − Depth value of earthquake in Japan As for the 6 key elements to express time-series-context, we can apply the followings. 6 Key elements for express time-series-context are: − Original time-granularity (OG) = daily − Target timing-granularity (TG) = peaks of earthquake depth − Feature extraction method (FEM) = time point of peaks − Focusing time-interval (FW) = from first peak to the last peak 3 or more peaks that matched condition − Differential computing method (DCM) = ratio computing function for, number of days between the selected adjacent peaks, by applying average, difference, and other functions − Pivot extraction method on ITD (PEM) = most recent 2 or more peaks that matched condition 90 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 4.2 Case II: Deforestation analysis with time-series semantic computing on 5D World Map System This case shows the analysis of the time-series difference extraction on deforestation in Mae Wong National Park in Thailand (Figure 6) during the period from 2003 to 2013, where the deforestation has heavily happened previously, and large-scale of restoration is deployed currently. Figure 7 shows the examples of original Landsat 7 and 8 satellite images of this area. Huai Nam Dang Phu Kradueng National Park National Park Mae Wong Thaplan National Park National Park Figure 6: Four important national parks to analyze deforestation in Thailand Source: own. 2003, Jan. 2004, Feb. 2005, Feb. Figure 7: Examples of original Landsat satel ite images of Mae Wong Source: own. Figure 8 shows the result images of pre-processing for Landsat satel ite images of Mae Wong from 2003 to 2013. To detect the change in the same season (dry season, less cloud in Thailand), the images of January to Februaty are collected. The pre- Y. Kiyoki et al.: A Time-series Semantic-computing Method for 5D World Map System 91. processing for the original images starts from rotating, via position adjustment, trimming (calibration), and ends by color-contrast adjustment. 2003, Jan. 2004, Feb. 2005, Feb. 2006, Feb. 2007, Jan. 2008, Jan. 2009, Jan. 2010, Feb. 2011, Feb. 2012, Feb. 2013, Feb. Figure 8: The pre-processed images of Landsat satel ite images of Mae Wong area in Thailand from 2003 to 2013 Source: own. Figure 9 shows the difference extraction results. The number of color clustering was set as 4 clusters. The focused colors are bright-green and dark green, which means forest area. Retreated parts are represented in orange color, and advanced parts are represented in blue color. The results show that the retreated area of green increased much from 2003 to 2004, but not so much from 2004 to 2005. From 2005 to 2007, the advanced area is observed, but from 2007, the retreated area can be observed again. From 2007 to 2013, the retreated area gradual y decreased. These results might show a success of the forest restoration policy and activities, though the deeper analysis with forest specialists is needed. We need to examine the details to judge if these results mean the speed of deforestation is reduced year by year. As shown in Figure 10 and Figure 11, the original satellite images with geo information and the difference-images can be mapped onto 5D World Map System and visualized with other data such as sensor data of weather and statistical data 92 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 about deforestation around the world. This visualization enables users to understand the complicated relations among various elements of environmental phenomena intuitively. 2003->2004 2004->2005 2005->2006 2006->2007 2007->2008 2008->2009 2009->2010 2010->2011 2011->2012 2012->2013 2003->2013 Figure 9: Results of difference extraction and difference-image creation (The number of clustering is 4. The focused color is bright-green and dark green, which means forest area. Retreated parts are shown in orange, and advanced parts are shown in blue.) Source: own. Figure 10: Mapping of an original satel ite image with geo information converted as a KML file Source: own. Y. Kiyoki et al.: A Time-series Semantic-computing Method for 5D World Map System 93. Figure 11: Mapping of the difference-images extracted by SIDE onto Mae Wong area Source: own. In this case, to analyze and predict the time-series stream of deforestation and restoration (recovery) of forest in national parks, the time-series query expression as a new time-series stream wil be created by the area size of deforestation (retreated area and advanced area) by 6 basic functions defined in Section 3 to express the query-time-series-stream. In this case, the time-interval is set as 10 years, and the time-granularity in time-series-stream is set as 1 year. The following is an example of query creation and time-series-context settings for the analysis of deforestation and prediction with time-series semantic computing. Input data for analysis (IRD) = time-series values of area-size of deforestation in Mae Wong in Thailand − Area-size of deforestation in Mae Wong National Park Input data for prediction (ITD) = time-series values of area-size of deforestation in target points (other national parks in Thailand) − Area-size of deforestation in Huai Nam Dang National Park − Area-size of deforestation in Phu Kradueng National Park − Area-size of deforestation in Thaplan National Park 94 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 As for the 6 key elements to express time-series-context, we can apply the followings. 6 Key elements for express time-series-context are: − Original time-granularity (OG) = daily − Target timing-granularity (TG) = peaks of deforestation − Feature extraction method (FEM) = time point of peaks − Focusing time-interval (FW) = from first peak to the last peak 3 or more peaks that matched condition − Differential computing method (DCM) = ratio computing function for, number of days between the selected adjacent peaks, by applying average, difference, and other functions − Pivot extraction method on ITD (PEM) = most recent 2 or more peaks that matched condition 5 Conclusion We have presented a new concept of "Time-series Semantic Computing” for realizing global and temporal environmental analysis. The main feature of this system is to realize semantic time-series analysis in a multiple dimensional semantic space. This space is created for dynamical y computing semantic relations between time-series data resources in different places and time. We have applied this method to timeseries data resources in 5D World Map. This system realizes a remote, interactive and real-time environmental research exchange among multiple and different remote spots in different areas. We have created a semantic-space for time-series analysis in environmental phenomena with multiple-dimensional axes along the time-axis. As the first step, this space is expandable to multiple spots to analyze and compare their time-series data in the global scope for environmental phenomena. We mapped them onto 5-Dimensional World Map System to make time-series semantic interpretations in those spots, as an international collaborative platform for environment analysis, to realize spatiotemporal and semantic interpretations. Y. Kiyoki et al.: A Time-series Semantic-computing Method for 5D World Map System 95. As our future work, we wil extend the Time-series Semantic Computing” realized onto 5-Dimensional World Map System to an international and col aborative research and education system for realizing mutual understanding and global knowledge-sharing on environmental issues in the world-wide scope. Acknowledgement We would like to appreciate “5D World Map System Project” members for their significant discussions and experimental studies. We also greatly appreciate Dr. Petchporn Chawakitchareon and and Dr. Sompop Rungsup for their active and col aborative research activities in environmental analysis. References [1] Yasushi Kiyoki and Saeko Ishihara: “A Semantic Search Space Integration Method for Meta-level Knowledge Acquisition from Heterogeneous Databases,” Information Modeling and Knowledge Bases (IOS Press), Vol. 14, pp.86-103, May 2002. [2] Yasushi Kiyoki, Xing Chen, “A Semantic Associative Computation Method for Automatic Decorative-Multimedia Creation with “Kansei” Information” (Invited Paper), The Sixth Asia-Pacific Conferences on Conceptual Modelling (APCCM 2009), 9 pages, January 20-23, 2009. [3] Yasushi Kiyoki, Xing Chen, Shiori Sasaki, Chawan Koopipat, “A Global y-Integrated Environmental Analysis and Visualization System with Multi-Spectral & Semantic Computing in “Multi-Dimensional World Map”, Information Modelling and Knowledge Bases XXVIII, pp.106-122,2017 [4] Yasushi Kiyoki, Shiori Sasaki, Nhung Nguyen Trang, Nguyen Thi Ngoc Diep, "Cross-cultural Multimedia Computing with Impression-based Semantic Spaces," Conceptual Model ing and Its Theoretical Foundations, Lecture Notes in Computer Science, Springer, pp.316-328, March 2012. [5] Yasushi Kiyoki: “A “Kansei: Multimedia Computing System for Environmental Analysis and Cross-Cultural Communication,” 7th IEEE International Conference on Semantic Computing, keynote speech, Sept. 2013. [6] Shiori Sasaki, Yusuke Takahashi, Yasushi Kiyoki: “The 4D World Map System with Semantic and Spatiotemporal Analyzers,” Information Modelling and Knowledge Bases, Vol.XXI, IOS Press, 18 pages, 2010. [7] Totok Suhardijanto, Yasushi Kiyoki, Ali Ridho Barakbah: “A Term-based Cross-Cultural Computing System for Cultural Semantics Analysis with Phonological-Semantic Vector Spaces,” Information Modelling and Knowledge Bases XXIII, pp.20-38, IOS Press, 2012. [8] Chalisa Veesommai, Yasushi Kiyoki, Shiori Sasaki and Petchporn Chawakitchareon, "Wide-Area River-Water Quality Analysis and Visualization with 5D World Map System", Information Modelling and Knowledge Bases, Vol. XXVII, pp.31-41, 2016. [9] Chalisa Veesommai, Yasushi Kiyoki, “Spatial Dynamics of The Global Water Quality Analysis System with Semantic-Ordering Functions”. Information Modelling and Knowledge Bases, Vol. XXIX, 2018. [10] Yasushi Kiyoki, Asako Uraki, Chalisa Veesommai, “A Seawater-Quality Analysis Semantic-Space in Hawaii-Islands with Multi-Dimensional World Map System”, 18th International Electronics Symposium (IES2016), Bali, Indonesia, September 29-30, 2016. [11] Yasushi Kiyoki, Petchporn Chawakitchareon, Sompop Rungsupa, Xing Chen, Kittiya Samlansin, “A Global & Environmental Coral Analysis System with SPA-Based Semantic Computing for Integrating and Visualizing Ocean-Phenomena with “5-Dimensional World- 96 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Map”, INFORMATION MODELLING AND KNOWLEDGE BASES XXXII, Frontiers in Artificial Intelligence and Applications 333, IOS Press, pp. 76 – 91, Dec 2020. [12] Shiori Sasaki and Yasushi Kiyoki, "Real-time Sensing, Processing and Actuation Functions of 5D World Map System: A Col aborative Knowledge Sharing System for Environmental Analysis", Information Modelling and Knowledge Bases, Vol. XXVIII, IOS Press, pp. 220-239, May 2016. [13] Yasushi Kiyoki, Xing Chen, Chalisa Veesommai, Shiori Sasaki, Asako Uraki, Chawan Koopipat, Petchporn Chawakitchareon and Aran Hansuebsai, “An Environmental-Semantic Computing System for Coral-Analysis in Water-Quality and Multi-Spectral Image Spaces with “Multi-Dimensional World Map”, Information Model ing and Knowledge Bases, Vol. XXVIII, 20 pages, March 2018. [14] Sompop Rungsupa, Petchporn Chawakitchareon, Aran Hansuebsai, Shiori Sasaki and Yasushi Kiyoki, “Photographic Assessment of Coral Stress: Effect of Low Salinity to Acropora sp. Goniopora sp. and Pavona sp. at Sichang Island, Thailand”, Information Modelling and Knowledge Bases, Vol. XXVIII, 20 pages, March 2018. 3 Advanced Applications Ločna – zamenjaj 98 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Ločna - zamenjaj ADAPTIVE CHARGING AND DISCHARGING STRATEGIES FOR SMART GRID ENERGY STORAGE SYSTEMS ALEXANDER DUDKO,1 TATIANA ENDRJUKAITE2 1 Keio University, Graduate School of Media and Governance, Kanagawa, Japan aleksandrsdudko@gmail.com 2 Transport and Telecommunication Institute, Research Department, Riga, Latvia tatianaendrjukaite@gmail.com The current state of energy generation and consumption in the world, where many countries rely on fossil fuels to meet their energy demands, poses significant chal enges in terms of energy security and environmental degradation. To address these chal enges, the world is shifting towards renewable energy sources (RES), which are not only environmentally sustainable but also have the potential to reduce dependence on fossil fuels. However, the intermittency and seasonality of RES arise new chal enges that must be addressed. To overcome these Keywords: chal enges, energy storage systems (ESS) are becoming adaptive charging, increasingly important in ensuring stability in the energy mix and energy storage meeting the demands of the electrical grid. This paper introduces systems, smart grid, charging and discharging strategies of ESS, and presents an energy, important application in terms of occupants’ behavior and renewable energy appliances, to maximize battery usage and reshape power plant sources, simulation, energy consumption thereby making the energy system more occupants’ efficient and sustainable. behavior model DOI https://doi.org/10.18690/um.feri.5.2023.4 ISBN 978-961-286-745-4 100 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 1 Introduction Many countries in the world rely heavily on oil, coal, and natural gas to meet their energy demands [1]. However, this reliance on fossil fuels presents two significant chal enges from a perspective of energy security. Firstly, the import of fossil resources is subject to significant price swings, which can negatively impact the stability of a country's economy and make it difficult to predict the situation in regions that import these resources. Secondly, the use of fossil fuels leads to environmental pollution and the emission of greenhouse gases, such as carbon dioxide (CO2), which contribute to the global warming problem. Growing energy demands and fossil fuels depletion together with COVID-19, after-pandemic times, and global energy crisis in 2021-2022 force modern world to switch energy generation from fossil sources to renewable energy sources (RES) [2, 3]. Renewable energy has great potential to reduce prices and dependence on fossil fuels in the short and long term. On the other side, intermittency and seasonality of renewable energy makes RES hard to use. Although costs for new photovoltaic panels (PV) and wind instal ations have increased [4]. Some regions and countries are starting to introduce RES, but the intermittency of renewable energy is stil covered by peak power plants burning oil and natural gas. To ful y switch to renewables the generated energy has to be stored and used when renewable resources are not available. In such a case it is hard to underestimate the importance of energy storage systems (ESS) for modern world and smart grid (SG) systems. Typical y, a private house connected to the utility power line through a battery would have a solar panel array instal ed on the roof or elsewhere of the property. The solar panels would convert sunlight into direct current (DC) electricity, which would then be fed into an inverter. The inverter would convert the DC electricity into alternating current (AC) electricity, which is compatible with the utility power grid. The AC electricity generated by the solar panels would then be sent to a battery storage system, where it would be stored for later use. The battery storage system would be connected to the utility power grid, al owing the house to draw electricity from the grid when the A. Dudko, T. Endrjukaite: Adaptive Charging and Discharging Strategies for Smart Grid Energy Storage Systems 101. battery is depleted and feed excess electricity back into the grid when the battery is charged. A control system would be used to manage the flow of electricity between the battery, the solar panels, and the utility grid. This control system would monitor the electricity demand of the house and adjust the flow of electricity accordingly. On the other side, energy storage system can have different goals [5]. For example, an energy storage can be used to make renewable energy output more stable for bringing it to a combined energy mix for large regions. ESS can be used for energy shifting to make electricity availability supply curve meet the electricity demand curve. Another use case is voltage and frequency regulation which is a typical issue in Smart Grids with many distributed renewable sources such as PV instal ed local y at residential houses and residential wind turbines, and which feed the energy back to the grid. Charging and discharging strategy can be optimized to solve specific goal: maximize battery usage to reduce power plant (fossil fuels) energy consumption, based on statistical data and probabilities decide when to charge and when to discharge, such as charge when grid frequency goes up and discharge when frequency goes down. This paper introduces adaptive charging and discharging strategies based on energy availability data and energy demand data. We propose a model which controls battery use based on consumption demand and selected charging/discharging strategy represented in the form of a function of battery internal state. In a very simple case the battery is always used or a threshold value is defined. A more advanced case takes into account energy storage efficiency factor, capacity, charging and discharging speeds, and other characteristics. This paper is organized as fol ows: Related work is presented in Section 2. Section 3 describes charging and discharging strategies. Experiments results and discussions are presented in Section 4. Section 5 gives conclusions and discusses a future work. 102 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 2 Related work Over the past 20 years, researchers have been searching for new and alternative energy systems. There have been various attempts to develop concepts for distributed energy generation from renewable sources, as wel as new designs for energy distribution and storage. Some researchers aim to make use of existing infrastructure and make the transition to a new system as seamless as possible, while others believe that starting from scratch with the option of integrating into the existing electrical grid is the best approach. Rikiya Abe et al. in 2011 for the first time have presented an idea of a digital grid (IEEE Transactions on Smart Grid). They have been presented a new concept of a grid as a splitting electrical grid into cel s and connect them with an electrical device to control the energy share between the cel s. They have presented the design of the proposed system with very little details, lacking operation examples or simulations. At the same time the research also lacks analysis and integration overview from the penetration of RES into the digital grid [6]. The idea of grid digitalization was growing very fast in some smal er projects, such as an Open Energy System research in Okinawa Island in Japan [7]. There is a direct current (DC) based Open Energy System (DCOES) joint research project. This project was researching on a DC-based bottom-up system that generates, stores, and shares electrical energy. Annette Werth et al., in “Evaluation of centralized and distributed microgrid topologies and comparison to Open Energy Systems” (IEEE International Conference on Environment and Electrical Engineering 2015) study was examining microgrid topologies that combine solar panels and batteries for a community of 20 residential houses [8]. They consider a system with centralized PV and batteries that distributes energy to the 20 homes, they also consider 20 standalone homes with roof-top PV and batteries. The virtual synchronous generator has gained significant interest as it operates similarly to a synchronous generator, making it a viable option for connecting distributed generation to the main power grid. However, the power and frequency output of a virtual synchronous generator can be unstable during significant power fluctuations in the distributed generation system. Fei Wang et al. studied how changes in parameters affect the active power and frequency of a virtual A. Dudko, T. Endrjukaite: Adaptive Charging and Discharging Strategies for Smart Grid Energy Storage Systems 103. synchronous generator. They analyzed the impact of power fluctuations and developed a smal -signal model to understand the dynamic behavior. Based on their research, they proposed a new adaptive control strategy and confirmed its effectiveness through experiments [9]. Hui Guo et al. published their research (IEEE Transactions on Industrial Informatics 2019) on the basis of bidding information. The real-time transaction was implemented to track the origin and destination of power transmission, as well as the amount and timing of power flow. To minimize losses from conversion and transmission, a minimum loss routing method was chosen for the transaction of power. The proposed optimization algorithm for selecting the minimum loss routing and managing congestion was confirmed through simulation results. [10]. Lijun Zhang et al. are researching on a novel matrix converter‐based topology to be applied in smart transformers based on the concept of multiple modularity. The conventional smart transformer topology, which uses H-bridge modules and DC electrolytic capacitors, was replaced with a new design that utilizes two matrix modules for greater flexibility in AC-AC structures. This new design was thoroughly analyzed, including a detailed examination of the impact of switching sequences on capacitor voltages. To validate the proposed topology and its analysis, simulations and hardware-in-loop experiments were conducted. [11]. K. Chaudhari et al. proposed a hybrid optimization algorithm for energy storage management, which shifts its mode of operation between the deterministic and rule-based approaches depending on the electricity price band al ocation. The cost degradation model for the energy storage system (ESS) and the levelized cost of photovoltaic (PV) power was applied to electric vehicle (EV) charging stations. The algorithm was divided into three parts: classifying real-time electricity prices into different categories, determining the real-time PV power from solar radiation data, and optimizing the operating cost of the EV charging station that combines PV and ESS to minimize expenses [12]. Another battery energy storage system based on direct method to control the power converter for fast compensation of grid voltage instability without energy management system has been proposed by D. -J. Kim et al. A new approach for improving the power quality at an electric vehicle charging station (EVCS) has been 104 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 developed. This method uses a model predictive voltage control scheme that is based on a disturbance observer, and it operates without the need for communication infrastructure. The proposed controller takes into account parameter uncertainties and uses a systematic design procedure that includes stability analysis. The performance of the controller was verified through tests using a simulation testbed that was designed to closely resemble an actual EVCS. [13]. Daniel Kucevic’s et al suggest a system for managing multiple battery energy storage systems located at electric vehicle charging stations within a distribution grid. The method involves linear optimization and time series modeling, with the goal of reducing peak power levels. A simulation tool was created to combine a power flow model with a battery energy storage system model to better understand the impact of storage systems on the distribution grid. [14]. Another research was done on occupant behavior data collection. This paper introduces a dataset of electricity usage in residential homes in Uruguay that was col ected by the Uruguayan electricity company (UTE) and studied by Universidad de la República. The purpose of the dataset is to analyze consumer behavior and uncover patterns of energy consumption that can be used to improve electricity services. The dataset is publicly accessible and stored in a public repository. It is confirmed by three subsets that cover total household consumption, electric water heater consumption, and energy consumption by appliance, with sample intervals from 1 to 15 minutes. The total household consumption subset includes the total aggregated consumption of 110,953 households distributed in the 19 departments of Uruguay. On average, each household was monitored for 539.2 days and each day counts with 95.2 records [15]. Salvatore Carlucci and etl in their work on modeling occupant behavior in buildings studied reviews approaches, methods and key findings related to occupants’ presence and actions (OPA) modeling in buildings. A comprehensive collection of research papers on the subject has been assembled and analyzed using bibliometric techniques. The initial review uncovered over 750 studies, with 278 selected for further analysis. These publications give a comprehensive overview of the progress and evolution of OPA modeling methods. The methods in the chosen literature have been divided into three categories: rule-based models, stochastic OPA A. Dudko, T. Endrjukaite: Adaptive Charging and Discharging Strategies for Smart Grid Energy Storage Systems 105. modeling, and data-driven methods for modeling functions related to occupancy and the actions of occupants. [16]. Moreover, currently renewable energy generators have instal ation limitations in the modern power grid due to both technical and policy reasons, which further complicate the penetration of RES to energy mix models. Authors in [17] discuss the chal enges of renewable energy penetration on power system flexibility. 3 Adaptive charging and discharging approach Electrical energy storage in batteries is becoming a crucial thing in our life. We use batteries in mobile phones, watches, laptops, and headphones. Batteries have been used in cars for decades, but with the popularization of electric vehicles (EV) it has become especial y important. Now we see more and more in-house appliances which benefit from utilizing batteries, such as electric toothbrush, audio speakers, hair trimmer, cordless vacuum cleaner, and so on. There are a number of benefits that can be achieved for an appliance when there is a battery, such as: − less dependent on energy availability, for example when a solar panel was used to charge a device, the device can be further used when the sun no longer available; − in case of short power outage times a device with a battery is stil powered and ready for use; − high power consumption of a device can be replaced with a lower charging power spread over longer time, so the power line to utility can be lower; − the device becomes free from socket and wires, so it can be carried along with the user inside and outside of the house. Even though batteries possess several advantages when utilized in appliances, it is important to acknowledge that there are also disadvantages and limitations associated with the usage of batteries as wel : − battery capacity is limited, so in some cases there may be insufficient amount of energy to complete a desired activity, other kinds of devices might be 106 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 using too much energy that those can hardly be powered with a battery of reasonable size; − batteries degrade over time, their capacity goes down, and may even have to be replaced in order to keep using the device; − device with a battery is typical y more complex and more expensive compared to similar analogues without a battery; − devices with batteries usual y are not able to share energy between each other, such that a mobile phone cannot be recharged from a cordless dust sucker; − batteries have to be recharged from time to time to keep working, so the freedom from wires and sockets is limited. In this work we consider a standalone energy storage device consisting of a battery and a Control Unit (CU). It can be plugged between the utility main line and any device. The layout for a customer can be set up as a single shared battery for the entire household as shown on Figure 1(a) or an individual battery per every appliance device as shown on Figure 1(b). a) b) Figure 1: Setup layout options: a) one shared battery per customer, and b) individual battery per device. Source: own. There are advantages and disadvantages to both setup layouts. On one hand, instal ing a single large battery for a house looks reasonable and simpler. But on the other side, every device has its own usage pattern, power, and priority. So, it would make more sense to setup individual battery per appliance with proper characteristics matching the device electricity usage pattern. A. Dudko, T. Endrjukaite: Adaptive Charging and Discharging Strategies for Smart Grid Energy Storage Systems 107. Going further, combination of both options probably gives another alternative setup layout, where individual batteries work for targeted devices and a shared battery for the entire consumption of the customer. Also, when there is a shared energy storage system for a private house, there is also typically a private micro-generator instal ed in the system, such as a solar PV panel. Figure 2 shows a more realistic setup. Figure 2: Mixed setup layout of batteries and use of private energy generation source, such as PV. Source: own. In a simple case we have a battery which is being used until ful y discharges, and then the remaining part of the demand has to be covered from the utility line. This scenario is shown in Figure 3. Scenario, where battery is ful y used until it gets total y discharged and the remaining time is covered by utility power line. This approach is good enough when battery capacity is large enough and periods of electricity demand by the consumer devices are short, so that battery capacity is larger than amount of electrical energy consumed during one session of device usage. 120 consumer demand 100 u�lity demand 80 ba�ery charge/discharge r, W 60 Powe 40 20 0 0 5 10 15 20 25 30 35 40 Time Figure 3: Scenario, where battery is ful y used until it gets total y discharged and the remaining time is covered by utility power line. Source: own. 108 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 In the scenario described in the Figure 3, the consumer device was fully supplied with the energy it needed. However, after the described case the battery is total y discharged, and it would not be possible to use it for the next session of consumer device usage. Battery charging has to happen along the way so that the overal process tends to optimize the battery usage. Batteries, charging and discharging can happen in various ways. In this paper we do not touch the details of the technical differences of various types of batteries, as wel as we do not target the differences in technical requirements of charging specific kinds of batteries. In this paper a battery is conceptual y viewed as an device for the storage of electrical energy that operates through the process of charging and discharging. The amount of stored capacity of a battery is increased through the charging process, which al ows a predetermined amount of energy to be accumulated. This stored energy can be recovered upon demand through the discharge process. The fundamental properties of a battery are defined by a set of characteristic parameters: − capacity, − maximum charge power, − maximum discharge power, − storage efficiency. The approach of determining when the battery has to take energy for charging, when the energy has to be given back, and by with what power, depending on consumer demand, battery internal state, as wel as other possible factors is cal ed charging and discharging strategy. We assume that energy demand can be covered not only from one of two sources, but also as an arbitrary mix of two of these sources. For example, 65% of energy is taken from a battery and the remaining 35% from utility line at any point of time. This can be achieved through various methods, such as by means of transformers, AC/DC converters, smart energy routers [18, 19], pulse-width modulation (PWM), etc. The efficiency and choice of the mix method is outside of the scope of this paper. A. Dudko, T. Endrjukaite: Adaptive Charging and Discharging Strategies for Smart Grid Energy Storage Systems 109. One of the options for taking the internal state of the battery into account is to discharge battery proportional y to the state of charge. In that case, when the battery is charged to 100% the consumption demand is ful y covered by the battery. When at time t battery has 85% of its charge then only 85% of load power at time t will be covered by the battery and the other 15% should be taken from the utility. That way, battery use decreases exponential y over time which makes battery being used longer although covering only part of the demand power. At the same time, utility demand power is lower and increases gradual y rather than as a step when the device is turned on. This case is displayed in Figure 4. This approach can be cal ed discharge strategy S which is based on battery state of charge (SoC), and no charging was involved. 120 consumer demand 100 u�lity demand 80 ba�ery charge/discharge , W 60 er 40 Pow 20 0 0 5 10 15 20 25 30 35 40 -20 Time Figure 4: Scenario with proportional discharge strategy. Source: own. Calculation formulas for battery charge power PBC and battery discharge power PBD are as shown in (1.1) and (1.2). Where PD is a consumer demand power, PCMAX is maximum charge power, PDMAX is maximum discharge power, FC and FD are charge and discharge strategy functions based on battery internal state. 𝑃𝑃𝐵𝐵𝐵𝐵 = 𝐹𝐹𝐵𝐵(𝐺𝐺𝑐𝑐𝐷𝐷) × 𝑃𝑃𝐵𝐵𝐶𝐶𝐶𝐶𝐶𝐶 (1.1) 𝑃𝑃𝐵𝐵𝐵𝐵 = 𝑎𝑎𝑔𝑔𝑔𝑔(𝑃𝑃𝐵𝐵 × 𝐹𝐹𝐵𝐵(𝐺𝐺𝑐𝑐𝐷𝐷), 𝑃𝑃𝐵𝐵𝐶𝐶𝐶𝐶𝐶𝐶) (1.2) That way we get battery power PB as a difference between the discharge and charge power as shown in (2). When PB is greater than zero, battery is discharging, when it is below zero it is charging. 110 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 𝑃𝑃𝐵𝐵 = 𝑃𝑃𝐵𝐵𝐵𝐵– 𝑃𝑃𝐵𝐵𝐵𝐵 (2) Available capacity CAV would be defined as differential equation as shown in (3.1) and which can be expanded in (3.2). 𝑜𝑜𝐷𝐷𝐶𝐶𝐴𝐴 𝑜𝑜𝑔𝑔 = – 𝑃𝑃𝐵𝐵 = 𝑃𝑃𝐵𝐵𝐵𝐵 – 𝑃𝑃𝐵𝐵𝐵𝐵 (3.1) 𝑜𝑜𝐷𝐷𝐶𝐶𝐴𝐴 (3.2) 𝑜𝑜𝑔𝑔 = 𝐹𝐹𝐵𝐵(𝐺𝐺𝑐𝑐𝐷𝐷) × 𝑃𝑃𝐵𝐵𝐶𝐶𝐶𝐶𝐶𝐶 – 𝑎𝑎𝑔𝑔𝑔𝑔(𝑃𝑃𝐵𝐵 × 𝐹𝐹𝐵𝐵(𝐺𝐺𝑐𝑐𝐷𝐷), 𝑃𝑃𝐵𝐵𝐶𝐶𝐶𝐶𝐶𝐶) Utility demand power PU in turn is defined as a remaining power that is required to cover the consumer demand and battery charge, taking into account charge efficiency EC, as shown in equation (4). 𝑃𝑃 𝑃𝑃 𝐵𝐵 𝑈𝑈 = 𝑃𝑃𝐵𝐵– 𝐹𝐹𝐵𝐵 (4) Charging and discharging strategies functions are defined as multiplier in range between 0 and 1. In the simplest case, these functions may always return 1 which would mean that the battery charges at the maximum possible power, as wel as discharges at the maximum power according to demanded load. Table 1: Charging and discharging strategies Strategy Function Charge Discharge S1 𝑜𝑜(𝑚𝑚) = 𝑚𝑚 CS1 - Proportional DS1 - Proportional S2 𝑜𝑜(𝑚𝑚) = 𝑚𝑚𝑘𝑘 CS2 - Optimistic DS2 - Wasteful S3 𝑜𝑜(𝑚𝑚) = √ 𝑘𝑘 𝑚𝑚 CS3 - Greedy DS3 - Economical S4 𝑜𝑜(𝑚𝑚)  =  1/ �1 + 𝑚𝑚−𝑘𝑘(𝑒𝑒−0.5)� CS4 - Balanced DS4 - Balanced S5 𝑜𝑜(𝑚𝑚) = 1 CS5 - Full charge DS5 - No use S6 𝑜𝑜(𝑚𝑚) = 0 CS6 - No charge DS6 - Full use In this paper we have taken 6 strategies for both charging and discharging to compare, so the overal battery available capacity over time depends on the battery state of charge (SoC), demand power, maximum charge power and maximum discharge power. The strategies are described in Table 1. The functions are represented in Figure 5. A. Dudko, T. Endrjukaite: Adaptive Charging and Discharging Strategies for Smart Grid Energy Storage Systems 111. 100% 100% ax and 80% 80% er S1 to dem 60% ow S2 60% , rela�ve to m a�ve er 40% S3 , rel ow 40% er charge p S4 ow 20% S5 20% Charge p S6 0% 0% 0% 20% 40% 60% 80% 100% Discharge p 0% 20% 40% 60% 80% 100% Depth of discharge (DoD) Depth of discharge (DoD) Figure 5: Charge and discharge strategies based on battery state of charge. Source: own. 4 Experiments and Discussions Residentials depend on electricity constantly, because a lot of appliances are working simultaneously, such as refrigerator, electrical heating, air conditioning, microwave, and so on. When an electrical issue arises, for example a power failure caused by a storm or there is a tripped breaker or any other problem with electricity in the circuit, the understanding of how an electrical system operates can be valuable in resolving the problem and restoring the power. Experiments on Strategies. Comparison of the strategies is performed on reference demand signal which is 100 W consumption over 16 seconds. Battery capacity is set to be 0.25 Wh. The comparison results of strategies experiments CS1 – CS6 and DS1 – DS6 are presented in Figure 6. In these experiments we can see that al cases with strategy DS5 end up with the battery not involved in the process. A very similar situation can be seen with the charging strategy CS5. At the beginning of the demand battery tries to cover the demand, but then battery usage is quickly reduced by the aggressive charge strategy, and even in some cases fading oscil ations appear in front. Strategy CS6DS6 is identical to ful use of battery until it is completely discharged and then the energy source is switched to utility line. All other strategies use battery and recharge it in different ways. 112 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Figure 6: Charge and discharge strategies comparison. Source: own. Strategies where discharging is DS3 and where charging is CS3 are mostly preserving the energy in the battery and it can be seen that utility demand quickly raises to power of the customer demand. In many cases we see that utility demand starts to raise immediately when the consumer demand increases. Although, there are several strategies, such as CS2DS2, CS4DS2, CS2DS4, CS2DS6, where utility demand stays close to zero for some time, so the device tries to cover the customer demand only by utilizing the battery. Experiments on Capacity show that the system behavior was evaluated depending on various battery capacity sizes. In Figure 7 are shown the results diagrams for the strategies CS1DS6 and CS4DS2 over battery capacities 0.25 Wh, 0.5 Wh, 1 Wh, 2 Wh, and 4 Wh. We can see that, when the capacity of the battery is getting bigger the strategies in both cases tend to use only battery power. A. Dudko, T. Endrjukaite: Adaptive Charging and Discharging Strategies for Smart Grid Energy Storage Systems 113. C, Strategy Wh CS1DS6 CS4DS2 consumer demand consumer demand 100 100 u�lity demand u�lity demand 50 ba�ery power 50 ba�ery power 0.25 , W , W er 0 er 0 Pow 0 5 10 15 20 25 30 35 40 Pow 0 5 10 15 20 25 30 35 40 -50 -50 -100 -100 Time Time consumer demand 100 u�lity demand 50 ba�ery power 0.50 , Wer 0 Pow 0 5 10 15 20 25 30 35 40 -50 -100 Time consumer demand 100 u�lity demand 50 ba�ery power 1.00 , Wer 0 Pow 0 5 10 15 20 25 30 35 40 -50 -100 Time consumer demand 100 u�lity demand 50 ba�ery power 2.00 , Wer 0 Pow 0 5 10 15 20 25 30 35 40 -50 -100 Time consumer demand consumer demand 100 100 u�lity demand u�lity demand 50 ba�ery power 50 ba�ery power 4.00 , W , W er 0 er 0 Pow 0 5 10 15 20 25 30 35 40 Pow 0 5 10 15 20 25 30 35 40 -50 -50 -100 -100 Time Time Figure 7: Comparison of strategies over battery capacity. Source: own. The main difference of the strategies plays a role when the consumer demand drains the battery significantly. Given the 100 W consumption over 16 seconds makes the total energy of the consumption demand data equal to 0.44 Wh. So, for the case of strategy CS4DS2 when the battery capacity is twice as much as the consumed energy of the customer, the power is almost ful y covered by the battery. Experiments on Power Limits. In these experiments, the system behavior was evaluated depending on reduced charge power and reduced discharge power options. In Figure 8 the diagrams for the strategy CS4DS2 are shown. We can see that when only charge power is limited, it affects the result insignificantly. But when 114 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 both charge power and discharge power are limited, the overal result becomes different to the initial setup. The case when PCMAX = 20 and PDMAX = 20 shows that when limits are below the demand power and battery capacity is enough then the system behaves as a mechanism for utility line power reduction for a given margin. PCMAX, W 100.0 60.0 20.0 100 100 50 50 0 0 100.0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 -50 -50 -100 -100 100 100 100 , W 50 50 50 AX 60.0 0 0 0 0 5 10 15 20 25 0 5 10 15 20 25 30 0 5 10 15 20 25 30 -50 -50 -50 PDM -100 - 100 -100 100 100 100 50 50 50 20.0 0 0 0 0 5 10 15 20 25 0 5 10 15 20 25 30 0 5 10 15 20 25 30 -50 -50 -50 -100 - 100 -100 Figure 8: Comparison of strategy behaviors with limitations of maximum charge and discharge power. Source: own. Experiments on Demand Shape. During experiments on demand shape the system behavior was estimated on various kinds of consumer demand signal types. Figure 9 shows the diagrams for the strategies CS1DS6 and CS4DS2 with battery capacity equal to 0.25 Wh, and charge/discharge limits of 100 W, which is above the demand power. Types of input signals included one-time session of demand, periodic load, increasing load as a step, and decreasing load as a step. In the considered cases strategy CS1DS6 provides a smoother utility demand power, because it utilizes battery more intensively. The scenario of decreasing demand is quite a typical case when a device needs more power right after startup but then the demand decreases. Both strategies which are shown in the comparison results covered the front peak very wel . A. Dudko, T. Endrjukaite: Adaptive Charging and Discharging Strategies for Smart Grid Energy Storage Systems 115. Demand Strategy description CS1DS6 CS4DS2 100 100 One time 50 50 demand 0 0 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 -50 -50 -100 -100 100 100 Periodic 50 50 demand 0 0 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 -50 -50 -100 -100 100 100 Increasing 50 50 demand 0 0 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 -50 -50 -100 -100 100 100 Decreasing 50 50 demand 0 0 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 -50 -50 -100 -100 Long-term 100 100 demand with 50 50 occasional 0 0 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 -50 -50 increase -100 -100 Figure 9: Comparison of strategy behaviors on various types of demand signal. Source: own. Experiments on Uruguay Dataset. In these experiments we have used data, which was collected during the research work on information on the behavior of electricity users. This research presented a dataset of electricity consumption in residential homes in Uruguay, col ected by the Uruguayan Electricity Company (UTE) and analyzed by Universidad de la República [15, 20]. The goal of the dataset is to study occupants’ behavior and discover patterns of energy consumption that can improve the electricity service. The dataset is open to the public and consists of three parts, which focus on overal household consumption, consumption by electric water heater, and energy usage by appliance, with time intervals ranging from 1 to 15 minutes. 116 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 4000 3000 , W 2000 Power 1000 00:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 0:00 Figure 10: Load dataset example of one household (customer id 170004, date 2019-09-11). Source: own. The utility company is responsible for ensuring that electricity is supplied to residential properties. This includes maintaining the power lines up to the attachment point, which is referred to as the load side. From this point, the responsibility for ensuring that the electrical system functions properly falls on the homeowner. This includes addressing issues such as circuit overload, which can cause power outages. It is important for occupants to understand their responsibilities when it comes to the electrical system in their home in order to ensure that it operates safely and effectively. A circuit overload means that there are too many high-powered appliances operating on the same electric circuit, for example hair dryer, air conditioner, washing or tumble-dryer machines, electric kettle, etc. An overloaded circuit means that occupants are using more electricity than the circuit is made for. In such a case the electrical system in a residence wil experience a shutdown due to a circuit breaker in the service panel being triggered. Circuit breakers are a reliable solution for preventing electrical fires caused by overloads, but the safest approach is to manage electricity usage to avoid overloads in the first place. By taking proactive steps to control electricity usage and avoid overloading the system, homeowners can help ensure the safety and stability of their electrical system. In this paper we have used the Disaggregated Energy Consumption by appliance dataset, which consists of two relevant data: the total aggregated consumption records of nine households in Montevideo, and the disaggregate consumption of a set of appliances in each household (e.g., lamps, fridges, air conditioner, etc.). The sampling interval is one minute, and the date range of consumption records is from 27th August 2019 to 16th September 2019. Appliances vary by customer and A. Dudko, T. Endrjukaite: Adaptive Charging and Discharging Strategies for Smart Grid Energy Storage Systems 117. typically include dehumidifier, tumble dryer, microwave, electric oven, electric air heater, washing machine, fridge, electric water heater, and air conditioner. The breakdown of the load demand data for customer id 170004 for entire day of 2019-09-11 is shown in Figure 11. Appliance Demand 1500 Electric water , W 1000 er heating 500 Pow 0 300 , W 200 Fridge er 100 Pow 0 2000 , W Microwave er 1000 Pow 0 1500 , W 1000 Tumble dryer er 500 Pow 0 2000 Washing , Wer 1000 machine Pow 00:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 0:00 Figure 11: Breakdown of the load demand data by appliance. Source: own. 1200 800 er, W 400 Pow 0 Figure 12: Electric water heating appliance demand with battery 250 Wh. Source: own. 118 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 400 er, W Pow 0 Figure 13: Fridge appliance demand with 300 Wh battery. Source: own. 1600 1200 er, W 800 Pow 400 0 Figure 14: Microwave appliance demand with 500 Wh battery. Source: own. 1600 1200 er, W 800 Pow 400 0 Figure 15: Tumble dryer appliance demand with 300 Wh battery. Source: own. 2000 1600 1200 er, W 800 Pow 400 0 Figure 16: Washing machine appliance demand with 1000 Wh battery. Source: own. Every appliance had its own individual battery. Its capacity was chosen individually based on the appliance demand. Capacity of batteries in the presented experiments make up a total of 2350 Wh. The same amount can be used as a single shared battery instead or can be shared between appliances in a different split in mixed layout. The details of the considered cases are presented in Table 2. A. Dudko, T. Endrjukaite: Adaptive Charging and Discharging Strategies for Smart Grid Energy Storage Systems 119. Table 2: Battery capacities split by appliances and by experiment cases. Battery Appliance Capacity, Wh Case 1 Case 2 Case 3 Individual Electric water heating 0 250 250 Individual Fridge 0 300 0 Individual Microwave 0 500 500 Individual Tumble dryer 0 300 0 Individual Washing machine 0 1000 600 Shared Entire household 2350 0 1000 The comparison of results is shown in Figure 17. We can see that al cases give relatively similar output. Although, many short-term load periods are smoothed out, the Case 2 where only individual batteries were involved, we can see that water heater battery capacity was not enough to avoid medium peaks. Those medium peaks are especially visible in the period between 14:00 and 16:00. Longer-term periods remain almost the same as in the original customer demand. 4000 Customer demand (All appliances in total) U�lity demand (Case 1 - Shared ba�ery) 3000 U�lity demand (Case 2 - Individual ba�eries) U�lity demand (Case 3 - Mixed ba�eries) er, W 2000 Pow 1000 00:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 0:00 Figure 17: Comparison of results for shared battery, individual batteries, and mixed batteries layouts. Source: own. Various kinds of appliances have significantly different level of load power, however the importance and priorities of such appliances for the customer is very different. Therefore, mixed layout of batteries as in Case 3 becomes a good choice, and the battery sizes is the subject for configuration per customers individually based on appliances and their usage patterns. In this set of experiments the overal peak was significantly lower for al cases with batteries compared to consumer demand. The comparison is given in Error! Reference source not found. We can see that with the use of batteries, the peak power has decreased to approximately 60%. This is a significant value which 120 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 contributes to reducing the utility power line and generation peaks which are in general more expensive. Table 3: Peak power comparison for experiment cases 1-3. Scenario Peak power, W Relative Customer demand (Al appliances in total) 4213.00 100.0% Utility demand (Case 1 - Shared battery) 2817.06 66.9% Utility demand (Case 2 - Individual batteries) 2484.16 59.0% Utility demand (Case 3 - Mixed batteries) 2518.11 59.8% 5 Conclusions and future work Energy generation and consumption in the world have significant chal enges in terms of energy security and environmental degradation. Renewable energy sources help to address these chal enges significantly and therefore have become very widespread in recent years. However, to ful y switch to renewables the generated energy has to be stored and used when renewable resources are not available. In such a case it is hard to underestimate the importance of energy storage systems for the modern world and Smart Grid systems. This paper introduces an adaptive charging and discharging approach with various strategies based on energy availability and energy demand. We propose a model which controls battery use based on consumption demand and selected charging/discharging strategy represented in the form of a function of battery internal state. In the model we take into account battery total capacity, available amount of energy in the battery in a given time, charging strategy, discharging strategy, energy storage efficiency factor, maximum charging and discharging power. Six strategies have been defined which can be applied for both charging and discharging. The experiments present the comparison of adaptive energy storage system behavior depending on various setups of strategies, battery capacity, demand load signal, and power limits. Disaggregated energy consumption by appliance dataset of nine households in Montevideo from Uruguay was used in this paper. Three batteries setup layouts for a household were compared, which include the layout of individual batteries per appliance, single shared battery for entire household, and a mixed approach. A. Dudko, T. Endrjukaite: Adaptive Charging and Discharging Strategies for Smart Grid Energy Storage Systems 121. In the future work we plan to extend the strategies to take into account utility cost of electricity which varies during the day. We plan to include private microgeneration options as wel as to explore in more details the optimal battery capacity split between appliances. Acknowledgement This work was financialy supported by the specific support objective activity 1.1.1.2. “Post-doctoral Research Aid” (Project id. N. 1.1.1.2/16/I/001) of the Republic of Latvia, funded by the European Regional Development Fund. Tatiana Endrjukaite research project No. 1.1.1.2/VIAA/1/16/095 “Integrated Model for Energy Generation, Distribution and Management”. References [1] Mayer, A. Fossil fuel dependence and energy insecurity. Energy, Sustainability and Society, 12, 27 (2022). https://doi.org/10.1186/s13705-022-00353-5 [2] World Energy Outlook 2022, IEA, Paris https://www.iea.org/reports/world-energy-outlook-2022, License: CC BY 4.0 (report); CC BY NC SA 4.0 (Annex A), 2022. [3] Ozili, Peterson K and Ozen, Ercan, Global Energy Crisis: Impact on the Global Economy (January 2, 2023). Available at SSRN: https://ssrn.com/abstract=4309828 or http://dx.doi.org/10.2139/ssrn.4309828 [4] Renewable Energy Market Update - May 2022, IEA, Paris https://www.iea.org/reports/renewable-energy-market-update-may-2022, License: CC BY 4.0, 2022. [5] H. Ibrahim, A. Ilinca, J. Perron. Energy storage systems – Characteristics and comparisons. Renewable and sustainable energy reviews 2008; 12(5): 1221-1250. [6] R. Abe, H. Taoka, D. McQuilkin. Digital grid: communicative electrical grids of the future. IEEE Transactions on Smart Grid 2011; 2(2): 399–410. [7] Okinawa Institute of Science and Technology, OIST Open Energy System Project, https://www.oist.jp/news-center/news/2015/2/17/energy-starts-home [8] A. Werth, N. Kitamura, I. Matsumoto and K. Tanaka, "Evaluation of centralized and distributed microgrid topologies and comparison to Open Energy Systems (OES)," 2015 IEEE 15th International Conference on Environment and Electrical Engineering (EEEIC), Rome, Italy, 2015, pp. 492-497, doi: 10.1109/EEEIC.2015.7165211. [9] F. Wang, L. Zhang, X. Feng and H. Guo, "An Adaptive Control Strategy for Virtual Synchronous Generator," in IEEE Transactions on Industry Applications, vol. 54, no. 5, pp. 5124-5133, Sept.-Oct. 2018, doi: 10.1109/TIA.2018.2859384. [10] H. Guo, F. Wang, L. Li, L. Zhang and J. Luo, "A Minimum Loss Routing Algorithm Based on Real-Time Transaction in Energy Internet," in IEEE Transactions on Industrial Informatics, vol. 15, no. 12, pp. 6446-6456, Dec. 2019, doi: 10.1109/TII.2019.2904188 [11] Lijun Zhang, Alexandre Bento, Guilherme Paraíso, Pedro Costa, Sónia Ferreira Pinto, José Fernando Silva, Fei Wang. Multiple modularity topology for smart transformers based on matrix converters. IET Electric Power Applications. pp. 926-940, 2022. DOI: 10.1049/elp2.12200 [12] K. Chaudhari, A. Ukil, K. N. Kumar, U. Manandhar and S. K. Kol imalla, "Hybrid Optimization for Economic Deployment of ESS in PV-Integrated EV Charging Stations," in IEEE Transactions on Industrial Informatics, vol. 14, no. 1, pp. 106-116, Jan. 2018, doi: 10.1109/TII.2017.2713481. [13] D. -J. Kim, B. Kim, C. Yoon, N. -D. Nguyen and Y. I. Lee, "Disturbance Observer-Based Model Predictive Voltage Control for Electric-Vehicle Charging Station in Distribution Networks," in 122 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 IEEE Transactions on Smart Grid, vol. 14, no. 1, pp. 545-558, Jan. 2023, doi: 10.1109/TSG.2022.3187120. [14] Daniel Kucevic, Stefan Englberger, Anurag Sharma, Anupam Trivedi, Benedikt Tepe, Birgit Schachler, Holger Hesse, Dipti Srinivasan, Andreas Jossen, Reducing grid peak load through the coordinated control of battery energy storage systems located at electric vehicle charging parks, Applied Energy, Volume 295, 2021, 116936, ISSN 0306-2619, https://doi.org/10.1016/j.apenergy.2021.116936 [15] Chavat, J., Nesmachnow, S., Graneri, J. et al. ECD-UY, detailed household electricity consumption dataset of Uruguay. Sci Data 9, 21 (2022). https://doi.org/10.1038/s41597-022-01122-x [16] Salvatore Carlucci, Marilena De Simone, Steven K. Firth, Mikkel B. Kjærgaard, Romana Markovic, Mohammad Saiedur Rahaman, Masab Khalid Annaqeeb, Silvia Biandrate, Anooshmita Das, Jakub Wladyslaw Dziedzic, Gianmarco Fajil a, Matteo Favero, Martina Ferrando, Jakob Hahn, Mengjie Han, Yuzhen Peng, Flora Salim, Arno Schlüter, Christoph van Treeck, Modeling occupant behavior in buildings, Building and Environment, Volume-174, 2020, 106768, ISSN:0360-1323, https://doi.org/10.1016/j.buildenv.2020.106768 [17] Semich Impram, Secil Varbak Nese, Bülent Oral, Chal enges of renewable energy penetration on power system flexibility: A survey, Energy Strategy Reviews, Volume 31, 2020, 100539, ISSN 2211-467X, https://doi.org/10.1016/j.esr.2020.100539. [18] Alexander Dudko, Tatiana Endrjukaite, and Leon R. Roose. 2020. Open Routed Energy Distribution Network based on a Concept of Energy Router in Smart Grid. In Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services (iiWAS2019). Association for Computing Machinery, New York, NY, USA, 483–491. https://doi.org/10.1145/3366030.3366036 [19] T. Endrjukaite, A. Dudko, and L. Roose. 2019. Energy Exchange Model in Routed Energy Distribution Network. In Proc. of the 6th ACM International Conference BuildSys '19. Association for Computing Machinery, New York, NY, USA, 393–394. https://doi.org/10.1145/3360322.3361017 [20] Chavat, Juan Pablo; Nesmachnow, Sergio; Graneri, Jorge; Alvez, Gustavo (2022): ECD-UY: Detailed household electricity consumption dataset of Uruguay. figshare. Collection. https://doi.org/10.6084/m9.figshare.c.5428608.v1 CONDITION MONITORING AND FAULT DETECTION OF A LASER OSCILLATOR FEEDBACK SYSTEM ARNE GRÜNHAGEN,1, 2, 3 ANNIKA EICHLER,2, 3 MARINA TROPMANN-FRICK,1 GÖRSCHWIN FEY2 Hamburg University of Applied Sciences, HAW, Hamburg, Germany arne.gruenhagen@desy.de, marina.tropmann-frick@haw-hamburg.de Hamburg University of Technology, TUHH, Hamburg, Germany arne.gruenhagen@desy.de, annika.eichler@desy.de, goerschwin.fey@tuhh.de Deutsches Elektronen-Synchrotron DESY, Hamburg, Germany arne.gruenhagen@desy.de, annika.eichler@desy.de The successful operation of industrial plants like the European X-Ray Free Electron Laser relies on the correct functioning of many dynamic systems that operate in a closed loop with controllers. In this paper, we present how data-based machine learning methods can monitor and detect disturbances of such dynamic systems based on the controller output signal. We implement four feature extrtion methods based on statistics from the time domain, statistics from the frequency domain, characteristics of spectral peaks, and the autoencoder latent space representation of the frequency domain. These extracted features require no system understanding and can easily be transferred to other dynamic systems. We systematical y compare the performance of 19 state-of-the-art fault detection methods to decide which combination of feature extraction and fault detection is most appropriate to model the condition of an Keywords: actively control ed phase-locked laser oscillator. Our fault detection, experimental evaluation shows that especially clustering feature extraction, autoencoder, algorithms are very wel suited for detecting disturbed system clustering, conditions. outlier detection DOI https://doi.org/10.18690/um.feri.5.2023.5 ISBN 978-961-286-745-4 124 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 1 Introduction and Motivation The European X-ray Free-Electron Laser (EuXFEL) [1] is a large-scale linear particle accelerator located in Hamburg, Germany. A 1 . 3 GHz Radio Frequency (RF) Master Oscillator (MO) is used to synchronize various components of the accelerator by distributing the RF signal as a timing reference source. Since this electrical distribution via coaxial cables is heavily influenced by the environment (e.g. humidity, temperature, electromagnetic fields), an optical synchronization system is installed that is less vulnerable to these environmental condition changes [2]. This optical synchronization system provides ultra-stable reference timing information to the accelerator components and the experimental setups with an integrated timing jitter in the range of a few femtoseconds. The main component of this optical synchronization system is a mode-locked pulsed laser oscillator that is phase-locked to the MO delivering an ultra-stable optical reference used to locally resynchronize RF sources, to lock optical laser systems, and to diagnose the arrival time of the electron beam along various locations for fast beam based feedbacks. Not only does the laser not produce a completely noise-free signal, but the emitted signal is also influenced by environmental disturbances (i.e., electrical, acoustical, mechanical, and optical) resulting in amplitude and phase fluctuations. To synchronize the laser oscillator to the MO, the relative phase error between a harmonic of the laser pulse repetition rate and the MO reference is determined and fed to a Proportional-Integral (PI) controller in a feedback loop. This controller acts on the laser oscil ator cavity length to lock the laser oscillator repetition rate to the 1.3 GHz MO frequency with a loop bandwidth in the order of 1 kHz to 10 kHz [3]. Since the controller compensates for disturbances, the control er output signal is an ideal data source to detect potential disturbances that increase the integrated timing jitter and therefore decrease the synchronization performance. The aim of this work is to detect changes in the controller output signal which may indicate environmental disturbances, disturbances in the MO reference or disturbances in the internal detection chain. This goal is achieved by realizing the fault detection pipeline depicted in Figure 1. In the data preparation step, we extract the power spectral density (PSD) from the controller output signal using Welch’s method [4] such that the fault detection can be based on both, data from the time domain and data from the frequency domain. In the feature engineering step, we A. Grünhagen et al.: Condition Monitoring and Fault Detection of a Laser Oscil ator Feedback System 125. implemented three different methods to extract meaningful features to fit several fault detection models. The phases will be explained in detail in Sections 3 and 4. In the following, we summarize related work in Section 2. Then we describe the data preparation and feature engineering steps in Section 3. Section 4 gives a brief overview on the methods selected for fault detection and Section 5 gives a detailed overview on the experimental validation of the proposed fault detection pipeline. We conclude this work highlighting specific findings and giving plans on future work in Section 6. Figure 1: Fault detection pipeline Source own. 2 Related Work Despite extensive literature about fault detection and anomaly detection in the area of manufacturing systems [5,6,7,8,9] only a few publications address fault detection of dynamic systems in closed-loop control. Especial y, literature on data-based fault detection is very rare. The authors of [10] use linear transfer functions to represent the actively control ed system under review and its controller. These models build the core of their fault diagnosis since they evaluate the discrepancy between the physical system output and the model output and the discrepancy between the physical controller output and the model output. These discrepancy measures are used as an anomaly score. Also, the authors of [11] use mathematical models to describe a physical system and 126 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 compare the model behavior with the behavior of the controled system. Based on the difference between the system output and model output, faulty system conditions are identified. In [12], the authors address control-loop data from a real system. They implement different fault detection mechanisms for different fault types, namely an oscil ation detection based on an autocorrelation function, the detection of sluggish-tuned loops using the so-cal ed idle index, quantization detection, and a saturation detection method. Again their approach requires a deep system understanding. Feature extraction for different industrial sectors is addressed by many publications. The authors of [13,14,15] each extract different basic statistics from the time domain, like the mean, the maximum, the minimum, the root mean square, or the entropy. In [16,17], the authors analyze frequency-domain vibration signals and decide on the system’s health condition based on the values of domain-relevant frequency components. In [18] the authors calculate both, statistics from the time domain and statistics from the frequency domain as features for standard fault detection methods. The authors of [19] developed a bi-directional long short-term memory neural network that works directly on time series data as a fault diagnosis mechanism. They compare the results of the bi-directional long short-term memory neural network with standard models that are fitted on time and frequency-related statistics. Also, the authors of [20] use neural networks in the form of a relational autoencoder to extract high-level features. We conclude that most of the related work addressing dynamic systems uses control theory models and therefore requires a deep understanding of the control theory behind the system. Existing publications using data-based fault detection methods do not address control er data. 3 Data Engineering In this section, we describe what kind of data is used and how the data is processed for building meaningful models that can describe the condition of laser oscil ators. Figure 2 shows a simplified version of the laser oscillator control loop. The input e( t) to the PI controller is the difference between the reference signal r( t), which in the case of the laser oscil ator is the phase of the reference signal provided by the A. Grünhagen et al.: Condition Monitoring and Fault Detection of a Laser Oscil ator Feedback System 127. electrical timing information coming from the MO, and the phase of the signal generated by the laser oscil ator y( t), affected by environmental disturbances d( t). The output u( t) of the PI controller feeding into the laser oscillator is a voltage that affects the cavity length of the laser oscillator and therbey adjusting the phase of the laser signal. This outgoing signal, also cal ed feedback signal, contains information about disturbances that the PI controller is processing and is therefore a valuable source of information for fault detection. Figure 2: Overview of the laser oscil ator control scheme Source: own. K( s) is the control er at state s G( s) is the laser oscillator at state s r( t) is the reference signal which the system output should follow y( t) is the laser oscilator output n( t) is the noise added by the measurement ym( t) is the laser oscilator output with the added measurement noise e( t) is the difference between ym( t) and r( t) and the input to the controller u( t) is the controller output d( t) is the disturbance acting on the laser oscillator output 3.1 Time and Frequency Domain The controller’s output signal contains values in the range from 0 to 1 and is measured with a sampling rate of 0 . 32 MHz. To check what kind of disturbances affect the system, the operators of the optical synchronization system mainly study the PSD estimation. 128 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Figure 3: Example for time series signal and PSD during normal operation Source: own. Figure 3 shows examples of the feedback signal in time domain and the respective PSD in the frequency domain during healthy operation. The time series signal is an oscillating signal containing the changes to the cavity length of the laser oscillator. Due to the oscillating nature of the feedback signal in the time domain, single data points cannot reflect the entire state of the system and therefore it is mandatory to look at a series of data points. For our calculations, each series contains 30000 datapoints, which is equivalent to 0 . 1 s. We calculate the PSDs using Welch’s method [4]. Welch’s method divides the time series data into overlapping segments, computes a modified periodogram for each segment, and averages the periodograms to the resulting PSD. Our PSD calculation uses Hanning windows containing 10000 data points with an overlap of 5000 datapoints. As a result, each PSD consists of 5000 datapoints. The shape of the PSD and its peaks at certain frequencies are characteristic of the current state of the system. For example, the increased power at 400 Hz comes from a mechanical disturbance of the laser oscil ator and the peak at 60000 Hz originates from the piezo resonant frequency (see Figure 3). In either case, considering the time-domain signals or the PSD in frequency domain, we work with a series of data points, also cal ed frames. Depending on the frame size, the fault detection algorithms may have to work with high dimensional data, which can lead to poor fault detection performance. For this reason, we use several feature engineering techniques to reduce the dimensionality of the input data. In the following, we describe three feature engineering techniques applied to the data. A. Grünhagen et al.: Condition Monitoring and Fault Detection of a Laser Oscil ator Feedback System 129. 3.2 Statistical Feature Extraction We use the tsfresh Python package [21] to calculate a bunch of statistics from the data frames. Table 1 gives an overview of the extracted statistics, a short description, and, if applicable, the corresponding parameter choices. This statistical feature extraction is applied to the time frames and to the PSDs. In both cases, the resulting dataset contains 34 values for the time-domain frame or PSD, respectively. Table 1: Summary of extracted statistics from data series x Statistic Description Parameter values maximum maximum of x - minimum minimum of x - mean mean µ of x - standard deviation standard deviation of x - variation standard deviation coefficient mean - variance variance σ of x - skewness skewness of x, determined with the adjusted Fisher- Pearson standardized moment coefficient G1 - kurtosis kurtosis of x, determined with the adjusted Fisher-Pearson standardized moment coefficient G2 - root mean square root mean square of x - quantile The quantile is the value that is greater than the q-th q ∈ [0.1, 0.2, 0.3, 0.4, proportion of the values in x 0.6, 0.7, 0.8, 0.9] autocorrelati R( l) = 1 2 ∑ n− l( xt − µ)( xt+ l − µ), ( n− l) σ t=1 on for lags where n denotes the length of x l ∈ [0,1,2,3,4,5,6,7,8,9] linear trend pvalue, rvalue, attributes different attributes of the linear regression from x intercept, slope, stderr absolute energy sum over squared values of x - 3.3 Peak Feature Extraction The peak feature extraction addresses spectral data from the frequency domain. Peaks within a power spectrum are special characteristics since they show how much the PI controller was correcting and at which frequency. This behavior provides insights about possible disturbances. We define a potential peak as every point in a series of data points which has a value higher than both of its neighboring data points. We filter out the irrelevant peaks, i.e. peaks due to noise, by specifying that a peak should have a minimum prominence of 45 dbm and a minimum value of −105 130 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 dbm. The prominence of a peak is defined as the vertical distance to the highest val ey. The minimum height and the minimum prominence are specific to the controller output signal and learned from experimental evaluations. Based on this peak detection we implemented three feature extraction algorithms that identify peak related characteristics. 3.3.1 Number of Peaks per Area We divided the frequency range in smaller regions, each covering 5000 Hz, and count the number of peaks. With a maximum frequency of 160000 Hz we have a total number of 32 features. 3.3.2 Characteristics of the Most Prominent Peaks We identified the five most prominent peaks, from which we extract the prominence, the height, the width, and the frequency. While the prominence, the height, and the width are numerical values, the frequency is a categorical value because a higher frequency does not imply a worse or better system condition. Therefore, we again divided the frequency range into regions of 5000 Hz, and for each region we count the number of prominent peaks. 3.3.3 Peak Healthyness This feature extraction method gives each extracted peaks following our set of observed PSDs constraints a score between 0 and 1 that determines whether the peak belongs to a healthy or unhealthy operation. For that we acquired controller output data during healthy operation and extracted all peaks following our criteria from the PSDS. Based on these peaks we assigned each frequency f a healthyness score healthyness( f ) = # peaksat f # . The re- sulting distribution of healthy peaks is depicted in Figure 4. In the feature extraction step, we identify the ten most prominent peaks and for each peak we take the healthyness score from the previously determined distribution as a feature. A. Grünhagen et al.: Condition Monitoring and Fault Detection of a Laser Oscil ator Feedback System 131. Figure 4: Probability distribution of frequencies having a healthy peak Source: own. 3.4 Autoencoder Latent Space We use a feedforward AutoEncoder (AE) [22] trained on PSDs from healthy and disturbed operations. Using the AE’s encoder, we transform a complete PSD into the AE latent space vector, which is used as a feature vector for fault detection methods. The basic structure of the AE is shown in Table 2. The AE consists only of fully connected layers and each layer, except for the output layer, is fol owed by the leakyRELU activation function. Table 2: Overview Autoencoder Layer input 1. 2. latent 1. 2. encoding encoding space decoding decoding output Dimension 5000 500 100 10 100 500 5000 4 Selected Models In this section, we describe what kind of algorithms we use to model the behavior of the laser oscil ator based on the controller’s output signal. The purpose of these algorithms is to automatically decide whether the laser oscillator is currently disturbed or not. We divided the fault detection algorithms into the classes: clustering algorithms, outlier detection algorithms, and other algorithms that are neither based on clustering nor outlier detection. 132 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 4.1 Clustering Algorithms Clustering algorithms aim to group data samples into classes with similar elements. Clustering requires the concept of a metric, which may differ from algorithm to algorithm [23]. For the purpose of fault detection, we assume that similar data samples belong to the same class. We use the following clustering algorithms: − Clustering based local outlier factor (CBLOF) [24] − K-means clustering [25] − Balanced iterative reducing and clustering using hierarchies (BIRCH) [26] − Gaussian mixture model (GMM) [27] 4.2 Outlier Detection Algorithms Outlier detection algorithms aim to identify rare items or events that differ significantly from the rest of the dataset [28]. Assuming that faulty data samples can be classified as outliers compared to healthy data samples, we use the following outlier detection algorithms: − Local outlier factor (LOF) [29] − Angle-based outlier detection (ABOD) [30] − Connectivity-based outlier detection (COF) [31] − Isolation-based outlier detection (IOF) [32] − K-nearest neighbor detection (KNN) [33] − Copula-based outlier detector (COPOD) [34] − Empirical cumulative distribution outlier detection (ECOD) [35] − Linear model deviation-based outlier detection (LMDD) [36] − One-class support vector machine (OCSVM) [37] − Stochastic outlier selection (SOS) [38] 4.3 Other algorithms In addition to the clustering algorithms and outlier detection algorithms, we use the fol owing algorithms to detect a disturbed system: − Kernel density estimation (KDE) [39] A. Grünhagen et al.: Condition Monitoring and Fault Detection of a Laser Oscil ator Feedback System 133. − Kernel principal component analysis (KPCA) [40] − Minimum covariance determinant (MCD) [41] − Principal component analysis (PCA) [42] − Sampling [43] In addition to the algorithms fitted on the feature dataset, we trained an AE with the structure shown in Figure 2 on PSDs belonging to healthy system operation. Therefore, the AE only learns to reconstruct PSDs belonging to a healthy operation. The AE fault detector uses a threshold on the reconstruction loss, which is realized using the mean squared error (MSE) between the input PSD and the reconstructed PSD at the output layer. The fault detection is based on the assumption that PSDs belonging to healthy system operation have a low MSE, while PSDs belonging to poor system conditions have a high MSE. 5 Experimental Evaluation The experiments were performed using the Python libraries tsfresh [21], PyOD [44], and Scikit-learn [45]. The runtimes were measured on a Windows 11 operating system running Python 3.9 with a processor Intel(R) Core(TM) i7-1185G7 @ 3 . 00 GHz and 16 GB of RAM. 5.1 Dataset Summary To evaluate the feature extraction technique and fault detection algorithms we generated disturbances at different frequencies by playing tones of single frequencies. The tones were played through a surface speaker mounted directly on the optical table next to the laser oscil ator one after the other at the same power. For evaluating the combination of feature extraction we recorded fitting data and validation data under the same conditions as summarized in Table 3. From both, the time frames and the PSDs, we extracted the features as described in Section 3 and normalized the extracted features using Z-normalization [46]. The number of features per data frame depends on the feature extraction method and is shown in Table 4. The peak characteristic feature extraction leads to the highest dimension and the AE latent space feature extractor to the lowest. 134 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Figure 5 shows the four fitting data sets of the four different feature extraction methods (AE latent space, statistics from the time domain, statistics from PSDs, and peak features). Table 3: Summary of fitting dataset and validation dataset Condition Fitting data Validation data # Frames Portion # Frames Portion no disturbance 4208 60 . 49 % 231 7 . 97 % 0 . 5 kHz disturbance 305 4 . 38 % 296 8 . 97 % 1 . 0 kHz disturbance 305 4 . 38 % 296 8 . 97 % 1 . 5 kHz disturbance 306 4 . 4 % 296 10 . 22 % 2 . 0 kHz disturbance 305 4 . 38 % 296 10 . 22 % 2 . 5 kHz disturbance 305 4 . 38 % 296 10 . 22 % 3 . 0 kHz disturbance 305 4 . 38 % 231 10 . 25 % 3 . 5 kHz disturbance 306 4 . 4 % 296 10 . 22 % 4 . 0 kHz disturbance 306 4 . 4 % 296 10 . 22 % 4 . 5 kHz disturbance 306 4 . 4 % 296 10 . 22 % Table 4: Numer of features per dataframe Feature Extraction statistics statistics Peak AE latent Method (time) (PSD) characteristics (PSD) space Number of extracted features 34 34 94 10 Figure 5: Feature vizualization by t-SNE Source: own. A. Grünhagen et al.: Condition Monitoring and Fault Detection of a Laser Oscil ator Feedback System 135. To represent the multidimensional feature datasets in a two-dimensional space, we used t-distributed Stochastic Neighbor Embedding (t-SNE) [47]. The visualization of the data is intended to provide a basis for evaluating the algorithms in Section 5.4. It can be seen that the data points recorded under the same disturbances or no disturbance are clustered for al feature extraction methods. However, the clusters on the AE latent space dataset are significantly closer, sometimes even with an overlap, than the clusters on the other feature datasets. Since we are analyzing fault detection methods in this work, it is noteworthy that there is only an overlap between disturbed data points and undisturbed data points on the AE latent space dataset. In particular, the data from the 1 . 5 kHz disturbance have a strong overlap with the undisturbed data points. Using statistics from time series, statistics from PSDs, or peak features there are only overlapping clusters between data points of different disturbance types. Furthermore, it is noticeable that the undisturbed datapoints based on time statistics form two separate clusters rather than one cluster. 5.2 Algorithms Parameters Most of the algorithms selected contain control able parameters that influence different aspects of the algorithm. A summary of all the parameters used is given in Tables 5, 6, 7. Table 5: Clustering algorithms’ parameters Algorithm Parameter Values BIRCH threshold 0.2, 0.4, . ., 3.8, 4.0 branching factor 20, 40, 60, 80, 100 # clusters 2, 3, 4, 5, 6, 7, 8, 9, 10, 15 CBLOF alpha 0.5, 0.6, 0.7, 0.9 beta 1.5, 2, 5, 7, 10, 15 GMM # components 2, 3, 4, 5, 6, 7, 8, 9, 10, 15 K-means # clusters 2, 3, 4, 5, 6, 7, 8, 9, 10, 15 Table 6: Outlier detection algorithms’ parameters Algorithm Parameter Values ABOD # nearest neighbors 5, 10, . ., 95, 100 COF # nearest neighbors 5, 10, . ., 95, 100 COPOD - - ECOD - - IOF - - KNN # nearest neighbors 5, 10, . ., 95, 100 136 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Algorithm Parameter Values LMDD - - LOF # nearest neighbors 5, 10, . ., 95, 100 OCSVM - - SOS perplexity 5, 10, . ., 95, 100 Table 7: Other algorithms’ parameters Algorithm Parameter Values KDE bandwith 0.4, 0.6, . ., 2.0, 2.2 KPCA # components 1, 2, . ., 10 MCD - - PCA # components 1, 2, . ., 10 Sampling - - 5.3 Performance Criteria We repeated the experiment under similar environmental conditions to verify the quality of the algorithms (see Table 3). The validation data consists of 2896 data frames, each covering 0 . 1 s. Therefore, a live fault detection requires a maximum inference duration of 289 . 6 s on the whole validation dataset. To evaluate the system state at each point in time, it is necessary that the fault detection algorithms also operate at such a high speed. We evaluate this criterion by measuring the time it takes each algorithm to classify the validation data samples and determine the inference speed by dividing the measured duration by the number of frames. Additional y, we measured the time each algorithm needs to be fitted. To evaluate the feature extraction methods and algorithms qualitatively we are using the area under the receiver operating characteristic (AUROC) [48] as a performance metric. The AUROC score is defined as the area underneath the ROC curve and ranges between 0 and 1, where a score of 1 implies a perfect predictor, an AUROC score of 0 implies that the predictor gives always wrong predictions, and an AUROC score of 0.5 indicates that the predictor makes random guesses. We calculate the AUROC scores of both, the fitting dataset and the validation dataset. The AUROC metric does not provide information on which of the disturbances are classified correctly and which of them are misclassified. Therefore, we also calculate the classification accuracy TruePredictions(condition) for each condition, either a disturbed frequency or undisturbed respectively. A. Grünhagen et al.: Condition Monitoring and Fault Detection of a Laser Oscil ator Feedback System 137. We repeated the process of fitting and evaluation ten times with different random seeds and determined the mean value for each metric. In summary, we determined the mean values of the following metrics for each combination of feature extraction method and fault detection algorithm: − Fitting duration (fitting dataset) − Inference duration (validation dataset) − AUROC score (fitting dataset) − AUROC score (validation dataset) − Condition specific accuracies (validation dataset) 5.4 Results In this section, we describe the results of the algorithms applied to the experimental data. Combining the feature extraction methods and the different parameter choices, we built 3084 models on the different feature datasets (AE latent space, statistics from time series signals, statistics from PSDs, peak characteristics). The fitting durations of al algorithms related to the feature extraction method and the choice of parameters are depicted in Figure 6. The fitting durations only include the fitting of the algorithms and not the transformation of the recorded data into the features. All clustering algorithms require very little time to be fitted for all feature extraction methods and all parameter choices. Among the outlier detection algorithms, the LMDD algorithm and ABOD have by far the longest fitting durations. Among the other algorithms, KPCA needs the longest time to be fitted for all feature extraction methods. It is noticeable that KPCA using the AE latent space features takes more than 100 seconds longer to be fitted than the other feature extraction methods. 138 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Figure 6: Fitting durations Source: own. Figure 7 shows the duration needed by the algorithms to classify the validation data samples. The inference duration consists of both the feature extraction part and the algorithmic classification part. The feature extraction methods are based on very efficient signal processing algorithms, such as fast Fourier transforms or basic statistical calculations. Therefore, feature extraction has a small impact on the overall inference duration. The maximum al owed inference duration is 289 . 6 s. This criterion is fulfilled by all algorithms for all feature extraction methods and all parameter settings. All clustering algorithms perform particularly well, followed by the other algorithms and the outlier detection algorithms. For all feature extraction methods, ABOD has the worst inference duration when many nearest neighbors are used. The LMDD algorithm has the second highest inference duration. In the following, we describe the ability of the algorithms to classify disturbed data samples as disturbed and non disturbed data samples as normal. For both the fitting dataset and the validation dataset, we manual y assigned a label to each data sample, A. Grünhagen et al.: Condition Monitoring and Fault Detection of a Laser Oscil ator Feedback System 139. either undisturbed or disturbed. The AUROC score is based on the manually assigned reference and the labels assigned by the fault detection algorithms. Figure 7: Inference durations Source: own. 5.4.1 Clustering Algorithms The AUROC results and the condition specific accuracies of the clustering algorithms with respect to the feature extraction methods are depicted in Figure 8. In general, features from the PSDs (statistics, and peak features) form a good basis for clustering algorithms to reliably identify disturbed laser oscil ator feedback systems, since all clustering algorithms except the GMM achieve very good AUROC scores and high accuracies for al conditions. The GMM algorithm does not achieve satisfactory results for any combination of parameter setting and feature extraction method. It is noticeable that the condition specific accuracies obtained by GMM show that the GMM algorithm classifies al data samples as disturbed. From the 140 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 results of the CBLOF algorithm, it can be seen that the fault detection quality is strongly dependent on the choice of input parameter. At an alpha of 0 . 5, the best AUROC scores of 1 . 0 are obtained on the fitting and validation dataset regardless of the choice of cluster number and beta for al feature extraction methods. Birch, and the K-Means algorithms achieve perfect results on the validation dataset for features from PSDs and the correct parameter choice. The very good results of the clustering algorithms can be described with the help of the structure of the examined data. As the t-SNE embeddings of the data set already indicate (see Figure 5), the data measured under similar conditions are positioned in cluster-like structures, especially using the PSD statistics and the peak features. Figure 8: Results of clustering algorithms Source: own. A. Grünhagen et al.: Condition Monitoring and Fault Detection of a Laser Oscil ator Feedback System 141. 5.4.2 Outlier Detection Algorithms The AUROC scores and the condition-dependent accuracies of the outlier detection algorithms are shown in Figure 9. In general, it can be seen that no combination of outlier detection algorithm and feature extraction method achieves a perfect AUROC score of 1 . 0. It is also noticeable that the choice of parameters for the outlier detection algorithms has no great influence on the result, because the maximum AUROC scores hardly differ from the minimum AUROC scores per algorithm. In contrast to the clustering algorithms, the outlier detection algorithms achieve very poor AUROC scores on the feature datasets that use PSDs as a basis. Among all outlier detection algorithms, KNN achieves the highest AUROC score of 0 . 9148 using the AE latent space as features. The corresponding condition specific conditions show that the data recorded under no excitation are correctly classified with an accuracy of 0 . 8788. The accuracies that KNN detects an excited system from the control er data are al higher than 0 . 9. Figure 9: Results of outlier detection algorithms Source: own. 142 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 5.4.3 Other Algorithms The AUROC scores and the condition-specific accuracies of the other algorithms depending on the feature extraction methods used are shown in Figure 10. It is noticeable that all algorithms that were fitted with PSD statistics are predictors that classify all validation data as disturbed. This implies that the algorithms cannot generalize the error detection learned on the PSD statistics fitting dataset because not all data samples from the fitting data set are classified as disturbed. Furthermore, it can be seen that similar to the outlier detection algorithms, none of the other algorithms achieve a perfect AUROC score of 1 . 0 on the validation dataset. Figure 10: Results of other algorithms Source: own. For all feature extraction methods KPCA achieves the highest AUROC scores, with the highest value of 0 . 94 being achieved with the AE latent space as the feature. The number of principal components leading to the highest AUROC scores for the A. Grünhagen et al.: Condition Monitoring and Fault Detection of a Laser Oscil ator Feedback System 143. respective feature extraction methods differ. Therefore, there exists no correct choice of principal components such that KPCA can describe the error detection behavior for al feature datasets. The second highest AUROC score on the validation dataset is also achieved on the AE latent space by the MCD algorithm. In addition, the results of the AE trained on non-disturbed PSDs are summarized in Table 8. The AE fault detector achieves perfect AUROC scores of 1 . 0 on both the fitting and validation datasets. Table 8: Autoencoder fault detector results Training duration Inference duration AUROC (fitting) AUROC (validation) 220.8 s 0.133498 s 1.0 1.0 5.4.4 Summary Table 9 gives an overview of the algorithms and their parameter configuration that achieve an AUROC score higher than 0 . 95 on the validation dataset. If an algorithm achieves such an AUROC score with multiple parameter combinations, we selected the parameter combination that gives the best AUROC score and the lowest inference duration. The AE works directly on the PSDs. Therefore, no prior feature extraction is required. Among the algorithms that require prior feature extraction, only the clustering algorithms K-means clustering and CBLOF achieve very good AUROC scores on all validation datasets. Additionally, BIRCH achieves very good AUROC scores on the validation dataset using either the AE latent space, PSD statistics, or peak characteristics. Furthermore, it stands out that no algorithm which is fitted with the AE latent space achieves a perfect AUROC score. The best algorithms that do not belong to the clustering algorithms are KPCA having an AUROC score of 0 . 9368 and KDE with an AUROC score of 0 . 9436, both using the AE latent space as feature. As described in Section 5.1, the feature datasets each form clusters according to the type of disturbance, which explains why clustering algorithms in particular work so wel . The overlap of disturbed and undisturbed data on the AE latent space dataset 144 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 (see Figure 5) also constitutes for the fact that none of the selected algorithms achieves a perfect AUROC score when using the AE latent space as feature input. Table 9: Best fault detection results on the validation dataset Featue Algoritm Parameter AURC Inference duration in s - AE - 1.0 0.1335 threshold: 2 . 4 BIRCH branching factor: 0.9726 0.007 80 AE latent space K-means # clusters: 4 0.9559 0.0014 # clusters: 10 CBLOF alpha: 0 . 5 0.9667 0.0018 beta: 5 # clusters: 9 CBLOF alpha: 0 . 6 1.0 0.002 statistics (time) beta: 1 . 5 K-means # clusters: 4 1.0 0.002 threshold: 3 . 8 BIRCH branching factor: 1.0 0.0019 20 statistics (PSD) # clusters: 2 CBLOF alpha: 0 . 5 1.0 0.0018 beta: 1 . 5 K-means # clusters: 5 1.0 0.0019 threshold: 4 . 0 BIRCH branching factor: 1.0 0.1606 40 peak characteristics # clusters: 2 CBLOF alpha: 0 . 7 1.0 0.0018 beta: 1 . 5 K-means # clusters: 6 1.0 0.0036 6 Conclusion In this paper, we investigated the ability of data-based fault detection algorithms in combination with four feature extraction methods to model the condition of an actively controlled phase-locked laser oscil ator and determined the best methods and parameters for detecting disturbances that affect the healthy operation of the synchronization system. The fault detection methods were validated experimental y by disturbing the system acoustical y. We evaluated the classification performance for each combination of feature extraction, fault detection method, and algorithmic-specific parameters using the fitting duration, inference duration, and AUROC scores as quality measures. A. Grünhagen et al.: Condition Monitoring and Fault Detection of a Laser Oscil ator Feedback System 145. From the classification results, we can conclude that very good prediction results can be obtained without deep system expertise. Comparing the prediction results of the different types of algorithms, we notice that clustering algorithms achieve the best results regardless of the feature extraction methods. Moreover, there is no combination of an algorithm not belonging to the clustering algorithms and a feature extraction method that achieves a perfect AUROC score on the validation dataset. Additionally, there is no combination of a fault detection algorithm and the AE latent space as a feature extractor that achieves a perfect AUROC score on the validation dataset. With an AUROC score of 1 . 0 and a inference duration of 0 . 0018 s when applied to the validation dataset, the combination of CBLOF and peak characteristics or the combination of CBLOF and statistics from PSDS achieve the best results. However, we would like to draw particular attention to the performance of the AE fault detector, as it does not require prior feature extraction and can thus be applied directly to any dynamic system controlled in a closed loop. In addition, the inference time for the validation dataset is below the maximum acceptable threshold for real-time fault detection. The experimental evaluation used in this work is based on the excitation of different frequencies at the same level by a surface loudspeaker. For future work, we plan to investigate what minimum interference intensity must be present for a fault detection algorithm to be effective and to extend the fault detection mechanism by specifying the exact type of fault, rather than just a binary classification of healthy or disturbed. We also want to extend the fault detection mechanism to a predictive maintenance module that can predict when the next faulty operating point wil occur. Acknowledgement We acknowledge the support by DASHH (Data Science in Hamburg - HELMHOLTZ Graduate School for the Structure of Matter) with the Grant-No. HIDSS-0002. 146 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 References [1] Sobolev E, Zolotarev S, Giewekemeyer K, Bielecki J, Okamoto K, Reddy HKN, et al. Megahertz single-particle imaging at the European XFEL. Communications Physics. 2020;3(1). [2] Schulz S, Czwalinna M, Felber M, Fenner M, Gerth C, Kozak T, et al., editors. Few-Femtosecond Facility-Wide Synchronization of the European XFEL: JACoW Publishing, Geneva, Switzerland; 2019. [3] Heuer M. Identification and control of the laser-based synchronization system for the European X-ray Free Electron Laser [doctoralThesis]. Technische Universita¨t Hamburg-Harburg; 2018. Available from: http://tubdok.tub.tuhh.de/handle/11420/1706. [4] Welch P. The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms. IEEE Transactions on Audio and Electroacoustics. 1967;15(2):70-3. [5] Zheng H, Wang R, Yin J, Li Y, Lu H, Xu M. A new intel igent fault identification method based on transfer locality preserving projection for actual diagnosis scenario of rotating machinery. Mechanical Systems and Signal Processing. 2020;135:106344. Available from: https://www.sciencedirect. com/science/article/pii/S0888327019305655. [6] Siahpour S, Li X, Lee J. Deep learning-based cross-sensor domain adaptation for fault diagnosis of electro-mechanical actuators. International Journal of Dynamics and Control. 2020;8(4):1054-62. [7] Bagheri M, Zol anvari A, Nezhivenko S. Transformer Fault Condition Prognosis Using Vibration Signals Over Cloud Environment. IEEE Access. 2018;6:9862-74. [8] Gamal M, Donkol A, Shaban A, Costantino F, Di G, Patriarca R. Anomalies detection in smart manufacturing using machine learning and deep learning algorithms. In: Proceedings of the International Conference on Industrial Engineering and Operations Management, Rome, Italy; 2021. p. 1611-22. [9] Lopez F, Saez M, Shao Y, Balta EC, Moyne J, Mao ZM, et al. Categorization of Anomalies in Smart Manufacturing Systems to Support the Selection of Detection Mechanisms. IEEE Robotics and Automation Letters. 2017;2(4):1885-92. [10] Quevedo J, Puig V, Escobet T. Model Fault Detection of Feedback Systems: How and Why to Use the Output of the PID Controller? IFAC Proceedings Volumes. 2000;33(4):319-24. IFAC Workshop on Digital Control: Past, Present and Future of PID Control, Terrassa, Spain, 5-7 April 2000. Available from: https://www.sciencedirect.com/science/article/pii/S1474667017382630. [11] Puig V, Quevedo J. Fault-tolerant PID control ers using a passive robust fault diagnosis approach. Control Engineering Practice. 2001;9(11):1221-34. PID Control. Available from: https://www. sciencedirect.com/science/article/pii/S0967066101000685. [12] Bauer M, Auret L, Bacci di Capaci R, Horch A, Thornhil NF. Industrial PID Control Loop Data Repository and Comparison of Fault Detection Methods. Industrial & Engineering Chemistry Research. 2019;58(26):11430-9. Available from: https://doi.org/10.1021/acs.iecr.8b06354. [13] Wang Q, Liu J, Wei B, Chen W, Xu S. Investigating the Construction, Training, and Verification Methods of k-Means Clustering Fault Recognition Model for Rotating Machinery. IEEE Access. 2020;8:196515-28. [14] Duong BP, Kim JM. Non-Mutual y Exclusive Deep Neural Network Classifier for Combined Modes of Bearing Fault Diagnosis. Sensors. 2018;18(4). Available from: https://www.mdpi.com/ 1424-8220/18/4/1129. [15] Kim D, Heo TY. Anomaly Detection with Feature Extraction Based on Machine Learning Using Hydraulic System IoT Sensor Data. Sensors. 2022;22(7). Available from: https://www.mdpi.com/ 1424-8220/22/7/2479. [16] Li H, yun Xiao D. Fault diagnosis based on power spectral density basis transform. Journal of Vibration and Control. 2015;21(12):2416-33. Available from: https://doi.org/10.1177/ 1077546313487242. A. Grünhagen et al.: Condition Monitoring and Fault Detection of a Laser Oscil ator Feedback System 147. [17] Wang Z, Yang J, Li H, Zhen D, Xu Y, Gu F. Fault Identification of Broken Rotor Bars in Induction Motors Using an Improved Cyclic Modulation Spectral Analysis. Energies. 2019;12(17). Available from: https://www.mdpi.com/1996-1073/12/17/3279. [18] Sundaram S, Zeid A. Smart Prognostics and Health Management (SPHM) in Smart Manufacturing: An Interoperable Framework. Sensors. 2021;21(18). Available from: https://www.mdpi.com/ 1424-8220/21/18/5994. [19] Zhang N, Chen E, Wu Y, Guo B, Jiang Z, Wu F. A novel hybrid model integrating residual structure and bi-directional long short-term memory network for tool wear monitoring. The International Journal of Advanced Manufacturing Technology. 2022;120(9):6707-22. [20] Meng Q, Catchpoole DR, Skil icorn DB, Kennedy PJ. Relational Autoencoder for Feature Extraction. CoRR. 2018;abs/1802.03145. Available from: http://arxiv.org/abs/1802.03145. [21] Christ M, Braun N, Neuffer J, Kempa-Liehr AW. Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh – A Python package). Neurocomputing. 2018;307:72-7. Available from: https://www.sciencedirect.com/science/article/pii/S0925231218304843. [22] Tschannen M, Bachem O, Lucic M. Recent advances in autoencoder-based representation learning. arXiv preprint arXiv:181205069. 2018. [23] Xu R, Wunsch D. Survey of clustering algorithms. IEEE Transactions on Neural Networks. 2005;16(3):645-78. [24] He Z, Xu X, Deng S. Discovering cluster-based local outliers. Pattern Recognition Letters. 2003;24(9):1641-50. Available from: https://www.sciencedirect.com/science/article/ pii/S0167865503000035. [25] Hartigan JA, Wong MA. Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society Series C (Applied Statistics). 1979;28(1):100-8. Available from: http://www.jstor.org/stable/2346830. [26] Zhang T, Ramakrishnan R, Livny M. BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD Rec. 1996 jun;25(2):103-14. [27] Reynolds DA. Gaussian mixture models. Encyclopedia of biometrics. 2009;741(659-663). [28] Singh K, Upadhyaya S. Outlier detection: applications and techniques. International Journal of Computer Science Issues (IJCSI). 2012;9(1):307. [29] Breunig MM, Kriegel HP, Ng RT, Sander J. LOF: Identifying Density-Based Local Outliers. SIGMOD Rec. 2000 may;29(2):93-104. Available from: https://doi.org/10.1145/335191.335388. [30] Kriegel HP, Schubert M, Zimek A. Angle-Based Outlier Detection in High-Dimensional Data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’08. New York, NY, USA: Association for Computing Machinery; 2008. p. 444-52. [31] Wang Y, Li K, Gan S. A Kernel Connectivity-based Outlier Factor Algorithm for Rare Data Detection in a Baking Process. IFAC-PapersOnLine. 2018;51(18):297-302. 10th IFAC Symposium on Advanced Control of Chemical Processes ADCHEM 2018. Available from: https://www.sciencedirect. com/science/article/pii/S2405896318319980. [32] Liu FT, Ting KM, Zhou ZH. Isolation-Based Anomaly Detection. ACM Trans Knowl Discov Data. 2012 mar;6(1). [33] Angiulli F, Pizzuti C. Fast outlier detection in high dimensional spaces. In: European conference on principles of data mining and knowledge discovery. Springer; 2002. p. 15-27. [34] Li Z, Zhao Y, Botta N, Ionescu C, Hu X. COPOD: Copula-Based Outlier Detection. In: 2020 IEEE International Conference on Data Mining (ICDM); 2020. p. 1118-23. [35] Li Z, Zhao Y, Hu X, Botta N, Ionescu C, Chen G. ECOD: Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions. IEEE Transactions on Knowledge and Data Engineering. 2022:1-1. [36] Arning A, Agrawal R, Raghavan P. A Linear Method for Deviation Detection in Large Databases. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. KDD’96. AAAI Press; 1996. p. 164-9. 148 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 [37] Scho¨lkopf B, Wiliamson R, Smola A, Shawe-Taylor J, Platt J. Support Vector Method for Novelty Detection. In: Proceedings of the 12th International Conference on Neural Information Processing Systems. NIPS’99. Cambridge, MA, USA: MIT Press; 1999. p. 582-8. [38] Janssens J, Huszaŕ F, Postma E, van den Herik H. Stochastic outlier selection. Tilburg centre for Creative Computing, techreport. 2012;1:2012. [39] Parzen E. On Estimation of a Probability Density Function and Mode. The Annals of Mathematical Statistics. 1962;33(3):1065 1076. Available from: https://doi.org/10.1214/aoms/1177704472. [40] Hoffmann H. Kernel PCA for novelty detection. Pattern Recognition. 2007;40(3):863-74. Available from: https://www.sciencedirect.com/science/article/pii/S0031320306003414. [41] Rousseeuw PJ, Driessen KV. A Fast Algorithm for the Minimum Covariance Determinant Estimator. Technometrics. 1999;41(3):212-23. Available from: https://www.tandfonline.com/doi/ abs/10.1080/00401706.1999.10485670. [42] Shyu ML, Chen SC, Sarinnapakorn K, Chang L. A novel anomaly detection scheme based on principal component classifier. Miami Univ Coral Gables Fl Dept of Electrical and Computer Engineering; 2003. [43] Sugiyama M, Borgwardt K. Rapid distance-based outlier detection via sampling. Advances in neural information processing systems. 2013;26. [44] Zhao Y, Nasrullah Z, Li Z. PyOD: A Python Toolbox for Scalable Outlier Detection. Journal of Machine Learning Research. 2019;20(96):1-7. Available from: http://jmlr.org/papers/v20/ 19-011.html. [45] Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825-30. [46] Goldin DQ, Kanellakis PC. On similarity queries for time-series data: Constraint specification and implementation. In: Montanari U, Rossi F, editors. Principles and Practice of Constraint Programming – CP ’95. Berlin, Heidelberg: Springer Berlin Heidelberg; 1995. p. 137-53. [47] Van der Maaten L, Hinton G. Visualizing data using t-SNE. Journal of machine learning research. 2008;9(11). [48] Fawcett T. An introduction to ROC analysis. Pattern Recognition Letters. 2006;27(8):861-74. ROC Analysis in Pattern Recognition. Available from: https://www.sciencedirect.com/science/ article/pii/S016786550500303X. ‘ANYWHERE TO WORK’ A DATA MODEL FOR SELECTING WORKPLACES ACCORDING TO INTENTS AND SITUATIONS HITOSHI KUMAGAI, NAOKI ISHIBASHI, YASUSHI KIYOKI Musashino University, Graduate School of Data Science, Tokyo, Japan g2251001@stu.musashino-u.ac.jp, n-ishi@musashino-u.ac.jp, y-kiyoki@musashino-u.ac.jp In this paper, we proposed a new workplace data model and its calculation method. The method was designed to calculate appropriate workplace according to the intents (activities) and Keywords: situations of a worker. The data model was designed as a workplace, active based semantic space with three knowledge bases: ‘Activity-affecting’, working, ‘Place-determining’, and ‘Activity and Place’. Experiments were hybrid work, conducted to show the different results depending on activities semantic computing, and the contexts of the workplace and presented the feasibility information of the proposed data model and calculation method. modeling DOI https://doi.org/10.18690/um.feri.5.2023.6 ISBN 978-961-286-745-4 150 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 1 Introduction 1.1 ‘Workplace’: research definition ‘Workplace’, where the present study focuses on. It is also caled ‘office’. However, recently, the term; ‘workplace’ is often used with winder means as a place to work. A typical person who uses the workplaces can be ‘knowledge workers’, Drucker [1] coined this term and defined it as ‘high-level workers who apply theoretical and analytical knowledge, acquired through formal training, to develop products and services’. For knowledge creation, Nonaka [2] developed the ‘SECI model’, and divided it into four-dimensions; each dimension was called ‘Ba’, which means ‘place’ in Japanese. Nonaka notes that knowledge creation is a spiral through the ‘Ba’ with some human interactions. ‘Ba’ does not necessarily mean physical place, although each ‘Ba’ can be connected to certain workplace (Figure 1). The number of knowledge workers has increased, and a research firm has been estimated to have more then one bil ion workers [3]. Hence, ‘knowledge workers’ are the key players in economic society, and the preparation of the workplace becomes more important. Figure 1: SECI model [2] applied to workplaces Source: own. H. Kumagai, N. Ishibashi, Y. Kiyoki: ‘Anywhere to Work’ a Data Model for Selecting Workplaces According to Intents and Situations 151. Here we defined several terms in this study as folows; − Workplace: Places where ‘workers’ are working, which includes conventional ‘centre office’, ‘home office’ (work from home), and the ‘3rd place’ − Workers: People who are knowledge workers, but not limited to these, which includes people whose jobs are information processing and do not have essential reason to use any physical place. − Centre office: Physical workplaces (offices) of the organisations of the “workers” − Home office: the home of the worker from where they can work. − The 3rd place: An alternative workplace besides the centre and home offices, such as, a shared service office, café, library or anywhere to work. − Functional spaces: Components of physical workplaces, such as desks (workstations), open communication spaces, meeting rooms, phone booths, or others. − Workplace services: Services that are provided to the workers in workplaces, such as reception, beverages, canteens, or others. − Workplace settings: Features of a workplace, which comprises a set of ‘functional spaces’ and ‘workplace services’. 1.2 Recent workplace problems In the three years since the emergence of COVID-19, workplace circumstances have changed drastical y. The term ‘hybrid work’ has become common, which refers to the combination of working at the centre office and remotely, particularly from home. Although the movement for flexible working from anywhere appeared 20 years ago, as mentioned in Chapter 2, it had been adopted by only a few advanced technology companies. However, during the COVID-19 pandemic, many workers were forced to work from home, with many organisations rapidly introducing remote communication tools, and workers having to acquire remote communication literacy faster than in the last decade. However, whether workers can work from anywhere or should come to centre offices stil remains controversial. Some GAFA executives have cal ed for employees to return to the centre office over their resistance, despite the fact that their company appears to be better able to utilise IT tools for remote working. [4]. The hybrid work model, which is a compromise or 152 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 mixture of working from centre office and from anyplace, seems to be the new normal for workplaces. Workers have now become more flexible for anywhere to work, however, this means that they must select more appropriate workplace for their productivity in complex situations. In addition, facility managers who are responsible for planning, implementing, and maintaining the workplace of an organisation, have more difficulties in planning the size, or workplace settings (Figure 2). Figure 2: Problems of hybrid work. Source: own. 1.3 Research journey and scope of the proposal of this study Investment in a new workplace (physical centre office) is immense. Therefore, improving workplaces using the ‘trial and error’ approach is difficult. The current planning of physical workplaces has been a conceptual approach; some experienced and knowledgeable designers define a concept for a new workplace with a smal study of the current work situations of the organisation. Although this study could predict the volume of each functional facility in current settings, it cannot predict changes in a new setting. For example, the concept might state that ‘The workers H. Kumagai, N. Ishibashi, Y. Kiyoki: ‘Anywhere to Work’ a Data Model for Selecting Workplaces According to Intents and Situations 153. should communicate casualy in open spaces rather than talk formaly in a meeting room’, and recommend that the client prepare some open communication spaces. However, the study, if the current setting of the client does not have such spaces, then an estimated number of workers wil use such an open communication space cannot be made. In other word, explanatory variables of conventional mathematical method, such as operations research, might not be provided in current workplace planning practice. In addition, we must have chal enged to treat multiple and complex contexts of the workplace to solve the problems in hybrid work situation as mentioned in previous section. Although, collecting multiple and complex data is still difficult, many sensors, including social sensors, are emerging and those will help us to col ect the data in the near future. Therefore, a data model that describes behaviours and preferred workplaces of the workers must be constructed. An indication for the future of this model is the digital twin of self-driving cars. Data collection is no longer being conducted in the real world but in digital twins where virtual drivers drive with virtual cars in virtual towns. The future objective of this research is to establish a workplace digital twin, where virtual workers work in a virtual workplace setting, which can predict the comfort and productivity of the workers. This study is the first step of the entire journey for a workplace digital twin and proposes a data model in which a worker can find an appropriate place to work in complex situations. 2 Discussion and research 2.1 Discussions in workplace Over the last 20 years, workplace-setting trends have been changed slightly. As knowledge workers have become the core human-capital of an economic society, some people, particularly executives of advanced technology companies, believe that the workers must be more communicable to the knowledge creation spiral reported by Nonaka et al [2]. However, knowledge workers must transform tacit knowledge into explicit knowledge. As a result, workers must concentrate to create knowledge. Therefore, knowledge workers must engage in contradictory activities, such as communication and concentration. 154 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 In 2004, a Dutch consultant Veldhoen [5] coined the term ‘activity based working (ABW)’. His established company, Veldhoen + Company, notes that: ‘ABW creates a space that is specifical y designed to meet the physical and virtual needs of individuals and teams’. [6] The ABW concept has become popular among facility managers particularly in Northern Europe, Australia, and Japan. The COVID-19 pandemic accelerated this movement; however, the situation has become more complicated with hybrid work. Workers and facility managers obtained more options regarding work location. Thus, several data models and calculation methods are required, which al ow workers to select a workplace. 2.2 Research for data model of intent of people based on situation Workers and workplace settings may vary, and a single type does not seem to be present in the open world. Thus, the workplace data model of should be treated as a closed-world assumption. Research conducted by Yokoyama et al. [7] proposes an ‘information-ranking method’ of facilities and services based on the dynamic contexts (intent/situation) of train passenger with a semantic space model. They had presented a method that calculate the appropriate facility or service in complex situation by using semantic space model. The setting of their study was similar to that reported in this study, in which a place based on dynamic and static contexts of a person is selected. We assumed that the semantic space model could be applied to workplace data modeling. If the contexts of the workplace could be defined, we could calculate the behaviours of the workers. 2.3 Proposed data model and calculation method of ‘Anywhere to work’ 2.4 Data model aim The aim of the data model proposed in this study is to calculate appropriate workplaces based on the context of the workplace, and the intentions and situations, of a worker using knowledge bases. In this study, as the first step, we aimed to calculate a single appropriate workplace for a worker in a set of their situations. Then we wil aim to calculate the work journey of the workers in the future. Therefore, in H. Kumagai, N. Ishibashi, Y. Kiyoki: ‘Anywhere to Work’ a Data Model for Selecting Workplaces According to Intents and Situations 155. this study, we set the workplace as the objective variable and the other parameters for the context of the workplace as the explanatory variables. 2.5 Approach The process through which workers select their workplace must be determined. The ABW concept recommends that workers select an appropriate place depending on their ‘activity’, such as solo work, casual communication, or official meetings. Therefore, ‘activity’, one of a dynamic intent of a worker, can be the primary context of a workplace. Traditionally, in workplace planning, facility managers use their knowledge to correlate the activities of the workers and functional places. If workers had to work daily at only their centre office, this primary correlation could be sufficient. However, more complex contexts have recently emerged for hybrid work situation. In this study, we raised contexts of workplace in ‘Dynamic/Static’ and ‘Intention/Situation’ categories, based on the study by Yokoyama et al. [7]. We then divided the contexts of workplace into ‘Personal/Interpersonal’ and ‘Environmental (Place-oriented/General)’. This scheme made it easier to raise some context in the determination of workplaces by the workers; however, the manner in which a worker decides on a place to work in these contexts remains complicated. Final y, we found another axis: the ‘Activity-affecting’ and the ‘Place-determining’ contexts. (Figure 3). Activity-affecting contexts: Affects the productivity of the intent (‘Activity’) or motivation of a worker for doing an activity (intent) such as, psychological safety level, attendees (who wil be) in the centre office, or indoor quality (such as temperature and humidity). Place determining context: Affects directory the determination of a worker for a place, such as the weather and access (commuting) to the centre office or area of the centre office. 156 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Figure 3: Example of the contexts Source: own. 2.6 ‘Tri-knowledge-base with personal context vectors model’: Proposed data model concept In this study, we set the possible ‘Workplace’ options into vector y (Table 1 lists all symbols of this proposed model). The proposed model calculated that the more appropriate workplace yi wil be the bigger in a set of workplace contexts. We set the possible activity options into vector x. When a worker wanted to do xi (an activity), the value of xi was set to ‘1’ and all other items xj were set to ‘0’. If we could define the correlation between in matrix M, we can calculate y = Mx. Findings mentioned in the previous section noted ‘Activity’ as the primary context, as well as the ‘Activity-affecting’ and ‘Place-determining’ contexts as complementary contexts, al owing us to describe the relationship among the contexts of workplace into three correlations. Consequently, we easily defined each correlation as a knowledge base. Primary knowledge base Map: Correlation between ‘Activity and Place’ Complementary knowledge bases: Ma: Correlation between ‘Activity’ and ‘Activity-affecting’ context Mp: Correlation between ‘Place’ and ‘Place-determining’ context H. Kumagai, N. Ishibashi, Y. Kiyoki: ‘Anywhere to Work’ a Data Model for Selecting Workplaces According to Intents and Situations 157. We set the complementary contexts of workplace as vector ca for ‘Activity-affecting’, and cp for ‘Place-determining’ context. Subsequently, we adopted the result of the calculation: x ’= caiMa x as the adjusted value of x, and in the same way, y’ = cpiMp adjusted the value of y. Then, we formulated: y= y’Mapx’= (cpiMp)Map(caiMax) These correlations might differ depending on the worker. However, significant efforts were made to prepare knowledge base for each worker. To simplify this problem, we adopted the personal context vector v (vai for cai/vpi for cpi). It weighed the extent to which each complementary workplace context affected the results of choosing a place. For a worker, weighting the personal context vectors for each context of the workplace (ca, cp) was easier. Therefore, caivai was applied instead of cai, similarly, cpivpi was applied instead of cpi. Consequently, we calculated the proper workplace y as fol ows (Figure 4) . y={(caivai) Mp}Map{(caivai)Max} Figure 4: Proposed data model structure ‘Tri-knowledge-base with personal context vectors model’ Source: own. 158 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Table 1: Definitions of proposed data model symbol Symbol Definition Explanation Example Result of the more proper y1:home office, y Workplace: Objective y2:3rd place, variable (vector) workplace yi will be the bigger in a context y3:meeting room in centre office, ⋮ Activity: Primary x1:solo work with high concentration, x explanatory variable Activity which a worker is intent x2:solo work with low concentration, (vector) on doing x3:casual communication, ⋮ Affects the productivity of the ca1: psychological safety level, c ca2: attendances in the centre office, ai Activity-affecting context (vector) intent (‘activity’) or motivation of worker for an ‘activity’ ca3: temperature (indoor), ⋮ Affects directory the cp1: weather, cpi Place-determining context (vector) determination of a worker for a cp2: access to the centre office, place ⋮ y1:home office to x1:solo work with Primary knowledge Correlation between ‘Activity and lower concentration = 1.0, Map base Place’ y3:meeting room in centre office to (Matrix) the larger is the more related x3:formal communication = 0.4, ⋮ x1:solo work with lower concentration Complementary Correlation between ‘Activity’ and to ca2: attendances in the center office M = 0.4, a knowledge bases ‘Activity-affecting’ contexts (Matrix) the larger is the more related x3:casual communication to ca1:psychological safety level =1.0, ⋮ Complementary Correlation between ‘Place’ and y1:home office to cp1:weather =1.0, M y3:meeting room in centre office to p knowledge bases ‘Place-determining’ contexts (Matrix) the larger is the more related cp2:access to the centre office =0.6, ⋮ Context vector for v Each worker weights the va1= 0.5 then x3 to ca1 is adjusted as ai Activity-affecting context “Activity affecting contexts” 1.0*0.5=0.5 Context vector for v Each worker weighs the Activity- vp1=0 then y1 to cp1 = is adjusted as pi “Activity affecting contexts” affecting context 1.0*0=0 3 Prototype system implementation 3.1 Assumed applicable area In this study, we conducted calculations using sample data to confirm that the working the model. We assumed a simple organisation in Tokyo, Japan, with simple workplace settings in the summer season. H. Kumagai, N. Ishibashi, Y. Kiyoki: ‘Anywhere to Work’ a Data Model for Selecting Workplaces According to Intents and Situations 159. 3.2 Functional overview 3-Type / 4-module We designed a prototype system based on Tri-knowledge-base. In practice with the assumed the contexts of workplace, two types of ‘Activity-affecting’ contexts were observed; one type was not dependent on any place, while the other was dependent on the centre offices. Therefore, we divided the ‘Activity-affecting’ contexts calculation into two. As a result, the system had four modules in three types (Figure 5). Figure 5: Modules of the prototype system Source: own. − Module 1 (1-1, 1-2) ‘Activity-affecting’ context calculation The first module calculated x’: ‘Activity-affecting’ context and divided it into two sub-modules. o Module 1-1: General (none place dependent) This module calculated ‘General (none place dependent)’ context. We defined the parameters of the context as cag and knowledge base as Mag. The result: xg’ was normalised and applied to the final calculation ‘Activity and Place General (none place dependent)’ with knowledge base Mapg in Module 3. o Module 1-2: Place dependent 160 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 In this experiment setting, some ‘Activity-affecting’ contexts which depended on the place ‘centre office’. We defined the parameters of the contexts as cao, and knowledge base as Mao. The result: xo was normalised in these contexts, added to xg,’ and then normalised to xo’. The result: xo’ was applied to the calculation only for ‘Activity and Place’ knowledge base ‘centre office’ dependent Mapg in Module 3. Another intermediate calculation: xo, might occur in other place-dependent workplace contexts. − Module 2: ‘Place-determining’ context calculation Module 2 calculated ‘Place-determining’ context. We defined cp as the context parameter, and Mp as the knowledge base. Result: y’ was normalised. − Module 3: ‘Activity and Place’ The final module calculated ‘Activity and Place’ with the primary knowledge base Map. In this experiment, the knowledge base was divided into ‘none place dependent’ (Mapg) and ‘centre office dependent’ (Mapo), and applied to the results of Module1 (1-1, 1-2) and Module 2. − Personal context vectors We defined one personal context vector item as a parameter of the complementary context of the workplace, ca and cp. The context vector vi was set by each worker, in advance, who was the system user. In addition, we set different context vectors for different options of a parameter if it could vary from person to person. For example “Indoor temperature”, was set basically 22 to 28℃as the comfortable range. However, the feeling of ‘Indoor temperature’ might vary depending on the person. Therefore, we divided the range into three, 22-24/24-26/26-28, and applied to same correlation to the knowledge base. If a worker felt uncomfortable in the band of 22-24, the person could weigh lower on their context vector, such as 0.5 or 0. Thus, personal preferences could be included in the personal context vector. 3.3 Parameter, correlation, and normalisation range In this experiment, al complementary contexts (cai and cpi) were defined from ‘0’ to ‘1’. If several options were available for a parameter, such as very good/good/neutral/bad/very bad in ‘Psychological safety level’, they were divided H. Kumagai, N. Ishibashi, Y. Kiyoki: ‘Anywhere to Work’ a Data Model for Selecting Workplaces According to Intents and Situations 161. into exclusive options; only the value of selected option became ‘1’ and rest were set to ‘0’. In addition, we set the range of correlations in the knowledge bases from zero to one. Therefore, multiple results of one parameter (the context of the workplace) and correlation (knowledge base) fel within the range of 0 to 1. Final y, we normalised the matrix product by the average and divided the matrix product by the number of parameters. Consequently, the objective variable yi fel within the range of 0 to 1. 4 Experiment 4.1 Experimental context parameters and knowledge bases For the experiment, we defined the workplace options, activity options and complementary contexts of workplace parameters and personal context vectors, as shown in Table 2, and knowledge bases, as shown in Figure 6-9. Table 2: the parameters of the experiment. Symbol definition Options Actual value Context Vector y1: home office; Live alone or separate room y2: home office; Live with family y3: 3rd place; Café or Library Results can Workplace: the y4: 3rd place; Shared open office vary depends y objective variable y5: 3rd place; Rental Bos Not applied on the (vector) y6: Centre office; Booth y7: Centre office; Open desk context y8: Centre office; Open communication small y9: Centre office; Open communication small y10: Centre office; Meeting room x1: solo work; high concentration, Activity: primary x2: solo work; low concentration, exclusive x explanatory x3: co-work Not applied options variable (vector) x4: casual communication x5: formal communication ‘Activity-affecting contexts’: ca - General (none place dependent): Cag cag11: Administration Percentage One for all c cag12: Coordinator ag1 Job type cag13: Business planning (total 100%) options cag14: R&D 162 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Symbol definition Options Actual value Context Vector cag15: Sales cag21: Very good Psychological cag22: Good exclusive One for all cag2 safety level cag23: Neutral options options cag24: Bad cag25: Very bad - Place ‘centre office’ dependent cao1 Attendances cao1: preferable people is there 0/1 Applied one cao2 Attendances cao2: dislike people is not there 0/1 Applied one cao3 Attendances cao3: Team member(s) be there 0/1 Applied one c Indoor quality; ao42: 22-24℃ exclusive One for each cao4 temperature cao43: 24-26℃ options option cao44: 26-28℃ Indoor quality; cao52: 35-45% exclusive One for each cao5 humidity cao53: 45-55% options option c ao54: 55-65% 1- Indoor quality; cao6 ([actual ppm] Applied one CO2(ppm) cao6: ppm -1000)/1500 Indoor quality; cao71: Less 300Lx exclusive One for each cao7 Brightness on cao72: 300-600Lx options option desktop c ao73: Over 600Lx cao8 refreshment cao8: Drink 0/1 Applied one cao9 refreshment cao9: Snack 0/1 Applied one cao10 refreshment cao10: Meal 0/1 Applied one ‘Place-determining contexts’: cp cp11: 0% Weather: Rain cp12: 10-40% exclusive One for all cp1 chance forecast cp13: 50% options options at last 21pm c14: 60-90% cp15: 100% cp21: Central 3-wards Tokyo exclusive One for all c cp22: Central 5-ward p2 Area of the office cp23: Dedicated Big Cities options options c24: Others cp31: In 30-mins cp32: 30 - 60 mins exclusive One for all cp3 Commuting time cp33: 50% options options cp34: 60-120 mins cp35: Over 120 min H. Kumagai, N. Ishibashi, Y. Kiyoki: ‘Anywhere to Work’ a Data Model for Selecting Workplaces According to Intents and Situations 163. x1: x2: solo work; high solo work; low x3: co-work x4: casual x5: formal concentration, concentration, communication communication y1: home office; Live alone or separate room 1.0 0.8 0.0 0.2 0.8 y2: home office; Live with family 0.6 0.6 0.0 0.2 0.2 y3: 3rd place; Café or Library 0.6 0.6 0.0 0.0 0.0 y4: 3rd place; Shared open office 0.4 0.6 0.4 0.2 0.6 y5: 3rd place; Rental Bos 0.6 0.4 0.0 0.0 0.4 y6: centre office; Booth 0.8 0.2 0.0 0.0 0.0 y7: centre office; Open desk 0.6 0.8 0.6 0.4 0.2 y8: centre office; Open communication smal 0.4 0.6 1.0 0.8 0.2 y9: centre office; Open communication smal 0.0 0.8 0.8 1.0 0.4 y10: centre office; Meeting room 0.0 0.0 0.2 0.4 1.0 Figure 6: Knowledge base; Map: ‘Activity and Place’ Source: own. x1: x2: solo work; high solo work; low x3: co-work x4: casual x5: formal concentration, concentration, communication communication Job type cag11: Administration 0.4 0.6 0.4 0.4 0.8 cag12: Coordinator 0.4 0.6 0.6 0.4 1.0 cag13: Business planning 1.0 0.8 0.8 1.0 0.4 cag14: R&D 1.0 0.8 1.0 1.0 0.4 cag15: Sales 0.6 1.0 0.6 0.8 0.6 psychological cag21: Very good 0.4 0.6 0.8 1.0 0.6 safety level cag22: Good 0.4 0.6 0.6 0.8 0.6 cag23: Neutral 0.4 0.4 0.4 0.4 0.4 cag24: Bad 0.2 0.2 0.2 0.2 0.4 cag25: Very bad 0.2 0.2 0.2 0.0 0.2 Figure 7: Knowledge base; Mag: ‘Activity-affecting’ General, none place dependent Source: own. x1: x2: solo work; high solo work; low x3: co-work x4: casual x5: formal concentration, concentration, communication communication Attendances cao1: preferable people is 0.4 0.8 0.8 1.0 0.6 cao2: dislike people is not 0.0 0.6 0.4 1.0 0.2 cao3: Team member(s) be 0.4 0.6 0.6 0.6 0.8 Indoor quality; cao42: 22-24℃ 0.8 0.8 0.6 0.8 0.4 temperature cao43: 24-26℃ 0.8 0.8 0.6 0.8 0.4 cao44: 26-28℃ 0.8 0.8 0.6 0.8 0.4 Indoor quality; cao52: 35-45% 0.8 0.8 0.6 0.8 0.4 humidity cao53: 45-55% 0.8 0.8 0.6 0.8 0.4 cao54: 55-65% 0.8 0.8 0.6 0.8 0.4 Indoor quality; CO2(ppm) 1-([actual ppm] -1000)/1500 1.0 0.8 0.6 0.6 0.6 Indoor quality; cao71: Less 300Lx 0.2 0.2 0.2 0.6 0 Brightness on desktop cao72: 300-600Lx 0.8 0.8 0.6 0.8 0.8 cao73: Over 600Lx 0.2 0.2 0.4 0.2 0.6 refreshment cao8: Drink 0.6 0.8 0.6 1.0 0.2 cao9: Snack 0.4 0.4 0.4 1.0 0.0 cao10: Meal 0.4 0.4 0.4 0.8 0.0 Figure 8: Knowledge base; Mao: ‘Activity-affecting’ Place, centre office dependent. Source: own. 164 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Rain chance forcast at last 21pm Area of the office Commuting time itis rds rd C in s ins ins m % in m m % -m 0 -40 r 120 -90 % edicated Big thers - 60 -12 ve 0% 10% 50% 60 100 Central 3-wa Central 5-wa D O In 30 30 60 O y1: home office; Live alone or separate room 0.4 0.4 0.6 0.8 1.0 0.2 0.6 0.8 1.0 0.4 0.6 0.8 1.0 y2: home office; Live with familiy 0.4 0.4 0.6 0.8 1.0 0.0 0.4 0.6 0.8 0.4 0.6 0.8 1.0 y3: 3rd place; Café or Library 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.4 0.6 0.0 0.2 0.4 0.6 y4: 3rd place; Shared open office 0.4 0.4 0.4 0.2 0.2 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 y5: 3rd place; Rental Bos 0.4 0.4 0.4 0.2 0.2 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 y6: centre office; Booth 0.5 0.5 0.5 0.3 0.0 0.8 0.5 0.3 0.0 0.8 0.6 0.2 0.0 y7: centre office; Open desk 0.5 0.5 0.5 0.3 0.0 0.8 0.5 0.3 0.0 0.8 0.6 0.4 0.2 y8: centre office; Open communication smal 0.5 0.5 0.5 0.3 0.0 0.8 0.5 0.3 0.0 0.8 0.8 0.6 0.4 y9: centre office; Open communication smal 0.5 0.5 0.5 0.3 0.0 0.8 0.5 0.3 0.0 1.0 0.8 0.6 0.4 y10: centre office; Meeting room 0.5 0.5 0.5 0.3 0.0 0.8 0.5 0.3 0.0 0.6 0.4 0.2 0.0 Figure 9: Knowledge base; Mp: ‘Place-determining’ Source: own. 4.2 Visualization of Results We prepared sample data that can show the features of the model. The system lists the two results in a line graph; the dashed line describes the results of the primary knowledge base for ‘Activity and Place’, and the solid line describes the results of the complementary contexts of the workplace. A place with a higher value is preferable to other places in the workplace. 4.3 Experiment 4.4 Result for different activities First, we created three sample datasets and set activities differently but the same for al other complementary contexts of the workplace (Figure 10). The shapes of the results for both the primary knowledge base (dashed line) and with-contexts-of-workplace (solid line) were similar. However, some points of with-contexts-of-workplace (circles in the graphs) differed from the primary points. This indicates that the context of the workplace affects differently. H. Kumagai, N. Ishibashi, Y. Kiyoki: ‘Anywhere to Work’ a Data Model for Selecting Workplaces According to Intents and Situations 165. Figure 10: Results with complementary context are different from the primary knowledge base Source: own. 4.4.1 Results for different complementary contexts Second, we prepared three sample datasets and set either different ‘Activity-affecting context’ or different ‘Place-determining context’ for the same activity (Figure 11). Both types of workplace contexts generated different preferences. Figure 11: Results for different complementary context Source: own. 166 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 4.4.2 Results for different personal context vectors Final y, we prepared three sample datasets and set the same activity and complementary contexts for the workplace, but with different personal context vectors (see Figure 12). The results were not much different from each other, but slightly changed the rank of preferability (circles in the graphs). Figure 12: Results for different context vector Source: own. 5 Conclusions and further scope Herein, we proposed a data model and calculation method with three knowledge bases and the contexts of workplace, and showed the possibility of selecting appropriate workplaces. The system afforded different results with the complex contexts of workplace from the result with only ‘Activity and Place’ knowledge base, which has been used for traditional workplace planning. However, several practical issues remain unresolved. H. Kumagai, N. Ishibashi, Y. Kiyoki: ‘Anywhere to Work’ a Data Model for Selecting Workplaces According to Intents and Situations 167. 5.1 Is there sufficient context? Some readers of this paper may state that they have different contexts to decide the workplace. Particularly, ‘activities’ as the primary context, must be wel modeled. In this study, we prioritized the method to calculate in complex workplace context. However, we must define the activity model for more practical situations. Furthermore, we believe that the contexts of the workplace and knowledge bases might be different from a set of organisational and workplace settings. In our prototype system, we manual y set up the contexts of the workplace and knowledge base. Therefore, the system must be improved to easily establish context and knowledge bases. 5.2 Are the ‘Activity-affecting’ and ‘Place-determining’ contexts related each other? Here, we have determined that ‘Activity-affecting’ and ‘Place-determining’ contexts are related each other. Therefore, the result: x’ of ‘Activity-affecting’ context has multiplied by the results: y’ of ‘Place-determining’ context as y= y’Mpax’. If there is no relationship between ‘Activity-affecting’ and ‘Place-determining’, we can add y’ to Mpax’; as y= y’+Mpax’. The formula means that the ‘Place-determining’ context will less affect, if the result; x’ of ‘Activity-affecting’ context becomes larger. We aim to investigate this relationship by applying it to actual settings in the future. 5.3 How should the value be normalised? Here, we used the average to normalise the results. Although a strategy for normalization is currently unavailable, we aim to investigate the normalization way in the next step. 5.4 How can the future prediction of the contexts of workplace be col ected? Some workplace contexts include future prediction, such as attendance of other people, indoor quality (temperature/humidity) of tomorrow. Each workplace context cannot be col ected by any sensor and must be predicted using two types of method. 168 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 1) Using some other prepared information, e.g., for attendances context, due to COVID-19, some organisations adopted ‘Office access control’. The organisations ask the workers to book in advance to come to office. The system can provide future attendance for a person. 2) Alternatively, the system may make infererences using knowledge bases and past data. 6 Future possibility and Next steps 6.1 Future possibility of the proposed model The calculations were conducted manually and individually using the prototype system. If implemented as a real-time online system in an actual setting, the model can serve as a personal assistance tool for workers. This tool, which connects to schedule organising applications, can make workers more productive and comfortable in complex workplace contexts. If the system can handle multiple data simultaneously, facility managers can use it as a simulator to plan workplace settings. A facility manager can set functional spaces and workplace services in several options, and then simulate the occupancy rate of the virtual centre office and estimate the excess or deficiency. 6.2 Next step of the research In the next step, we plan to prepare a more practical system and apply it to an actual setting. Subsequently, we aim to evaluate the functionality of the model, contexts of the workplace, and knowledge bases. However, the study encounters a chal enge; therefore, we aim to col ect more dynamic intent (activity) data and the feelings of workers. Currently, we can col ect such intent data from only a few questionnaires. However, we desire to have more continuous and extensive data to improve this model. Therefore, we aim to develop service applications, such as the personal assistance mentioned in the previous section. H. Kumagai, N. Ishibashi, Y. Kiyoki: ‘Anywhere to Work’ a Data Model for Selecting Workplaces According to Intents and Situations 169. References [1] Peter F. DRUCKER (1959), Landmarks of Tomorrow, Harper colophon books [2] Ikujiro NONAKA, (1990). Management of Knowledge Creation. Tokyo: Nihon Keizai Shinbun-sha. [3] Craig ROTH (2019, December 11), 2019: When We Exceeded 1 Bil ion Knowledge Workers, Gartner Blog Network, https://blogs.gartner.com/craig-roth/2019/12/11/2019-exceeded-1-billion-knowledge- workers/#:~:text=At%20some%20point%20this%20year,many%20knowledge%20workers%2 0in%20history, Retrieved December 21st 2022 [4] Jack KELLY, (2021, April 1), Google Wants Workers To Return To The Of ice Ahead Of Schedule: This Looks Like A Blow To The Remote-Work Trend, Forbes, https://www.forbes.com/sites/jackkelly/2021/04/01/google-wants-workers-to-return-to-the-office-ahead-of-schedule-this-looks-like-a-blow-to-the-remote-work-trend/?sh=43dec3c11575, Retrieved December 21st 2022 [5] Erik VELDOHOEN (2004), The Art Of Working, Academic Service [6] Veldhoen + Company, Rethink the way we work Activity Based Working, https://www.veldhoencompany.com/en/activity-based-working/, Retrieved December 21st 2022 [7] Yokoyama, M., Kiyoki, Y., & Mita, T. (2019). A Correlation Computing Method for Integrating Passengers and Services in Semantic Anticipation. Information Modelling and Knowledge bases XXX, 312, 435., IOS Press 170 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 4 Art Applications Se zamenja 172 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Se zamenja AN IMPLEMENTATION METHOD OF GACA: GLOBAL ART COLLECTION ARCHIVE YOSUKE TSUCHIYA,1, 2 NAOKI ISHIBASHI1 1 Musashino University, Graduate School of Data Science, Tokyo, Japan g2251002@stu.musashino-u.ac.jp, n-ishi@musashino-u.ac.jp 2 IT System Group, Ishibashi Foundation, Tokyo, Japan g2251002@stu.musashino-u.ac.jp In this paper, an implementation method of GACA, Global Art Collection Archive, is proposed. Each museum maintains their own archives of art col ections. GACA dynamical y integrate those collection data of artworks in each museum archive and provide them with REST API. GACA works as a integrated data Keywords: GACA, platform for various kinds of viewing environment of artworks museum database, such as virtual reality, physical exhibitions, smartphone open data, applications and so on. It al ows users not only to view artworks, multidatabase system, but also to experience the creativity of artworks through seeing, Art Sensorium feeling, and knowing them, inspiring a new era of creation. Project DOI https://doi.org/10.18690/um.feri.5.2023.7 ISBN 978-961-286-745-4 174 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 1 Introduction In recent years, many museums put large efforts to establish digital archives of their art collections. Some museums such as the Metropolitan Museum of Art in New York[1], the Paris Musées[2], and the Louvre Museum[3], provide their archives on art col ections as open data. These archives contain various types of information (such as information about the collections and exhibitions) in various media formats (such as text, images, video, audio, and 3D models). The archives of these museums consist of different media and different genres of artworks, and multidatabase system approach [4,5,6,7] that seems applicable to integrate such heterogeneous archives. Artizon Cloud [8] is a multidatabase system that integrates various archives of art collection data and enables them to use inside the museum and public areas while properly handling issues such as copyrights. By integrating col ection data, new types of art experience could be implemented in both physical and virtual spaces. Art Sensorium Project[9], as shown in Figure 1, was launched in Musashino University with focusing on two key technologies as follows: 1) a multidatabase system architecture to integrate multiple art collections, 2) virtual space design and implementation for the Data Sensorium[10]. In particular, personalized art exhibitions where artworks are selected from a museum archive and displayed based on the viewer's tastes and viewing tendencies could be implemented [11]. Figure 1: A System Architecture of Art Sensorium Project Source: own. Y. Tsuchiya, N. Ishibashi: An Implementation Method of GACA: Global Art Col ection Archive 175. The primary focus of the Art Sensorium Project is the design and implementation of a multidatabase system that integrates art col ections. Therefore, in this paper, an implementation method of GACA, Global Art Collection Archive, is proposed. Each museum creates its own archives of art col ections, then GACA integrates these archives and provides appropriate data for use in virtual reality, Data Sensorium, personal fabrication[12], and other applications. The structure of the paper is as follows: Section 2 reviews related researches and highlights the challenges that need to be addressed. Section 3 outlines the GACA architecture, designed to integrate with an arbitrary number of museum archives. Section 4 details the implementation of GACA, focusing on its integration with three specific museum archives. Section 5 presents two experiments conducted to verify the system's functionality and effectiveness. 2 Related Researches There are several existing integrated archives of art col ections, including Japan Search[13] and the Heritage Connector[14] in the UK. These archives receive government subsidies and contain art collections from museums within their respective countries. Google Arts & Culture [15] is another example of an integrated archive. It provides art col ections and connects them to other web services provided by Google such as Geo Locations and Augmented Reality. These existing integrated archives of art col ections have issues in terms of data heterogeneity and col ection coverage. The first issue is data heterogeneity, which can manifest in two ways: the heterogeneity of the data structure and the heterogeneity of the data notation. In terms of data structure, art col ection data of each museum are typical y organized according to each own rules. This results in missing data items (e.g., an item that exists in one museum but not in another) and non-uniform data types (e.g., a serial ID of a work may be an integer in one museum but a string in another). The other is about the heterogeneity of data notation, such as the language notation of the artist's name, unit of artwork size, and so on. For example, the same "artist name" may be listed in multiple languages in one museum, while in another museum, only listed in the local language. Additionally, the size of a work may be listed in centimeters in one museum, but in inches in another. 176 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 The second issue is colection coverage. Covering art colections from al over the world is a challenging task. To achieve this, an environment should be created where art collection data from each museum can be easily shared online. Japan Search and the Heritage Connector are federal y subsidized integrated col ection archives that provide collection data from museum archives via API linkage or CSV. However, their coverage is limited to data from domestic museums. Google Arts & Culture is a cloud service provided by Google. Museums can choose to manually register their collections with Google Arts & Culture, which can then be linked to other Google services such as Google Maps and AR. However, not al museums provide all of their collection data to Google Arts & Culture (although some, such as the MET, do provide API integration). Many museums register their col ections with Google Arts & Culture for public relations purposes, such as showcasing famous painters. This means that less famous, but still significant and important artworks are not accessible. Our goal is to create a viewing environment that al ows users not only to see famous artworks, but also to experience the creativity of artworks through seeing, feeling, and knowing them, inspiring a new era of creation. To achieve this, GACA is designed as a data platform that integrates various archives of art col ection data maintained by each museum, making them accessible through a range of devices and applications (as shown in Figure 2). To facilitate seamless integration and use of these data within the devices and applications, GACA wil be implemented as a multidatabase system that addresses the previously mentioned issues of heterogeneity and col ection coverage. Viewing Environments for Integrated Art Collections API Integrated Art Collections Resolving heterogeneity Data Integration for Heterogenous Archives Museum Museum Museum Heterogenous Art Collections Archive 1 Archive 1 Archive N build in each museum. Figure 2: Idea of GACA Source: own. Y. Tsuchiya, N. Ishibashi: An Implementation Method of GACA: Global Art Col ection Archive 177. 3 GACA Architecture GACA is designed as shown in Figure 3 to address the issues with existing integrated archives, such as data heterogeneity and col ection coverage. Specifical y, GACA consists of five main components, Multidatabase Engine for Integrated Art Archive, Integrated Art Archive, Data Converters, Dictionary Connectors and Integrated API. App 1 App 2 App x Integrated API Dictionary Dictionary 1 Connector 1 Such as Artist Name Multidatabase Engine Dictionary Dictionary 2 Integrated Art Connector 2 Such as GIS/GPS Archive for Integrated Art Archive Dictionary Dictionary m Connector m Such as Something Data Data Data Converter 1 Converter 2 Converter n Museum Museum Museum Archive 1 Archive 2 Archive n Figure 3: A System Architecture of GACA Source: own. 3.1 Multidatabase Engine for Integrated Art Archive The Multidatabase Engine for Integrated Art Archive serves as the core of GACA. It incorporates various curatorial functions of data from the Integrated Art Archive, utilizing methods such as image recognition, machine learning, spatial and temporal operations and so on. Multidatabase Engine sends requests based on those methods to the Integrated Art Archive, and receive integrated art col ection data of each museum as response from the archive. 3.2 Integrated Art Archive The Integrated Art Archive serves as a central location where art col ection data from various museum archives are stored. As described below, Data Converters are used to store the art collection data of each museum with converting to a common 178 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 schema. Dictionary Connector adds a unique key to those art collection data to connect it to the dictionary. The Integrated Art Archive utilizes these unique keys to provide the integrated art collection data in response to requests from the Multidatabase Engine. 3.3 Data Converters The Data Converters convert the data schema of the art collections in each museum archive into a common data schema. Each museum builds its own archive, so each museum has a different schema for its col ection data. For example, in one museum archive, artwork and artist information are maintained in separate tables, and each assigned a unique ID. In contrast, another museum archive maintains artworks and artists on the same table but does not assign a unique ID to artists. Additional y, each museum retains different types of media data. For instance, regarding multimedia data in a collection, one museum may only have images, while another may also manage audio and video in addition to images. Dictionary Connector Data Converter Museum Archive X Figure 4: A Data Flow in GACA Source: own. Y. Tsuchiya, N. Ishibashi: An Implementation Method of GACA: Global Art Col ection Archive 179. To address this heterogeneity among museum archives, GACA converts the collection data schema of each archive into a common collection data schema and stores it in the GACA Integrated Archive, as shown in Figure 4. The Data Converters generate three tables for managing col ection data: a table of artworks, a table of artists, and a table of correspondence between artworks and artists. This al ows the collection data from each archive to be combined at the meta-level, providing a comprehensive overview of the collections in the GACA Integrated Archive. 3.4 Dictionary Connectors Dictionary Connectors obtain information on references (dictionary information) and assign unique information for data retrieval in GACA. Collection data contains a variety of information. For example, artist information includes the artist's name, place of birth, date of birth, and date of death. Artwork information includes the name of the work, year of creation, place of creation, materials, techniques, and dimensions. While this information is useful for searching collections, the notation and units differ within each museum's archives. For example, names may be written in the native language of the museum, and the locations and units of measurement for dimensions of works may vary (e.g., inches or centimeters). These differences make it difficult to search and compare information across different museums' archives. To address the issue of heterogeneity of data notation, the Dictionary Connector connects to relevant dictionary databases, including artist notations, dimensions, time, and location data. The Dictionary Connector also generates a unique key for the art collection data from the dictionary database. It assigns this unique key to the col ection data of each archive, connecting to the dictionary data. This enables cross-search and information extraction within the GACA. By implementing Dictionary Connector and Data Converter, GACA enables to integrate of each museum archive dynamically, without selecting and limiting the collection data. The museum archives which store huge amounts of collection data could also easily be integrated into GACA by implementing the data converter. As a result, the GACA addresses the issue of col ection coverage. 180 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 3.5 Integrated API Integrated API works as the interface between Multidatabase Engine for Integrated Art Archive and various types of application such as virtual museum, Data Sensorium and so on. Integrated API designed as RESTful API with implementing a token authentication. When the Integrated API receives a request with a search key via GET or POST, API send the search key to Multidatabase Engine and receive the matching collection data. Integrated API return the collection data to the application with formatted in a supported data format such as JSON, CSV, or XML. 4 An Implementation of GACA A GACA prototype system was implemented as shown in Figure 5. It is connected to three museum archives: the MET Collection, the Paris Musées Collection, and the Artizon Cloud. These archives are independently implemented and some of them offer open data with REST API. In addition, GACA is connected to an artist dictionary on Wikidata[16], which includes notation of artist names in five different languages: English, French, Chinese, Korean, and Japanese. As a result, GACA has integrated approximately 800,000 art collections from the three archives and provided access to the collections via REST API with artist names in five different languages (English, French, Chinese, Korean and Japanese). Collection Website Integrated API Artist MultidatabaseEngine for Integrated Dictionary Wikidata Archive Integrated Art Archive Connector MET Data Paris Musees ArtizonCloud Converter Data Converter Data Converter MET Paris Musees ArtizonMuseum Figure 5: An Implementation of GACA Source: own. Y. Tsuchiya, N. Ishibashi: An Implementation Method of GACA: Global Art Col ection Archive 181. 4.1 MET Data Converter As shown in Figure 4, the role of data converters is to generate three tables with a common schema. The Metropolitan Museum of Art (MET) in New York City has approximately 480,000 artworks, documents, and other materials available as open data. This data can be accessed through REST API or CSV. However, the MET's collection data is organized into a single table with both artist information and artwork information. Each artwork is given a local unique ID (Object ID), but the artists do not have IDs. On the other hand, each artist information includes a Wikidata URL, which has an Entity ID that uniquely identifies the artist information in Wikidata. To convert this data structure, the Data Converter for the MET Collection follows these steps: − Separate the data about the artwork and the artist from the collection CSV file. − Create a table of artworks using Object ID of the artworks as a key. − Extract Entity IDs from the Wikidata URLs of the artists. − Create a table of artist data using the Entity ID extracted in step 3 as a key. − Create a table of correspondence between the Entity ID of the artist and the Object ID of the artwork. 4.2 Paris Musées Data Converter Paris Musées is a public organization that oversees 14 museums in the city and has made the collections of approximately 360,000 items housed in these institutions available as open data. The col ection data is shared among the 14 museums and can be accessed via a JSON-formatted API. In contrast to the MET API, which is based on RESTful, a GraphQL query must be generated to retrieve the collection data from Paris Musées. Each artwork and artist in the collection has a unique ID, and the artwork is also associated with the ID of the museum that owns it. To convert this data structure, the Data Converter for Paris Musées fol ows these steps: 1. Submit a GraphQL query to obtain information on the 14 museums in the Paris Musées network and retrieve the data in JSON format. 182 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 2. Using the ID of each museum as a key, submit a GraphQL query to retrieve the collection data stored in each museum and obtain the data in JSON format. 3. Separate the data related to artworks and artists from the data obtained in step 2. 4. Create an artwork table using the unique ID of each artwork as the key. 5. Create an artist table using the unique ID of each artist as the key. 6. Create a table of correspondences between the unique ID of each artwork and the unique ID of each artist. 4.3 Artizon Cloud Data Converter The Bridgestone Museum of Art, which was founded in 1952, was reopened in 2020 as the Artizon Museum[17]. Artizon Cloud is a multidatabase system that contains various data archives related to artworks owned by the Artizon Museum. These archives include basic information of collection, evidential documents, multimedia (including images and sound), text, and event archives. Artizon Cloud controls the scope of collection offerings through three layers (Private Zone, Museum Zone, Public Zone) and rights relations. The Artizon Cloud Data Converter converts the data structure of the collection data published in the Public Zone of Artizon Cloud using the following steps: 1. Separate the data about artworks and the data about artists from the Artizon Cloud art collection. 2. Create a table of artworks using the artwork IDs as keys. 3. Create a table of artists using the artist's ID as a key. 4. Create a table of correspondences between the artwork IDs and artist IDs. 4.4 Artist Dictionary Connector In this prototype system, Wikidata was utilized as the reference dictionary for artist notation. Wikidata is a collaborative, open data database that is compiled and normalized by volunteers. It is freely available to the public and has gained a reputation for credibility, receiving the Open Data Publisher Award in 2014. Additionally, its open-source nature has made it a central hub for datasets from various institutions, including libraries and museums. Y. Tsuchiya, N. Ishibashi: An Implementation Method of GACA: Global Art Col ection Archive 183. Figure 6 demonstrates the process of creating the artist notation dictionary data from Wikidata. The MET have already used Wikidata URLs as references for artist information in its collection data, and the MET Data Converter employs Wikidata Entity ID as the artist ID of MET. Thus, the Dictionary Connector first uses this Entity ID to obtain the notation of the relevant artist in five languages and create a basic dictionary table. It then searches the basic dictionary data table for the artist notations of Paris Musées and Artizon Cloud, querying Wikidata for any artist notations that do not match. These notations are obtained in the five languages and added to the basic dictionary table. Final y, the Dictionary Connector searches the respective Artwork-Artist tables generated by the Data Converter and assigns the corresponding artist a Wikidata Entity ID. This connects the artist information in each museum collection with the information in the notation dictionary, using the Wikidata Entity ID as the key. Figure 6: A Workflow of Dictionary Connector Source: own. 4.5 Integrated Art Archive Integrated Art Archive is implemented with a relational database management system (PostgreSQL version 12.3). As shown in Figure 3, art collection data, which is converted to a common data schema by Data Converter, are stored in artworks tables and artist tables. Also, Wikidata Entity ID of the artist's name is added to the artwork-artist table. Integrated Art Archive receives the query by SQL from Multidatabase Engine and returns the result set. 184 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 4.6 Multidatabase Engine for Integrated Art Archive As a prototype of multidatabase engine for the Integrated Art Archive, a method to search art col ections by the different languages of artist names was implemented. It works as fol owing steps to retrieve the art col ection data from Integrated Art Archive. 1. Receive al or a part of an artist's name from the Integrated API. The artist's name can be in any five languages (Japanese, English, French, Chinese, and Korean), obtained from the Dictionary Connector. 2. Search the received artist name in the Artist Dictionary to obtain the Wikidata Entity ID and notation of the artist name by five languages of the artist. 3. Submit a query to the Integrated Art Archive using the artist's Wikidata Entity ID obtained in step 2 as a key to retrieve the relevant artist's col ection data. The query is submitted by SQL. 4. Receive the result set (integrated art col ection data) from the Integrated Art Archive and return it to the Integrated API. 4.7 Integrated API In this prototype, the Integrated API was implemented using Python 3.6.8 and the Flask framework for a web application. As shown in Figure 7, the API provide endpoints that accept requests by GET or POST methods and returns the corresponding data in JSON format. At this point, two endpoints have been implemented as follows: 1) an endpoint for artist name, which returns a set of Wikidata Entity ID and the artist’s name in five languages from the Artist Dictionary, 2) an endpoint for Wikidata Entities ID, which returns a list of art collection data from the Multidatabase Engine for the Integrated Art Archive. To ensure security, the Integrated API was implemented with token authentication using JWT (JSON Web Token). Only authorized applications can access the Integrated API. In addition to the API, the website provides a form for submitting keys to the endpoints and displays the results. The website al ows searches of the integrated data col ection by artist name and artwork name, as wel as searches of the col ection data for each individual museum. Y. Tsuchiya, N. Ishibashi: An Implementation Method of GACA: Global Art Col ection Archive 185. Figure 7: A Example of JSON Data returned from Integrated API Source: own. 5 Experiments To confirm that the system is functioning as intended, two experiments were conducted: one to test the integrated search capabilities of the col ections, and the other to test the operation of the Integrated API. 5.1 Experiment of Integrated Col ection Search An experiment was conducted to test the integrated search functionality of the system. During the experiment, it was confirmed that the following two points were working as expected: 1) searches for artists could be performed in five languages, and 2) the results showed that the works of the corresponding artists were retrieved from al three archives and displayed in a single list. For example, "Renoir" by Pierre-Auguste Renoir (1841-1919) is rendered as "Renoir" in English and French, "雷諾 瓦" in Chinese, "르누아르" in Korean, and "ルノワール" in Japanese. Searches for "Renoir" in each language al returned the expected results. Figure 8 shows the results of the artist search with each "Renoir" notation, demonstrating that Pierre-Auguste Renoir can be searched in each language. Figure 9 shows some of the 186 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 results, with works from the Metropolitan Museum of Art, Paris Musées, and Artizon Cloud displayed in the same list. During the implementation of the system, several issues became apparent. One issue is that keys that are not registered in the dictionary database cannot be retrieved. In this case, the artist notation dictionary was created from Wikidata, but if the notation for the relevant artist in a particular language was not present in Wikidata, it would not appear in the search results. As shown in Figure 7, "ルノワ ール" in Japanese corresponds to both “Renoir” and "Lenoir" in English, resulting in the retrieval of Albert Lenoir (1801-1891) and Alfred Lenoir (1850-1920). However, the Chinese ("阿爾伯特-勒努瓦", "阿尔弗雷德-勒努瓦") and Korean ("알버트 르누아르", "알프레드 르누아르") notations for these two artists are not currently present in Wikidata, so they could not be registered in the dictionary when it was created. Therefore, searches for "勒努瓦" in Chinese or "알프레드" in Korean would not retrieve these two artists. Figure 8. A Result of Artist Search in Different Language Source: own. Y. Tsuchiya, N. Ishibashi: An Implementation Method of GACA: Global Art Col ection Archive 187. Additionally, artworks by unknown artists also cannot be retrieved. The Metropolitan Museum of Art collection has approximately 15,000 works and the Paris Musées collection has 86,000 works with an unknown artist (works with an anonymous or unknown artist name). These works, such as those with blank or "Anonyme" in the artist’s name, have not been retrieved by the artist search. On the other hand, there is more information about artwork than just its artist name, such as the date of creation, location of the work, size, and more. There is also information derived from image analysis of the artwork. Connecting separate dictionaries corresponding to this information and using them to search the col ection would be needed. Currently, there are 800,000 artworks subject to search, and connecting multiple dictionaries wil be necessary to retrieve artworks that users are seeking from this large collection. Figure 9: A Result of Integrated Search by Artist name Source: own. 188 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 5.2 Experiments of Integrated API As a test of the Integrated API, we created a simple artwork viewer in Python that displays images of artworks corresponding to the artist's search. This application displays images of the artist's artwork according to the following procedure: 1. When the application is launched, the user sends a username and password via POST to obtain an access token. 2. The user send the Entity ID of the artist in the form at the top of the application. 3. The application accesses the API based on the Entity ID entered and obtains the JSON data of the list of works. 4. The application extracts the image URL of the artwork from the JSON data. 5. The application accesses the image URL of the artwork extracted in step 4 and downloads it. 6. Once al images have been downloaded, the application displays them. Figure 10 shows the artworks of Auguste Renoir displayed in the simple image viewer we created Note that the API only includes images that are in the public domain, and only the corresponding artworks are displayed on the viewer1. Figure 10: Artwork Viewer from Integrated API Source: screenshot, own, 2023. 1 Images are cited from Met Collection: https://www.metmuseum.org/art/the-collection, and Paris Musées Collection: https://www.parismuseescollections.paris.fr/en (2022). Y. Tsuchiya, N. Ishibashi: An Implementation Method of GACA: Global Art Col ection Archive 189. One issue with the Integrated API is the time required to download images when the list of works returned from the API becomes large. GACA maintains the URL of images in each col ection data, so they must be accessed and downloaded each time. Due to network or system problems at individual archives, it may delay downloading and display of images. Media data, such as artwork images, are users' most desired data. In particular, it is undesirable to take a long time to load images of artworks when deploying into applications such as virtual museums. To address this issue, individual archives can provide media data in different sizes, such as simple, normal, and maximum sizes, al owing applications to select the appropriate size data based on the intended use. The Integrated API should also be able to provide corresponding images in this way. As an alternative measure, media data converted to a smal er size could be maintained as cache data in each application. 6 Conclusion The Global Art Col ection Archive (GACA) is proposed as a method for globally integrating art collections of various museums to provide accessibility through GACA API. By connecting the individual museum archives to dictionaries such as artist names, GACA enables the dynamic curation, integration, and use of these art collections in a variety of applications, including virtual reality, data sensorium, and digital fabrication. However, there are stil chal enges to be addressed in the future. These include finding ways to extract specific artworks from the large volume of collection data and storing and providing media data of artworks in appropriate sizes. The most significant future effort is to create a viewing environment based on GACA that al ows users to experience the creativity of artworks. We envision this environment not just as a place to view artworks, but as a space where users can ful y engage with and be inspired by the artworks through seeing, feeling, and knowing them. We hope that this chal enging effort wil open a new era of creativity and innovation. 190 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Acknowledgment Without colaborative research partnership between Musashino Unviersity and Ishibashi Foundation, this research would not even exist. We would like to express our utmost gratitude to Hiroshi Ishibashi, Taiji Nishijima, Kazunori Yamauchi, Shoji Kometani and Tomohiro Kawasaki of Ishibashi Foundation for their great cooperation and support throughout the project period. References [1] The Metropolitan Museum of Art: The Met Colection, available via WWW, https://www. metmuseum.org/art/collection (2022) [2] Paris Musées: Les collections en ligne des muse ́es de la Ville de Paris , available via WWW, https: //www.Paris Muséescollections.paris.fr/ (2022). [3] Muse ́e du Louvre: Atlas database of exhibits, available via WWW, http://cartelen.louvre.fr/ (2021). [4] Kitagawa, T. and Kiyoki, Y.: “The mathematical model of meaning and its application to multidatabase systems,” Proc. 3rd IEEE Int. Workshop on Research Issues on Data Engineering: Interoperability in Multidatabase Systems, p.130–135 (1993). [5] Kiyoki, Y. and Kitagawa, T.: ”A metadatabase system supporting interoperability in multidatabases”, Information Modeling and Knowledge Bases, Vol.5, pp.287–298 (1993). [6] Kiyoki, Y., Kitagawa, T. and Hitomi, Y.: “A fundamental framework for realizing semantic interoperabil- ity in a multidatabase environment”, Journal of Integrated Computer-Aided Engineering, Vol.2, No.1, pp.3–20 (1995). [7] Kiyoki, Y., Hosokawa, Y. and Ishibashi, N.: “A Metadatabase System Architecture for Integrating Het- erogeneous Databases with Temporal and Spatial Operations,” Advanced Database Research and Devel- opment Series Vol. 10, Advances in Multimedia and Databases for the New Century, A Swiss/Japanese Perspective, pp.158–165, World Scientific Publishing (1999). [8] Ishibashi, N.: “Artizon Cloud: A Multidatabase System Architecture for an Art Museum,” Information Modelling and Knowledge Bases XXXIII, pp.323-331 (2022). [9] Ishibashi, N., Fukuda, T., Tsuchiya, Y., Enzaki, Y., Iwata H.: “Art Sensorium Project: A System Architecture of Unified Art Collections for Virtual Art Experiences,” 33rd International Conference on Information Model ing and Knowledge Bases EJC 2023. (2023) (submitted) [10] Iwata, H., Sasaki, S., Ishibashi, N., Sornlertlamvanich, V., Enzaki, Y., Kiyoki, Y.: “Data Sensorium: Spatial Immersive Displays for Atmospheric Sense of Place,” Information Modelling and Knowledge Bases XXXIV, IOS Press, pp.247-257. (2022). [11] Fukuda, T., and Ishibashi, N: “Virtual Art Exhibition System: An Implementation Method for Creating an Experiential Museum System in a Virtual space”, Information Modelling and Knowledge Bases XXXIV, IOS Press, pp.38-47.(2022). [12] Gershenfeld, N., Gershenfeld,A. and Cutcher-Gershenfeld, J.:”Design Reality: How to Survive and Thrive in the Third Digital Revolution”, Basic Book (2017). [13] Japan Search, available via WWW, https://jpsearch.go.jp/ (2022). [14] Heritage Connector, available via WWW, https://www.sciencemuseumgroup.org.uk/project/heritage-connector/ (2022) [15] Google Arts & Culture, available via WWW, https://artsandculture.google.com/ (2022). [16] Wikidata, available via WWW, https://www.wikidata.org/ (2022). [17] Artizon Museum, available via WWW, https://www,artizon.museum (2022). ART SENSORIUM PROJECT: A SYSTEM ARCHITECTURE OF UNIFIED ART COLLECTIONS FOR VIRTUAL ART EXPERIENCES NAOKI ISHIBASHI,1,2 TSUKASA FUKUDA,2 YOSUKE TSUCHIYA,1 YUKI ENZAKI,1 HIROO IWATA3 1 Musashino University, Faculty of Data Science, Tokyo, Japan n-ishi@musashino-u.ac.jp, g2251002@stu.musashino-u.ac, enzaki@musashino-u.ac.jp 2 Musashino University, Graduate School of Data Science, Tokyo, Japan n-ishi@musashino-u.ac.jp, g2150002@stu.musashino-u.ac.jp 3 Tsukuba University, Faculty of Engineering, Information and Systems, Ibaraki, Japan iwata@kz.tsukuba.ac.jp This paper introduces Art Sensorium Project that is founded in Asia AI Institute of Musashino University. A main target of the project is to design and implement a system architecture of unified art collections for virtual art experiences. To provide art Keywords: experiences, a projection-based VR system, cal ed Data museum systems, multidatabase Sensorium, is used to stage art materials in a form of real-sized systems, virtual reality. Furthermore, a system architecture of a multimedia multidatabase system for heterogeneous art col ection archives is databases, immersive image, presented, so a set of integrated art data is applied to Data projection-based Sensorium for newly generated art experiences. VR DOI https://doi.org/10.18690/um.feri.5.2023.8 ISBN 978-961-286-745-4 192 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 1 Introduction A term virtual museum has been widely discussed for a long time. A definition of virtual museums is as follows: “a col ection of digitally recorded images, sound files, text documents and other data of historical, scientific, or cultural interest that are accessed through electronic media[1].” It could include various digital archives, databases, applications, digital gadgets and so on, so applying digital technologies to the area of art seems matching to the definition. To design and to implement virtual museums, there are many technologies expected to apply. In [2], seven technologies are mentioned useful to implement virtual museums as follows: 1) High Resolution Images, 2) Web3D, 3) Virtual Reality, 4) Augmented Reality, 5) Mixed Reality, 6) Haptics, 7) Handheld Devices. In recent years, many museums have worked to construct digital archives of their art collections such as Louvre Museum[3]. In addition, some museums have published their digital data archives as open data[4,5]. These open data are provided through Web API, so many kinds of digital innovation are expected to come in the area of art. As a commercial activity, Google Arts & Culture[6] is an widely-used example that presents master pieces of art museums in forms of mobile applications or virtual reality on screen. Governmental activities are also very active recently and globally. United Kingdom has launched a national project to establish a national collection with digital technologies, and it also targets to establish innovation using data of cultural heritages[7]. In Japan, some public services have been established such as Cultural Heritage Online[8] that integrates information of cultural heritages across many museums in Japan, and Art Platform Japan[9] that provides information of contemporary Japanese artworks. The services like [6,8] provide accessibility to masterpieces of museums by integration, but a system framework to stage any artwork by integrating various digital archives is not proposed. Museums, in general, provide art exhibitions designed with knowledge, experiences and inspirations of curators to provide art experiences for visitors. The actual museums provide exhibitions in common to all visitors since exhibitions are real and N. Ishibashi et al.: Art Sensorium Project: A System Architecture of Unified Art Collections for Virtual Art Experiences 193. static. However, a virtual museum in a virtual reality environment could provide a personal exhibition with dynamic curation according to visitor’s favour. The primary purpose of the art experience in this research is to stimulate the user’s intellectual curiosity, appeal to his/her emotions, and provoke an emotional response through the provision of various art data. Furthermore, the art experience includes cross-cultural exchange, such as the visualization of different subjectivities through an environment that brings people into contact with art from all over the world, and the inspiration for the creation of new art. In this paper, we would like to introduce Art Sensorium Project that dynamically integrates multiple art col ection archives to stage art experiences in Data Sensorium. 2 Data Sensorium Data Sensorium is a conceptual framework of systems providing physical experience of content stored in database[10], and Data Sensorium consists of spatial immersive display in a form of room-like display, various sensors that detect behaviour of users, and mechanical subsystems that provide haptics. A prototype system of the Data Sensorium was implemented with four 120-inch screens and corresponding projectors, and Torus Treadmil [11] as shown in Fig.1.The Torus Treadmill is a locomotion interface that creates sense of walking. Figure 1: Data Sensorium Source: own. 194 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Art Sensorium Project Art Sensorium Project started to stage art experiences in Data Sensorium. Fig.2 shows early sketches of the project to study expected applications of Data Sensorium in the area of art as fol ows: A: Data Sensorium as Database User Interface Visitors are expected to interactively search art colections to explore artworks such as searching artworks according to an artist, a museum, a motif, etc in Data Sensorium. B: Reproduction Environment of Past Exhibitions As mentioned above, art exhibitions are intellectual product of curators to stage actual artworks in a specific space, however the exhibition disappears when the exhibition finishes. Virtual reality, especial y Data Sensorium, could be a candidate technology to restore any exhibition in the past. C: Virtual Museum with Dynamic Curation Functionalities of dynamic curation are essential to automatically generate art exhibitions, and also very challenging. Knowledge base approaches such as [12,13] or machine learning approaches are currently under discussion to realize the dynamic curation. D: Environment for Remote Participation in the Exhibition Data Sensorium could be used as a remote controller for a robot with an omnidirectional camera, and such combination could make it possible to remotely attend an actual art exhibition in Data Sensorium. N. Ishibashi et al.: Art Sensorium Project: A System Architecture of Unified Art Collections for Virtual Art Experiences 195. Figure 2: Early Sketches of Art Sensorium Project Source: own. A current collaboration scheme is shown in Fig.3. So far, a data set of Artizon Museum[14] is connected using Artizon Cloud[15], as well as the open-data of The Metropolitan Museum of Art[4] and Paris Musées[5]. Prototype systems of Data Sensorium in Musashino University as wel as Thammasat University are already implemented, and Empowerment Studio of Tsukuba University is also discussed for the connection. Figure 3: A Col aboration Scheme of Art Sensorium Project Source: own. 196 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 4 A System Architecture for Art Sensorium Project A system architecture of Art Sensorium is composed by two essential parts. Firstly, art data of each museum are integrated in a multidatabase system as Fig.4. Secondly, Data Sensorium Applications receive the integrated data to stage virtual exhibitions. Figure 4: A System Structure of Art Sensorium Project Source: own. 4.1 A Multidatabase System for Art Sensorium Project An system architecture of the multidatabase system is shown in Fig.5. There are many approaches to design and implement multidatabase systems[16,17,18,19]. However, the meta-level system approach[20,21,22,23], seems applicable for the Art Sensorium Project by following reasons: 1. Flexibility to solve heterogeneity of local database structures and their access methods is a top priority, and the simple architecture to implement the multidatabase system is very important. 2. Solving heterogeneity in data formats and languages comes as a second issue, and flexibility is again very important to solve the problem for heterogeneity among various museums. N. Ishibashi et al.: Art Sensorium Project: A System Architecture of Unified Art Collections for Virtual Art Experiences 197. 3. Semantic computing to realize the dynamic curation will be a critical issues to come, and the meta-level system approaches are observed as a good solution [12, 13,24]. Figure 5: A Multidatabase System Architecture of Art Sensorium Project Source: own. To match such requirements, local data archives are connected to the multidatabase engine through corresponding data converters. Heterogeneity in data format, such as artist names, are converted using dictionaries, and the data are stored in the integrated archive as shown in Fig.5. An implementation method of the multidatabase system is described in [25]. 4.2 Data Sensorium Applications To design and implement Data Sensorium Applications, two key aspects are involved as follows: 1. Designs and implementations of gallery floors 2. Curation functions to stage artworks in 1 As Data Sensorium Applications, two prototype applications have been implemented. Dynamic generation of the gal ery floor is quite chal enging, so these prototype systems use static gal ery floors. However, artwork data are delivered 198 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 through the multidatabase engine, so artworks are dynamicaly staged in Data Sensorium. Al these applications are implemented with Unity[26]. 4.2.1 Reproduction Environment of Past Exhibitions For a reproduction environment of a past exhibition, a floor layout of an exhibition “Inaugural Exhibition Emerging Artscape: The State of Out Collection”, that was held 18/Jan./2020-31/Mar./2020 at Artizon Museum[14], was virtually reproduced as show in Fig.6. Figure 6: An Example Floor of a Data Sensorium Application Source: own. Figure 7: Representing a Past Exhibition in the Data Sensorium Source: own. N. Ishibashi et al.: Art Sensorium Project: A System Architecture of Unified Art Collections for Virtual Art Experiences 199. A list of artworks corresponding to each wall is stored in the multidatabase system, and URLs of artwork images are transmitted to each wal as shown in Fig.7. 4.2.2 A Virtual Museum with Dynamic Curation As a prototype application of a virtual museum with dynamic curation, 10 m x 10 m a cube shaped gal ery was constructed in Unity, and 2 planes on a wal are assigned to stage each artwork as shown in Fig.8. Figure 8: A Virtual Exhibition Room Source: own Figure 9: Invisible Spheres to Detect a Visitor Source: own 200 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Since the prototype system of Data Sensorium consists of the spatial immersive display and Torus Treadmil , but it does not have any other sensor. Therefore, invisible spheres have been set according to each plane for an artwork to detect if an user is close to the artwork as shown in Fig.9. Figure 10: A Transition of Virtual Exhibition Rooms Source: own. Figure 11: An Example of a Virtual Exhibition Source: own. Once an user enters the virtual gal ery, 8 artworks are randomly selected and staged from a set of artworks that are staged in actual Artizon Museum at the same time. Then, the user takes a look on each artwork, and the most interested artwork is N. Ishibashi et al.: Art Sensorium Project: A System Architecture of Unified Art Collections for Virtual Art Experiences 201. extracted by using times spent in each invisible spheres. Furthermore, 8 newly selected artworks are selected relating to the previously selected the most interested artwork when the user reloads the gallery as shown in Fig.10. Some example screenshots of the gal ery is shown in Fig.11, and more details for the implementation method for creating the gal ery is presented in [27]. 5 Conclusion In this paper, Art Sensorium Project was introduced. The main target of the project is to design and implement the system architecture of unified art collections for virtual art experiences. The system architecture of the multidatabase system that integrates various digital art archives was proposed, as wel as Data Sensorium applications were mentioned. For the future issues, there could be many strategies for the dynamic curation. The knowledge of curators should be treated as knowledge bases for generating an exhibition, or physical/logical perspective of artworks could be computed to generate an exhibition in a form of machine learning. Sensing techniques for one’s emotion or interest is also an issue. Above all, a system architecture that provides capabilities to implement such variety of strategies for dynamic curation is strongly needed as a collaboration framework. Acknowledgment This research is founded in Asia AI Institute of Musashino University, and supported by Musashino University, JSPS KAKENHI Grant Number JP22511707, Consortium for Advanced Service Implementation Industry-Government-Academia of Tokyo Metropolitan Government, and Artizon Museum. We would like to express our sincere gratitude to al organizations above. References [1] Britannica, The Editors of Encyclopaedia: “virtual museum”, Encyclopedia Britannica, https://www. britannica.com/topic/virtual-museum. Accessed 26 January 2023. [2] Styliani, S., Fotis, L., Kostas, K. and Petros, P.: “Virtual museums, a survey and some issues for consideration”, Journal of Cultural Heritage, Vol.10, No.4, pp.520–528. (2009) [3] Musée du Louvre: Atlas database of exhibits, available via WWW, http://cartelen.louvre.fr/. Accessed 26 January 2023. [4] The Metropolitan Museum of Art: The Met Col ection, available via WWW, https://www. metmuseum.org/art/collection. Accessed 26 January 2023. 202 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 [5] Paris Museés: Les colections en ligne des museés de la Vile de Paris , available via WWW, https://www.parismuseescollections.paris.fr/. Accessed 26 January 2023. [6] Google LLC: “Google Arts & Culture”, available via WWW, https://artsandculture.google. com. Accessed 26 January 2023. [7] Arts and Humanities Ressearch Council: “Towards a National Collection”, available via WWW, https://www.nationalcollection.org.uk. (2023) [8] The Agency for Cultural Affairs: “Cultural Heritage Online”, available via WWW, https://bunka. ni .ac.jp. Accessed 26 January 2023. [9] The Bunka-cho Art Platform Japan Project: “Art Platform Japan”, available via WWW, https://artplatform.go.jp. Accessed 26 January 2023. [10] Iwata, H., Sasaki, S., Ishibashi, N., Sornlertlamvanich, V., Enzaki, Y and Kiyoki Y.: “Data Sensorium-Spatial Immersive Displays for Atmospheric Sense of Place”, Information Modelling and Knowledge Bases XXXIV, pp.247–257. (2023) [11] Iwata,H.: “The Torus Treadmil : Realizing Locomotion in VEs”, IEEE Computer Graphics and Applications, Vol.19 No.6, pp.30-35. (1999) [12] Kiyoki, Y., Sasaki, S., Nhung Nguyen Trang and Nguyen Thi Ngoc Diep: “Cross-cultural Multimedia Computing with Impression-based Semantic Spaces”, Conceptual Modelling and Its Theoretical Foundations, Lecture Notes in Computer Science, Springer, pp.316-328. (2012) [13] Itabashi, Y., Sasaki, S. and Kiyoki, Y.: “An explorative cultural-image analyzer for detection, visualization, and comparison of historical-color trends”, Information Modeling and Knowledge Bases XXVI, IOS Press, pp.152–171. (2014) [14] Artizon Museum: Artizon Museum, available via WWW, https://www.artizon.museum/en/. Accessed 26 January 2023. [15] Ishibashi, N.: “Artizon Cloud: A Multidatabase System Architecture for an Art Museum”, Information Modelling and Knowledge Bases XXXIII, IOS Press, pp.323–331. (2022) [16] Batini, C., Lenzerini, M. and Navathe, S.B.: “A comparative analysis of methodologies for database schema integration”, ACM Computing Surveys, Vol.18, No.4, pp.324–364 (1986). [17] Litwin, W., Mark, L. and Roussopoulos, N.: “Interoperability of Multiple Autonomous Databases”, ACM Comp. Surveys, Vol.22, No.3, pp.267-293 (1990). [18] Sheth, A.P. and Larson, J.A.: “Federated database systems for managing distributed, heterogeneous, and autonomous databases,” ACM Computing Surveys, Vol.22, No.3, Special issue on heterogeneous databases, pp.183–236 (1990). [19] Zhang, J.: “Classifying approaches to semantic heterogeneity in multidatabase systems,” Proceedings of the 1992 conference of the Centre for Advanced Studies on Collaborative research - Volume 2, pp.153– 173 (1992). [20] Kitagawa, T. and Kiyoki, Y.: “The mathematical model of meaning and its application to multidatabase systems,” Proc. 3rd IEEE Int. Workshop on Research Issues on Data Engineering: Interoperability in Multidatabase Systems, p.130–135 (1993). [21] Kiyoki, Y. and Kitagawa, T.: ”A metadatabase system supporting interoperability in multidatabases”, Information Modeling and Knowledge Bases, Vol.5, pp.287–298 (1993). [22] Kiyoki, Y., Kitagawa, T. and Hitomi, Y.: “A fundamental framework for realizing semantic interoperability in a multidatabase environment”, Journal of Integrated Computer-Aided Engineering, Vol.2, No.1, pp.3–20 (1995). [23] Kiyoki, Y., Hosokawa, Y. and Ishibashi, N.: “A Metadatabase System Architecture for Integrating Heterogeneous Databases with Temporal and Spatial Operations,” Advanced Database Research and Development Series Vol. 10, Advances in Multimedia and Databases for the New Century, A Swiss/Japanese Perspective, pp.158–165, World Scientific Publishing (1999). [24] Sasaki, S., Takahashi, Y. and Kiyoki, Y.: “The 4D World Map System with Semantic and Spatiotemporal Analyzers, ”Information Modelling and Knowledge Bases, Vol.XXI, IOS Press, pp.1–18, 2010. N. Ishibashi et al.: Art Sensorium Project: A System Architecture of Unified Art Collections for Virtual Art Experiences 203. [25] Tsuchiya, Y. and Ishibashi, N.: “An Implementation Method of GACA: Global Art Colection Archive”, 33rd International Conference on Information Modelling and Knowledge Bases EJC2023. (2023) (submitted) [26] Unity: available via WWW, https://unity.com/. Accessed 26 January 2023. [27] Fukuda, T. and Ishibashi N.: “Virtual Art Exhibition System: An Implementation Method for Creating an Experiential Museum System in a Virtual space”, Information Modelling and Knowledge Bases XXXIV,pp.38–47. (2023) 204 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 5 Chal enges Imposed by the Society Ločna - zamenjaj 206 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Ločna - zamenjaj THE CHANGE OF COVID-19 COVERAGE IN AMERICAN, GERMAN AND JAPANESE DAILY NEWSPAPERS: A COMPUTER-ASSISTED TEXT ANALYSIS AND COMPARISON YUKIKO SATO,1 STEFAN BRÜCKNER2 1 Sophia University, Faculty of Foreign Studies, Tokyo, Japan yukisato@sophia.ac.jp 2 Toyo University, Faculty of Business Administration, Tokyo, Japan brueckner@toyo.jp During the COVID-19 pandemic, news media fulfill a vital role of disseminating information to the public and shaping public opinion for example on governmental responses to the outbreak. Responses to the pandemic and news coverage on it varies across countries. This paper examines a random sample of newspaper articles from the German Bild, the Japanese Yomiuri Shimbun, and the American USA Today, to clarify the how these newspapers reported on COVID-19 during the initial stages of the pandemic, that is from January to March 2020. It depicts first results of comparing these three newspapers' coverage in regard to (1) which actors are mentioned, (2) which regions are depicted, Keywords: COVID-19, and (3) which themes are mentioned. The Japanese Yomiuri news media, reports more frequently on the government’s response to the cross-cultural pandemic, whereas the German Bild and American USA Today analysis, computer assisted more frequently report on how the pandemic affected the lives text analysis, of citizens and individual measures to deal with the pandemic. cultural analytics DOI https://doi.org/10.18690/um.feri.5.2023.9 ISBN 978-961-286-745-4 208 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 1 Introduction On March 11, 2020, the World Health Organization (WHO) officially designated COVID-19 as a global pandemic. At the time of this writing, there have been over 660 million confirmed cases and 6 million deaths attributed to the virus [1]. As COVID-19 has become endemic in many countries, these numbers continue to rise [2]. To control or prevent the spread of the virus, many countries adopted non-pharmaceutical interventions (NPIs) [3]. On May 18th, the WHO propagated Public Health and Social Measures (PHSM), such as wearing masks, restricting social gatherings, closing schools and businesses, limiting domestic and international travel, as well as testing and quarantine, as a global guideline for NPIs [4]. Concurrently, governments around the world introduced policies to combat the virus, while people adapted to these new rules and guidelines, or autonomously adopted new behaviors meant to protect themselves and their communities. How countries or citizens reacted to the spread of the virus and which measures they adopted varies, due to differences in the political and economic systems, laws, and culture [5]. For example, in countries such as South Korea or Japan, requirements to use face masks in public, a practice already common before the current pandemic, did meet with no noticeable resistance, whereas such requirements evoked protest in many Western countries and the use of masks quickly receded after the relaxation of guidelines [6]. Restrictions on national or international travel also varied widely, as some countries remained open to travel, whereas others demanded varying periods and forms of quarantine, testing, or vaccination, or outright banned international travel altogether [7]. During the pandemic, news media served a vital function in disseminating information on COVID-19 and related government measures to the public, but also in shaping national and international discourse on how to respond to the spread of COVID-19 [8]. The media coverage affects not only how the public understand the pandemic, it also influences the decision making processes of politicians, corporations and scientists [9]. As the highly diverse range of media outlets and channels in current society can lead to the rapid spread of false or misleading news, an “infodemic” [10], people (re)turned to traditional news channels such as TV and newspapers to receive reliable information [11]. To understand the differing responses to COVID-19 across countries, and to contrast different ways of Y. Sato, S. Brückner: The Change of COVID-19 Coverage in American, German and Japanese Daily Newspapers: A Computer-Assisted Text Analysis and Comparison 209. disseminating information during times of crisis such as the current pandemic, it is necessary to examine how the pandemic and related measures were portrayed in the media and how this differed between countries and media outlets. Previous comparative studies on the media coverage of COVID-19 tend to focus either on quantitative (monolingual) comparisons [8, 12, 13] or are narrow in their thematic scope [14-16] and do not consider changes in the coverage over time. In contrast, this study examines how COVID-19 is portrayed in the most widely circulated national newspapers in Germany, Japan, and The United States of America (see Figure 1) in the respective original language. We study the period from January to March 2020 to clarify differences between the newspaper coverage per selected newspaper and over time. Figure 1: Overview of research Source: own. First, we collected a simple random sample of newspaper articles (n=600) that include the term COVID-19 or a synonym from the German newspaper Bild, the Japanese Yomiuri Shimbun and the American USA Today for each month: January, February, and March 2020. Through a compilation of categories used in or resulting from previous studies, and close readings of the col ected data, we defined 45 categories to identify (1) who, (2) where, and (3) what is mentioned in news articles, and how this changes from January to March 2020. By comparing which actors are mentioned or cited in the newspaper articles, which localities are observed, and which topics are discussed, it becomes possible to grasp differences in national discourses on the pandemic, how it was portrayed in the 210 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 media, as wel as how, which, and whose, information and opinions on it were disseminated. This can provide insights into how differences in media coverage reflect (or affect) varying responses to the pandemic, as wel as input into discussions on how public health and policy related information should be communicated to the public. The present paper is intended as a step in a larger project towards a comparative analysis of the discourses on global crises. 2 Background As of January 2023, there are more than 5 mil ion research articles on COVID-19 indexed on Google Scholar, with more than 2 mil ion also including the term “news”. While unsurprisingly, Kousha and Thelwall [17] identify clinical and medical studies on COVID-19 as the most cited research items in the beginning of the pandemic, COVID-19 has provided an incentive for research in various fields. Researchers’ focusing on media content in specific countries have investigated how news related to the virus are framed in different types of media [12, 18-21], what kind of health, medical, and political information the media covered [22-25], the sentiment of the news [13], as well as a quantitative analysis of online news coverage through text mining, topic analysis, and sentiment analyses [8]. Others analyzed the coverage of COVID-19 in relation to specific themes, such as “tourism”, “digital contact tracing”, “residential care” or “older people” [14-16, 26]. Analyzed languages and regions include English (USA, UK, Canada, New Zealand, Australia), Chinese, German (Germany, Austria, Switzerland), Korean and Spanish. Most studies focus on one or two regions/languages, while broader comparative studies either are limited to English material or focus on a narrow topic, such as the portrayal of German chancel or Angela Merkel, and former and current Presidents of the United States Donald Trump and Joe Biden [8, 18]. Through these approaches key themes to examine the coverage on COVID-19 have become evident. For example, Hubner [21] categorized 10 news source categories by recording individuals and their organizations, along with 27 news topics, each supported by 5 to 6 keywords, in American news media. Gozzi et al. [27] compared the differences in multiple topics on Reddit and traditional media. Ophir et al. [20] presented 12 topic labels along with top 10 key words by investigating COIVD-19 in Italian media. Mach et al. [25] conduced a cross-cultural study of news on public health and Y. Sato, S. Brückner: The Change of COVID-19 Coverage in American, German and Japanese Daily Newspapers: A Computer-Assisted Text Analysis and Comparison 211. policy information by comparing 5 major topics in American, British and Canadian news media. However, while the development of labels to understand the news coverage on COVID-19 is a necessary endeavor to clarify what is reported in the news and how, these categories are usually not connected or utilized for further, in particular cross-regional comparative, research. As such, this paper reports on a comparative content analysis to clarify how topics in the COVID-19 related news coverage vary in different regions, by investigating news articles in national daily national newspapers, from Germany, Japan, and the USA in their original language. In a previous report [28], Sato shows that the threat of the virus was downplayed in the three newspapers in early January. By extending the scope of analysis until March 2022, we can examine and compare the changes in news coverage with the growing awareness of the extent of the virus’ spread. 3 Method We col ected al newspaper articles including the term COVID-19 or a synonym published in the German Bild, the Japanese Yomiuri Shimbun, and the American USA Today between January 1 to March 31, 2020. In consideration of feasibility, we then drew a simple random sample for each newspaper and month (see Table 1) for the analysis. The three newspapers were chosen to represent each region, as, at the time of data collection in April 2022, they were the most widely circulated daily national newspapers in Germany, Japan, and the USA respectively [29-31]. Data was col ected from Nexus Uni and the Yomiuri Database Service and compiled into a spreadsheet. We utilized the search query “covid OR coronavirus OR (corona AND virus)” in English and German, and “’ corona uirusu’ [in Japanese characters] OR COVID” in Japanese. Data collected includes the year, month, and day it was published, page number, section, author, title and sub-title, and finally the article’s main text. We chose the period from January to March 2020 to examine how the media covered the spread of the virus from the initial outbreak in January 2020, up until the WHO declared COVID-19 a “Global Pandemic” in March 2020. The articles were imported into the qualitative data analysis software MAXQDA. MAXQDA is a tool for conducting computer-assisted qualitative and mixed-method data analysis, that enables researchers to intuitively create, assign, organize, and 212 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 count codes and categories representing a segment of text (see Figure 2). It also provides an environment for collaboration between researchers during the coding of data. Table 1: Overview of the col ected data Newspaper Bild Yomiuri Shimbun USA Today TOTAL Country Germany Japan USA 1/2020 11 129 13 153 No. of articles 2/2020 52 801 45 898 3/2020 247 1,778 445 2,470 Total 310 2,708 503 3,521 1/2020 11 70 12 93 Random 2/2020 39 127 35 201 Sample 3/2020 94 139 113 346 Total 144 336 160 640 Figure 2: An overview of MAXQDA’s interface we utilized for this paper Source: own. Through a first round of close readings of the articles in the sample and based on a synthesis of previous studies [20, 21, 25, 27] we then developed a set of 45 categories to analyze (1) which actors (see Table 2) are mentioned in the articles, (2) which regions are discussed in the articles (see Table 3) and (3) what topics are mentioned (see Table 4). The authors, fluent in English, German and Japanese then assigned these categories to each news article in a second round of close readings. Discussion Y. Sato, S. Brückner: The Change of COVID-19 Coverage in American, German and Japanese Daily Newspapers: A Computer-Assisted Text Analysis and Comparison 213. between the authors ensured that the same criteria were used to code al articles during the analysis, revising the code system when necessary. Similar to content analysis [32], we then counted the frequency with which each category was applied to the articles, counting each category only once per article. Table 2: Overview of actor categories Actor Category Definition WHO The World Health Organization and its staff Media Media organizations Academica Researchers, scholars, and experts with affiliation to academic institutions Politicians Politicians not directly part of the government Government Government, ministries, and their staff Industry Companies, industry organizations, their staff NGOs Think tanks, public interest groups, foundations Medical Experts Persons affiliated with medical institutions Health Officials Public health agencies or institutions Sports Sport clubs, sport-related organizations (e.g., UEFA) and their staff Celebrities Celebrities, e.g., actors, singers, etc., including royalty Citizens Ordinary citizens Table 3: Overview of location categories Region Category Definition Response Reports Regional responses reported in the news articles Japan Responses in Japan USA Responses in the USA Germany Responses in Germany China Responses in China WHO Responses by the WHO Others Responses in other countries Outbreak Reports Reports on the COVID-19 outbreaks Japan Outbreak in Japan USA Outbreak in the USA Germany Outbreak in Germany China Outbreak in China Cruise Ship Outbreaks on cruise ships Others Outbreaks in other countries Table 4: Overview of topic categories Topic Category Definition Cases and deaths Infection numbers and deaths, portrayal of cases Restrictions Travel restrictions and lockdowns Political Response Responses of the government and political leaders 214 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Topic Category Definition Leaders’ Response Actions of political leaders directed at the person (e.g., Angela Merkel) Governmental Response Actions of governmental departments and staff Financial Support Governmental financial support plans and actions Medical/Health Medical handling of COVID-19 Preventing Spread (Official) Political actions to prevent COVID-19 Preventing Spread (Person al) Wearing masks, washing hands, social distancing COVID Tests Virus tests on COVID-19 Treatment Treatments of patients in hospitals and patients Research Research on virus and vaccines Role of the Media Function of the Media during the pandemic Explaining COVID Providing information on symptoms, how the virus spreads, etc. Chinese Censorship Chinese governmental control of information Information Accuracy Issues on accurate information and misinformation Social Effects Effects on the society Public Events Cancelation or restrictions on social events Work Effects on working and workplace Education Effects on education Olympics Issues regarding the Tokyo Olympics Daily Lives Effects on daily lives of the people Economic Effects Economic effects of COVID-19 Economy Effect on economy Business Effect on industry and companies Stock Markets Effect on financial markets 4 Results Below, we detail the results of our analysis. Figures 3-5 depict heat maps, based on the frequency of assigned categories per newspaper and month. The heatmaps are calculated per column, that is, red indicates a high frequency of a category within that particular newspaper and month. Overal , a higher number of articles in the Japanese Yomiuri Shimbun, particularly in January and February, reflects a greater geographical proximity to the original outbreak of the virus. Y. Sato, S. Brückner: The Change of COVID-19 Coverage in American, German and Japanese Daily Newspapers: A Computer-Assisted Text Analysis and Comparison 215. Figure 3: Heatmap depicting frequencies within the categories for “actors”, between the Bild, USA Today, and Yomiuri Shimbun, from January (1) to March (3) Source: own. Figure 4: Heatmap depicting frequencies within the categories for “regions”, between the Bild, USA Today, and Yomiuri Shimbun from January (1) to March (3) Source: own. Figure 3 depicts the frequency with which a particular actor was mentioned in the news coverage per newspaper and month. In al three newspapers, mentioning governmental institutions was most frequent in the Japanese Yomiuri Shimbun, as members of the government are often cited when reporting on the spread of the virus and possible and actual countermeasures. Politicians aside from members of the government are also frequently mentioned in the same light. While industry actors were mentioned in al three newspapers, usual y in concert with depicting the economic outfal of the pandemic, this was comparatively more frequent in the USA 216 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Today, especialy in March. In contrast, the categories “Sports” and “Celebrities” were most frequent in the German Bild, possibly indicating a stronger focus on human interest stories. Health officials are not mentioned frequently in the Bild, although academics are mentioned in a similar function to the mention of health officials in the other two newspapers, that is to provide expertise on the spread of the virus. In March 2020, the USA Today mentions the WHO comparatively frequently, regarding the designation of COVID-19 as a global pandemic. The heatmap in figure 4 depicts the frequency with which a particular region was mentioned in the news coverage of each newspaper in each month. Broadly speaking, aside from reporting on the outbreak and response within the country they are based in, each newspaper also reported on the original outbreak in China and the response of the Chinese government. The USA Today in particular mentions the outbreak in China in reports on US citizens stranded there. In comparison to the Bild and USA Today, the Japanese Yomiuri Shimbun reported more frequently on how other countries responded to COVID-19, including Germany and the USA. As citizens of the respective country were involved, the Yomiuri and Bild more frequently mentioned COVID-19 outbreaks in cruise ships. Figure 5. Heatmap depicting frequencies within the categories for “topics”, between the Bild, USA Today, and Yomiuri Shimbun from January (1) to March (3) Source: own. Y. Sato, S. Brückner: The Change of COVID-19 Coverage in American, German and Japanese Daily Newspapers: A Computer-Assisted Text Analysis and Comparison 217. Figure 5 shows the code frequencies of three newspapers in the Topic category from January to March. Common topics throughout the three periods and newspapers were reports on the number of infections and deaths, as wel as research on the virus and its effect on the economy. Restrictions, discussed and gradual y put in place over the early stages of the pandemic, were frequently mentioned in Germany, especial y in March, whereas they were not mentioned in the USA Today and comparatively less frequently in the Yomiuri Shimbun. The effects of the pandemic on work and education did not receive widespread attention until March. In comparison to the Bild, the Yomiuri and USA Today more frequently mentioned the fal out of the pandemic in respect to the overal economy, specific businesses, and the stock market. Each newspaper shows a specific tendency to focus on particular topics throughout the three months observed. The Bild frequently reported on public events and the restrictions placed on them, as wel as the daily lives of citizens during the pandemic. The USA Today less frequently mentions official efforts to prevent of the virus, but in turn, more frequently reports on how to prevent a further spread or infection through personal measures such as wearing a mask or disinfection. The Yomiuri focuses more strongly the response of the Japanese government and officially introduced methods of prevention. 5 Discussion and Conclusion This study compared the mentioned actors, localities, and topics in American, German, and Japanese newspaper articles in their respective languages aiming to clarify the differences across newspapers and over time during the beginning of the pandemic from January to March 2020. From January 2020, the German Bild, Japanese Yomiuri Shimbun, and American USA Today reported on the outbreak of COVID-19. The Yomiuri Shimbun shows a strong focus on the government's response to the outbreak and official measures to prevent the spread of the virus, whereas the USA Today and Bild appear more concerned with the effect of the pandemic on citizens’ daily lives and public events. At least in March, concrete reports on treatments of COVID-19 are comparatively less frequent in the Yomiuri Shimbun than in the Bild or USA Today. Overall, this could be an expression of a more paternalistic approach towards the virus in Japan, or at least in the Yomiuri Shimbun, than evident in the other two newspapers. In contrast to the Yomiuri Shimbun, the USA Today reports frequently on personal 218 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 measures to prevent the spread of the virus, but less frequently mentions government response to the virus. This might however also be a result of the lower overal number of COVID-19 infections and related deaths in Japan up until March 2020 when compared to the US or Germany. Furthermore, in contrast to the Bild, the USA Today and particularly the Yomiuri Shimbun frequently mention the pandemic’s effects on businesses, whereas the Bild focuses more on human interest stories. This could be interpreted as a result of the newspapers’ different readerships and journalistic approaches, with the German Bild as a tabloid focusing more on the social fallout of the pandemic, whereas particularly the Yomiuri Shimbun carters more to businesspeople. Overall, although by March 2020, cases of COVID-19 were confirmed in each country, newspapers tended to report less frequently on the concrete health hazards of the pandemic, and more frequently on the economic and societal effects of the pandemic. Furthermore, although the virus continued to spread, the frequency of articles providing the public with concrete information on how the virus spreads, what symptoms it can evoke, and how it can be treated, does not noticeably change over time. 6 Limitations, Further Work and Reflections on Methodology This paper presents selected first results of a cross-regional comparison of the media coverage on COVID-19 in Germany, Japan, and the USA, based on a random sample of articles from the period of January to March 2020. While this allows us to identify salient differences in the news coverage, it does limit the use of quantifying the results of our coding analysis. In addition, our selection of the most widely circulated newspaper in each country for inclusion in this analysis also led to a narrow sample of articles per country, as differences in journalistic approach, target readership, and political leanings between the newspapers accentuate differences in the articles. In further work, we plan to extend the analysis of articles to al articles published in 2020, and to include further newspapers for analysis to provide a comprehensive and quantitatively interpretable comparison of the news coverage on COVID-19. Our rationale for using a random sample of articles from a limited number of Y. Sato, S. Brückner: The Change of COVID-19 Coverage in American, German and Japanese Daily Newspapers: A Computer-Assisted Text Analysis and Comparison 219. newspapers as a first step was to identify salient categories in the articles and create a first system of codes and thematic categories. Further analysis of a larger corpus of data requires the use of automatic coding, for which the codes and categories established here provide a first basis. As a next step, we first plan to continue qualitative analysis, as depicted in this paper, for a larger random sample of articles published up until December 2020. This serves to further establish a system of thematic codes that enables us to efficiently grasp the articles’ contents. Based on this extended code system, we then aim to create a dictionary for automatic coding analysis, that is we compile a list of search terms linked to the codes and categories, that we automatical y assign to the overal corpus. Finally, we will conduct a qualitative in-context analysis of these automatical y coded text segments. This combines qualitative and quantitative analysis in that categories are based on human interpretation of the data, which is then used as a basis for a quantitative comparison of articles, that is however in turn also again subjected to qualitative analysis. In contrast to methods of text mining or topic model ing, this allows for a more theoretically informed and interpretable analysis of textual data and can be used in contexts aside from the analysis of newspaper articles. Acknowledgement This work was supported by JSPS KAKENHI Grant Number 21K13444. I would like to thank my research project members at Keio University, Japan. I would also like to express my gratitude to researchers and faculty members who shared their valuable insights and comments for this project. References [1] World Health Organization. WHO Coronovirus (COVID-19) Dashboard 2023 [Available from: https://covid19.who.int/. [2] Mathieu E, Ritchie H, Rodés-Guirao L, Appel C, Giattino C, Hasell J, et al. Coronavirus Pandemic (COVID-19) [Online Resource]. OurWorldInData.org2020 [Available from: https://ourworldindata.org/coronavirus. [3] European Centre for Disease Prevention and Control. Non-pharmaceutical interventions against COVID-19 2021 [updated Nov. 23 2021. Available from: https://www.ecdc.europa.eu/en/covid-19/prevention-and-control/non-pharmaceutical-interventions. [4] World Health Organization. Overview of public health and social measures in the context of COVID-19 2020 [Available from: https://apps.who.int/iris/bitstream/handle/10665/332115/WHO-2019-nCoV-PHSM_Overview-2020.1-eng.pdf. [5] Wang D, Mao Z. A comparative study of public health and social measures of COVID-19 advocated in different countries. Health Policy. 2021;125(8):957-71. 220 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 [6] Offeddu V, Yung CF, Low MSF, Tam CC. Effectiveness of masks and respirators against respiratory infections in healthcare workers: a systematic review and meta-analysis. Clinical Infectious Diseases. 2017;65(11):1934-42. [7] sherpa. Travel requirements map. Where's open, what's required? [Available from: https://apply.joinsherpa.com/map?affiliateId=sherpa&language=en-US. [8] Krawczyk K, Chelkowski T, Laydon DJ, Mishra S, Xifara D, Gibert B, et al. Quantifying online news media coverage of the COVID-19 pandemic: Text mining study and resource. Journal of medical Internet research. 2021;23(6):e28253. [9] Schwitzer G, Mudur G, Henry D, Wilson A, Goozner M, Simbra M, et al. What are the roles and responsibilities of the media in disseminating health information? PLoS Med. 2005;2(7):e215. [10] World Health Organization. Infodemic 2023 [Available from: https://www.who.int/health-topics/infodemic#tab=tab_1. [11] Ali SH, Foreman J, Tozan Y, Capasso A, Jones AM, DiClemente RJ. Trends and predictors of COVID-19 information sources and their relationship with knowledge and beliefs related to the pandemic: nationwide cross-sectional study. JMIR public health and surveillance. 2020;6(4):e21071. [12] Xu Y, Yu J, Löffelholz M. Portraying the Pandemic: Analysis of Textual-Visual Frames in German News Coverage of COVID-19 on Twitter. Journalism Practice. 2022:1-21. [13] de Melo T, Figueiredo CM. Comparing news articles and tweets about COVID-19 in Brazil: sentiment analysis and topic modeling approach. JMIR Public Health and Surveillance. 2021;7(2):e24585. [14] Amann J, Sleigh J, Vayena E. Digital contact-tracing during the Covid-19 pandemic: an analysis of newspaper coverage in Germany, Austria, and Switzerland. Plos one. 2021;16(2):e0246524. [15] Chen H, Huang X, Li Z. A content analysis of Chinese news coverage on COVID-19 and tourism. Current Issues in Tourism. 2022;25(2):198-205. [16] Allen LD, Ayalon L. “It’s pure panic”: The portrayal of residential care in American newspapers during COVID-19. The Gerontologist. 2021;61(1):86-97. [17] Kousha K, Thelwal M. COVID-19 publications: Database coverage, citations, readers, tweets, news, Facebook walls, Reddit posts. Quantitative Science Studies. 2020;1(3):1068-91. [18] Ogbodo JN, Onwe EC, Chukwu J, Nwasum CJ, Nwakpu ES, Nwankwo SU, et al. Communicating health crisis: a content analysis of global media framing of COVID-19. Health promotion perspectives. 2020;10(3):257. [19] Gabore SM. Western and Chinese media representation of Africa in COVID-19 news coverage. Asian Journal of Communication. 2020;30(5):299-316. [20] Ophir Y, Walter D, Arnon D, Lokmanoglu A, Tizzoni M, Carota J, et al. The framing of COVID-19 in Italian media and its relationship with community mobility: a mixed-method approach. Journal of Health Communication. 2021;26(3):161-73. [21] Hubner A. How did we get here? A framing and source analysis of early COVID-19 media coverage. Communication Research Reports. 2021;38(2):112-20. [22] Su Z, McDonnel D, Wen J, Kozak M, Abbas J, Šegalo S, et al. Mental health consequences of COVID-19 media coverage: the need for effective crisis communication practices. Globalization and health. 2021;17(1):1-8. [23] Basch CH, Hil yer GC, Meleo-Erwin Z, Mohlman J, Cosgrove A, Quinones N. News coverage of the COVID-19 pandemic: Missed opportunities to promote health sustaining behaviors. Infection, Disease & Health. 2020;25(3):205-9. [24] Moon H, Lee GH. Evaluation of Korean-language COVID-19–related medical information on YouTube: cross-sectional Infodemiology study. Journal of medical Internet research. 2020;22(8):e20775. [25] Mach KJ, Salas Reyes R, Pentz B, Taylor J, Costa CA, Cruz SG, et al. News media coverage of COVID-19 public health and policy information. Humanities and Social Sciences Communications. 2021;8(1). Y. Sato, S. Brückner: The Change of COVID-19 Coverage in American, German and Japanese Daily Newspapers: A Computer-Assisted Text Analysis and Comparison 221. [26] Morgan T, Wiles J, Wil iams L, Gott M. COVID-19 and the portrayal of older people in New Zealand news media. Journal of the Royal Society of New Zealand. 2021;51(sup1):S127-S42. [27] Gozzi N, Tizzani M, Starnini M, Ciulla F, Paolotti D, Panisson A, et al. Collective response to media coverage of the COVID-19 pandemic on Reddit and Wikipedia: mixed-methods analysis. Journal of Medical Internet Research. 2020;22(10):e21597. [28] Sato Y. Cross-cultural analysis of the American, German, and Japanese newspaper coverage on COVID-19. 2022 International Electronics Symposium (IES). 2022:595-600. [29] Cision Media Research. Top 10 U.S. Daily Newspapers 2019 [Available from: https://web.archive.org/web/20190722203322/https://www.cision.com/us/2019/01/top-ten-us-daily-newspapers/. [30 deutschland.de. Most read German newspapers deutschland.de2020 [Available from: https://www.deutschland.de/de/topic/wissen/ueberregionale-zeitungen. [31] The Bunka News. ABC協会新聞発行社レポート [Jpaan Audit Bureau of Circulations Report on Newspaper Publishers] 2021 [Available from: https://www.bunkanews.jp/article/237791/. [32] Bengtsson M. How to plan and perform a qualitative study using content analysis. NursingPlus open. 2016;2:8-14. 222 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 HOW TO INCORPORATE ACCESSIBILITY TO DESIGN PRINCIPLES FOR IS ARTEFACTS? JUHO-PEKKA MAKIPAA, TERO VARTIAINEN University of Vaasa, School of Technology and Innovations, Computing Sciences Department, Vassa, Finland juho-pekka.makipaa@uwasa.fi, tero.vartiainen@uwasa.fi Design principles are used to specify design knowledge and describe the aim of artefact instantiation. Accessibility research aims to create artefacts that can be used by al users. However, schemes for design principles lack the tools to define accessibility explicitly. This study proposes extensions to scheme design principles for accessibility-related design science research. We draw accessibility domain-specific characteristics from the literature to include accessibility in design principles for Human-Computer Interaction (HCI) instantiations. We extended the components of design principles with the following attributes: HCI Artefact Features; Contextual factors; Computer Input Modalities; Computer Output Media; Human Sensory Keywords: accessibility, Perception; Human Cognition; Human Functional Operations. design principles, We devised a checklist for researchers to fol ow the variations in design science accessibility. The extensions are intended to foster researchers to research, IS artefacts, incorporate accessibility in producing a more accurate information formulation of design principles. systems DOI https://doi.org/10.18690/um.feri.5.2023.10 ISBN 978-961-286-745-4 224 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 1 Introduction Accessibility is a research topic often categorized as a sub-subject of the Human-Computer Interaction (HCI) discipline [1]. Accessibility research is interdisciplinary in nature and has domain-specific characteristics that are needed to address in research. In general, accessibility-related research is attempting to identify issues in the HCI within a wide spectrum of human abilities and aims to discover solutions to information and system quality that enables users’ autonomy use of information and information technology (IT)[2]. Simply to say, in practice, accessibility aims to create artefacts that can be used by al users. Accessibility, therefore, represents the extent to which users with their variable abilities in perception, cognition, and action can interact and operate with a system without external assistance (secondary users or assistive technology) [2]. In contrast, the goal of HCI research is to attempt to build and evaluate new behavioural solutions with a focus on interactions that increase human capabilities to interact with information, technologies, and tasks [3,4]. HCI research is focusing advancing the knowledge base with descriptive knowledge by explaining human cognition, affect, and behaviour in interaction with technology. Secondly, HCI research is providing prescriptive knowledge for IT system design and human process and interaction artefacts presented in a form of design theories and/or design entities [4]. According to Adam et al., (2021) [3:4], Design Science Research (DSR) can support three modes of HCI research: (1) ‘how to construct an HCI artefact for a given problem space.’ (2) ‘how individuals use the artefact in its environment,’ and (3) ‘building and evaluating novel composite solutions that improve synergies between technologies and human behaviour’. Mäkipää et al., (2022) [2], identified four domains in IT artefact development, the factors within them, and their roles and actions that influence the realization of accessibility. The domains are (1) user, (2) management (3) developers, and (4) features of IT artefact. The factors within these domains and the relationships between the domains should al be concerned to ensure the realization of accessibility. However, accessibility research barely uses design science as a research method even though it is promising for HCI research [5]. J.-P. Makipaa, T. Vartiainen: How to Incorporate Accessibility to Design Principles for is Artefacts? 225. Gregor et al., (2020) [6] derived a schema for researchers to specify a design principle for IT-based artefacts. The schema aids researchers to formulate design principles components and define who is implementer, in what context, by what mechanisms, for what purpose, for whom, and why the instantiation is intended. The components include aim, implementer, user, context, mechanism, and rational. It clarifies the general role of the actors (implementer, users, and enactors) who are involved with the use of the design principle. In psychology and cognitive science, a schema is defined as a concept that describes a pattern of thought or behaviour that organizes categories of information and the relationships among them [7] . Schema is, in psychology, an internal model of the mental structure from the real world. People organize information into schemes and schema is used to understand added information. For example, a builder of an artefact (IT developer, researcher etc.) has a schema about the user, that is, an idea of what the user is like. This schema, however, al ows the builder to identify different users as the same user type (c.f. user groups which are categorized based on certain characteristics). The schema also includes activities such as how artefacts is used by users i.e. designers' assumptions. These assumptions are needed to convert to realization by observing real-world interaction behaviour of users. In accessibility research, this means that we need to focus on user abilities. However, user ability is a variable that depends on the individuals. The nature of human abilities, severity, and their mixture is complex. Moreover, due to assistive technology, potential accessibility barriers become even more complex to understand [8]. Gregor et al., (2020) [6] addressed the lack of ‘people aspect’ of design principles and devised a design theory to make design principles more understandable and useful in real world design contexts. Accessibility is therefore also one criteria of reusability of design principles [9]. Addressing the lack of a more accurate description of the attributes of the components in the design principle scheme would enable accessibility researchers to incorporate accessibility to design principles. Therefore, we addressed this issue by asking: How to incorporate accessibility to design principles for IS artefacts? In this paper, we continue the work and propose an extension to the scheme presented by Gregor et al., (2020) [6]. The goal of this study is to extend the scheme for design principles with attributes of human aspect factors in use of HCI artefact including: HCI artefact features; variables of the context; mechanisms in HCI; and variables in user abilities. We draw upon theories related to the components of design principles and accessibility domain specific characteristics that should be 226 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 included and addressed in design principles for HCI instantiations. This paper is organized as follows. The next section describes the theoretical foundation of DRS as wel as DSR in HCI research. Then, we present research methods. Final y, we propose the next steps to complement and justify the extensions. 2 Theoretical Foundation 2.1 Design Science Research The goal of DSR is to generate prescriptive knowledge about the design of IS artefacts like software, methods, models, and concepts [10]. The design of the artefact, its precise definition and the evaluation of its usefulness are the most central issues of DSR [11]. Design research differs from design in general by focusing more on generating and developing new knowledge, while design in general focuses on using existing knowledge [12]. Therefore, the design must combine behavioural and organizational theories to develop an understanding of business problems, context, solutions, and evaluation methods [11]. For strategies to be implemented in the business infrastructure effectively, it requires organizational designing activities as well as information system designing activities [11]. These design activities are interdependent and they reflect the most central research subjects in the field of IS. To be more precise, design activities shows the relationship of business strategies and IT strategies to the infrastructures of the organization and information systems [11]. The design activities contain a sequence of activities that produces an innovative product, i.e. an artefact. The evaluation of the artefact produces feedback, based on which both the design process and the artefact and their quality can be developed. This type of iteration between build and review is typical before the final version of the artefact is complete. In design science, one contradiction must be accepted. Design means both process and product (artefact) – in other words, design means both doing and a thing [13]. Researchers must therefore consider both the design process and the artefact itself as part of the research [11]. March and Smith (1995) [14], indicated two types of design activities (construction and evaluation), and four types of artefacts (constructs, models, methods, and instantiations) produced by design scientists in IS studies. Construction refers to the process of constructing an artefact for a specific purpose. Construction is guided by the question of whether the artefact is feasible. Thus, the J.-P. Makipaa, T. Vartiainen: How to Incorporate Accessibility to Design Principles for is Artefacts? 227. artefact itself becomes a research object that must be evaluated scientifical y [14]. Evaluation is the process of deciding how well an artefact performs its task [14]. Evaluation requires developing metrics and measuring the artefact against those metrics. Metrics also determine what the artefact is trying to achieve. If the metrics have not been defined or the testing was not successful, it is impossible to scientifically prove the usefulness of the artefact. Hevner et al. (2004) [11] further emphasized that, it is important to separate routine design work from design research. Routine design refers to the utilization of existing knowledge in the solution of the organization's known problems [15]. The key distinction between routine design and design science research is that it is precisely recognized what contribution the research makes to current knowledge, both in terms of basic knowledge and the methodological part [15]. Maedche et al., (2021) [12] proposed a reference framework for design research activities to help researchers position their own work and justify the type of contribution they want to make. The framework includes two dimensions between which design-oriented research varies. The first dimension includes the researchers' explanation of their contribution to current knowledge and tel s whether the explanation is prescriptive or descriptive. The second dimension comprises the role of researchers in relation to the artefact and shows whether researchers are creating a new artefact (Creation) or examining an existing artefact (Observation) [12]. 2.2 Design Science in Human Computer Interaction Research HCI research focus on producing information about how people interact with information, technology, and tasks [3,4]. Design research and HCI research streams can be seen inherently related and highly overlapping [4] . The knowledge produced in HCI research can be classified as either descriptive knowledge aimed at explaining human behaviour and cognition with technology or as prescriptive knowledge aimed at guiding how IT systems should be constructed [3] . Adam et al., (2021) [3] presented three modes that DSR can focus in HCI. First, they called ’interior mode’ as such research that focuses on IT system design technical y and aim to solve problems on how to build and design an interface that enhance human performance. These HCI artefact constitutes constructs, model, methods, and instantiations for an interface design. Second, ‘exterior mode’ focuses on the use 228 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 of artefact in its environment. Researchers focus on how individuals use the artefact by observing and analysing existing real-world use cases. Researchers primarily evaluate how effectively users interact with the IT system interface basing the observations to qualitative and/or quantitative evidence to produce both prescriptive and descriptive knowledge around human behaviours [3]. Third, some research projects that integrate both interior and exterior modes Adam et al., (2021) [3] cal ed ‘gestalt mode’. Gestalt mode type of research focuses on synergistic design of human behaviour and IT systems to improve human performance. In such projects selected evaluation methods should cover both systems performance and human performance so that the improvements to the HCI application can be justified. These types of research projects contribute to guiding design theories to achieve synergies between people and systems between socio-technical systems and technical components [3]. Hevner and Zhang, (2011) [4] indicated that it is crucial to identify what constitute an HCI artefact in design research. They categorized examples of HCI artefacts within DSR artefact types: construct, model, method, or instantiations [14]. Constructs in HCI are defined as ‘vocabulary and symbols used to define design problems and solutions that provide a means to represent design ideas’ [4:58]. Examples of construct-type HCI artefacts included metaphors, constructs of interaction, visualization, and organization (layouts of HCI). Models in HCI are ‘…sensemaking arrangements of constructs that al ow exploration of abstract design’ [4:58]. Examples of these type of HCI artefacts are such as graphical models, card stacks, 3D models, cognitive maps, etc. [4] . Methods-type HCI artefacts are defined as ‘processes that provide guidance on how to solve problems and exploit opportunities’ [4:58]. Examples of these types of HCI artefacts are wel -established participatory design, collaboration processes, human-centred design, and value sensitive design [4]. Lastly, instantiations in HCI represents the ‘implementation of an artefact in a working system,’ ‘demonstrates feasibility and value,’ or ‘provides ability to study uses and impacts on embedded system’ [4:58]. Instantiation-type of artefacts in HCI are websites, user interfaces, input/output devices, avatars, etc. [4]. J.-P. Makipaa, T. Vartiainen: How to Incorporate Accessibility to Design Principles for is Artefacts? 229. 2.3 Accessibility Domain Specific Characteristics in Design Activities The domain of accessibility contains basical y three points of knowledge of accessibility that can be considered as field specific characteristics that are required for successfully design an accessible HCI artefact: (1) assumptions about users’ abilities, that is developers should consider human senses one by one and assume that users may lack one of abilities; (2) users’ actual need, that is developers should elicit users’ requirements related to task and context of HCI artefact that is a target of development. (3) factors in value chain that are related to management and development of the artefact and have influence to accessibility [2]. Assumptions about users’ abilities contain the mindset that user lack some human abilities therefore multimodal interaction should be provided. Users’ actual needs are detected in the collaboration with users. Collaboration with users is the process of planning a partnership. Since its introduction, the method has been adapted and extended in the field of HCI. Similarly, the participatory design approach consists of a set of theories, practices, and studies related to end-user participation in technology development and design [16]. User participation and experimental research are also getting increasingly important in IS research to study decision-making processes and user behaviour [17] . Overall, user participation as an approach contains several methods that can be used in various parts of the value chain such as brainstorming, direct observation, activity diaries, cultural probes, surveys and questionnaires, interviews, group discussion, empathic modelling, user trials, scenarios and personas, prototyping, cooperative and participatory design, etc. [16]. To achieve a diverse view of users' needs, user participation should be including users with different disabilities as a representative. However, some of the user participation methods have limitations to adopted into user requirements elicitation with certain users [16]. From these methods and techniques direct observation, scenarios, personas, and prototyping are evaluated to be appropriate as such for use with user groups with motion, vision, hearing, or cognitive and communication disabilities [16]. However, the information derived from observation, scenarios, and personas is mostly produced by the researcher, which means that only the use of a prototype can be classified as one that produces user-oriented information and can be used with users with disabilities without adjustment. Factors in the value chain refer to the accessibility implementation process that includes different stakeholders and their input to the realisation of accessibility. 230 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 3 Design Science Methodology We designed our study based on design science research process model by Peffers et al. (2007) [18] (see Figure 1.). Present study focus on extending a scheme for design principles proposed by Gregor et al. (2020) [6]. Thus, we started with the objective-centred solution entry point [18]. We aimed to improve [15] existing scheme [6] with the relevance that makes the scheme more accurate and adaptable for accessibility-related DSR. Therefore, we first defined objectives of a solution. Figure 1: Research Process Model Source: Adopted from [18]. To define objectives, we performed a literature search to draw accessibility domain specific characteristics that should be included and addressed in design principles for HCI instantiations. We adopted kernel theories such as the International Classification of Functioning, Disability and Health (ICF) agreed upon by the World Health Assembly [19], which helped us to identify variations in human abilities. ICF is commonly used by disability experts in governments and other sectors [20]. Then, we based the search within the lens of the following components of the design principle scheme [6]: instantiation, context, mechanism, enactor, and user. We reasoned and drawn attributes related to the interaction with an HCI artefact. In this paper, we conducted three first step in design science research process: problem identification; definition of objectives; and design and development [18]. We adopted existing knowledge [11], and the construction of the first version of the extensions. We also included the communication phase to the present study as we plan to develop the first proposal based on peer-reviewing process [18]. In Demonstration, and Evaluation phase, we wil apply and demonstrate the results J.-P. Makipaa, T. Vartiainen: How to Incorporate Accessibility to Design Principles for is Artefacts? 231. with focus group including accessibility researchers and evaluate the feasibility by interviewing accessibility researchers. 4 Theoretical Extensions to the Scheme for Formulating Design Principles in Accessibility-Related DSR In this section we il ustrate the extensions to a scheme for design principles. The extensions are intended for accessibility-related research to incorporate accessibility to produce more accurate design principles. Figure 2. illustrate the components of design principles [6] and indicates our proposed attributes of these components that should be specified in a case of accessibility-related research. Figure 2: Extensions for Components for Design Principles Source: Area inside the dotted line is adopted from [6]. The knowledge i.e. building blocks are draw from the following kernel theories: Studies [21,22] related to HCI artefact features; [23–27] related to Contextual factors; [6,28] related to Computer Input Modalities; [28] related to Computer Output Media; [20,28,29] related to Human Sensory Perception; and [29–33] related to Human Cognition. Component numbers four and seven in the original scheme for design principles [6] are extended with the considerations of assistive technology as one part of enactors, and “Improved Access” as a goal of design principles related to accessibility. As follow, extensions for the components for design principles are described: (1) HCI Artefact Features: HCI artefact instantiations such as websites and user interfaces can be sorted into specific features, where users interact with the artefact trough the content, presentation style, functionality, interaction style, and structure 232 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 [21,22]. Users construct their own conceptual version of the nature of an artefact with their personal judgment. This judgment is influenced by emotional consequences such as pleasure, satisfaction, etc. as wel as behavioural consequences, such as the time spend with artefact [21]. (2) Contextual factors: The context of use can affect users' abilities. The context of use may vary due to environmental factors, including users’ emotional state, sociocultural factors, socio-technical factors, whereby cultural, political, sociological, and historical aspect of context [23–26]. Moreover, user expectations of artefact behaviour often rely on past experiences, prejudice, evoked memories, unmet expectation, and conviction that strongly influences how users perceive and experience the accessibility [27]. Furthermore, the expectations are related to the history of context and the emotional state. (3) Computer Input Modalities: Mechanisms, such as acts, activities, and processes [6] in a case of HCI artefact relate to mechanisms how user interact with the HCI artefact. Referring to basic model of HCI by [28], mechanisms include modalities that user can provide input for HCI artefact: movements, force, sound, images [28]. (4) Computer Output Media: After the computer has received input from a user, it processes the data and provide an output for the user with modalities such as visual, auditive, tactile, olfactory, gustatory, or vestibular [28]. (5) Human Sensory Perception: Human sensory perceptions can differ in terms of abilities in sight, hearing, touch, smell, taste, and balance [20,28,29]. (6) Human Cognition: Cognitive ability are different for everyone [32,33]. It is therefore necessary to consider each specific cognitive deficit rather than considering cognitive matters as a whole [33]. Cognitive abilities includes possible variations in focusing attention, memory, thinking and speed of processing, reading and writing, mental functions of language, calculating and quantitative knowledge, solving problems, making decisions and reaction speed, psychomotor functions and sequencing complex movements and speed, emotional functions, perceptual functions, higher-level cognitive functions and domain specific knowledge, experience of self and time functions, and comprehension-knowledge [29–31]. The J.-P. Makipaa, T. Vartiainen: How to Incorporate Accessibility to Design Principles for is Artefacts? 233. awareness of individuals’ cognitive abilities to perform tasks in HCI artefact and the adoption of this knowledge into the design activities are crucial for creating a successful interaction. (7) Human Functional Operations: Human outputs for HCI artefacts , such as typing with a keyboard and using pointing devices, touch screens and others, require at least one human functional ability [34]. Human functional abilities can be classified as follows: voice and speech functions (voice functions, articulation functions, fluency and rhythm of speech functions, alternative vocalisation functions) and neuromusculoskeletal and movement-related functions (functions of the joint and bones, muscle functions, movement functions) [29]. As the interaction with HCI artefact can also be considered social interactions [35], factors related to human abilities for social interaction, such as abilities for interpersonal interactions, relationships, and communication (receiving and producing, conversation, and use of communication devices and techniques) should be considered in designing for accessibility [29]. Summing up the extensions, we devised demonstration of a checklist for researchers to incorporate accessibility in design principles (Table 1.). 234 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Table 1: A Checklist to Incorporate Accessibility in Design Principles HCI Artefact Computer Human Sensory Human Features Context Computer Input Modalities Output Functional Media Perception Cognition Operation What feature are you What are contextual In what way the How is the With what sense What human cognitive abilities How the user addressing? factors influencing computer will take information the user receives are addressed? performs your target? the input from user? presented? the information? the action? Content Environmental Movements Text Sight Focusing attention Movement Presentational style User’s emotional Force Image Hearing Memory Voice Functionality state Sound Video Touch Thinking and speed of Sight Interactional style Socio-cultural Images Graphs Smell processing Structure Socio-technical other… Tables Taste Reading and writing other… Cultural Sound Balance Mental functions of language Political other… Calculating and quantitative Sociological knowledge Historical Solving problems other… Making decisions and reaction speed Psychomotor functions and sequencing complex movements and speed Emotional functions Perceptual functions Higher-level cognitive functions and domain specific knowledge Experience of self and time functions Comprehension-knowledge J.-P. Makipaa, T. Vartiainen: How to Incorporate Accessibility to Design Principles for is Artefacts? 235. The checklist in Table 1. is not, however, comprehensive. The checklist does not separate different severity levels in human abilities. Moreover, all variables in computer input modalities, computer output media, and in contexts are not presented. However, the checklist is intended for researchers in accessibility-related research to incorporate a wide aspect of accessibility and improve accuracy by identifying attributes relate to context, human abilities, and interaction. For example, studies exploring IT use of blind individuals often ignore the fact that these same individuals may have variations in cognitive abilities. The checklist therefore helps to specify more accurately what human factors the design principles intend to cover. 5 Conclusion and Next Phases In this paper, we presented a tentative il ustration of the extensions to a scheme for design principles. Our extensions are intended for accessibility-related research to incorporate accessibility to produce a more accurate formulation of design principles. We aim to contribute improvements to a scheme for design principles presented by [6] so that they are more adaptable to accessibility-related DSR. We provided seven attributes to extend the components of design principles and devised a checklist for researchers to incorporate accessibility in design principles. We conducted three first step of design science research process: problem identification; definition of objectives; and design and development. Next, we will conduct the evaluation of proposed extensions. We wil apply the Demonstration, and Evaluation phase [18], and include accessibility researchers to evaluate the usefulness of the extensions. References [1] Lewthwaite S, Sloan D. Exploring pedagogical culture for accessibility education in computing science, in: Proceedings of the 13th International Web for Al Conference, Association for Computing Machinery, New York, NY, USA, 2016: pp. 1–4. doi:10.1145/2899475.2899490. [2] Mäkipää J-P, Norrgård J, Vartiainen T. Factors Affecting the Accessibility of IT Artifacts: A Systematic Review, CAIS. 51 (2022) 666–702. doi:10.17705/1CAIS.05129. [3] Adam M, Gregor S, Hevner A, Morana S. Design Science Research Modes in Human-Computer Interaction Projects, THCI. (2021) 1–11. doi:10.17705/1thci.00139. [4] Hevner A, Zhang P. Introduction to the AIS THCI Special Issue on Design Research in Human-Computer Interaction, THCI. 3 (2011) 56–61. doi:10.17705/1thci.00026. [5] Mack K, McDonnel E, Jain D, Lu Wang L, Froehlich J. E, Findlater L. What Do We Mean by “Accessibility Research”? A Literature Survey of Accessibility Papers in CHI and ASSETS from 1994 to 2019, in: Proceedings of the 2021 CHI Conference on Human Factors in Computing 236 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Systems, Association for Computing Machinery, New York, NY, USA, 2021: pp. 1–18. http://doi.org/10.1145/3411764.3445412 (accessed January 8, 2022). [6] Gregor S, Chandra Kruse L, Seidel S. The Anatomy of a Design Principle, Journal of the Association for Information Systems. 21 (2020) 1622–1652. doi:10.17705/1jais.00649. [7] DiMaggio P. Culture and Cognition, Annual Review of Sociology. 23 (1997) 263–287. doi:10.1146/annurev.soc.23.1.263. [8] Vollenwyder B, Iten G.H, Brühlmann F, Opwis K, Mekler E.D. Salient beliefs influencing the intention to consider Web Accessibility, Computers in Human Behavior. 92 (2019) 352–360. doi:10.1016/j.chb.2018.11.016. [9] Iivari J, Hansen M.R.P, Haj-Bolouri A. A proposal for minimum reusability evaluation of design principles, European Journal of Information Systems. 30 (2021) 286–303. doi:10.1080/0960085X.2020.1793697. [10] vom Brocke J, Maedche A. The DSR grid: six core dimensions for effectively planning and communicating design science research projects, Electron Markets. 29 (2019) 379–385. doi:10.1007/s12525-019-00358-7. [11] Hevner A, March S.T, Park J, Ram S. Design Science in Information Systems Research, MIS Quarterly. 28 (2004) 75–105. [12] Maedche A, Gregor S, Parsons J. Mapping Design Contributions in Information Systems Research: The Design Research Activity Framework, CAIS. 49 (2021) 355–378. doi:10.17705/1CAIS.04914. [13] Wal s J.G, Widmeyer G.R, El Sawy O.A. Building an Information System Design Theory for Vigilant EIS, Information Systems Research. 3 (1992) 36–59. doi:10.1287/isre.3.1.36. [14] March S.T, and Smith G.F. Design and natural science research on information technology, Decision Support Systems. 15 (1995) 251–266. doi:10.1016/0167-9236(94)00041-2. [15] Gregor S, Hevner A. Positioning and Presenting Design Science Research for Maximum Impact, MIS Quarterly. 37 (2013) 337–355. [16] Stephanidis C. ed., The Universal Access Handbook, CRC Press, Boca Raton, 2009. doi:10.1201/9781420064995. [17] Greif-Winzrieth A, Maedche A, Weinhardt C. Designing a Public Experimental Terminal for Citizen Engagement, (2021) 11. [18] Peffers K, Tuunanen T, Rothenberger M.A, Chatterjee S. A Design Science Research Methodology for Information Systems Research, Journal of Management Information Systems. 24 (2007) 45–77. doi:10.2753/MIS0742-1222240302. [19] World Health Organization. Towards a Common Language for Functioning, Disability and Health ICF. (WHO/EIP/GPE/CAS/01.3), (2002). https://www.who.int/classifications/icf/icfbeginnersguide.pdf?ua=1. [20] World Health Organization. How to use the ICF: A practical manual for using the International Classification of Functioning, Disability and Health (ICF), (2013). [21] Hassenzahl M. The Thing and I: Understanding the Relationship Between User and Product, in: M.A. Blythe, K. Overbeeke, A.F. Monk, and P.C. Wright (Eds.), Funology, Springer Netherlands, Dordrecht, 2003: pp. 31–42. doi:10.1007/1-4020-2967-5_4. [22] W3C, Web Content Accessibility Guidelines (WCAG) 2.1, (2018). https://www.w3.org/TR/WCAG21/ (accessed June 14, 2020). [23] Lyytinen K, Newman M. Explaining information systems change: a punctuated socio-technical change model, European Journal of Information Systems. 17 (2008) 589–613. doi:10.1057/ejis.2008.50. [24] McKay J, Marshal P, Hirschheim R. The Design Construct in Information Systems Design Science, Journal of Information Technology. 27 (2012) 125–139. doi:10.1057/jit.2012.5. [25] Meiselwitz G, Wentz B, Lazar J. Universal Usability: Past, Present, and Future., Foundations and Trends in Human-Computer Interaction. 3 (2010) 213–333. [26] Sharp H, Lotz N, Mbayi-Kwelagobe L, Woodroffe M, Rajah D, Turugare R. Socio-cultural factors and capacity building in Interaction Design: Results of a video diary study in Botswana, J.-P. Makipaa, T. Vartiainen: How to Incorporate Accessibility to Design Principles for is Artefacts? 237. International Journal of Human-Computer Studies. 135 (2020) 102375. doi:10.1016/j.ijhcs.2019.102375. [27] Aizpurua A, Arrue M, Vigo M. Prejudices, memories, expectations and confidence influence experienced accessibility on the Web, Computers in Human Behavior. 51 (2015) 152–160. doi:10.1016/j.chb.2015.04.035. [28] Schomaker L, Hartung K. A Taxonomy of Multimodal Interaction in the Human Information Processing System, Rep. Esprit Proj. 8579 (1995). [29] WHO, ICF Browser, (2022). https://apps.who.int/classifications/icfbrowser/ (accessed March 24, 2021). [30] J. Carrol . Human-computer interaction: Psychology as a science of design, Annual Review of Psychology. 48 (1997) 61–83. doi:10.1146/annurev.psych.48.1.61. [31] McGrew K.S. CHC theory and the human cognitive abilities project: Standing on the shoulders of the giants of psychometric intel igence research, Intel igence. 37 (2009) 1–10. doi:10.1016/j.intell.2008.08.004. [32] Berget G, Mulvey F, Sandnes F.E. Is visual content in textual search interfaces beneficial to dyslexic users?, International Journal of Human-Computer Studies. 92–93 (2016) 17–29. doi:10.1016/j.ijhcs.2016.04.006. [33] Sevilla J, Herrera G, Martínez B, Alcantud F. Web accessibility for individuals with cognitive deficits: A comparative study between an existing commercial Web and its cognitively accessible equivalent, ACM Trans. Comput.-Hum. Interact. 14 (2007) 12-es. doi:10.1145/1279700.1279702. [34] Carroll J.B. Human Cognitive Abilities: A Survey of Factor-Analytic Studies, Cambridge University Press, Cambridge, 1993. doi:10.1017/CBO9780511571312. [35] Lee K.M, Nass C. Designing social presence of social actors in human computer interaction, in: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, New York, NY, USA, 2003: pp. 289–296. doi:10.1145/642611.642662 238 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 THAMMASAT AI CITY DISTRIBUTED PLATFORM AND ITS VALIDATION IN SOCIAL DISTRIBUTION AND AMBIENT LIGHTING VIRACH SORNLERTLAMVANICH,1, 2 SOMRUDEE DEEPAISARN,1 THATSANEE CHAROENPORN3 1 Musashino University, Faculty of Data Science, Asia AI Institute (AAII), Tokyo, Japan. virach@musashino-u.ac.jp, thatsane@musashino-u.ac.jp 2 Thammasat University, Faculty of Engineering, Pathumthani, Thailand virach@musashino-u.ac.jp 3 Thammasat University, Sirindhorn International Institute of Technology, Pathumthani, Thailand somrudee@si t.tu.ac.th The Thammasat AI City distributed platform is a proposed AI platform designed to enhance city intel igent management. It addresses the limitations of current smart city architecture by incorporating cross-domain data connectivity and machine learning to support comprehensive data collection. The validation of the AI City in this study focuses on two main areas: monitoring and visualization of city ambient lighting, and indoor human physical distance tracking. The smart street light monitoring system provides real-time visualization of street lighting status, energy consumption, and maintenance needs, which helps to optimize energy usage and reduce maintenance Keywords: costs. The indoor camera-based system for human physical AI City, distance tracking can be used in public spaces to monitor social smart city, distancing and ensure public safety. The overal goal of the social distribution, city ambient platform is to improve the quality of life in urban areas and align lighting, with sustainable urban development concepts. AI platform DOI https://doi.org/10.18690/um.feri.5.2023.11 ISBN 978-961-286-745-4 240 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 1 Introduction As cities continue to grow and become more connected, there is an increasing need for advanced technologies such as artificial intel igence (AI) to manage and optimize the complex systems that make up a modern city. One area where AI can have a significant impact is in the field of smart cities, where AI can be applied to improve the quality of life for residents and make cities more efficient and sustainable. As technology advances and the cost of sensors and network devices decreases, the concept of the Internet of Things (IoT) is becoming more feasible. Cities are able to use digital infrastructure and high-speed communication to connect various devices and systems, allowing for greater data collection and analysis, and improved efficiency and decision-making. This is the place where artificial intelligence and machine learning come into play to maximize data’s value. Machine learning can be used to process the large amounts of data generated by connected devices in the IoT which can lead to more efficient and effective decision-making. Machine learning can also be used to optimize the performance and energy efficiency of connected devices and systems, and to improve the accuracy and reliability of their data. In conjunction with the data streaming from various possible sources including IOT devices, to a platform and the analytic results from machine learning, the data-driven artificial intelligence is well-suited to form the analytical foundation of the AI City. Thammasat AI City initiative is an determined program that aims to establish a resilient AI platform at the Rangsit campus of Thammasat University. The initiative focuses on four key domains, including elderly and healthcare, mobility, agriculture, and the environment and it is designed to identify the opportunities and chal enges of AI disruption and to create a role model for the ful activation of data and physical availability. The Rangsit campus location is an ideal setting for the project as it allows for testing AI solutions in a real-world scenario and involving multiple stakeholders. The initiative aligns with the societal changes and technology trends that have emerged in the wake of the COVID-19 pandemic, namely distributed city, human traceability, new reality, home-office integration, contactless technology, digital lending, and frugal innovation. The goal of the initiative is to create a model for a smart city that is efficient, sustainable, and responsive to the needs of its citizens. V. Sornlertlamvanich, S. Deepaisarn, T. Charoenporn: Thammasat AI City Distributed Platform and Its Validation in Social Distribution and Ambient Lighting 241. The remainder of the paper is organized as fol ows. Section 2 discusses the issues in the challenges of urbanization. Section 3 explains the architecture and design of the AI City initiatives. Section 4 elaborates on the AI City domain specific connectivity with the proposed technologies. And the conclusion in Section 5. 2 The Chal enges of Urbanization Thammasat University, Rangsit Campus is one of the leading universities in Thailand, known for its research and education programs in various fields including agriculture, economy, politics and science. The Rangsit campus is situated in Pathum Thani province, a city near Bangkok, where covers an area of 1,526 square kilometers. The city has a population of 985,643 according to a 2020 report of the National Statistical Office of Thailand. It serves as an important hub for higher education, hosting ten renowned universities, Thailand Science Park and seven mega economic areas which includes shopping mal s and agricultural markets. And 35.11 percent of the total land area are the agriculture area. Similar to many cities in Thailand, Pathum Thani Province is facing a rising population situation. This increase in population is accompanied by a number of urbanization problems, such as insufficient elderly care facilities, environmental deterioration, and traffic congestion. These issues can have a negative impact on the overal quality of life for the inhabitants of the province. With the reference of the Pathum Thani Plan 2018 – 2022 of The Pathum Thani Provincial Office (2018) [1], Pathum Thani Urbanization can lead to increased problems and chal enges including traffic congestion and road safety problems, environmental problems, economic and tourism problems and lifestyle problems. To address their chal enges, it is important for the local government and community to work together to develop sustainable solutions that balance the needs of the growing population with the preservation of the environment and quality of life for residents. This may include efforts to improve public transportation, promote sustainable development, and increase access to healthcare and other services. Additional y, encouraging the creating an efficient solid waste management system, developing of the capabilities of the target industries, social development and basic security of the people can help to mitigate the negative impacts of urbanization on the environment and economy. 242 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 3 AI City Initiatives The Thammasat University's AI City networking in RUN project is an initiative to model AI capacity on a city scale within the Rangsit campus, which is 2.8112 square kilometers in size. The project aims to address the current limitations of AI research caused by the insufficiency and diversity of data. Reliable and connected data wil be col ected and made available to demonstrate AI's capabilities in real-life applications fully. The project functions as a based-platform [2] for four high-impact domains in Rangsit city, including healthcare, environment, mobility, and agriculture. The project is equipped with various AI-enabled devices namely healthcare monitoring devices [3], noninvasive bed sensors [4], environmental sensors, video analytics cameras, street lights, indoor tracking devices [5], and drones for aerial photography. Figure 1. below depicts the project’s architecture and its domain-specific connectivity. Figure 1: AI City architecture and domain specific connectivity Source: own. In order to make the data from several sources available for modeling, the data is stored and sent to the cloud services using low energy mesh network (6LoWPAN protocol in case of smart lighting, Bluetooth low energy (BLE) in case of indoor positioning and bed sensor, etc.). To reduce the high bandwidth consumption devices such as video streaming of surveil ance cameras, LAN connectivity, and V. Sornlertlamvanich, S. Deepaisarn, T. Charoenporn: Thammasat AI City Distributed Platform and Its Validation in Social Distribution and Ambient Lighting 243. several techniques1 (steady state at rest, motion detection, etc.) are introduced. In addition, bed sensors for elderly care systems, the detection of types of on-bed position is localized not only to realize the real-time warning but also to conserve the bandwidth by sending the compressed results to the cloud. The AI City Project is a comprehensive approach to integrate AI into city operations and infrastructure through four main layers: the accumulation layer, the knowledge layer, the understanding layer, and the decision-making layer, as illustrated in Figure 2. Data from IoT devices is analyzed and connected to produce models and prediction results in four targeted domains. The primary layer is the layer of accumulating physical raw data through various sensor network devices as the foundation for the subsequent layers. The next layer is data extraction into knowledge and once the knowledge has been created with new coming data, we can make good understanding and provide good decision at the final layer by the advance of deep learning, neural network and machine learning. Model training for specific tasks is conducted in the understanding layer. The appropriate machine learning paradigms are introduced and evaluated to produce the results in the decision-making layer. The connectivity and selection of the data from various sources are crucial to implementing in the city-scale AI platform development. The platform composes of health & aging platform, environment platform, mobility platform and economy platform [6]. 3.1 Smart Health and Aging Platform As Thailand becomes a completely aging society, the project aims to address the growing burden of caring for the elderly. It seeks to improve the quality of life and access to health services for communities near Thammasat University, focusing on fice areas: social, activity, health, medicine, and sleep condition. The project involves the development of a technology-based health platform for the aging population, consisting of five main components. Those are the sleep quality analysis system, the medicine identification system, the indoor positioning & services system, the daily health monitoring system, and the elderly follow-up and care robotic system. 1 https://info.verkada.com/surveil ance-features/bandwidth/ 244 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 3.2 Smart Mobility Focusing in the Thammasat University, Rangsit Campus and Pathum Thani Province, this platform aims to enhance mobility, reduce air pollution, improve safety and boost tourism by analyzing travel patterns and providing in-depth insights. The platform consists of the intelligent traffic data acquisition, the car sharing service system, the crime reporting and monitoring system. 3.3 Smart Environment Platform The smart environment platform is aimed at addressing environmental issues and fulfilling the goal of smart city development. The platform leverages IoT technologies and AI systems to monitor the urban environment. The essential systems according to the concept of urban environment monitoring include the air pollution levels monitoring system, the weather forecasting system, and the vehicle front camera-based inspection system for suspicious objects. 3.4 Smart Economy The Thammasat University Rangsit Campus and Pathum Thani Province are home to various educational and research organizations, including AIT and NSTDA, that conduct research in various fields. However, there is a lack of promotion and connection between these organizations and the private sector. Thus, a Smart Economy Platform is being developed to connect these organizations and bring research results to real-life applications. This will involve collaboration between research universities and researchers to improve innovation and foster cross-disciplinary and geographic cooperation. The platform leverages big data and Deep Learning to analyze and process information, and aims to drive collaboration between public and private sectors, improve the value and productivity of the agricultural sector, develop targeted industries and services, and support local entrepreneurship by integrating research with real-world problems. V. Sornlertlamvanich, S. Deepaisarn, T. Charoenporn: Thammasat AI City Distributed Platform and Its Validation in Social Distribution and Ambient Lighting 245. Figure 2: Deep intel igent IoT in ful y connected network Source: own. 4 AI City Domain Specific Connectivity Regarding to the domain specific connectivity, there are four types of technology that we focus on, namely distributed city, human traceability, contactless technology and frugal innovation. In this paper, we will mainly focus on the distributed city and the contactless technology. As countries around the world try to admit that the Covid-19 has evolved a strain to survive, to be able to spread easily and quickly until eventually many countries have to accept that COVID-19 is an endemic disease. From trying to block the border to minimize the spread. As wel as efforts to eradicate this disease within a limited time at the cost of enormous social and economic losses. Final y, we have to adapt and learn to live safely with COVID-19 and have as much balance in life as possible. From the concept of new normal that our way of living moved to relied on the online communities, it was currently changed to the next normal. Although epidemic prevention measures are sparse, the lives of people have changed a lot from the original. 246 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Covid-19 changes our behaviors and attitudes in life, especially the exposure to technology including the online platforms which meets the needs of convenience, safety, and hygiene. These behaviors are leading to the next normal way of living which impact to the economy and environment in the future. People are aware of safety, stability, and more flexibility in living and of a greater understanding that health is fundamental to life. The stay-at-home economy, touchless society, physical distancing, and elevating health and wel ness concern are concrete examples of the next normal. Although the situation has improved. But there is still no vaccine that is 100 percent effective in preventing infection. This al owed people to maintain their distance and avoid entering the areas where people are crowded. However, job duties or social contact is sometimes difficult to avoid being in the public areas. Contactless or touchless society then shifted into account. The focus is on adopting technology and practices that allow social and economic activities to continue while minimizing the risk of disease transmission through physical contact. This can include contactless payment systems, virtual meetings, facial recognition technology, and the use of robots for tasks that would normal y require human contact. The goal is to reduce the spread of disease while maintaining as much normalcy and efficiency as possible in daily life. The contactless approach to internet of thing and wearable medical technology emphasizes the need for comfort and ease of use, while also respecting privacy and avoiding interference with daily activities. Following the contactless approach, we aim to create systems that can provide medical monitoring and care without the need for invasive or uncomfortable devices, through the use of contactless technology such as sensors and platforms. These systems include health monitoring, elderly care, and medicine identification, which are designed to provide medical personnel with the information they need to ensure the wel -being of their patients. 4.1 Indoor Camera-Based System for Human Physical Distance Tracking The proposed method for determining physical distance in indoor environments introduces the use of end-to-end cameras to track a person's position, movement direction, and seat activity. The system does not identify individuals, but instead V. Sornlertlamvanich, S. Deepaisarn, T. Charoenporn: Thammasat AI City Distributed Platform and Its Validation in Social Distribution and Ambient Lighting 247. records their location for the purpose of monitoring physical distancing. The research mainly focuses on two main aspects: detecting a person's location and detecting seat positions [7]. 4.1.1 Seat Detection The seat detection in the proposed system uses contour color extraction [8] on the top part of the seat or table. It extracts unique contour shape as a feature, as shown in Figure 1. The results indicate that contour detection works wel for the first rows of seats, but may have difficulties detecting seats in the rest of the rows. The perspective transformation [9], [10] has been applied to changes the original image projection into a new visual plane for system requirements using Equation (1). (1) For improving the seat detection ability, we apply the perspective transformation with the selection of the top seat area in the room as the region of interest (ROI), before seat contour detection. The process al ows the system to accurately detect the seats in the room. 4.1.2 Person Detection The proposed system detects a person in the specified area using YOLOv3 algorithm, which uses trained weights and data sets to detect objects. The pre-trained model and function selected specifical y for human body detection were utilized in the algorithm. as illustrated in Figure 2. In the standard pre-trained YOLOv3 [11],[12], the bottom center of the detection box is used as the reference point for a person's location. The proposed system improves person location reference point by first calculating the height-to-width ratio of the detected human object and rounding it, which better appropriate for the indoor environment being monitored. The ratio is denoted as QBox = hBox/wBox. Where hBox and wBox represent the height and the width of the box, respectively. 248 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 The classification of standing and sitting is based on the ratio of QBox, which is calculated as the height of the box (hBox) divided by its width (wBox) (QBox = hBox/wBox). And it can be concluded that − if the value of QBox >= (greater than or equal) 1.6, refers to a standing person in the box − if the value of QBox < 1.6, refers to a sitting person in the box (2) The distinction between standing and sitting scenarios is illustrated in Figure 3.(b) and 3.(c) compared to Figure 3.(a) which shows the conventional algorithm for determining the reference location of a person. Here, the location of the person in each frame is identified using a modified algorithm. The conventional point at the bottom-middle of the detected human body has been replaced with a reference point c, which can take the values 0.775 or 0.675 for a standing or sitting person, respectivelyThe reference points for standing and sitting cases are calculated using Equation (2) and this approach was used to improve the accuracy of location of persons in a room environment. Figure 3: Il ustration of human body detection reference point (green). (a) original approach (at the bottom-middle point of the green box) (b) a person standing between seats (77.5% of vertical proportion from the top-middle point as himage1 /wimage1 ≥ 1.6) (c) a person sitting on the seat (67.5% of vertical proportion from the top-middle point as himage2 /wimage2 < 1.6). Source: own. V. Sornlertlamvanich, S. Deepaisarn, T. Charoenporn: Thammasat AI City Distributed Platform and Its Validation in Social Distribution and Ambient Lighting 249. 4.1.3 Physical Coordinate Calculation The reference point coordinates for seats and persons were determined and used to calculate physical coordinates by converting the image coordinates to physical coordinates using two reference parameters from the room's physical dimensions, including the real size of the room and the pixel size of the region of interest in the image frame. The ratio between image and room size was calculated using Equation (3). (3) 4.1.4 Performance Evaluation and Discussion The proposed system uses seat contour detection to initiate seat positioning in a room. Nonetheless, the accuracy of the detection decreases in the back rows according to constraints from further seats. To improve the detection ability for the back row seats, a perspective transformation is proposed. This transformation increases the ability of the seat detection algorithm to detect seats in the back rows. For physical distancing monitoring, the system uses the upper left-hand corner of the seat boundary area after the perspective transformation as the reference point. This point is used to transform the image space into a bird-eye view, al owing for calculation of physical locations of seats and people [13]. While the system can estimate actual distances in real-world units, there may stil be errors from the transformation, which have been studied in various related researches [14], [15], [16]. To improve the accuracy of image-based distance measurement, we need to select the proper area of interest and algorithm to calibrate the image-physical distance. A camera-based system was implemented to determine the physical locations of seats and people in a room. The average error of seat location determined by the system was ±5.25 cm. with a standard deviation of 4.64 cm. The small error suggests the overal performance of the system in determining real-world locations. The system can accurately determine whether people are at an appropriate physical distance (180 cm.) apart from others for COVID-19 prevention, and the uncertainty 250 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 of 5.25±4.64 cm. has a smal effect on the scale of distancing. The system is reliable in providing proper physical distancing suggestions. To classify whether a person is sitting or standing, we used the height-to-width ratio of the area surrounding a detected person as the criteria. Based on this classification, an appropriate reference point was determined to represent the spatial location of the person in the room space. The maximum boundary for a person sitting on a seat was found to be 46x87 cm., which is reasonable according to seat dimensions. The system is able to accurately determine whether a seat is occupied or available This method solved the problem of the hidden part of the human body behind the seat, al owing for more precise location identification. The improvement of the body reference point shows the potential for developing a more precise location identification system in the future. Figure 4. illustrates the system workflow of the camera-based indoor physical distancing log recording system. Figure 5: The system workflow for camera-based indoor physical distancing log recording system Source: own. 4.2 Smart Street Light Monitoring and Visualization Platform Smart lighting is a concept that utilizes artificial intelligence (AI) and the Internet of Things (IoT) technology to manage and control lighting systems in cities. The implementation of smart lighting has become a popular trend in modern cities as it enables efficient and effective management of lighting equipment, reduces energy V. Sornlertlamvanich, S. Deepaisarn, T. Charoenporn: Thammasat AI City Distributed Platform and Its Validation in Social Distribution and Ambient Lighting 251. consumption and costs, and enhances the overal safety and security of the city. Smart lighting has been implemented in many major cities worldwide as an important solution for smart city management. The ability to monitor and control lighting equipment in real-time, col ect and analyze data from devices and sensors, and the ease of use of the platform make it a convenient solution for managing and improving the lighting systems in cities. The Smart Lighting project at Thammasat University, Rangsit Campus, is an example of the implementation of this concept. The project involves the development of a web application on a cloud platform that serves as a central control system for monitoring, controlling, and collecting data from lighting equipment and sensors in the university campus in real-time. The platform provides a user-friendly interface for monitoring and controlling the lighting equipment, and it also visualizes the data col ected from the devices and sensors in the area, making it easier for users to analyze and understand the information, and for the efficient maintenance by campus staff as wel . The platform, moreover, this platform aims to improve energy efficiency and align with Thammasat University's sustainable development goals through monitoring and data analysis for optimal energy consumption [17]. For the first phase, the 167 smart light poles, equipped with LED lamps and adjustable dimming levels were installed. The pole-to-pole separation of approximately 20 meters ensures optimal coverage with the general regulation of street light in Thailand and provides a more efficient and effective lighting solution. 4.2.1 Smart Street Light Installation In collaboration with the Minebea Mitsumi Inc., of Japan, smart street lights and environmental sensors are instal ed in six zones on five roads within the Rangsit campus. The project equipment consists of 167 controllable lighting devices with brightness sensors attached to each lighting device, environment station including weather condition and light sensors, three gateways for connecting al equipment to an external control system. As depicted in Figure 6, the smart LED lights and their associated il uminance sensor are connected to the control node and CMS Neptune SC-v6.0.3 platform operated by the manufacturer (Paradox Engineering, Switzerland). The API can be accessed 252 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 via REST (the Representa- tional State Transfer) over HTTP (Hypertext Transfer Protocol) to collect data and control the local device. The backend of the system must be able to send HTTP requests and receive responses from the CMS API. The front-end visualization is presented as a web application, which also connected to the lighting system via the CMS API, accessible from any devices with a web browser, and can display received data and device status. Advantageously, the web application can be accessed from anywhere at any time. Figure 6: Device connection in the platform. Source: own. 4.2.2 Development of Back-end and Front-end System To develop the web application, it is necessary to establish both the back-end or the server-side and the front-end or the client side of the application. Node.js, a JavaScript runtime environment, is used in web application development as it allows JavaScript to run on both ends. [18]. On top of node.js, the Express.js web framework module is introduced because of its highly configurable nature, thus al owing for more customization on the web application development [19]. Express.js provides a robust and convenient routing system that al ows developers to create custom API endpoints to handle HTTP requests and send responses. The Axios module [20] makes it easy to send HTTP requests and receive responses, which is useful when connecting to third-party APIs like the CMS API. For security, V. Sornlertlamvanich, S. Deepaisarn, T. Charoenporn: Thammasat AI City Distributed Platform and Its Validation in Social Distribution and Ambient Lighting 253. it's common to implement authentication and authorization mechanisms to protect the API and its resources. The use of cookies to store user tokens can help to improve the performance of the authentication process by reducing the number of requests to the CMS server. By using these technologies, developers can create a back-end application that integrates with the CMS API and provides a secure and efficient way for users to access and control devices through the CMS. The backend application communicates with the CMS API to retrieve data from lighting devices and environmental sensors. It processes and reformats the data received from the API and sends it as an HTTP response in JSON format to the front-end application, making it easier to display and use. The API also provides additional information, such as device information and status, to support the monitoring system on the front-end. The process in the back-end application is illustrated in the provided figure 7. Figure 7: The connection between CMS API and the back-end processing of the web application Source: own. Figure 8: The dashboard interface used in the web application Source: own. 254 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 The front-end web application is designed to be user-friendly and accessible via the internet or cel ular network using HTTP protocol. To ensure that the web application behaves the same across al devices, including mobile and desktop devices, responsive web application design with Bootstrap is implemented. This al ows for fast and optimized rendering of web pages on devices with different screen sizes, improving the user experience on mobile devices. Chart.js, an open source JavaScript library [21], is used for data visualization on the dashboard. The library provides a wide range of customization options and is used in the web application to display data from environmental sensors including temperature and humidity from the past 2 hours, in the form of line graphs. The numerical data of illuminance, Ultra Violet A and B indices, wind velocity, wind direction, and air pressure are updated every 10 minutes. The dashboard interface is displayed in Figure 8. Finally, the device location is displayed on an interactive map using Leaflet.js [22]. The interactive map provides real-time information on the device information and precise location. The web application, a combination of front-end and back-end development, s deployed on Microsoft Azure App Service [23], which supports Node.js applications and has a streamlined deployment process. After deployment, a public URL is available for accessing the web application to monitor the smart lighting system and retrieve data from environmental sensors. 4.2.3 Il umination Data Analytics and Prediction Models Figure 9 illustrates the results of the first set of analyses, which focuses on examining the hourly average il uminance values over a period of nine months (February to October 2022). It was observed that the natural light illuminance in Thailand exhibited a predictable pattern of increasing from 06:00, peaking during midday, and declining to zero at approximately 18:00. Additionally, a correlation matrix was analyzed, as shown in Figure 10, to assess the relationship between the selected dataset features. The results indicated that Ultraviolet A and Ultraviolet B were highly correlated with illuminance values. V. Sornlertlamvanich, S. Deepaisarn, T. Charoenporn: Thammasat AI City Distributed Platform and Its Validation in Social Distribution and Ambient Lighting 255. Figure 9: The hourly average of il uminance values over the period of nine months, covering February to October 2022 Source: own. Figure 10: Investigation of the strong positive correlation between Ultraviolet A and il uminance values by generating a correlation matrix for every pair of variables Source: own. Machine learning algorithms were implemented to forecast future illuminance timestamps in this study. The correlation matrix showed a significant correlation between Ultraviolet A and Ultraviolet B with il uminance values; therefore, they were eliminated from the dataset to prevent overfitting. Five environmental parameters, namely humidity, temperature, air pressure, illuminance, and wind velocity, were retained as input parameters. Date and time were also included as parameters during training. The performance of each model was assessed using a correlation coefficient metric by comparing the predicted values with the actual values. Table 1 presents the results of experiment, which compared the performance of four different machine learning models, including Gradient Boosting, XGBoost, Random Forest, and Decision Tree, across different analysis window sizes (3, 4, 5, 6, and 7 days) for predicting illuminance values in the environmental dataset. 256 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Table 1: Evaluation of correlation coefficient between predicted and actual iluminance values on test data using varied machine learning models and window sizes Window size 3 4 5 6 7 Models Days Days Days Days Days Gradient Boosting 0.918 0.914 0.919 0.912 0.912 XGBoost 0.922 0.920 0.920 0.919 0.903 Random Forest 0.919 0.917 0.918 0.915 0.918 Decision Tree 0.839 0.840 0.848 0.838 0.840 Typical y, a model is deemed better if it has a higher correlation coefficient between predicted and actual values. According to the results in Table 1, the optimal machine learning model varied based on the size of the analysis window. For a 3-day window size, the XGBoost model had the highest correlation coefficient of 0.922, making it the best-performing model among the selected ones. 5 Conclusion The Thammasat AI City platform aims to enhance city management by incorporating cross-domain data connectivity and machine learning to support comprehensive data collection. In this paper, the platform has been validated in two areas: street light monitoring and indoor human physical distance tracking. The street light monitoring system provides real-time information on lighting status, energy consumption and maintenance needed to optimize energy usage and reduce costs. The indoor camera-based system monitors social distancing to ensure public safety. The overal goal of the platform is to improve quality of life in urban areas and align with sustainable urban development. Acknowledgement This work was supported by the Thailand Science Research and Innovation Fundamental Fund, Contract Number TUFF19/2564 and TUFF24/2565, for the project of “AI Ready City Networking in RUN”, based on the RUN Digital Cluster col aboration scheme. References [1] Klaylee J, Iamtrakul P, Kesorn P. Driving Factors of Smart City Development in Thailand. In: Proceedings of International Conference and Utility Exhibition on Energy, Environment and Climate Change (ICUE); 2020, p. 1-9. doi: 10.1109/ICUE49301.2020.9307052 [2] Ota N. Create Deep Intelligence TM in the Internet of Things; 2014. URL http://on-demand.gputechconf.com/gtc/2015/presentation/S5813-Nobuyuki-Ota.pdf V. Sornlertlamvanich, S. Deepaisarn, T. Charoenporn: Thammasat AI City Distributed Platform and Its Validation in Social Distribution and Ambient Lighting 257. [3] Singh KK, Singh A, Lin J-W, Elnger A. Deep Learning and IoT in Healthcare Systems. Paradigms and Applications: CRC Press; 2021 Dec. [4] Viriyavit W, Sornlertlamvanich V. Bed Position Classification by a Neural Network and Bayesian Network Using Noninvasive Sensors for Fal Prevention. Journal of Sensors: Hindawi. 2020 Jan; Volume 2020, Article ID 5689860. p. 1-14. https://doi.org/10.1155/2020/5689860 [5] Kovavisaruch L, Sanpechuda T, Chinda K, Kamolvej P, Sornlertlamvanich V. Museum Layout Evaluation based on Visitor Statistical History. Asian Journal of Applied Sciences. 2017 Jun;5(3). p. 615-622. [6] Virach Sornlertlamvanich, Pawinee Iamtrakul, Teerayuth Horanont, Narit Hnoohom, Konlakorn Wongpatikaseree, Sumeth Yuenyong, Jantima Angkapanichkit, Suthasinee Piyapasuntra, Prittipoen Lopkerd, Santirak Prasertsuk, Chawee Busayarat, I-soon Raungratanaamporn, Somrudee Deepaisarn, and Thatsanee Charoenporn. Data Analytics and Aggregation Platform for Comprehensive City-Scale AI Modeling, Proceedings of the 32nd International Conference on Information Modelling and Knowledge Bases (EJC2022), Hamburg, Germany, May 30 - June 3, 2022, pp. 97-112. [7] Somrudee Deepaisarn, Angkoon Angkoonsawaengsuk, Charn Arunkit, Chayud Srisumarnk, Krongkan Nimmanwatthana, Nanmanas Linphrachaya, Nattapol Chiewnawintawat, Rinrada Tanthanathewin, Sivakorn Seinglek, Suphachok Buaruk, and Virach Sornlertlamvanich, "Camera-Based Log System for Human Physical Distance Tracking in Classroom," Proceedings of 2022 APSIPA Annual Summit and Conference, Chiang Mai, Thailand, November 7-10, 2022. [8] S.-W.HongandL.Choi,“Automatic recognition of flowers through color and edge based contour detection,” in 2012 3rd International conference on image processing theory, tools and applications (IPTA), pp. 141–146, IEEE, 2012. [9] N. I. Hassan, N. M. Tahir, F. H. K. Zaman, and H. Hashim, “People detection system using yolov3 algorithm,” in 2020 10th IEEE international conference on control system, computing and engineering (ICCSCE), pp. 131–136, IEEE, 2020. [10] I. Ansari, Y. Lee, Y. Jeong, and J. Shim, “Recognition of car manufacturers using faster r-cnn and perspective transformation,” Journal of Korea Multimedia Society, vol. 21, no. 8, pp. 888–896, 2018. [11] N. I. Hassan, N. M. Tahir, F. H. K. Zaman, and H. Hashim, “People detection system using yolov3 algorithm,” in 2020 10th IEEE international conference on control system, computing and engineering (ICCSCE), pp. 131–136, IEEE, 2020. [12] P. Gupta,V.Sharma,andS.Varma,“People detection and counting using yolov3 and ssd models,” Materials Today: Proceedings, 2021. [13] J. C. Marutotamtama and I. Setyawan, “Physical distancing detection using yolo v3 and bird’s eye view transform,” in 2021 2nd Interna- tional Conference on Innovative and Creative Information Technology (ICITech), pp. 50–56, IEEE, 2021. [14] S.-F. Lin, J.-Y. Chen, and H.-X. Chao, “Estimation of number of people in crowded scenes using perspective transformation,” IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, vol. 31, no. 6, pp. 645–654, 2001. [15] J. Yu, N. Gao, Z. Meng, and Z. Zhang, “High-accuracy projector calibration method for fringe projection profilometry considering perspective transformation,” Optics Express, vol. 29, no. 10, pp. 15053–15066, 2021. [16] V. Kocur and M. Fta ́cňik, “Detection of 3d bounding boxes of vehicles using perspective transformation for accurate speed measurement,” Machine Vision and Applications, vol. 31, no. 7, pp. 1–15, 2020. [17] Somrudee Deepaisarn, Paphana Yiwsiw, Chanon Tantiwattanapaibul, Suphachok Buaruk, and Virach Sornlertlamvanich, "Smart Street Light Monitoring and Visualization Platform for Campus Management," 2022 17th International Joint Symposium on Artificial Intelligence and 258 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Natural Language Processing (iSAI-NLP), Chiang Mai, Thailand, November 5-7, 2022, pp. 1-5, doi: 10.1109/iSAI-NLP56921.2022.9960257. [18] N. Chhetri, A Comparative Analysis of Node.js (Server-Side JavaScript). Master’s thesis, St. Cloud State University, 2016. [19] A. Mardan, “Using express.js to create node.js web apps,” Practical Node.js, pp. 51–87, 2018. [20] “Axios.”. Available at https://axios-http.com/, 2022. [21] H. M. Mil qvist and N. Bolin, A comparision of performance and scalability of chart generation for Javascript data visualisation libraries: A comparative experiment on Chart.js, ApexCharts, Billboard, and ToastUI. Bachelor’s thesis, University of Sko ̈vde, 2022. [22] V.Agafonkin, “Leaflet-anopen-sourceJavascriptlibraryforinteractive maps.” https://leafletjs.com/, 2022. [23] V. P. Desai, K. S. Oza, P. P. Shinde, and P. G. Naik, “Microsoft azure: Cloud platform for application service deployment,” International Journal of Scientific Research in Multidisciplinary Studies, vol. 7, no. 10, pp. 20–23, 2021. DIGITAL MODELING THE IMPACT OF EU ENERGY SECTOR TRANSFORMATIONS ON THE ECONOMIC SECURITY OF ENTERPRISES OLENA KHADZHYNOVA,1 ŽANETA SIMANAVIČIENĖ,1 OLEKSIY MINTS,2 KATERYNA POLUPANOVA1 1 Mykolas Romeris University, Vilnius, Lithuania mints_a_y@pstu.edu, zaneta.simanaviciene@gmail.com, polupanova.pstu@gmail.com 2 SHEI “Pryazovskyi State Technical University”, Dnipro, Ukraine mints_a_y@pstu.edu The main purpose of this position paper is to consider the prerequisites for digital modeling the impact of EU energy sector transformations on the economic security of enterprises. The energy security of the EU is a current issue for al member countries. The EU's energy policy aims for diversification of energy resources and energy independence. After 2022, this issue has worsened. The article analyzes the main risks to the energy security of EU countries and industries located in these countries. The dynamics of energy consumption by different sectors of the EU economy are considered and the impact of Keywords: changes in the energy sector on the economic security of economic security, businesses is evaluated. Approaches to modeling the impact of energy security, transformations in the energy sector on the economic security of enterprises, industry, businesses are discussed. The most promising approaches for modeling, modeling catastrophic changes are highlighted. European Union DOI https://doi.org/10.18690/um.feri.5.2023.12 ISBN 978-961-286-745-4 260 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 1 Introduction The energy sector is a vital component of the European Union's economy and plays a significant role in ensuring economic security for enterprises. In recent years, the EU has undergone significant transformations in its energy sector, aimed at reducing its dependence on fossil fuels, increasing the use of renewable energy sources, and enhancing energy efficiency. The shift towards cleaner energy sources has led to the creation of new business opportunities and the expansion of existing ones, particularly in the renewable energy and energy efficiency sectors. The increased investment in these areas has created jobs, stimulated innovation, and helped drive economic growth. However, the transformation of the energy sector also presents chal enges for enterprises. The transition to new energy sources and technologies can result in changes to traditional business models, leading to increased competition and reduced profits. It can also cause uncertainty for companies that are heavily invested in fossil fuels, leading to potential job losses and financial instability. To mitigate these risks, the EU has implemented various measures, including providing financial support to companies, encouraging the development of new technologies, and promoting the adoption of energy-efficient practices. These initiatives aim to support enterprises in the transition to a more sustainable energy future and ensure their long-term economic security. The EU is heavily dependent on energy imports from other countries, particularly for oil and natural gas. This dependence on energy imports leaves the EU vulnerable to price fluctuations and supply disruptions, which can have significant impacts on the economy and individual enterprises. During a long time Russian Federation perceived as a reliable supplier of natural gas. However, 2022 marked the beginning of a hybrid economic war between the Russian Federation and the EU, during which the probability of a complete cessation of natural gas supplies from the Russian Federation to the EU increases significantly. O. Khadzhynova et al.: Digital Modeling the Impact of EU Energy Sector Transformations on the Economic Security of Enterprises 261. The war between Ukraine and the Russian Federation has already led to significant transformations in the energy sector of the European Union. Moreover, this process continues and the final configuration of the energy sector of the EU is stil difficult to predict. But it is already clear that the energy sector transformations wil greatly affect the economic security of industrial enterprises in Europe. The main purpose of this article is to consider the prerequisites for modeling the impact of EU energy sector transformations on the economic security of enterprises. In order to accomplish this objective, we wil conduct a literature review on the topic of the economic security of the EU, trends in the development of the European energy sector, and the modeling of their interrelationships. Subsequently, we wil analyze the development of the energy infrastructure of the EU, and examine trends in the dynamics of energy consumption by the industrial sector. Based on the findings, we wil identify approaches to model the transformations of the EU energy sector and their impact on the economic security of enterprises. In this regard, we wil consider possible methods within the framework of inductive and deductive approaches. 2 Literature review The close relation between the energy and economic security of the EU countries has been evident for a long time. Although the need for a dramatic transformation of the energy sector became critical only in 2022, the prerequisites for modeling such a situation were considered in the works of many scientists. The authors (Sharples, 2013; Smiech, 2013) show the key aspects of economic security for the EU energy in the context of climate change. Their articles deal with the concept of energy security and economic problems for the EU countries. The problem of natural gas consumption as a ‘green fuel’ for Europe is considered. The other group of authors (Jonsonn et al., 2015; Sytailo & Okhrimenko 2020) defines the key security indicators of the EU energy market in the next aspects: energy security, security of supply, security of demand and revenue, other political, social, technical and environment risk factors. Therefore, energy consumption in the context of low-carbon energy transitions described as not only technological problem, but also the consideration of market supply and demand aspects. 262 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 An important component of the economic security of the EU energy consumption is the development and implementation of renewable energy sources (Zherlitsyn et al., 2020). The authors consider the economic aspects of the implementation of the appropriate technological solutions, evaluate the forecasting and economic efficiency models for such projects. The relationship between the EU foreign policy and energy strategies is the other part of economic security aspect (Youngs, 2020). In particular, the factors exogenous to energy policy can be significant and how these can contain contestation as wel as generate it. These aspects have become key in 2022. Contemporary works show what the EU economic security needs to improve its external energy security. The author (Misik, 2022) shows that the EUʼs response in energy security policy have been rather slow and mainly towards the revision of the Unionʼs internal mechanisms rather than the common external energy policy creation. The study by Perdana et al. (2022) examines the economic repercussions of a complete ban on fossil fuel imports from Russia to European energy producers. Of particular interest is the analysis of the effects of cutting off energy imports from Russia for each individual EU country. Another study by Chachko and Linos (2022) evaluates the EU's energy consumption and identifies security and defense strategies in response to Russia's invasion of Ukraine. The outcome of the crisis revealed a lack of attention paid to the economic security aspects in the EU's energy sector. As a result, early 21st-century authors placed a significant emphasis on the promotion of "green fuel" and diversification of energy sources for EU countries. Nonetheless, the outcome of partial and total embargoes on fossil fuel imports from Russia in 2022 exposed a deficiency of economic security in the energy sector for EU countries. 3 Results and discussion Energy security is considered to be an important aspect of ensuring a stable and reliable energy supply for individuals, businesses, and governments. This includes ensuring the availability of resources such as oil, natural gas, and coal, as well as promoting the use of renewable energy sources and reducing dependence on single sources of energy. Energy security also involves protecting energy infrastructure and O. Khadzhynova et al.: Digital Modeling the Impact of EU Energy Sector Transformations on the Economic Security of Enterprises 263. addressing potential risks such as supply disruptions, price volatility, and environmental impacts. Thus, energy security is an important part of economic security. Energy security refers to the availability, reliability, and affordability of energy supplies, as wel as the stability of the energy systems and infrastructure that support them. It involves ensuring a consistent and sufficient supply of energy to meet the needs of individuals, businesses, and governments, while also mitigating the risks and impacts of energy production and consumption, such as price volatility, supply disruptions, and environmental degradation. The goal of energy security is to ensure a stable and sustainable energy system that supports economic growth and protects national economic security. From the perspective of enterprises, energy security is the assurance that they have access to a reliable and cost-effective supply of energy to meet their operational needs and support their business goals. It involves minimizing the risks and uncertainties associated with energy prices and supply, and ensuring the resilience of the energy infrastructure that supports their operations. Companies may adopt strategies to improve their energy security, such as diversifying their energy sources, implementing energy-efficient technologies, and investing in renewable energy. In the economical context the ultimate goal for companies is to minimize their exposure to energy price volatility and supply disruptions, while ensuring the sustainability of their energy consumption and supporting their bottom line. Energy security is a key concern for the European Union, as the bloc relies heavily on imported energy, particularly oil and natural gas. The EU has taken various steps to improve its energy security by diversifying its energy mix and reducing its dependence on single sources of energy. This includes promoting the use of renewable energy sources, such as wind and solar power, and increasing energy efficiency through the implementation of policies and regulations. The EU has also established a number of initiatives aimed at improving energy security and the stability of the energy market. For example, the EU has established a single energy market, which facilitates the flow of energy between member states and helps to reduce dependence on single sources of energy. The bloc has also 264 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 established a number of interconnections between its national energy grids, which help to increase the security of energy supply and reduce the impact of disruptions. The EU is also working to improve the security of its energy infrastructure, particularly in the context of increasing concerns about cyber security threats. The EU has established various initiatives aimed at enhancing the resilience of its energy systems, including the development of a comprehensive framework for the security of the electricity grid and the strengthening of emergency response mechanisms. Overall, energy security is a high priority for the EU, and the bloc is taking a multifaceted approach to address the chal enges it faces in ensuring a secure, reliable, and sustainable energy supply, which is confirmed by the analysis below. To assess the impact of changes in the energy sector structure on the economic security of enterprises, the dynamics of energy consumption by various sectors of the economy should be considered (Fig. 1). The decline in energy consumption by the industrial sector, as shown in Fig. 1, is a noteworthy trend in the EU's economy. The reduction in energy consumption compared to 1990 levels, where the industrial sector consumed 79% of energy, to 77% in 2021 highlights the efforts made by the EU to improve energy efficiency and reduce its carbon footprint. The decrease in energy consumption in the industrial sector can be attributed to the implementation of energy-saving technologies, as wel as increased awareness about the environmental impact of energy consumption. On the other hand, the data in Fig. 1 shows that the other sectors of the EU economy have experienced a steady increase in energy consumption until 2006-2010. The subsequent decrease in energy consumption after this period can be attributed to the implementation of energy-efficient technologies and practices, such as the use of renewable energy sources and the development of more efficient buildings and appliances. O. Khadzhynova et al.: Digital Modeling the Impact of EU Energy Sector Transformations on the Economic Security of Enterprises 265. 140% 130% 127% 125% 120% 119% 120% 113% Final consumption - energy use 110% 107% 104% 104% 100% 100% 102% 100% Final consumption - industry sector - energy use 90% 87% Final consumption - transport sector - energy use 80% 79% 77% Final consumption - other sectors - 70% energy use 60% 50% 1990 1990 to 2000 1990 to 2010 1990 to 2021 Figure 1: Dynamics of energy use by the groups of consumption, comparing with 1990. Source: Authors’ estimations and [2]. The changes in energy consumption by the industrial sector of EU countries began much earlier and can be traced back to the early 1990s. Fig. 2 provides a more detailed look at these changes, highlighting the specific industries that have seen the greatest reductions in energy consumption. Understanding the changes in energy consumption by industry is crucial for policymakers, as it provides valuable insights into the effectiveness of energy efficiency policies and the areas where additional efforts are needed to reduce energy consumption. 140% 130% 120% iron and steel 110% chemical and petrochemical non-ferrous metals 100% non-metallic minerals 90% transport equipment machinery 80% mining and quarrying food, beverages and tobacco 70% paper, pulp and printing 60% 50% 1990 1990 to 2000 1990 to 2010 1990 to 2021 Figure 2: Dynamics of energy consumption by the groups of industries, comparing with 1990. Source: Authors’ estimations and [2]. 266 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 The data shown in Fig.2 reveals that most industries have experienced a decrease in energy consumption. However, the printing, food, and transport industries show an exception, yet they have either seen a decrease or no increase in energy consumption since 2000. Such dynamics, in the context of an increase in GDP in EU countries, could indicate that industrial enterprises had two main opportunities: or become more energy efficient or shutting down. An additional analysis of industrial production statistics showed that most industries have seen an increase in production over the period under review. This is even true for energy-intensive industries such as metal production (with a 5% increase from 1991 to 2021), motor vehicle production (with a 47% increase over the same period), mining, and others (Eurostat, 2022). Let's consider the approaches to modeling the transformations of the EU energy sector and its impact on the economic security of enterprises. The positive influence in this case is the availability of sufficient statistical data reflecting the dynamics of the development of various industries in the EU and their energy consumption, including a breakdown by types of fuel used (Eurostat, 2022). This allows for sufficient use of such inductive models: Statistical models, that use historical data to estimate the relationship between energy sector transformations and economic security can be used to make predictions about the future. This type of model can be used to estimate the impact of changes in energy prices, energy mix, and energy efficiency on the economy and individual enterprises. Input-output models: This type of model looks at the flow of goods and services between different sectors of the economy, including the energy sector. By analyzing the interconnections between different sectors, input-output models can be used to estimate the indirect and spil over effects of energy sector transformations on the economy and individual enterprises. Using inductive models al ows us to establish and statistical y confirm the main trends and interrelations in the transformation of the energy sector of the EU, which have historically arisen over the past decades. Unfortunately, inductive models cannot adequately predict the reaction of the research object in case of sharp changes O. Khadzhynova et al.: Digital Modeling the Impact of EU Energy Sector Transformations on the Economic Security of Enterprises 267. in external conditions. This is the situation that arose in 2022 when the EU countries were forced to sharply restructure the chains of energy carriers supplies and the structure of the energy market due to the start of the war in Ukraine and the "gas blackmail" by the Russian Federation. To model the transformations of the energy sector of the EU and their impact on the economic security of enterprises in this case, the following models based on a deductive approach can be used: Scenario analysis: This type of analysis considers different future scenarios for the energy sector, taking into account different policy interventions and technological developments. Scenario analysis can be used to explore the potential impact of energy sector transformations on the economy and individual enterprises, al owing for a more comprehensive understanding of the risks and opportunities associated with the transition to a more sustainable energy future. Simulation modeling: It's a broad class of models that includes system dynamics models, agent-evolutionary models, service system models, and others. The use of simulation models involves formulating assumptions about the structure, properties, and internal relationships of the modeled objects. By constructing virtual representations of real-world phenomena, simulation models enable us to experiment with different scenarios, identify cause-and-effect relationships, and make informed decisions about how to optimize and manage these systems. Based on these assumptions, the behavior of the object (energy sector, or industry) in the future is calculated, taking into account both random and planned deviations in its conditions of existence. The deductive approach to modeling the impact of bifurcation changes in the energy sector on the economic security of enterprises does not exclude the use of inductive methods of modeling to confirm specific trends in the economies of EU countries. Economic development processes are to a sufficient degree inertial and it should be expected that even in the conditions of global catastrophes, the main vector of development will remain unchanged. Even a cursory analysis of statistics and literary sources al ows us to note that the use of "green energy" and high energy efficiency in production is characteristic for EU countries. However, additional research is needed to more accurately establish the impact of these factors. 268 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 It's important to note that these models are just a tool and the results they provide should be interpreted with caution. The assumptions made, data used, and methodologies employed will impact the results, so it's essential to carefully consider these factors when using models to understand the impact of energy sector transformations on the economic security of enterprises. 4 Conclusion The transformation of the EU energy sector presents both opportunities and chal enges for enterprises. While the shift to cleaner energy sources offers new business opportunities, it also requires companies to adapt to changes in the market and the energy sector. Through various measures and initiatives, the EU is working to ensure that enterprises can thrive in a changing energy landscape and contribute to a more sustainable and economical y secure future. It is currently impossible to accurately predict al the consequences of the transformation of the EU's energy sector as a result of the military conflict between Ukraine and Russia. Industrial enterprises in the EU have long been focused on improving energy efficiency, which has helped to bolster their economic security against energy threats from Russia. The potential physical shortage of natural gas supply is driving the EU to increase its use of renewable energy and electric vehicles. The use of digital models will allow for the establishment of the most likely scenarios for event development and reduce the level of uncertainty for industries. Acknowledgement Sources of funding for research presented in a scientific article or scientific article itself: This research is/was funded by the European Social Fund under the No 09.3.3-LMT-K-712-23-0211 “Transformation of the economic security system of enterprises in the process of digitalization” measure. References [1] Chachko, E., & Linos, K. (2022). Ukraine and the Emergency Powers Of International Institutions. American Journal of International Law, 116(4), 775-787. Doi:10.1017/ajil.2022.57 [2] Eurostat (2022) Energy statistics - an overview. Eurostat https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Energy_statistics_-_an_overview# O. Khadzhynova et al.: Digital Modeling the Impact of EU Energy Sector Transformations on the Economic Security of Enterprises 269. [3] Jonsson, D. K., Johansson, B., Mansson, A., Nilsson, L. J., Nilsson, M., & Sonnsjo, H. (2015a). Energy security matters in the EU Energy Roadmap. Energy Strategy Reviews, 6, 48-56. doi:10.1016/j.esr.2015.03.002 [4] Misik, M. (2022). The EU needs to improve its external energy security. Energy Policy, 165, 5. doi:10.1016/j.enpol.2022.112930 [5] Perdana, S., Viel e, M., & Schenckery, M. (2022). European Economic impacts of cutting energy imports from Russia: A computable general equilibrium analysis. Energy Strategy Reviews, 44, 15. doi:10.1016/j.esr.2022.101006 [6] Sharples, J. D. (2013). Russian approaches to energy security and climate change: Russian gas exports to the EU. Environmental Politics, 22(4), 683-700. doi:10.1080/09644016.2013.806628 [7] Smiech, S. (2013). Some aspects of energy security in the EU member countries in the period 2000-2010. Proceedings of the 7th Professor Aleksander Zelias International Conference on Modelling and Forecasting of Socio-Economic Phenomena, 157-164. [8] Sytailo, U., & Okhrimenko, O. (2020). Evaluating the level of economic security of the EU energy markets. Eastern Journal of European Studies, 11(2), 353-377. [9] Youngs, R. (2020). EU foreign policy and energy strategy: bounded contestation. Journal of European Integration, 42(1), 147-162. doi:10.1080/07036337.2019.1708345 [10] Zherlitsyn, D., Skrypnyk, A., Rogoza, N., Saiapin, S. & Kudin, T. (2020). Green tariff and investment in solar power plants. Studies of Applied Economics. Vol 38 (4). Doi: http://dx.doi.org/10.25115/eea.v38i4.3994 270 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 6 Communication and Collaboration Ločna stran 272 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Ločna stran LESSONS LEARNED FROM COLLABORATIVE PROTOTYPE DEVELOPMENT BETWEEN UNIVERSITY AND ENTERPRISES JANNE HARJAMAKI, MIKA SAARI, MIKKO NURMINEN, PETRI RANTANEN, JARI SOINI, DAVID HASTBACKA Tampere University, Pori, Finland janne.harjamaki@tuni.fi, mika.saari@tuni.fi, mikko.nurminen@tuni.fi, petri.rantanen@tuni.fi, jari.soini@tuni.fi, david.hastbacka@tuni.fi In this article, the focus is on the KIEMI research project (“Less is More: Towards the Energy Minimum of Properties” in English) conducted in Tampere University during the period of 2019-2022. In this project, we used the earlier developed Descriptive Model of Prototyping Process (DMPP) to guide university-enterprise collaboration. The project consisted of several pilot cases, with prototypes, which were done in collaboration with companies, tackling real-world problems. In this article, we review and evaluate the suitability of the DMPP for usage in a research project. The article explores the topic from two directions: the collaboration of university and Keywords: artifact, enterprises, and the reusability of artifacts within the DMPP. The reusability, paper introduces several pilot cases made on the KIEMI project, collaboration, and describes the usage of the DMPP in them. Furthermore, the DMPP, prototyping, paper evaluates the model, sets forward the chal enges faced, iterative design, and, finally, discusses topics for future research. process model DOI https://doi.org/10.18690/um.feri.5.2023.13 ISBN 978-961-286-745-4 274 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 1 Introduction Universities and other research organizations produce research results, typical y in the form of publications, such as papers and technical reports. In addition, applied research produces prototypes with proofs of concept (PoC). This study presents the outcome of one university project, where proofs of concept were mainly implemented by building data-gathering prototypes. The focus of this study is on the findings of the KIEMI project (“Vähemmäl ä Enemmän – Kohti Kiinteistöjen Energiaminimiä”, or “Less is More: Towards the Energy Minimum of Properties” in English). The aim of the project was to develop proof-of-concept demonstrations and prototype applications that illustrate how cost-effective, open, and modular solutions could be utilized to improve the energy efficiency of existing, older buildings [1]. The KIEMI project was selected for analysis in this paper because of its large number of pilot use cases. The goal of the KIEMI project was to save energy, and we worked towards this goal by developing and constructing data-gathering IoT sensor systems. We used the developed SW/HW framework [2] and the formerly developed descriptive model of the prototyping process (DMPP) [3]. The SW/HW framework generalizes prototype development into a group of necessary components and even more precisely the framework defines guidelines for constructing prototype systems to col ect data for different purposes by reusing the required software and hardware components [2]. The DMPP was developed to guide the IoT prototype development process and can be used as a guideline when building a prototype. The DMPP contains the prototype development practices that have been applied in research projects between our university and enterprises. With these developed IoT prototypes, developers can receive valuable feedback on the possibility of implementing the application [3]. The following research questions were formulated during the project work. For this study, we wished to gain insight on the following topics: − RQ1: Collaboration. How was university-enterprise collaboration executed in practice using the DMPP? − RQ2: Reusability. How did the reusability of the artifacts in the DMPP steps support the workflow of the pilot cases? J. Harjamaki et al.: Lessons Learned from Collaborative Prototype Development Between University and Enterprises 275. University-enterprise collaboration (part of universities’ third mission [4], [5]) has been used in previous projects and the DMPP model was developed into its current format based on the pilot cases of these previous projects. The KIEMI project also aimed to build prototypes in collaboration with companies for IoT type data gathering. Since we already had a completed process template, it was decided to put it to good use in this project as wel , and RQ1 looks at the success of this issue. Further, RQ2 focuses on the operation of DMPP sub-processes and how templates were created from them. The use of templates was intended to accelerate the operation. At the beginning, their significance was not understood, but by following the model the usefulness of the templates was noted. The same practices were observed when using the process model, so reuse was included in the review. The benefit and reusability of templates created specifical y from reporting was monitored as it was expected to speed up the implementation of some steps. The structure of this paper is as fol ows: In Section II, we review the related research about universities’ third mission, industry collaboration. Also the background of the KIEMI project is explained. In Section III, we introduce the DMPP and its connections with project work. Further, the implementation of university-enterprise collaboration in prototype development is described by means of process modeling notation. Section IV introduces the KIEMI project – its purpose, activities, goals, and outcome. Section V continues by describing the prototyping pilot cases performed during the KIEMI project. Section VI evaluates the usability of DMPP in the KIEMI project highlighting results of the project and pilot cases. Section VII summarizes the study, and includes a discussion and suggestions for future research on the topic. 2 Background 2.1 Third mission It is a common conception that the modern university serves three main purposes: teaching, research, as wel as a broader social function. The latter of these functions, commonly dubbed ”The Third Mission” [4], [5], is regarded as including measures contributing to social impacts and interaction. 276 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Industry-academia collaboration benefits those organizations that do not have their own R&D facilities. For example, companies can utilize the resources of a university to understand their modern-day software engineering problems. Industry has realized that it can support innovation and development processes when collaborating with researchers. [6] Figure 1 illustrates how the process model approach can be used to align European Union policy and Finnish universities’ missions in the form of applied research and col aboration. Figure 1: Third mission concept with the KIEMI project Source: own. The EU cohesion policy and EU Structural Funds (SF) are used through Operational Programmes (OPs) to make it possible to create innovative collaboration projects for local stakeholders. Finnish universities have extended their traditional teaching and research activities within the third mission (TM) to exploit research results for peripheral areas, i.e., in the form of collaboration with local stakeholders. [7] The University Consortium of Pori (UC Pori) has longstanding and specialized experience of creating collaboration with local stakeholders using the EU SF and OPs through university facilities and resources [7]. The KIEMI project represents a continuation of the series of OPs executed at UC Pori in recent years. J. Harjamaki et al.: Lessons Learned from Collaborative Prototype Development Between University and Enterprises 277. In collaboration, the transfer of technology is an important part, because it innovates development processes and innovative products achieve improved business competitiveness. In the study by [8], innovation is considered as a process consisting of two phases: technology creation and technology transfer. As seen in Figure 1, the KIEMI project was a framework for implementing collaboration and applied research methods in the form of innovative ICT application pilot cases for local stakeholders. The descriptive model for the prototyping process (DMPP) was the spearhead of the process, pul ing al the pieces together. 2.2 Collaboration channels for interactions Interaction between public research organizations and industry can be implemented through many kinds of collaboration channels. One way to classify collaboration channel types was done in [9], where channels were divided into four groups: traditional, services, commercial, and bi-directional. In this paper, collaboration in SF OPs can be seen as bi-directional collaboration between university and industry, where both parties benefit from the acquisition and development of the technological know-how necessary for the prototype. In addition to the technical content, the prototype usage must take into account the development of interconnections necessary for university-enterprise collaboration and their impact on future cooperation activities. 2.3 Innovation models for collaboration In projects like KIEMI, collaboration activities are done several times; mostly each time with different SMEs or public organizations (or some unit or department from their organization). To simplify this for the reader, we use the term industrial development (ID) for these collaboration parties or stakeholders. In addition, in case some ID has their own research group or department or if there is a CEO with a researcher’s mindset, their staff can be referred to as industrial research (IR). Similarly, the university research unit, as in the KIEMI project, can be defined as academic research (AR). 278 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 For successful collaboration management between ID and AR, it is useful to have a framework or process model to ensure that the collaboration and innovation activities inside it create solutions and PoC along with pilot cases and receive strong support from al parties from the very beginning. In the study by Punter[8], two main stakeholder groups were identified: researchers and industrial practitioners, where the former (AR) act as a technology provider and the latter (ID) as a technology receiver. They also pointed out that AR and ID may have completely different values and targets for technology and collaboration activities. AR is interested in proving concepts for technology via pilot cases during projects. ID is looking for a statement or evaluation of the business benefits and costs of the technology and may see AR’s PoC as a technology study without the necessity for proof, i.e., a production proof version. With an EU OP (such as KIEMI), the ID types of collaboration are predefined in the OP requirements. The same set of requirements also contains targets for project results which can be related to certain products or services through ID or a target may be related to co-creation activities or to research and development activities between AR and ID. in this project, a production proof version is not included, only PoCs. It is assumed that ID wil continue the production proof version from the results of the project. The model used should take different types of ID into account. It should also take into consideration the fact that innovation activities and technology transfer may happen in al phases or steps. As an example, Punter [8] highlights a case where design work was able to add value for ID. Similarly, in projects, value can be produced in cases where some commercial product, already designed for a certain usage, has been applied in a new environment through pilot case activities. Naturally, activities to develop a suitable collaboration model fall mostly to the party responsible for the project, as here on the AR side. The model and its efficiency define success for current and future col aboration between AR and ID. A study by [10] presents the Certus model, which was developed at a Norwegian research-based innovation center. Their needs for a collaboration model contained similar elements to the DMPP model. They required deeper research knowledge of J. Harjamaki et al.: Lessons Learned from Collaborative Prototype Development Between University and Enterprises 279. cocreation activities via problem definition and solving tasks and more active dialog between researchers and practitioners to align their expectations. They also wanted to ensure that the results and outputs from research projects that are created have practical relevance and benefit for their partners and that the results can be transferred and exploited effectively by their partners. The Certus model [10] contains seven phases, from problem scoping to market research. Whereas the first four phases (problem scoping, knowledge conception, knowledge and technology development, and knowledge and technology transfer) can be regarded as similar to proof-of-concept development, the following three phases (knowledge and technology exploitation, organizational adoption, and market research) are more related to production proof activities. 2.4 The KIEMI project The reduction of greenhouse gas emissions is one of the most challenging global objectives of the near future. Low carbon emissions, energy savings, a climate-friendly approach, and ecological y sustainable choices require new and innovative services, solutions, and products. One of the biggest potential areas where savings can be made is energy use in properties in Finland. The KIEMI project, carried out by Tampere University Pori unit, designed and developed methods and technologies that aid in finding and achieving the propertyand situation-specific ”energy minimum”, i.e., a situation where the minimum amount of energy is used while still preserving a comfortable environment within the building. In the KIEMI project, the primary focus was not on new properties or so-cal ed ”smart buildings”, but on older buildings and apartments that do not contain modern automatic and intel igent devices commonly used for controlling the quality of the living and working environment. Proof-of-concept demonstrations and prototype applications were developed in the KIEMI project that illustrate how cost-effective, open, and modular solutions can be utilized to improve the energy efficiency of buildings. Further, a decrease in overall energy usage will lead to cost savings related to energy expenses and reduce the carbon footprint caused by, for example, the heating, cooling, and air conditioning of buildings. 280 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 In the present world situation in 2023, the theme of the project, energy savings, is especial y topical, at least in Europe. The KIEMI project partners consisted of organizations and companies who were able to take part in the pilot cases implemented during the project by providing properties, equipment, sensors, and measurement data or by acting as experts. The results of the project can be utilized by all those involved with the energy and resource efficiency of properties and housing-related wel being as wel as the relevant private (companies) and public bodies (municipalities). The commitment of the project partners to the project activities was based on the DMPP collaboration model developed in previous projects. In the KIEMI project, the focal point of the partner-specific co-operation varied, depending on how the partner wished to participate, and how they were able to contribute to the research. Collaboration and contribution to the project pilot cases took place roughly according to the following breakdown: Figure 2: Timeline of pilots in KIEMI. Source: own. 1. Identifying premises for use in the project (condition measurements in the properties) 2. Handing over existing property data for use in the project (interfaces with existing property measurement systems) J. Harjamaki et al.: Lessons Learned from Collaborative Prototype Development Between University and Enterprises 281. 3. Determining measurement needs and planning pilot cases together (tailored needs for condition measurement of the target) 4. General development of condition measurement (developing sensor and measurement systems in collaboration with project partner) During the project a total of 23 different types of pilot cases were carried out related to the energy efficiency and condition measurement of properties. The pilot cases conducted during the KIEMI project as wel as the prototype systems developed for them and the technology testing have been reported extensively in the form of scientific articles (several international y peer-reviewed research publications). Figure 2 shows the schedule of pilot case implementation by month and quarter over the duration of the project. For interrupted pilot cases, the timetable describes the time interval during which discussion and reflection took place. 3 Process model for prototyping: Descriptive model for the prototyping process (DMPP) The purpose of this section is to present how the selected process model has supported the work within the projects. Our descriptive software process model for IoT prototyping was introduced in [3]. The DMPP was developed during a previous project where the prototyping focused on one area. The DMPP was developed using the descriptive process model (DPM) approach [11]. The basic concepts related to processes are role, activity, resource, and artifact. The example is illustrated by a developer (role) involved in software development (activity) using a programming tool (resource). The activity produces some software (artifact) used in a prototype system. The process data for the model is col ected through interviews with the developers involved in the four different prototype development processes. Four prototype development projects and their outcomes were reported in several studies [12], [13], [14], [15]. The common factor in all of the studies is that they present developed IoT prototype systems that gather data. When the KIEMI project started, we noticed that this DMPP could be an acceptable way to approach the subject. During the project, we actively searched for pilot cases (Step 0) where previously col ected knowledge about prototyping IoT data-gathering systems could be used. Figure3 presents the DMPP [3] including steps one to six. The pilot case starts with an issue related to a suitable situation for the research 282 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 group. The pilot case ends after it has been presented to the customer and other reports have been published. After the pilot case, there is also the possibility to add step 7 (Production proof mentioned in 2.3) which consists of following up the procedure, e.g., client or someone outside of the original pilot case group wishes to utilize the prototype or parts of it. The second possibility is that the developed prototype system goes into production and needs further support (this kind of situation is reported in [14]). Public community Company Organization Representatives Representatives Representatives 6. Presentation 1. Discuss 2. Requirements 3. Develop 4. Development 5. Prepare & Slides Requirements Notes Software Artifacts Conduct Presentation University Representatives Figure 3: Process model for prototype development. Source: Adapted from [3]. Figure 3 presents the DMPP model. The model includes six steps and the roles, activities, and artifacts can be described as fol owed using the SW/HW framework [3] and the DMPP [2]: 1. The first step starts from the requirements definition, a collaborative discussion between the developers and the client. The client defines what kind of data would be useful. The developer group starts to define the hardware and overal architecture of the system and how the data wil be col ected by the software. The selected hardware mostly determines the software environment and tools used. Benefit - Clarification of the problem item together with the customer. Limitation - Does the development team have sufficient expertise in the subject area? 2. The outcome of the discussion is the first artifact: for example, the prototype system requirements in the discussion notes. The developer group constructs the first architecture model of the component interconnections. For example, in IoT systems, we describe the practice of how to define a J. Harjamaki et al.: Lessons Learned from Collaborative Prototype Development Between University and Enterprises 283. system by reusing the system definitions of previous prototypes. Light documentation has been found to speed up stage completion, but may cause problems later if the system is put into production. 3. The third step is the software/hardware prototype development made by the research group including the project manager and SW/HW developers. The IDs’ representatives are involved in the development process in the role of instructor. In this step, the SW/HW framework is used as the guideline for selecting the components for the prototype. The SW/HW framework gives guidelines and speeds up development when the operating process of suitable components has at least partial y been thought through in advance. Reuse of components also makes it easier when the number of background studies decreases. 4. The fourth step introduces the working prototype artifact, which consists of the developed software and hardware components. Also, the interconnections of the components are tested. The testing process overall is usually only the functional testing of the prototype system. Additional y, the gathered data is inspected and if possible, compared to the expected results. Another notable issue is the fact that, if the system is later put into production, testing must be carried out more thoroughly. 5. The fifth step includes preparing the outcome of the development process. Further, this step includes presenting the prototype and its functionality to the ID. The SW/HW framework can be complemented if necessary. 6. The sixth step is to publish the results, for example, the prototype system, col ected data, and analysis of the project. For example, in a university environment, the publication of results is important for supporting future research projects. The process model in Figure 3 is a simplified presentation of the prototype development process. It gives abstract instructions for the operation with defined steps to implement the pilot case from start to finish. If all of the steps are performed, the level of the outcome is predictable. The model is sufficient for developing a prototype, and also makes it possible to add more activities if needed. For example, procedures such as iterations, testing, and customer testing could be included in the process. Further, because the model is developed from university pilot cases, it combines two factors: software/hardware prototype development and 284 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 collaboration with customers. Both of these are discussed in the following section when the usability of the DMPP in the KIEMI project is evaluated. 4 DMPP utilization in the KIEMI project and technology transfer The purpose of this section is to describe how the DMPP model was utilized in the work process of the KIEMI project. This section also describes how different parties were involved in the project, what kind of collaboration actions were taken during the DMPP steps, and which technology transfer actions occurred during the work process. Figure 4 presents an overal picture of the project, collaboration, and DMPP process in the form of the Business Process Model and Notation (BPMN, [16]). 4.1 Project partners In the overall picture (in Figure 4) four groups can be recognized in their own swimlane: 1. EU OP and its program documents and goals (via OP documents and goals) must be taken into account for project content and implementation. 2. University within its third mission (TM) and its strategy (via University Strategy) which gives guidelines for research group activities and publishing of project work. 3. Project (like KIEMI) activities are carried by project team members (academic researchers, AR) and activities can be divided into three sub categories: a) Project management (Management) is responsible for implementation of the project plan (Project Plan) and reporting project results to the funding representatives of EU OP (OP supervision) as wel keeping track of research publications for university representatives (Research supervision). Project management also acts as the selector of new prototypes in the form of col aboration and pilot case actions. b) DMPP process (DMPP) and its six steps (1-6), which are linked to each other and to collaborative actions with IDs via prototype and pilot case actions. c) Collaboration and Piloting (Collaboration/Piloting) which contains actions and paths supporting DMPP process steps. J. Harjamaki et al.: Lessons Learned from Collaborative Prototype Development Between University and Enterprises 285. 4. Collaborative Organization(s) are representatives of collaborating IDs and with whom the content of prototypes and usage via pilot cases is co-created and codeveloped. Technology transfer (and technology creation) takes place between AR and ID via project work and the work process used in it. 4.1.1 The work process In Figure 4 the work process of project work can be divided into the following actions (one to eight): 1. The project starts when the project administration (Management) is organized. The project administration defines/selects an appropriate pilot case (Select New Pilot Case), the resources and actions required for the content, and launches the pilot case (Start Pilot Case). 2. From the point of view of the project, a single collaborative pilot case starts (in Collaboration/Piloting) with the invitation of the collaborator (Collaboration Call) and the agreement on cooperation (Collaboration Ignition). For pilot cases #17, #18, #19, and #23, invitations to collaboration IDs were sent via a 3rd party. 3. The first phase of the DMPP process (Discuss Requirements) starts when the project has established contact with the collaborator (ID) and the actual discussion of requirements and objectives begins (Requirement Discussion). For pilot cases #17, #18, #19, and #23, we also received positive responses to collaborate. The project utilizes the discussion base created in previous discussions (Achieved Prototype Pilot Requirement Notes) as a basis for a new discussion. ID brings their views (needs and support and available partners or technical vendors (TV)) to the discussion. For example, needs can be related to certain sensors or measurements and support can be related to the facilities where measurements are made. This starts technology transfer actions between AR and ID/TV. The discussion wil result in a decision to continue cooperation and (in a positive decision) the content of the next phase of the DMPP process, namely the requirement notes (Prototype Pilot Requirement Notes). 286 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 As the discussion produces a positive decision (OK To Initiate Prototype Pilot?), a pilot case (Prototype Pilot Ignition) and the third phase of the DMPP process (Develop Software) will begin (Start Prototype develop). On the ID side, the corresponding decision (OK To Initiate Prototype Pilot?) to proceed initiates support for prototype development and supports prototype piloting activities. In the event of a discussion producing a negative decision (or cooperation ending without successful agreement), the pilot case is reported to the administration as interrupted (Pilot Case Aborted), which then processes the interruption result. For pilot cases #98 and #99, collaboration was ended in the first phase of the DMPP process (Discussion). 4. In the third phase of the DMPP process (Develop Software), the prototype artifacts (software and hardware) needed in the pilot case are developed. The development of the prototype (Develop Prototype (SW/HW)) is guided by the requirements recorded in the previous phase (Prototype Pilot Requirement Notes in Requirement Notes) and utilizes any artifacts (Development Artifacts) that may have been generated in previous cases. Prototype development involves discussions and exchanges of information (Technical Discussion) with the ID and TV brought into the pilot case. New and advanced artifacts resulting from the prototype development phase are introduced to artifact management (Manage Artifacts in Development Artifacts), representing the fourth stage of the DMPP process. Pilot case #11 was an example of a case where both technology creation and technology transfer occurred between AR and ID. 5. The completion of the prototype development phase (Prototype Develop Ready) initiates the prototype pilot case execution phase (Execute Prototype Pilot in Collaboration/Piloting), where pilot case data and results are col ected from the use of the prototype at the pilot case site (received from ID). The data col ected in the prototype pilot case is included/added to the Development Artifacts (via Manage Artifacts) generated in the third step (Development Software). The piloting of a single prototype could take several weeks. For pilot case #19, data was col ected for a period of several months and data col ection was monitored online. On the other hand, pilot case #13 contained data for a period of over one year and data was col ected afterwards from ID’s J. Harjamaki et al.: Lessons Learned from Collaborative Prototype Development Between University and Enterprises 287. Figure 4: Technology transfer in the Kiemi project. Source: The figure is available in [17]. 288 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 database. The latter case also contained technology transfer between AR and ID to tune up ID’s interface about database metadata information. 6. At the end of the prototype pilot case (Start Prototype Presentation), the penultimate stage of the DMPP process, the preparation phase for the presentation of the results is initiated. In this phase (Prepare Presentation in Prepare & Conduct Presentation), the artifacts generated during the prototype pilot case are compiled (via Manage Artifacts in Development Artifacts) into presentation materials for the final stage of the DMPP process (via Manage publications in Presentation Slides) and the presentation of the materials to ID (Conduct Presentation in Prepare & Conduct Presentation). In the preparatory phase, previous presentation materials (Archieved Slides via Manage Slides) can be utilized. The presentation schedule is discussed with ID (Cal For Presentation) who gathers their team and TV for the meeting (Receive Presentation in Collaborative Organizations(s)). The presentation ends steps five and six of the DMPP process for collaboration tasks. (Prototype Presentation Ready). Pilot cases #17, #18, #19, and #23 were examples of technology transfer via a presentation and delivered report documents. Case #23 also included a representative from ID’s TV side. 7. There is usually a feedback discussion (Ask Feedback/Give Feedback in Collaboration/Piloting) following the presentation (Prototype Presentation Ready) on the results obtained from the use of the prototype and the implementation of its piloting, as wel as on the success of the collaboration. Feedback processing concludes the collaborative pilot case (Pilot Case Ready) and technology transfer actions between AR and ID/TV. Pilot case #10 contained a feedback discussion where ID felt that the collaboration was very successful and they requested another pilot case (#16 in the list) after the issue for the target facility had been solved thanks to the first pilot case. 8. At the end of the pilot case (Pilot Case Ready), the information is sent to the administration (Pilot Reporting), which records the project indicators and progress (via Project Indicators) for reporting to the EU OP financier (OP Supervision) on the pilot case. The administration is also responsible for sharing the research results (Research Reporting) through communication channels (via Project Publications) and to the university (Research J. Harjamaki et al.: Lessons Learned from Collaborative Prototype Development Between University and Enterprises 289. supervision via Research Publications). Actions for communication tasks are also reported to the EU OP financier (OP Supervision). Artifacts and publication slides generated in the DPMM process may be published or distributed in connection with the news blog. Pilot cases #17, #18, and #19 were examples of (one way) technology transfer via news blogs for any other ID or individual interested in the topic. When a single collaborative pilot case has ended, management decides on the need for another pilot case (Is Project Completed?). Once the required number of prototypes and their piloting work have been completed (or project time is coming to an end, it leads to the final tasks and the end of the project. 5 Pilot cases in KIEMI The purpose of this section is to present the background or characteristics related to the pilot cases (comparison table) as wel as to compare the activity levels of collaboration associated with the pilot cases. Table 1 contains pilot case specific reference parameters. Pilot cases are numbered with a running identification number according to their starting time (see pilot case timeline in Figure 2). Comparative data has been compiled for each pilot case using six parameters. The User Group parameter describes the classification of the piloting target. Options include company (A), public operator (B), entity (C), and others (D). The Stakeholders parameter describes the classification of parties who joined the piloting target. Alternatives include subscriber (E), users (F), technical vendor (G), and developer (H). Several parties may have been involved in the piloting. The DMPP usage parameter describes the number of steps in the DMPP process utilized at the piloting site. Each pilot case may have utilized one or more, or even al of the steps. The OTS used parameter contains information on whether off-the-shelf components were used in the pilot case. The Publish content parameter includes information on whether the results of the pilot case were released in a transparently available format through a research publication (X) or project news blog (Y) or both. Some pilot case results were only handled internal y. The Collaboration activity level parameter describes the collaboration activity of ID 290 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 during the work process (in Fig. 4). For a couple of pilots some information was not yet available during the writing of this paper and that information is marked with (*). Table 1: Properties of pilot cases in the KIEMI project 5.1 Pilot cases with high-level col aboration In high-level collaboration, the counterpart (ID) demonstrates active cooperation at al stages of the work process. ID brings to the discussion stage a view of the features required for the prototype and its operating environment. ID also demonstrates its interest in the technical content of the prototype resulting from the development phase and is involved in the processing of observations made during the pilot case phase. In high-level cooperation, ID shows interest in the content of the results (report) and highlights their views on the exploitation of the results. It is clear that ID benefits from high-level collaboration in many ways. J. Harjamaki et al.: Lessons Learned from Collaborative Prototype Development Between University and Enterprises 291. Pilot case #10 is a good example of high-level collaboration. The target was a daycare center, which had received feedback about poor air quality inside the building. The first target was to measure the temperature, humidity, and CO2 values at different times and report the readings to the partner. The first results showed that at certain moments the temperature and CO2 values had risen. During the early phase meeting where the results were shown, we decided with the partner(ID) to continue and expand the pilot case. Expansion meant contacting the air conditioning equipment supplier(TV). This gave us an interface with the air conditioning system. In addition, they expanded the sensor number and type to col ect data that was more specifically environmental. Our project team also used the previously developed visualization tool to this pilot case. Outcome: This was the widest pilot case with several partners(TV and ID), using previously used and developed components. 5.2 Pilot cases with mid-level collaboration In mid-level collaboration, the counterpart (ID) is involved at the beginning and end of the work process and in some way also involved in the development content of the work process. ID support may be required, particularly in situations where part of the prototype content is sourced from an ID-managed data source. In general, ID benefits from mid-level collaboration, at least from the perspective of external testing obtained for its own functions. Pilot case #13 can be used as an example of mid-level collaboration. In this case ID had a vast amount of facilities at their disposal and they had already implemented a data sensor system and were using data analysis tools via their TV. For the pilot case, ID al owed AR to use their data (col ected by ID’s TV) for AR’s tools to produce another kind of analysis from the data. ID did not participate in the actual SW development, but the use of data via ID’s API during piloting required technical discussions. The benefit for ID from the piloting case was related to experience gained about their API and the knowledge received via the pilot case report. 292 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 5.3 Pilot cases with low-level collaboration In low-level collaboration, the counterpart (ID) is involved in the work at the beginning (Discuss Requirements) and end of the process (Presentation Slides). In these cases, the project team has most often conducted a search for actors interested in collaboration and provided the test target, giving the ID the opportunity to obtain new information about its application through the report. Thus, AR also provides technology transfer to ID. For a project, low-level collaboration can also be beneficial. Piloting over a longer time period does not necessarily burden the project staff and the results obtained from the pilot case can be very useful for demonstrating the functionality of the prototype. Low-level collaboration is also no obstacle to publicizing the results of the project - on the contrary, for example pilot cases #17, #18, and #19 (entities as user groups) and the disclosures generated from their results have contributed to the local visibility and reputation of the project. The presentation materials have also been utilized to obtain new, higher-level collaborative cases. 5.4 Failed pilot cases In addition to the above levels of collaboration, it is also useful to point out exceptions where piloting collaboration ended or was interrupted. In the work process, piloting can usually be interrupted only in its initial stages. The reason may be ID’s reluctance (or resource shortage) to initiate collaboration. ID is not interested even in free piloting if it does not promise immediate benefit; in practice, however, that requires some involvement. Piloting may involve TV on ID’s part, which is necessary but TV is reluctant (similar to ID’s own reluctance). Another reason may be that something comes up during the discussion stage (Discuss Requirements) that makes it impossible to continue or not meaningful to continue the piloting. Even after progressing to the technical stage of the DMPP process (Develop Software), a situation may arise where a developed prototype is found to be unworkable. From the point of view of collaboration, the work process is J. Harjamaki et al.: Lessons Learned from Collaborative Prototype Development Between University and Enterprises 293. interrupted, although from the point of view of research, a non-working prototype is also part of the results of the research. If the idea works, the hardware can be replaced with more suitable hardware in the next iteration round. Pilot cases #98 and #99 are examples of cases where collaboration was interrupted. In case #99, ID was interested in collaboration, but access to required data was managed via ID’s TV’s API and TV had little or no interest in collaboration. For case #98, ID was also interested in collaboration. During the discussion stage AR noticed that it would be too difficult to produce data in such a form that would work for ID’s needs. In both cases proceedings (in discussion stage) were paused and final y project management decided to shelve the piloting case. It is worth mentioning that in the work process there were also some cases where project management was asked to help to communicate with ID to make sure that the collaboration would continue. Interruptions in collaboration cause serious harm to the work process. For example, due to material limitations, when the test equipment is reserved at one site, the next piloting target cannot be handled. 6 Usability and evaluation of DMPP in the KIEMI project The DMPP was developed for the production of prototypes at the university. The goal has always been to produce scientific results from the prototypes. The research group is from non-commercial institutions and therefore the focus is not on achieving financial goals. This subsection clarifies the advantages of different phases of the DMPP. The KIEMI project used the DMPP model to create prototypes together with collaborative partners. This project and its approach to the subject through prototyping demonstrated the functionality of the DMPP model, especial y in prototyping projects like this one. The suitability of the different phases of the DMPP model can be assessed through the KIEMI project pilot cases as fol ows: Discuss requirements: Most pilot case projects involve an external partner(ID) when discussing objectives. The level of collaboration varies a lot. In low-level collaboration e.g., in pilot cases #19 and #22, the partner provided the premises to perform the measurements. The partner does not make any special requests. The output for the partner is a report which may lead to further actions. If the collaboration is closer, as when the partner takes part in further discussions, the 294 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 starting point is also directed more by the partner. In these cases, the partner mostly has some issue which should be researched, e.g., they have been notified of poor indoor air quality (pilot case #10). Usually in these cases, the original task assignment expands during the pilot case and more partners join in. The DMPP is suitable for this kind of activity because the non-commercial leader – the university research team – is focused on research goals rather than financial goals. Further, the additional research/technical goals set by partners are shown to be applicable to the operation of the model within the iteration rounds. The best example of this kind of activity is pilot case #10 where the university research team led the pilot case and col ected the necessary partners (e.g., ventilation technology supplier and building caretaker). Requirements notes are an important part of documentation and their main purpose is to guide the pilot case in the selected direction. The usage of the DMPP shows the advantage of ”light documentation” for getting things started; the usage of previously defined architecture models and device configurations also speeds up the operation. The term ”light documentation” also means the reuse of the technological choices and definitions made in earlier pilot cases. The exception is pilot case #23, where the final report included a section on desired goals. Internal requirements are also mentioned in several cases, e.g., the research group wants to change or update some specific feature. The ”light documentation” idea is based on the ”Some Things Are Better Done than Described” [18]. Light documentation and process modeling is focused on the university and other research institution environments where the aim was prototyping rather than the development of commercial products. Of course, this leads to a larger amount of work if technology transfer to some partner starts from the prototype. The Develop software phase uses the artifacts of previous requirements as a loose guideline. For example, UI [19] and backend [20] software developed in pilot case #09 were used in al subsequent pilot cases (excluding #11). In the DMPP, changes to the requirements are possible if it is seen to be of some benefit. Further, the requirement changes were not normal y discussed with partners unless something was needed from them. The DMPP does not set requirements for the software or hardware components used, but we noticed that the usage of off-the-shelf components accelerated prototype development. The second advantage of these J. Harjamaki et al.: Lessons Learned from Collaborative Prototype Development Between University and Enterprises 295. kinds of components is the ability to vary the prototype solutions when we have to conform to the requirements of the selected components. Development artifacts are typically fully working prototype systems which are also the main goals of this phase for the DMPP. In the KIEMI project, this phase usually involved instal ing the prototype to collect data at a target provided by the partner. Most of the prototypes were working SW/HW prototypes, but there were also only SW prototypes for analyzing and visualizing the customer’s collected data (#12 and #13). The main purpose of the DMPP is to produce a working prototype and therefore only the main functions of the prototype are utilized. Additionally, the documentation or testing could be done only partially. This kind of approach speeds up the development but could slow down the technological transfer later on. The Prepare & conduct presentation phase is for reporting the results. In longer projects we noticed that the document reuse of skeleton reports accelerated this phase. In pilot cases #20 and #23 of the final phase of the KIEMI project we col ected a skeleton report from pilot case #19. This automation sped up the reporting phase. This shows that when using the DMPP model, reporting wil mostly include the same components. Presentation and publishing of the results are the last phase in the DMPP. In successful pilot cases the partners are usually interested in further developing the prototype and the technology transfer wil continue from this point. One significant advantage of the DMPP is the ultimate purpose of publishing the scientific material (pilot cases #03, #09, #10, #11, #15, and #16 have been published) and other public material from the pilot cases. Overall analysis and DMPP’s suitability for projects were shown in the KIEMI project. Two approaches were used in the project: the software development style and collaboration style. The DMPP is able to connect both styles. The project was shown to be successful for university-enterprise (AR-ID) collaboration in the context of prototype development. Further, based on the results in creating usable prototypes, the model can be seen as a success. 296 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 7 Conclusions RQ1: Collaboration. How was university-enterprise collaboration executed in practice using the DMPP? The DMPP process was part of a project (Fig. 4) where the content was guided by the objectives set for the project (Management) and an individual prototype was made through collaboration (Collaboration/Piloting). The DMPP process was in the background (invisible to ID), but it was able to provide support for collaboration (AR-ID) through all of its six phases. The ability of the DMPP process to support technology transfer was highlighted in phases 1, 3, 4, and 5. For Step 2 (Requirement Notes), the content was usual y only left up to the project team (AR). Regarding companies (ID and their TV), it is unknown whether they had one of their own similar methods in place. At the very least, communication (emails) enabled ID (and their TV) to receive and store requirement-related data. As far as Step 6 is concerned, ID received a report on the content and results of most pilot cases. For pilot cases where content was distributed through open channels (such as Project news blogs and Github in Presentation slides), ID (and TV) had the opportunity to catch up, not only with their own content, but also the content of other pilot cases. The collaboration also demonstrated that university and corporate representatives have a very different view of technology, and therefore of pilot cases as a whole. Especially in small companies, the desire and ability to recognize the value and benefits contained in the prototype is often low, and the university needs to convince the collaborator of the benefits of a prototype that requires effort on their part. In a longer-term project, it should be considered whether each prototype is intended for actual technology transfer or whether that stage will only come when satisfactory prototypes have been achieved. In practice, the project requires that pilot cases at the beginning of the project are conducted mainly with organizations offering test environments and only at the end does the content begin to involve technology transfer. J. Harjamaki et al.: Lessons Learned from Collaborative Prototype Development Between University and Enterprises 297. There was no investment in cost calculations or business models in the design of university prototypes and this may have contributed to the amount of interest shown by companies. To improve collaboration it is good to add a point where the company provides a (suitable general level) assessment of the prototype as wel as the associated return on investment (ROI). With the feedback received, the research team would accumulate expertise in designing the next prototype and opportunities to produce a result that is of more interest to the company. The ability to produce prototypes valued by companies is a significant strength and advantage for a university operator that organizes projects. It is also an advantage for future project partner searches. RQ2: Reusability. How did the reusability of the artifacts in the DMPP steps support the workflow of the pilot case? The use of the DMPP model led to the reuse of artifacts when the mode of operation remained the same even though the pilot cases changed. In the prototypes, we mainly used the same software and hardware components that had been used before. Further, we also always tried to introduce some new components, because this increased knowledge and expanded component-based variation. The DMPP uses light documentation to speed up prototype development, but we noticed that separate phases in different pilot cases started to contain the same type of documents. Therefore, the conclusion is that the DMPP leads to re-use of skeleton documents in different pilot cases. The findings of the research presented above represent the context of a Finnish university and it would require more research to obtain universally applicable results. However, these observations and findings provide the basis for the possibility to extend the research to an external comparison between universities in different countries. 8 Summary This article focused on the KIEMI research project conducted at the Pori Unit of Tampere University during 2019-2022. The project used the earlier developed Descriptive Model of Prototyping Process (DMPP) to guide university-enterprise collaboration. The project consisted of several pilot cases and prototypes, which were made in collaboration with companies, and offered real-world problems. This article reviewed and evaluated the suitability of the DMPP for this topic. The article dealt 298 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 with the collaboration between university and enterprises, and reusability within the DMPP. The paper presented several pilot cases made in KIEMI, and described the usage of the DMPP. Final y, the paper evaluated the model, presented some of the chal enges faced, and discussed future research topics. Acknowledgements This work is part of the KIEMI project and was funded by the European Regional Development Fund and the Regional Council of Satakunta. References [1] Saari M, Silberg P, Gro¨nman J, Kuusisto M, Rantanen P, Jaakkola H, et al. Reducing Energy Consumption with IoT Prototyping. Acta Polytechnica Hungarica. 2019;16(9, SI):73-91. [2] Saari M, Rantanen P, Hyrynsalmi S, Ha¨stbacka D. In: Sgurev V, Jotsov V, Kacprzyk J, editors. Framework and Development Process for IoT Data Gathering. Springer International Publishing; 2022. p. 41-60. Available from: https://doi.org/10.1007/978-3-030-78124-8_3. [3] Saari M, Soini J, Gro¨nman J, Rantanen P, Ma¨kinen T, Sil berg P. Modeling the software prototyping process in a research context. In: Tropmann-Frick M, Thalheim B, Jaakkola H, Kiyoki Y, Yoshida N, editors. Information Modelling and Knowledge Bases XXXII. vol. 333. IOS Press; 2020. p. 107-18. [4] Vorley T, Nel es J. Building Entrepreneurial Architectures: A Conceptual Interpretation of the Third Mission. Policy Futures in Education. 2009 6;7:284-96. Available from: http://journals.sagepub. com/doi/10.2304/pfie.2009.7.3.284. [5] Zomer A, Benneworth P. The Rise of the University’s Third Mission. Reform of Higher Education in Europe. 2011:81-101. Available from: http://link.springer.com/10.1007/ 978-94-6091-555-0_6. [6] Basili V, Briand L, Bianculli D, Nejati S, Pastore F, Sabetzadeh M. Software Engineering Research and Industry: A Symbiotic Relationship to Foster Impact. IEEE Software. 2018 9;35:44-9. Available from: https://ieeexplore.ieee.org/document/8409904/. [7] Salomaa M, Charles D. The university third mission and the European Structural Funds in peripheral regions: Insights from Finland. Science and Public Policy. 2021 jul;48(3):352-63. Available from: https://academic.oup.com/spp/article/48/3/352/6126876. [8] Punter T, Krikhaar RL, Bril RJ. Software engineering technology innovation–Turning research results into industrial success. Journal of Systems and Software. 2009;82(6):993-1003. [9] Arza V, Carattoli M. Personal ties in university-industry linkages: a case-study from Argentina. The Journal of Technology Transfer. 2017 8;42:814-40. Available from: http://link.springer.com/ 10.1007/s10961-016-9544-x. [10] Dusica M, Arnaud G. Industry-Academia research collaboration in software engineering: The Certus model. Information and Software Technology. 2021 4;132:106473. Available from: https://linkinghub.elsevier.com/retrieve/pii/S0950584920302184. [11] Becker-Kornstaedt U, Webby R. A comprehensive schema Integrating Software Proces Modeling and Software Measurement. IESE-Report No 04799/E. 1999. [12] Grönman J, Rantanen P, Saari M, Sil berg P, Vihervaara J. Low-cost ultrasound measurement system for accurate detection of container utilization rate. In: 2018 41th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). IEEE; 2018. . [13] Soini J, Sillberg P, Rantanen P. Prototype System for Improving Manually Collected Data Quality. In: Budimac Z, Galinac Grbac T, editors. Proceedings of the 3rd Workshop on Software Quality Analysis, Monitoring, Improvement, and Applications, SQAMIA 2014, September 19-22, J. Harjamaki et al.: Lessons Learned from Collaborative Prototype Development Between University and Enterprises 299. 2014, Lovran, Croatia. Ceur workshop proceedings. M. Jeusfeld c/o Redaktion Sun SITE; 2014. p. 99-106. [14] Soini J, Kuusisto M, Rantanen P, Saari M, Sil berg P. A Study on an Evolution of a Data Col ection System for Knowledge Representation. In: Dahanayake A, Huiskonen J, Kiyoki Y, editors. Information Modelling and Knowledge Bases XXXI. vol. 321. IOS Press; 2019. p. 161-74. [15] Grönman J, Sil berg P, Rantanen P, Saari M. People Counting in a Public Event—Use Case: Free-to-Ride Bus. In: 2019 42th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). IEEE; 2019. [16] Object Management Group, Inc. Business Process Model and Notation; 2023. Accessed January 13, 2023. Available from: https://www.omg.org/spec/BPMN/2.0.2/About-BPMN. [17] Janne Harjama¨ki. Technology transfer in the Kiemi project; 2023. Accessed January 23, 2023. Available from: https://cawemo.com/share/bb6b8086-13b7-4ab9-bb86-92cdaf9a5d18. [18] Hunt A, Thomas D. The Pragmatic Programmer. Addison-Wesley; 2000. [19] Nurminen M, Lindstedt A, Saari M, Rantanen P. The Requirements and Chal enges of Visualizing Building Data. In: 2021 44th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). IEEE; 2021. [20] Nurminen M, Saari M, Rantanen P. DataSites: a simple solution for providing building data to client devices. In: 2021 44th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). IEEE; 2021. 300 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 A PRIMITIVE ACTION-DRIVEN RECOGNITION METHOD FOR THE REALIZATION OF GLOBAL HETEROGENEOUS SIGN LANGUAGE RECOGNITION TAKAFUMI NAKANISHI,1, 2 AYAKO MINEMATSU,2 RYOTARO OKADA,1, 2 OSAMU HASEGAWA,1, 2 VIRACH SORNLERTLAMVANICH1, 2 1 Musashino University, Department of Data Science, Tokyo, Japan takafumi.nakanishi@ds.musashino-u.ac.jp, ryotaro.okada@ds.musashino-u.ac.jp, osamu@ds.musashino-u.ac.jp, virach@musashino-u.ac.jp. 2 Musashino University, Asia AI Institute, Tokyo, Japan takafumi.nakanishi@ds.musashino-u.ac.jp, ayako.minematsu@ds.musashino-u.ac.jp, ryotaro.okada@ds.musashino-u.ac.jp, osamu@ds.musashino-u.ac.jp We represent a primitive action-driven recognition method for realizing global heterogeneous sign language recognition. We should realize a method to recognize sign languages from various linguistic areas as easily as possible for global communication. However, most of the current sign language recognition methods realize specific sign language recognition for individual linguistic regions, and when we realize sign language recognition among multilingual regions, we should implement it in an ad hoc manner. To develop multilingual sign language recognition, it is necessary to realize a new method to handle various sign systems in a unified manner. This method defines common primitive Keywords: actions of various sign language systems worldwide and describes sign language what the combination of these primitive actions indicates in recognition, various sign language systems to realize sign language primitive actions, global recognition. This method consists of multiple primitive action communication recognition modules and a primitive action composition module. platform, Each primitive action recognition module recognizes each global primitive action common to all sign languages. The primitive heterogeneous sign action composition module determines the actual sign meaning language, from the combination of recognition results from multiple action-driven primitive action recognition modules. recognition DOI https://doi.org/10.18690/um.feri.5.2023.14 ISBN 978-961-286-745-4 302 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 1 Introduction Communicating with diverse people from different linguistic backgrounds is becoming increasingly important. Sign language, mainly used by people who are deaf or hard of hearing for everyday communication, has established itself as its own language system. For al people to communicate natural y and easily with each other, it is important not only to translate between languages but also to recognize and work seamlessly with other methods, such as sign language. It is important to realize a global communication platform that helps al diverse people communicate. So far, we have been studying sign language recognition methods [1][2][3][4]. This research focuses on methods to achieve sign language recognition with smal training data. We have found it difficult to collect enough sign language video data. Therefore, applying machine learning methods to sign language recognition is general y difficult. In our research [1][2][3][4], recognition is realized by extracting time-series skeletal features from training data in advance, extracting time-series skeletal features from input videos, and computing similarity weighing. These methods [1][2][3][4] are realized in Japanese Sign Language and can be applied to other sign languages. However, the cost is too high to apply to many sign languages quickly. According to the reference [5], there are more than 400 different sign languages worldwide, depending on the country, region, etc. For al people to communicate natural y and easily, it is necessary to recognize and compose capabilities for these 400+ sign languages that must be realized and seamlessly coordinated. We need to create a system that facilitates the application of recognition and composition to these various sign languages. Most current sign language recognition methods realize specific sign language recognition for individual linguistic regions. When we realize sign language recognition among multilingual regions, we should implement it ad hoc manner. To develop multilingual sign language recognition, it is necessary to realize a new method to handle various sign systems in a unified manner. We represent a primitive action-driven recognition method for realizing global heterogeneous sign language recognition. This method defines common primitive actions of various sign language systems worldwide. It describes what the combination of these primitive actions indicates in various sign language systems to T. Nakanishi: A Primitive Action-driven Recognition Method for the Realization of Global Heterogeneous Sign Language Recognition 303. realize sign language recognition. This method consists of multiple primitive action recognition modules and a primitive action composition module. Each primitive action recognition module recognizes each primitive action common to al sign languages. The primitive action composition module determines the actual sign meaning from the combination of recognition results from multiple primitive action recognition modules. When introducing a new sign language system, this method can be implemented simply by adding the combinations of primitive actions and their meanings to the knowledge base in the primitive action composition module. In other words, by realizing this method, it wil be possible to integrate more than 400 sign language systems without implementing ad hoc recognition and synthesis systems for each. This method wil realize a new global communication platform that avoids the communication divide and al ows people to communicate freely in the current situation, where people communicate in many ways. This paper uses HamNoSys (The Hamburg Sign Language Notation System) [6], a transcription system common to all signs, to realize multiple primitive action recognition modules. The HamNoSys is a transcription system for all sign languages with a direct correspondence between symbols and gesture aspects, such as hand location, shape, and movement. We can realize each primitive action recognition function according to each handshape chart in HamNoSys. This paper makes the following contributions to the broader research field. − We propose a new method—a primitive action-driven recognition method to realize global heterogeneous sign language recognition. − To realize our method, we apply the HamNoSys [5] to multiple primitive action recognition modules. This paper is organized as follows. In section 2, we present some related works of our method. Section 3 provides an overview of the existing study, HamNoSys [5]. Section 4 represents our primitive action-driven recognition method to realize global heterogeneous sign language recognition. In section 5, we describe some results of preliminary experiments. Finally, in section 6, we summarize this paper. 304 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 2 Related Works Our previous works [1][2][3][4] presents sign language recognition methods. In these methods, recognition is realized by extracting time-series skeletal features from training data in advance, extracting time-series skeletal features from input videos, and computing similarity weighing. We have found it difficult to collect enough sign language video data. Therefore, applying machine learning methods to sign language recognition is general y difficult. Our previous paper [1][2][3][4] also described some related works for the realization of sign language. The reference [7] surveys machine learning methods applied in sign language recognition systems. This reference [7] says that sign language involves the usage of the upper part of the body, such as hand gestures [8], facial expression [9], lip-reading [10], head nodding, and body postures to disseminate information [11] [12] [13]. We classify hand gestures and lip reading as verbal behavior. We classify head nodding and body postures to disseminate information as emotional behavior. We classify facial expressions as both verbal and emotional behavior. According to reference [3], sign language recognition methods can be divided into two categories: continuous recognition of multiple sign words and discontinuous recognition. To realize continuous recognition, there are some works such as the method of hidden Markov model (HMM) and dynamic time warping (DTW) [14] or the methods using Random Forest, artificial neural network (ANN), and support vector machine (SVM) [15]. To realize non-continuous recognition, there are some works, such as the method of k-nearest neighbor (k-NN) [16], SVM [17], and sparse Bayesian classification of feature vectors generated from motion gradient orientation images extracted from input videos [18]. To realize sign language recognition for non-continuous and non-time-series data, there are some works such as the method of k-NN [19], similarity calculation using Euclidean distance [20], cosine similarity [19][21], ANN [22], SVM [23], and convolutional neural network (CNN) [24]. The reference [25] provides a research survey on recognizing emotions from body gestures. Their works solve the problem of some of these sign language recognition functions. T. Nakanishi: A Primitive Action-driven Recognition Method for the Realization of Global Heterogeneous Sign Language Recognition 305. However, preparing enough training data for sign language recognition is necessary to realize these methods. Adequately preparing sign language videos and their labeled training data is often impossible. In addition, according to the reference [5], there are more than 400 different sign languages worldwide; depending on the country, region, etc., it is not realistic to implement a recognition system for more than 400 different sign languages in an ad hoc manner. We must realize a recognition platform that could easily and uniformly apply each sign language. The concept of "primitive" is proposed by Kiyoki et al. [25] in the metadatabase system architecture. The metadatabase system connects several legacy databases. For connecting several legacy databases through the metadatabase system, each legacy database has some primitive functions. Applying the reference [25], this method has the recognition function of each basic hand movement as each primitive action recognition module. It derives the meaning of sign language from the primitive action composition module that integrates them. Our method can be implemented simply by adding the combinations of primitive actions and their meanings to the knowledge base in the primitive action composition module for realizing the other sign language recognition system. In other words, by realizing this method, it wil be possible to integrate more than 400 sign language systems without implementing ad hoc recognition and synthesis systems for each. This method wil realize a new global communication platform that avoids the communication divide and al ows people to communicate freely in the current situation, where people communicate in many ways. 3 HamNoSys (The Hamburg Sign Language Notation System) This paper uses HamNoSys (The Hamburg Sign Language Notation System) [6], a transcription system common to all signs, to realize multiple primitive action recognition modules. The HamNoSys is a transcription system for al sign languages with a direct correspondence between symbols and gesture aspects, such as hand location, shape, and movement. We can realize each primitive action recognition function according to each handshape chart [27] in HamNoSys. 306 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Figure 1 shows the HamNoSys handshape chart. The description of each handshape in Figure 1 comprises symbols for the basic forms and diacritics for thumb position and bending. By this approach, the handshape descriptions should include al handshapes used in sign language worldwide. HamNoSys can be applied internationally because it does not refer to nationally diversified finger figures. Figure 1: HamNoSys Handshape Chart Source: [27]. We construct each primitive action recognition module that recognizes the shape of each finger appearing in Figure 1. All finger-expressed signs are combinations of finger shapes in Figure 1. We can potentially recognize all the world's sign languages by building a system that can recognize al these finger shapes. We only need to prepare as a knowledge base which combinations of finger shapes indicate what meaning. The primitive action composition module is the function that derives meaning using this knowledge base. In other words, our method can realize the recognition of various signs without training data from sign language videos simply by appending the knowledge base referenced by the primitive action composition module. This eliminates the need to collect sufficient video sign language media T. Nakanishi: A Primitive Action-driven Recognition Method for the Realization of Global Heterogeneous Sign Language Recognition 307. content necessary to achieve sign language recognition. We can easily realize a unified global sign language recognition system. This method will realize a new global communication platform that avoids the communication divide and al ows people to communicate freely in the current situation, where people communicate in many ways. 4 Primitive Action-driven Recognition Method This section presents our new method— a primitive action-driven recognition method. This sign language recognition method can easily and uniformly apply to various sign language systems. This method wil realize a new global communication platform that avoids the communication divide and al ows people to communicate freely in the current situation, where people communicate in many ways. 4.1 Overview The primitive action-driven recognition method consists of preprocessing (timeseries skeletal feature extraction), multiple primitive action recognition modules, and a primitive action composition module, as shown in Figure 2. The preprocessing extracts time-series skeletal feature data from the input sign language video data. The time-series skeletal feature data recognizes basic hand movements in multiple primitive action recognition modules. The multiple primitive action recognition modules recognize basic hand movements. We construct each primitive action recognition module that recognizes the shape of each finger in HamNoSys [6][27], as shown in Figure 1. This module recognizes the basic action of each hand from the time-series skeletal features and converts it into symbols specified by HamNoSys cal ed sign language spel ing. The primitive action composition module determines the actual sign meaning from the combination of recognition results from multiple primitive action recognition modules by using a sign language spel ing knowledge base. When introducing a new sign language system, this method can be implemented simply by adding the combinations of primitive actions and their meanings to the sign language spel ing knowledge base in the primitive action composition module. 308 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Figure 2: An overview of a primitive action-driven recognition method. This method consists of preprocessing (time-series skeletal feature extraction), multiple primitive action recognition modules, and a primitive action composition module. The multiple primitive action recognition modules recognize basic hand movements. The primitive action composition module integrates the recognition results of the multiple primitive action recognition modules according to the description in the sign language spel ing knowledge base to infer the meaning of the input sign language. Source: own. The sign language spel ing knowledge base consists of a sequence of symbols cal ed sign language spel ings defined by HamNoSys and a word. When users can use HamNoSys as a reference for sign spel ing and the hand shapes that make up the sign language, they can easily add new words to this knowledge base. This is the most important feature of this method. Most previous sign language recognition methods required enough labeled video content representing sign language as training data. However, our works [1][2][3][4] have shown that it is difficult to collect enough sign language video content to apply existing machine learning methods. Furthermore, according to the reference [5], to apply various sign languages worldwide, it is necessary to realize as many as 400 different sign languages into an integrated system. Furthermore, according to the reference [5], to apply various sign languages worldwide, it is necessary to realize as many as 400 different sign languages into an integrated system. An existing study used to implement a simplified knowledge base description method is HamNoSys [6][27]. The HamNoSys is a transcription system for all sign languages with a direct correspondence between symbols and gesture aspects, such as hand location, shape, and movement. The T. Nakanishi: A Primitive Action-driven Recognition Method for the Realization of Global Heterogeneous Sign Language Recognition 309. handshape descriptions should include al handshapes used in sign language worldwide. HamNoSys can be applied international y because it does not refer to nationally diversified finger figures. When we can create a sign language spelling knowledge base of sign language spel ing and word pairs using HamNoSys for each of the various sign languages, we can realize an integrated sign language recognition system across the different sign languages. 4.2 Preprocessing (Time-series Skeletal Feature Extraction) The preprocessing extracts time-series skeletal features representing both hands' positions each time from sign language video data. Figure 3 shows the detail of the time-series feature extraction modules. Figure 3: Preprocessing (Time-series Skeletal Feature Extraction) Source: own. First, it converts the input sign language video data into a set of images at each time as the time-series media content set. Next, it extracts features representing both hands' positions in each image. Through this process, we can obtain time-series multiple features at each time. In this paper, we apply Mediapipe [28] to feature extraction. The Mediapipe can extract hands, faces, arms, and body parts skeletal features. This paper uses landmarks of both hands' parts as features. The Mediapipe extracts each normalized position (x,y,z) data of 42 landmarks from each image. We can obtain 126 features each time as time-series features. Therefore, it generates a 126 × 𝑔𝑔 time-series feature matrix. This matrix shows the 126 features of the motion extracted from the sign language represented in the input video and their temporal variation. 310 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 4.3 Multiple Primitive Action Recognition Modules The multiple primitive action recognition modules recognize basic hand movements, as shown in Figure 4. We construct each primitive action recognition module that recognizes the shape of each finger in HamNoSys [6][27]. From the time-series skeletal features extracted by preprocessing, al primitive action recognition modules are executed for each time, and the corresponding primitive actions are derived. In other words, this module assigns a single symbol by HamNoSys that represents the hand movement each time. We can obtain the representation of recognition results for each frame as a sequence of HamNoSys symbols. The symbol sequence extracted for each time (frame) have duplicates of the same symbol. The system deletes consecutive identical symbols. Through these modules, we can obtain symbol sequences that appear in HamNoSys from time series skeletal feature data. Figure 4: An overview of multiple primitive action recognition modules Source: own. 4.4 Sign Language Spelling Knowledge Base The sign language spel ing knowledge base consists of a sequence of symbols cal ed sign language spel ings defined by HamNoSys and a word. When users can use HamNoSys as a reference for sign spel ing and the hand shapes that make up the T. Nakanishi: A Primitive Action-driven Recognition Method for the Realization of Global Heterogeneous Sign Language Recognition 311. sign language, they can easily add new words to this knowledge base. This is the most important feature of this method. Figure 5 shows how to compose sign language spelling. In general, sign language consists of multiple hand gestures. The sign language spel ing describes what handshapes are in what order. Figure 5 shows an example of the sign meaning hel o in Japanese. The sign language meaning hel o is performed by a hand gesture with the index and middle fingers raised, followed by a hand gesture with the index and middle fingers bent. Each hand shape is assigned a symbol that is determined within HamNoSys. The Sign language spel ing is represented by one or more symbol sequences denoted by HamNoSys. By applying the same methodology, creating a knowledge base for sign language recognition worldwide is possible. The HamNoSys set of finger shapes is common in the world's sign languages. By introducing such sign language spelling, building a knowledge base with simple descriptions is possible without creating or collecting new sign language videos. The sign language spel ing knowledge base consists of a sequence of symbols cal ed sign language spel ings defined by HamNoSys and a word. This knowledge base can be created for each different sign language system. Table 1 shows an example of the sign language spel ing knowledge base. Figure 5: How to compose sign language spel ing Source: own. 312 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Table 1: An example of the sign language speling knowledge base. The sign language spel ing knowledge base consists of a sequence of symbols cal ed sign language spel ings defined by HamNoSys and a word. 4.5 Primitive Action Composition Module The primitive action composition module derives appropriate words by matching the symbol sequences extracted by the multiple primitive action recognition modules with each sign language spel ing in the sign language spel ing knowledge base. The primitive action composition module must weigh the similarity between sign language spellings and symbol sequences. It is possible to derive a word that matches the sign by comparing the sequence of symbols extracted by the multiple primitive action recognition modules with the sign spel ing in the sign language spel ing knowledge base using the Levenshtein distance. In this paper, we apply Levenshtein distance to weigh the similarity between sign language spel ings. Levenshtein distance is one of the edit distances and is defined as the minimum number of times required to transform one string into another. 5 Conclusion This paper represented a primitive action-driven recognition method for the realization of global heterogeneous sign language recognition. This method defines common primitive actions of various sign language systems worldwide. It describes what the combination of these primitive actions indicates in various sign language systems to realize sign language recognition. This method wil realize a new global communication platform that avoids the communication divide and al ows people T. Nakanishi: A Primitive Action-driven Recognition Method for the Realization of Global Heterogeneous Sign Language Recognition 313. to communicate freely in the current situation, where people communicate in many ways. We apply our proposed method to create a new global communication platform and develop our method to bridge diverse communities. It is important to seamlessly bridge between diverse speaking communities (including the sign language speaker community). Our proposed method realizes the platform bridging diverse communities. We wil realize a new function that recognizes words and sentences in sign language as our future work. We will apply our method to the sign language of various countries. We need to establish a sign language segmentation method for this to work. Moreover, developing a co-editing environment for a sign language spel ing knowledge base is necessary to realize this method, verify its effectiveness using large-scale data, and conduct experiments on subjects with native signers. References [1] Nitta, T., Hagimoto, S., Yanase, A., Okada, R., Sornlertlamvanich, V., Nakanishi, T. Realization for Finger Character Recognition Method by Similarity Measure of Finger Features, International Journal of Smart Computing and Artificial Intelligence, Vol. 6 No. 1, 2022. [2] Hagimoto S, Nitta T, Yanase A, Nakanishi T, Okada R, Sornlertlamvanich V, Knowledge Base Creation by Reliability of Coordinates Detected from Videos for Finger Character Recognition, In proc. of 19th IADIS International Conference e-Society 2021, FSP 5.1-F144, 2021. p.169-176. [3] Nitta T, Hagimoto S, Yanase A, Nakanishi T, Okada R, Sornlertlamvanich V. Finger Character Recognition in Sign Language Using Finger Feature Knowledge Base for Similarity Measure, In Proceedings of the 3rd IEEE/IIAI International Congress on Applied Information Technology (IEEE/IIAI AIT 2020), 2020. [4] Nakanishi, T., Minematsu, A., Okada, R., Hasegawa, O., Sornlertlamvanich, V. Sign Language Recognition by Similarity Measure with Emotional Expression Specific to Signers, 32nd International Conference on Information Modelling and Knowledge Bases, 2022. [5] SIL International (2018a). Sign Languages. https://www.sil.org/sign-languages [6] Hanke, T. HamNoSys-representing sign language data in language resources and language processing contexts. In: Streiter, Oliver, Vettori, Chiara (eds): LREC 2004, Workshop proceedings: Representation and processing of sign languages. Paris: ELRA; 2004. pp. 1-6. [7] Adeyanju I. A, Bel o O. O, Adegboye M. A. Machine learning methods for sign language recognition: A critical review and analysis. Intelligent Systems with Applications, 2021 12, 200056. [8] Gupta R, Rajan S. Comparative analysis of convolution neural network models for continuous Indian sign language classification, Procedia Computer Science, 171 2020, pp. 1542-1550. [9] Chowdhry D.A, Hussain A, Ur Rehman M.Z, Ahmad F, Ahmad A, Pervaiz M. Smart security system for sensitive area using face recognition, Proceedings of the IEEE conference on 314 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 sustainable utilization and development in engineering and technology, IEEE CSUDET 2013, pp. 11-14. [10] Cheok M.J, Omar Z, Jaward M.H. A review of hand gesture and sign language recognition techniques, International Journal of Machine Learning and Cybernetics, 10 (1) 2019, pp. 131-153. [11] Butt U.M, Husnain B, Ahmed U, Tariq A, Tariq I, Butt M.A, Zia M.S. Feature based algorithmic analysis on American sign language dataset, International Journal of Advanced Computer Science and Applications, 10 (5) 2019, pp. 583-589. [12] Rastgoo R, Kiani K, Escalera S. Sign language recognition: A deep survey, Expert Systems with Applications, 164 2021, Article 113794. [13] Lee C.K.M, Ng K.H, Chen C.H, Lau H.C.W, Chung S.Y, Tsoi T. American sign language recognition and training method with recurrent neural network, Expert Systems with Applications, 167 2021, Article 114403. [14] Huang, Y., Monekosso, D., Wang, H., Augusto, JC. A hybrid method for hand gesture recognition, 2012 Eighth International Conference on Intelligent Environments, Guanajuato, Mexico, June 2012. pp. 297–300. [15] Yuan, S. et al. Chinese sign language alphabet recognition based on random forest algorithm. 2020 IEEE International Workshop on Metrology for Industry 4.0 & IoT, June 2020. pp. 340– 344. [16] Izzah, A., Suciati, N. Translation of sign language using generic fourier descriptor and nearest neighbour. IJCI, vol. 3, no. 1, February 2014. pp. 31– 41. [17] Raheja, JL., Mishra, A., Chaudhary, A. Indian sign language recognition using SVM, Pattern Recognit. Image Anal., vol. 26, April 2016. pp. 434–441. [18] Wong, SF., Cipol a, R.Real-time adaptive hand motion recognition using a sparse bayesian classifier. Computer Vision in Human-Computer Interaction, Berlin, Heidelberg, 2005, pp. 170–179. [19] Mahmud, I., Tabassum, T., Uddin, Md.P., Ali, E., Nitu, AM., Afjal, MI. Efficient noise reduction and HOG feature extraction for sign language recognition. 2018 International Conference on Advancement in Electrical and Electronic Engineering (ICAEEE), 2018. pp. 1– 4. [20] Hartanto, R., Susanto, A., Santosa, P.I., Real time static hand gesture recognition system prototype for Indonesian sign language. 2014 6th International Conference on Information Technology and Electrical Engineering, Yogyakarta, Indonesia, 2014. pp. 1–6. [21] Anand, MS., Kumar, NM., Kumaresan, A. An efficient framework for Indian sign language recognition using wavelet transform. Circuits and Systems, vol. 07, no. 8, June 2016. pp. 1874– 1883. [22] Hasan,MM., Khaliluzzaman,Md., Himel,SA., Chowdhury, RT. Hand sign language recognition for Bangla alphabet based on Freeman Chain Code and ANN. 2017 4th International Conference on Advances in Electrical Engineering (ICAEE), Dhaka, September 2017. pp. 749–753. [23] Athira, PK., Sruthi, CJ., Lijiya, A. A signer independent sign language recognition with co-articulation elimination from live videos: an Indian scenario. J. King Saud. Univ. - Comput. Inf. Sci., vol. 34, no. 3, March 2022. pp. 771–778. [24] Aloysius, N., Geetha, M., A scale space model of weighted average CNN ensemble for ASL fingerspel ing recognition. Int. J. Comput. Sci. Eng., vol. 22, no. 1, May 2020. pp. 154–161. [25] Noroozi F, Kaminska D, Corneanu C, Sapinski T, Escalera S, Anbarjafari G. Survey on emotional body gesture recognition. IEEE transactions on affective computing, 12(02), 2021. p. 505-523. [26] Kiyoki, Y., Hosokawa, Y., Ishibashi, N. A metadatabase system architecture for integrating heterogeneous databases with temporal and spatial operations." Advanced Database Research and Development Series 10, 2000. pp. 158-165. [27] HamNoSys Handshapes, T. Nakanishi: A Primitive Action-driven Recognition Method for the Realization of Global Heterogeneous Sign Language Recognition 315. [28] https://www.sign-lang.uni-hamburg.de/dgs- korpus/files/inhalt_pdf/HamNoSys_Handshapes.pdf [29] Mediapipe, https://google.github.io/mediapipe/ 316 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 ALGORITHM OUTLINE FOR SKETCH MAP DRAWING FROM SPATIAL DATA DISTILLED FROM NATURAL LANGUAGE DESCRIPTIONS MAREK MENŠÍK, PETR RAPANT, ADAM ALBERT VSB - Technical University of Ostrava, Ostrava, Czech Republic marek.mensik@vsb.cz, petr.rapant@vsb.cz, adam.albert@vsb.cz Much knowledge about the real world is recorded in plain text, e.g., as messages on social networks. These messages contain, among others, also spatial information, and it can be distil ed from these messages by natural language processing. The Keywords: extracted information can be represented as a plain topological motion verbs, graph stored as tuples describing individual edges. This paper TIL, spatial data, presents an outline of an algorithm that uses these tuples for topology, creating a sketch map. sketch map DOI https://doi.org/10.18690/um.feri.5.2023.15 ISBN 978-961-286-745-4 318 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 1 Introduction Much knowledge about the real world is recorded in plain text, e.g., messages on social networks. These texts contain, among other things, spatial data. Our research aims to extract these spatial data from plain text, compile a topological graph of the described area, and then visualize it as a sketch map. A sketch map is defined as ’ an outline map drawn from observation rather than from exact survey measurements and showing only the main features of the area’. 1 It is usual y a hand drawing of an area drawn without scale, and it usually shows the main characteristics of an area and is not cluttered with unnecessary detail. The sketch map has a low degree of positional accuracy and therefore does not correctly represent the distances, dimensions, and shapes of objects. On the other hand, it can have a high degree of logical accuracy, meaning that the spatial relationships (topology, spatial order) between objects are correctly represented2. The process of creating a sketch map from plain text data consists of three steps [1]: (i) identification of spatial entities and their spatial relations by natural language processing, (ii) creation of a plain topological graph that captures identified spatial entities and their spatial relations, and (ii ) conversion of this graph into a sketch map. This paper deals with the first results related to the third step. Converting a topological graph into a sketch map involves, in principle, the dynamic placing of spatial entities on the canvas in a way that all the known spatial relations between them are kept. Some authors dealt with this process. The authors [1] represented each entity as a rectangle, the size and position of which are adjusted stepwise to fit al spatial relations. They mainly dealt with data that describe the urbanized area. On the other hand, the authors [2] focused on creating a sketch map of an open landscape using descriptions of individual routes created by orienteers. The resulting sketch map captured the relative position of each spatial entity in relation to other entities. Their approach was based on a genetic algorithm. 1 See https://www.merriam-webster.com/dictionary/sketch%20map 2 See https://www.tariffnumber.com/info/abbreviations/12485 M. Mensik, P. Rapant, A. Albert: Algorithm Outline for Sketch Map Drawing from Spatial Data Distil ed from Natural Language Descriptions 319. The described approach is different. We use TIL constructions to represent captured spatial data and create a plain topological graph of the described area. Mathematical logic tools are used to process this graph to compile individual circles in the graph, and these circles are then concatenated to create the final sketch map. The following chapters are organized as follows. Chapter 2 provides a detailed description of the input data format utilized by the algorithm, along with essential definitions. In Chapter 3, we focus on the numbering of directions and the identification of the bounding circle in our dataset. The modification of data required for the application of the algorithm is discussed in Chapter 4, while Chapter 5 presents a thorough description of the algorithm itself. Chapter 6 is dedicated to a case study that showcases the application of the algorithm. Final y, Chapter 7 concludes the paper. 2 How We Obtain Our Data We start with descriptions of the agents’ journeys, which we consider coherent both in space and time. In [3], we have introduced heuristic functions that manipulate these descriptions of journeys. These functions incremental y build a TIL construction describing spatial data. TIL is a typed hyperintensional λ -calculus of partial functions found by Pavel Tichý in the early 1970s. TIL exploits procedural semantics, i.e., natural language expressions encode algorithmical y structured procedures as their meaning. Tichý defined six kinds of such meaning procedures that he coined TIL constructions as the centerpiece of his system; see [4]. Constructions produce extensional or intensional entities, or even lower-order procedures, as their products or, in wel -defined cases, fail to produce anything. TIL has been introduced and thoroughly described in numerous papers, such as [4], [5], [6], [7]. A journey can be informally described as a sequence of natural language sentences, and each sentence contains information about part of an agent’s journey. To formalize the sentences, we exploit a class of motion verbs (e.g., to go, to walk, to cross, to turn) that bind other sentences’ constituents to them via valency. The valence of 320 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 the verb is described in valency frames using functors. Functors define the semantic-syntactic relationship between the verb and its complement. Details are given, for example, in [8], [9], [10], [1]. For example, the sentence ”John walks swiftly 15 minutes from the gym to home.” can be analyzed by exploiting the valency frame of the verb to walk as follows: − ACT (who): John − DIR1 (from where): gym − DIR3 (to where): home − EXT (for how long/ how far): 15 minutes − MANN (manner): swiftly Using valency frames and the information that follows from them, we might obtain a formal description of one’s journey by formalizing it in the natural language formalized in the expressive language of TIL. 3 λwλt [[ ′ACTwt ′John ′walk] ∧[ ′DIR 1 wt ′gym ′walk] ∧[′ DIR 3 wt ′home ′walk] (1) ∧[ ′EXTwt ′ 15 ′walk] ∧[ ′MANNwt ′quickly ′walk]] Types: DIR1, DIR3/( oπν) τω ; ACT/ ( oιν) τω ; MANN/ ( oαν) τω ; EXT/; Tom/ ι; home, school /π; walk/ν, where π is a type of places; ν is a type of the activity denoted by a verb. This approach is based on our previous research. In that, we introduced an algorithm of symbolic supervised machine learning that incrementally builds an explication of vague or inaccurate expressions into an adequately accurate one. For more information, see [11], [7]. 3 For the sake of readability, we will use just TIL language to display examples. Our computations are executed over TIL-Script constructions. M. Mensik, P. Rapant, A. Albert: Algorithm Outline for Sketch Map Drawing from Spatial Data Distil ed from Natural Language Descriptions 321. In this paper, we process the data obtained from the TIL constructions. These data contain information about the places the agents visited and the directions of movement between pairs of these places they took. In [3], we introduced several definitions that identify spatial data in TIL constructions. The following two ones help us identify the places visited by agents. Definition 1 ( node, edge). Let V be a motion verb, let S = { B|(′ DIR 1 or ′ DIR 3) and V are constituents of B } and let DV = { C| V is a constituent of C }\ S then DV is a set of edges and S is a set of nodes. Definition 2 ( place, functor, value). Let [ α x v] be a node and let [ β y v 1] be the edge description. Then x is a place, α, β are functors, and y, v, v 1 is values. A simple sentence might connect two places by verb valency with additional information. In the case of construction 1, by definition 1, [′ DIR 1 wt ′ gym ′ walk] and [′ DIR 3 wt ′ home ′ walk] are nodes. The rest, [′ ACTwt ′ John ′ walk],[′ EXTwt ′15 ′ walk], [′ MANNwt ′ quickly ′ walk], is an edge. By definition 2, home, and gym are places; they are constituents of nodes. Definition 3 ( relative direction RD). Let [′ MANN x v] be an edge description, and the value x be one of the following ′ straight, ′ slightly–left, ′ left, ′ sharp–left, ′ back, ′ sharp–right, ′ right and ′ slightly–right. Then x is the relative direction. Using definitions 1, 2 and 3, we extract the spatial information used for map sketching, namely, a place that is the origin of the agent’s movement, his direction of movement, and the place to which the agent is heading. We obtain input data as a tuple [DIR1, direction, DIR3]. 3 Numbering Directions are essential elements in map sketching. We encode the relative directions from the input data using numbers. The mapping of relative directions into natural numbers is shown in table 1. 322 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 The data in table 1 represent the basic eight directions that we obtain, i.e. Straight (F), slightly left (FL), left (L), sharp left (BL), back (B), sharp right (BR), right (R) and slightly right (FR). TIL constructions which are Table 1: TIL representations (TIL), relative directions (RD) and their numerical representation. TIL RD Encoding TIL RD Encoding ′ straight F 0 ′ back B 4 ′ slightly–left FL 1 ′ sharp–right BR 5 ′ left L 2 ′ right R 6 ′ sharp–left BL 3 ′ slightly–right FR 7 Let [ ′MANN x v] be an edge description, and the value x be one of the following , ′slightly–left, , ′sharp–left, ′back, ′sharp–right, ′right and ′slightly–right. Then x is the relative direction. Based on Table 1, we define the relative direction number (RDN): Definition 4 ( relative direction number RDN). Let C be a graph circle. RDN is the value assigned to the edge in C based on algorithm 1. RDNs are together with the relevant relative direction visually represented in Figure 1. Figure 1: Absolute [RDN/ASDN] and ⊕ function for computation of individual nodes coordinates Source: own. M. Mensik, P. Rapant, A. Albert: Algorithm Outline for Sketch Map Drawing from Spatial Data Distil ed from Natural Language Descriptions 323. In the first step of the map sketching algorithm, we assign RDNs to edges according to Algorithm 1. Algorithm 1. Relative direction number (RDN) assignment Require: ◦ - numbering function (Table 2), G - set of tuples [ α DIR1 RD DIR3] representing the graph, C ⊆ G represents the circle, S - starting node, where RD 1 = front RDN 1 ← 0 for i := 2 to | C| do RDNi = RDNi−1 ◦ RDi end for Table 2: Computation of RDN: Application of the ◦ function3 ◦ F FL L BL B BR R FR F 0 1 2 3 4 5 6 7 0 0 1 2 3 4 5 6 7 1 1 2 3 4 5 6 7 0 2 2 3 4 5 6 7 0 1 3 3 4 5 6 7 0 1 2 4 4 5 6 7 0 1 2 3 5 5 6 7 0 1 2 3 4 6 6 7 0 1 2 3 4 5 7 7 0 1 2 3 4 5 6 Afterward, we identify all bounding circles defined by definition 5 in our input data. Definition 5 ( bounding circle BC). Let G be a plane graph and C be a graph circle. If each edge of the circle C has the maximal number Cn calculated by the equation: Cn = (( RDNout − RDNin + 8) mod 8) (2) Then C is cal ed the bounding circle. 4 4 ASDNout is ASDN of outgoing incident edge and ASDNin represents ASDN of the edge entering into the node. 324 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 There may be an edge in multiple circles with different RDNs. To deal with this situation, we need to run the rotation algorithm (see Section 4.1) to obtain ASDN (further definition) to unify all RNDs. For example, assume that we have a segment of a description of some bounding circle in which an agent went right from A to B and assume that the agent came to node A from relative direction 0. Our input data would be in the form [A, right, B]. Therefore, we can assign the edge from A to B with RND 6. 4.1 Bounding Circle Adjustment Before we can join two bounding circles in their common sections, it is often necessary to modify them. In general, there are two situations that we need to consider. In the first one, the same sections of the bounding circles are described by agents from different directions. Therefore, it is necessary to rotate one bounding circle so that their common sections lead in the same direction. The second occurs when the distances between nodes in the common sections are unequal. In this case, the nodes of one bounding circle must be moved so that the distances of the nodes in the common sections are the same, but the directions of edges in the modified bounding circle remain the same. 4.1 Rotation The common edges of two bounding circles can be described in different directions. One agent can turn on a street from the left, and another can turn on the same street from the right. Therefore, the unification of the RDNs of two bounding circles with common edges is necessary for the merging of those two bounding circles. The unification is done by rotation, and rotation is achieved by recalculating al RDNs using the equation 3. 5 RDNi+1 = ( RDNi + 1) mod 8 (3) Definition 6 ( absolute sketch direction number ASDN). Let e be an edge, then the RDN of the edge e is called ASDN if for every circle where the edge e is part of it, the e has the same RDN. 5 The rotation is used as many times as it is necessary to unify the RDN. M. Mensik, P. Rapant, A. Albert: Algorithm Outline for Sketch Map Drawing from Spatial Data Distil ed from Natural Language Descriptions 325. Remark: If al the data are consistent, then rotation gives ASDNs. If some RDN is not possible to transform into ASDN, then the data are not consistent and the user can be notified which edge is not consistent. 4.2 Node adjustment The node adjustment consists of verifying that the node is consistently positioned to neighboring nodes with respect to the RDN and moving in the direction of the RDN if necessary. If a node is moved, the consistency of neighboring nodes is verified. The function ⊕ is used to place the nodes in particular coordinates consistently. For example: If node A has coordinates [0,0] and node B is in direction 7, that is, edge (A, B) has RDN = 7. Then according to the function represented by figure 1, node B has coordinates [n,n], where n > 0. 5 Algorithm for Computing Coordinates of Nodes 1. Sort all bounding circles (BCs). 2. Pick the longest one and place any node to coordinate [0,0]. 3. Go through al the nodes in the circle and according to their ASDN calculate the coordinates. Check whether there are no edges intersections. If so, make adjustments. 6 4. Withdraw the used circle from the set. 5. From the rest of the circles find the one with the longest common part with the already processed circle. If there are known coordinates, use them. Otherwise, recalculate and adjust al affected BCs. 7 6. Set one common node of the new circle to already known coordinates and continue by step [3.] 6 Case Study In our case study, we wil demonstrate the outline of the algorithm functionality. First, we will present the input data visualized using the topological graph. We will identify bounding circles from input data and from this point we will visualize 6 Move nodes to other coordinates in the same direction as described in chapter 4. 7 The longest common part means BC with the most common edges. 326 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 processed data using graph visualization for simplicity. The next step will demonstrate the adjustment of bounding circles and subsequently merging them into a sketch map. Table 3 represents our input data. Column Places contain triplets ( H, A, F) meaning an agent went from place H via place A to place F. Absolute and Relative columns are values of absolute directions, and relative direction, respectively. For example, a pair ( ne, sw) means that an agent came to A from the northeast (place H) and continued to the southwest (place F). Therefore, the relative direction is straight. The absolute directions in Table 3 are generated from the map data and are mentioned here only to verify the correctness of our algorithm, they are not required for the correct functionality of our algorithm. Table 3: Input data Places Abs Rel Street Places Abs Rel Street (H,A,F) ne,sw F ′ Bowery (F,A,H) sw,ne F ′ Bowery (A,F,G) ne,sw F ′ Bowery (G,F,A) sw,ne F ′ Bowery (F,G,E) ne,sw F ′ ChathamS (E,G,F) sw,ne F ′ Bowery Q (G,E,J) ne,sw F ′ ChathamS (J,E,G) sw,ne F ′ ChathamSQ Q (E,J,D) ne,sw F ′ ChathamS (D,J,E) sw,ne F ′ ChathamSQ Q (J,D,B) ne,nw R ′ MottST (B,D,J) nw,ne L ′ ChathamSQ (D,B,M) se,ne R ′ MottST (M,B,D) ne,se L ′ MottST (B,M,I) sw,se R ′ BayardST (I,M,B) se,sw L ′ MottST (M,I,H) nw,se F ′ BayardST (H,I,M) se,nw F ′ BayardST (I,H,A) nw,s R ′ Bowery (A,H,I) sw,n L ′ BayardST w w (D,B,C) se,e BR ′ PellST (C,B,D) e,se BL ′ MottST (B,C,O) w,e F ′ PellST (O,C,B) e,w F ′ PellST (C,O,L) w,e F ′ PellST (L,O,C) e,w F ′ PellST (O,L,P) w,e F ′ PellST (P,L,O) e,w F ′ PellST (L,P,A) w,e F ′ PellST (A,P,L) e,w F ′ PellST (C,B,M) e,ne BR ′ MottST (M,B,C) ne,e BL ′ PellST (P,A,H) w,ne FL ′ Bowery (H,A,P) ne,w FR ′ PellST (P,A,F) w,sw BR ′ Bowery (F,A,P) sw,w BL ′ PellST (K,G,E) nw,s R ′ ChathamS (E,G,K) sw,n L ′ DoyerST w Q w (K,G,F) nw,ne L ′ Bowery (F,G,K) ne,nw R ′ DoyerST (L,K,G) n,se FL ′ DoyerST (G,K,L) se,n FR ′ DoyerST (O,L,K) w,s R ′ DoyerST (K,L,O) s,w L ′ PellST (P,L,K) e,s L ′ DoyerST (K,L,P) s,e R ′ PellST M. Mensik, P. Rapant, A. Albert: Algorithm Outline for Sketch Map Drawing from Spatial Data Distil ed from Natural Language Descriptions 327. From Table 3, we identify three bounding circles (according to definition 5), namely BC 1: L–O–C–B–M–I–H–A–P; BC2: L–O–C–B–D–J–E–G–K and BC 3: L–K– G–F–A–P. BCs are presented in Table 4, where the edges of bounding circles are in the form of [[ X,Y ], RDN]. By algorithm 1 we assign RDN to al edges in the bounding circles. Table 4: Three identified BCs. BC 1 BC 2 BC 3 [[a,p],0] [[l,o],2] [[a,p],0] [[p,l],0] [[o,c],2] [[p,l],0] [[l,o],0] [[c,b],2] [[l,k],2] [[o,c],0] [ b,d],5] [ k,g],3] [[c,b],0] [[d,j],7] [[g,f],5] [ b,m],5] [[j,e],7] [ f,a],5] [ m,i],3] [ e,g],7] [[i,h],3] [[g,k],1] [ h,a],1] [[k,l],0] The visualization of bounding circles from Table 4 is presented in Figure 2. Figure 2: Three unrotated BCs are sketched according to Table 4 Source: own. Because there are bounding circles that share some nodes, we can merge them. Firstly, we merge BCs that share the most nodes. In this case, we will merge BC 1 and BC 2 first (L-O-C-B are shared). To do so, we need to unify RDN of the same edges in both BCs. The edge L-O in BC 1 has RDN = 0 and the same edge in BC 2 has RDN = 2. Therefore, it is necessary to adjust (rotate) BC 2 until the RDNs are the same. Rotation is visualized in figure 3. 328 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Figure 3: Three BCs sketched according to Table 4 Source: own. Now, all bounding circles are oriented in the same direction. However, as seen from the visualizations of BC 1 and BC 2 in Figure 3, the distances between the shared nodes L and O are not equal. It is necessary to adjust the nodes of one of the BCs to match the nodes of the other BC. In figure 4, we can see that in BC 1, we have adjusted the coordinates of node L, so the distance from node L to node O is the same in both BCs. The adjustment of the coordinates of node L compromised the RND of the other nodes in BC 1. Therefore, the coordinates of the nodes P, A, H, I, and node M were adjusted, respectively, so the original RND of the edges remained unchanged. Figure 4: Node adjustment of BC 1 Source: own. M. Mensik, P. Rapant, A. Albert: Algorithm Outline for Sketch Map Drawing from Spatial Data Distil ed from Natural Language Descriptions 329. Since the sequence L–O–C–B is proportionaly the same in both bounding circles 1 and 2, we can proceed to the merging process. The visualization of the merged BCs is presented in Figure 5. Figure 5: Merge of BC 1 and 2 Source: own. The same process is applied in the case of merging BC 3 to the merged BC 1 and 2. In this case, there is no rotation or adjustments needed. Figure 6 presents the merged BCs 1, 2, and 3. Figure 6: Merge of BC 1, 2 and 3 Source: own. 330 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 The resulting image represents a sketch map, where the nodes are positioned according to the direction in which they lie from each other. The distances between the nodes, as mentioned in the introduction, are not relevant, what matters is their relative position on the sketch map. 7 Conclusion This paper presents an outline of an algorithm that accepts tuples that describe individual edges of a topological graph. The tuples are in the form of [from where (place), via what (place), to where (place), change of direction (direction)]. The output of the algorithm is a sketch map of the topological graph. From the input data, we identify bounding circles that are modified and merged into the sketch map. This outline of the algorithm is the first attempt to solve the problem of drawing a sketch map, i.e. the computation is costly and not optimized. The algorithm was implemented in PROLOG language. Acknowledgements This research has been supported by a Grant from SGS No. SP2023/065, VSˇB - Technical University of Ostrava, Czech Republic, “Application of Formal Methods in Knowledge Model ing and Software Engineering VI”. References [1] Maria Vasardani, Sabine Timpf, Stephan Winter, and Martin Tomko. From descriptions to depictions: A conceptual framework. In Thora Tenbrink, John Stel , Antony Galton, and Zena Wood, editors, Spatial Information Theory, pages 299–319, Cham, 2013. Springer International Publishing. [2] Lamia Belouaer, David Brosset, and Christophe Claramunt. From verbal route descriptions to sketch maps in natural environments. SIGSPACIAL ’16, New York, NY, USA, 2016. Association for Computing Machinery. [3] Marek Menšík, Adam Albert, Petr Rapant, and Tomaáš Michalovský. Heuristics for spatial data descriptions in a multi-agent system. Frontiers in Artificial Intel igence and Applications, 364:68–80, 2023. [4] Marie Duží, Bjørn Jespersen, and Pavel Materna. Procedural semantics for hyperintensional logic. Springer, New York, 2010. [5] Marie Duží. Extensional Logic of Hyperintensions, pages 268–290. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012. [6] Marie Duzi. Communication in a multi-cultural world. Organon F, 21:198–218, 01 2014. [7] Marek Menšík, Marie Duží, Adam Albert, Vojtěch Patschka, and Miroslav Pajr. Refining concepts by machine learning. Computación y Sistemas, 23(3):943 – 958, 2019. [8] Jarmila Panevova. Valency frames and the meaning of the sentence. The Prague School of Structural and Functional Linguistics, 41:223, 1994. [9] Bernd Heine, Heiko Narrog, Vilmos Ágel, and Klaus Fischer. Dependency grammar and valency theory. The Oxford Handbook of Linguistic Analysis, 2015. M. Mensik, P. Rapant, A. Albert: Algorithm Outline for Sketch Map Drawing from Spatial Data Distil ed from Natural Language Descriptions 331. [10] Thomas Herbst, David Heath, Ian F. Roe, and Dieter Götz. A Valency Dictionary of English: A CorpusBased Analysis of the Complementation Pat erns of English Verbs, Nouns and Adjectives. De Gruyter Mouton, 2013. [11] Marek Menšík, Marie Duží, Adam Albert, Vojtěch Patschka, and Miroslav Pajr. Machine learning using TIL. In Frontiers in Artificial Intel igence and Applications, pages 344 – 362, Amsterdam, 2019. IOS Press. 332 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 7 Spatial and Temporal 334 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Ločni list A RISK-RESILIENCE CALCULATION METHOD FOR ENVIRONMENTAL CHANGE AND DISASTER ANALYSIS WITH 5D WORLD MAP SYSTEM VISUALIZATION SHIORI SASAKI,1, 2 YASUSHI KIYOKI,1, 2 AMANE HAMANO2 1 Keio University, Tokyo, Japan {ssasaki, y-kiyoki}@musashino-u.ac.jp 2 Musashino University, Tokyo, Japan {ssasaki, y-kiyoki}@musashino-u.ac.jp, s2122067@stu.musashino-u.ac.jp This paper presents an important application of 5D World Map System, which realizes an analytical computing and visualization for expressing environmental phenomena, causalities and influences, with “Time-series Multilayer Risk-Resilience Keywords: Calculation” method for global environmental change and CPS, cyber-disaster analysis to make appropriate and urgent solutions to physical-system, global and local environmental phenomena in terms of short and sensing-long-term changes. This method enables the calculation of the processing-current risk and resilience of a target region or city to disasters actuation, GIS, based on the history of past time-series changes and the overlap open data, of multidimensional factors to predict them in the near future. visualization, This method calculates the total risk and resilience to disaster as knowledge bases, a total aggregate value that reflects the amount of change in each SDGs, variable in the past, by transforming multidimensional and SDG9, heterogeneous variables into a form that al ows comparative and SDG11, arithmetic operations through normalization. As an SDG13, SDG15, implementation and experiments, we apply our method to deforestation, assessing the role of forests in urban disaster resilience by disaster risk, analysing the relationships between time-series changes in forest disaster resilience, distribution and urban disaster occurrence, specifically using urban GIS, satellite data, demographic data, urban infrastructure data, development, and disaster data and calculate “urban-forest-disaster global risk/resilience”. environment DOI https://doi.org/10.18690/um.feri.5.2023.16 ISBN 978-961-286-745-4 336 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 1 Introduction Disaster Risk Reduction (UNDRR) points out [19], it is essentially important to collect disaster information in a wide area in real time, to make it accessible and public through open data, and to detect vulnerable areas at an early stage and take countermeasures for the realization of Sustainable City (SDG9, SDG11) and Disaster Resilience (Sendai Framework [20]). In fact, many cases have been reported where it is difficult to accurately assess the current situation and identify disaster-risk hotspots due to lack of information and data. In addition, not only detecting disaster-risk hotspots but also estimating disaster-resilience is an important and urgent task to build capacity. As Sendai Framework [20] indicates, it is important to make “action to prevent new and reduce existing disaster risks: (i) Understanding disaster risk; (ii) Strengthening disaster risk governance to manage disaster risk; (iii) Investing in disaster reduction for resilience and (iv) Enhancing disaster preparedness for effective response, and to "Build Back Better" in recovery, rehabilitation and reconstruction.” One possible solution to these issues is the use of satel ite multispectral imagery and open data of socioeconomics to detect the risk and vulnerability of the specific area of the countries with rapid environmental changes and disasters. The objective of our method is not only to visualize but also to calculate the risk/vulnerability and the resilience of society for disaster risks in a target area as values to make appropriate and urgent solutions to global and local environmental phenomena in terms of short and long-term changes, with a time-series and multilayered manners. In this study, we describe a method of “Time-series Multi-layered Risk-Resilience Calculation” for estimating the risk or resilience of a (possible) disaster-affected area using open satel ite data and remote sensing technology. This method enables the calculation of the current risk and resilience of a target region or city to disasters based on the history of past time-series changes and the overlap of multidimensional factors, and to predict them in the near future. S. Sasaki, Y. Kiyoki, A. Hamano: A Risk-Resilience Calculation Method for Environmental Change and Disaster Analysis with 5D World Map System Visualization 337. In the implementation, we focus on deforestation and landslide phenomena, which are considered to be caused by a combination of natural disasters such as heavy rainfal , floods, earthquakes, and human socio-economic activities such as land-use development, logging, farming, building etc. We describe the feasibility and effectiveness of our method by several experiments with the data of (a) landslide and flood risks, (b) population growth, (c) infrastructure development and (d) forest distribution in Japan (Ibaraki Prefecture in Japan, 2015 and 2020). The Time-series Multi-layered Risk-Resilience Calculation method realized by this study is designed to be applied to the 5D World Map System [1]-[8]. “SPA-based 5D World Map System” [1]-[8] is a global and environmental knowledge-integrating and processing system for memorizing, searching, analysing and visualizing “Global and Environmental Knowledge and Information Resources,” related to natural phenomena and disasters in global and local environments. This system analyses environmental situations and phenomena with “environmental multimedia data sharing,” as a new global system architecture of collaborative and global environment analysis. This system realizes a remote, interactive and real-time environmental research exchange among different areas. On the other hand, in the field of the remote sensing, many studies have been conducted to estimate environmental changes on the earth surface and the land use status using environmental indices such as the Normalized Difference Vegetation Index (NDVI) Normalized Difference Water Index (NDWI), Normalized Difference Snow Index (NDSI) by using satellite multispectral imagery. We utilize these remote sensing techniques and apply them to create knowledge bases for our objectives in this study. In addition, our method utilizes open-data satellite multispectral imagery to estimate the size of disaster-affected area with relatively high accuracy using an inexpensive and uncomplicated estimation method. This feature makes the method widely applicable to LDCs and smal local governments. In particular, the method is effective for early assessment of the situation, such as rapid confirmation of the disaster situation in wide-area disasters. 338 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 In this paper, we focus on the deforestation and landslide disasters as an example and conduct an implementation and experiments using open satel ite data and opensource GIS software. Furthermore, the applicability of our method to a multidimensional world map system is also discussed. 2 Overview of SPA-based 5D World Map System 5D World Map System [1]-[8] is a knowledge representation system that enables semantic, temporal and spatial analysis of multimedia data and integrates the analysed results as 5-dimentional dynamic historical atlas (5D World Map). The composition elements of 5D World Map are a spatial dimension (3D), a temporal dimension (4D) and a semantic dimension (5D). Figure 1: System Structure of 5D World Map System with AI-Sensing Source: [33]. A semantic associative search method [9][10] is applied to this system for realizing the concept that "semantics" of words, documents, multimedia, events and phenomena vary according to the "context". 5D World Map System [1]-[8] has been providing various functionalities to calculate and express the semantics and context S. Sasaki, Y. Kiyoki, A. Hamano: A Risk-Resilience Calculation Method for Environmental Change and Disaster Analysis with 5D World Map System Visualization 339. of various types of multimedia data [4][5][7][8]. Also, the functions for monitoring, analysing and warning with a multi-dimensional and multi-layered visualization of 5D World Map System with AI-Sensing has been utilized for monitoring SDG14, SDG9, SDG11 (Sustainable Ocean and Disaster Resilience) in United Nations ESCAP [7][31][32][33]. The SPA (Sensing-Processing-Actuation) process of 5D World Map System is shown in Figure 1. Currently, the 5D World Map System is globally utilized as a Global Environmental Semantic Computing System, in SDG14, United-Nations-ESCAP’s Closing-The-Loop project [32] for observing ocean-environment situations with local and global multimedia data resources [33]. In this project of plastic garbage detection and reduction, we include a new function of AI-Sensing to 5D World Map System and apply the analytical visualization functions in the SPA (Sensing-Processing-Actuation) process as shown in Figure 1. 3 Related Studies 3.1 Multilayer Visualization with 5D World Map System The multilayer visualization function of 5D World Map System and its application for ocean environmental analysis [7] and disaster-resilience monitoring [8] have been presented. Our method described in this paper is based on these research results on methods for analyzing and predicting disaster resilience from the interaction of global and local views [7][8]. 3.2 Multispectral Image Analysis with 5D World Map System SPA-based environmental-semantic computing for global and local environment analysis with multispectral image analysis [2] and its application for coral health monitoring [11][13] have been proposed. Also, the deforestation analyses with satellite multispectral images and SAR data analysis [12][14][17] applied to 5D World Map System, and multispectral imaging with UAV for agricultural monitoring and analysis [16][18] have been proposed. 340 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 3.3 Disaster Risk Analysis There are many disaster risk visualization systems and disaster prevention maps using multispectral satel ite imagery and remote sensing, and there are also open data platforms on disasters provided by national and local governments [21][22][23][24]. On the other hand, individual studies on estimating and visualizing the actual disaster area using these technologies are scattered local y in the fields of GIS, disaster prevention, and environmental engineering. However, a global y aggregated information platform with disaster data analysis, visualization, and sharing systems are stil in the research and development stage [25]. Also, there are a lot of studies on disaster risk detection using satellite multispectral imagery and remote sensing. For example, a study using high-resolution satellite images to predict the risk of landslides by evaluating the predisposition to landslides from satel ite images has proposed [26]. Another study proposed the extraction of reflectance characteristics that are highly relevant to landslide risk using visible and infrared images from multispectral imagery as wel as our method [27]. In their research paper, they presented disaster damage prediction at the time of the shooting using highly accurate satellite multispectral imagery. About the cause and effect of deforestation, there is a study which discusses the causes of deforestation in the tropical areas from the aspects of slash-and-burn agriculture, population growth, poverty, and road construction [29]. Based on this reference, we focus on population growth and road construction in the implementation, which are also cited as causes of deforestation even in the non-tropical area. We visualize their relationship with forests in the Kanto region (Ibaraki prefecture) in Japan to show the importance of forests and their effects and impacts. There is another study which analyzes of the relationship between the development of urbanization on steep slopes and landslides [30]. In this study, we focus on the increasing risk of landslides including rainfall factors because urban residential development is approaching steep slopes and the damage often occurs in residential areas adjacent to these slopes. S. Sasaki, Y. Kiyoki, A. Hamano: A Risk-Resilience Calculation Method for Environmental Change and Disaster Analysis with 5D World Map System Visualization 341. From a technical aspect, satelite image analysis methods using opensource GIS software, QGIS, are widely introduced [28][42]. In this study, we refer to the method of using and utilizing QGIS and show the possibility of integrated use with various open data. In this study, we design and implement a method to analyze and visualize the relationship between disasters and forests in urban areas in a multilayered manner, focusing on the geographical system characteristics, land use change, and landslide characteristics in urban areas. 4 Time-series Multilayered Risk-Resilience Calculation Method for Environmental Changes and Disasters Our method of Time-series Multilayered Risk-Resilience Calculation is assumed to be applied to the 5D World Map System [1]-[8]. Figure 2 shows the overal configuration of a multi-dimensional map visualization system (5D World Map System) for disaster data to which this method is applied. Figure 2: System structure of Multidimensional World Map System (5D World Map System) [8] to which our Time-series Multilayered Risk-Resilience Calculation method is applied Source: [8]. 342 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 By applying our Time-series Multilayered Risk-Resilience Calculation method to a 5D World Map System, disaster risk calculation and prediction by disaster type can be realized using the Geo Database including Socioeconomic Parameter DB, Infrastructure DB and Natural Parameter DB shown in Figure 2 and Figure 3. 4.1 System Architecture Our Time-series Multilayered Risk-Resilience Calculation method is planned to be implemented as a sub-system (Sub-system 1) of a multidimensional World Map system cal ed “5D World Map System”. Figure 3 shows the total design of subsystems and the connection to 5D World Map System. Sub-system 2 Sub-system 1 Disaster-affected Area Visualizer output Multi-dimensional World Map Total Risk Calculator output (5D World Map System) • Disaster type • Affected Area Zoom-Out Loss & • Death • Injured Zoom-In Damage Context-dependent Risk Calculator • Refugee Zoom-Out • Destroyed Zoom-In Building : Time-series Calculation Disaster-affected Area Calculator t + 1 t - 1 Δ (Amount of Change) (3) Affected-Area Estimator Pixel number of Dif erence between input two images Multilayer Calculation (2) Dif erential Computing Calculator Ly 1: layer variable Geo Info of NDVI, NDWI, etc Grid ( G) from input two images (1) Index Calculator for Satellite Image lx 1 : laye explana Disaster & Env. Database Manager Input: a set of two images (before/after) lx 2 : laye explana lX : integ Socio explana Infra Natural Economic structure Environment Satellite Image DB Data (Weather) Disaster-related DB (Open Data e.g. USGS, Copernicus) (Open Data e.g. GAR) Figure 3: System structure applying our method as a Sub-system of 5D World Map System Source: own. This system consists of the following functions. 1) Disaster database management function: This function is to register and manage metadata such as images and multispectral images col ected for the target area, environmental sensing data, statistical data, and infrastructure geographic data. 2) Time-series Multilayered Risk-Resilience Calculation (Sub-system 1) 3) Disaster-af ected area estimation & visualization function [34] (Sub-system 2) S. Sasaki, Y. Kiyoki, A. Hamano: A Risk-Resilience Calculation Method for Environmental Change and Disaster Analysis with 5D World Map System Visualization 343. 4) Multi-layer visualization function: Based on the results obtained by functions of 2) and 3), the other data stored in 1) such as environmental sensing data, statistical data, infrastructure geographic data, etc. are projected on the map as multiple layers. 4.2 Multilayer Risk-Resilience Calculation The part of multilayer calculation of the Time-series Multilayered Risk-Resilience Calculation method is defined as the following 6 steps. Figure 4 shows the visual image of the steps. This process normalizes multidimensional, distributed and heterogeneous variables in a grid format and calculates the total disaster risk/resilience of a specific region as an aggregate value. STEP 1: Set a phenomenon with spatiotemporal information as an objective variable y and set multiple phenomena with spatiotemporal information as explanatory valuables ( x 1, x 2, …, xn). STEP 2: Collect geographical information data of y and x 1, x 2, …, xn. Commonly used geographical information data are Vector data (point/line/polygon, shape file), Raster data (image, GeoTIFF file), Mesh data (AAIGrid file), Point Cloud data, CSV text and KML/KMZ. STEP 3: Set each geographical information data of each variable as a layer li for a geographical information system (GIS). The set of layers are defined as ly and lx 1, lx 2, …, lxn. STEP 4: To enable a calculation of risk or resilience among layers with various granularity, set a common grid G: ={ g 1 , g2, …, gm} . The values of each layer’s grids (e.g. density of points, pixel value etc.) are expressed as a matrix or a vector such as ly 1 = ( g 1 y 1, g 2 y 1, …, gmy 1), lx 1 = ( g 1 x 1, g 2 x 1, …, gmx 1), lx 2 = ( g 1 x 2, g 2 x 2, …, gmx 2) and so on. STEP 5: Normalization for each layer ly and lx 1, lx 2, …, lxn STEP 6: Create an integrated layer of explanatory variables lX by calculating a total risk or resilience value gjX with a performance of arithmetic operators (+, -, *, /) for each grid value. For example, the accumulation of values is performed, the created layer lX and each grid’s value gjX of the layer are expressed as: n gjX = Σ( gjxi) 344 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 i=1 lX = ( g 1 X, g 2 X, …, gmX) Ly 1: layer of an objective variable Grid ( G) Check for estimation and prediction lx 1 : layer of an explanatory variable lx 2 : layer of another explanatory variable lX : integrated layer of explanatory variables Figure 4: Multilayer Calculation of Risk-Resilience Source: own. 4.3 Time-series Risk-Resilience Calculation The part of time-series calculation of the Time-series Multilayered Risk-Resilience Calculation method is defined as the following 4 steps. Figure 5 shows the visual image of the steps. The process calculates the time-series change of each variable in a normalized grid format among a multidimensional, distributed and heterogeneous set of variables of disaster risk/resilience in the target area. STEP 1: Set a phenomenon with spatiotemporal information as an objective variable y and set multiple phenomena with spatiotemporal information as explanatory valuables ( x 1, x 2, …, xn). S. Sasaki, Y. Kiyoki, A. Hamano: A Risk-Resilience Calculation Method for Environmental Change and Disaster Analysis with 5D World Map System Visualization 345. STEP 2: Collect geographical information data of y and x 1, x 2, …, xn, before ( ti - 1) and after ( ti + 1) the time ( ti) when disaster or environmental change occurs in the target area. STEP 3: Set each geographical information data of each variable as a layer li for a geographical information system (GIS). The set of layers are defined as ly and lx 1, lx 2, …, lxn. STEP 4: Calculate the difference between the layers of before ( ti - 1) and after ( ti + 1). STEP 5: If there are many times when disasters or environmental changes occur ( ti | i = 1, 2, …, q), the STEP 4 is repeated among ( t 1-1), ( t 1+1), ( t 2+1), …, ( tq+1). For the detection of disaster effect or environmental change using raster data (image, GeoTIFF file), this method uses the "Normalized Difference Environmental Index" such as Normalized Difference Vegetation Index (NDVI), Normalized Difference Water Index (NDWI), Normalized Difference Snow Index (NDSI), Normalized Burn Ratio (NBR), Normalized Difference Built-up Index (NDBI), etc., especially to estimate the size of disaster-affected area before and after the phenomenon happens. Each of these indexes is used to detect environmental changes such as landslides, floods, avalanches, wide-area forest fires and reduction of cultivated land due to buildings. t + 1 t - 1 Δ (Amount of Change) Multilayer Calculation Figure 5: Time-series Calculation of Risk-Resilience Source: own. 346 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 5 Implementation To realize our method of Risk-Resilience Calculation, we implement the following process with distributed time-series geographic information data, such as socioeconomic indicator data, demographic data, natural disaster data, urban infrastructure data and satellite image data. In this implementation, we apply our method to assessing the role of forests in urban disaster resilience by analyzing timeseries changes in vegetation and forest distribution and their relationships in urban areas. Specifically, using GIS, satellite data, demographic, urban infrastructure, and disaster data, we analyze the relationship among disaster occurrence and 1) population density, 2) urban infrastructure development, and 3) forest distribution. From the results of this analysis, we evaluate the relationship among forest, urban development and natural disasters (hereinafter referred to as "forest-urban-disaster resilience"). In this implementation, the Time-series Multilayered Risk-Resilience Calculation is specifical y realized by the fol owing steps using GIS. STEP 1: Visualize the base near-infrared band (NIR) using satellite multispectral imagery of the target area. STEP 2: Calculate and visualize vegetation indices (NDVI) using satellite multispectral imagery. STEP 3: Create virtual layers by incorporating open geographic information data such as demographics, urban infrastructure, and disaster data for the target area. STEP 4: By switching virtual layers and overlaying them with the base map, the time-series multilayer risk-resilience is calculated to analyze "forest-urban-disaster resilience". STEP 5: Zoom in on a part of the target area where a major change is observed to confirm the details of the relationship among forests, urban development, and natural disasters. 5.1 Data and Tools In this implementation, the Time-series Multilayered Risk-Resilience Calculation is concretely realized for the fol owing four types of data. S. Sasaki, Y. Kiyoki, A. Hamano: A Risk-Resilience Calculation Method for Environmental Change and Disaster Analysis with 5D World Map System Visualization 347. 1. Using satellite multispectral images, vegetation distribution is calculated, and time-series changes (differences) are calculated. 2. Population growth rate is measured from time-series population data as demographic data. 3. As urban infrastructure data, geographic data of highway construction is used to overlay the vegetation distribution base map. 4. Data on sediment, flood, and inundation hazard zones wil be used as disaster risk data to overlay the vegetation distribution base map. QGIS [42], an open-source GIS, is used. The concrete data used for experiments are introduced in Section 6. 5.2 Data Processing 5.2.1 Grid creation The grid data (1km2) is created using the investigation tool of QGIS. The number of grids is 13,392 in this implementation. 5.2.2 Line and Point (Vector data) Line and point data (e.g. transportation data) are processed to the calculable form by the fol owing steps with shape files. Step 1: Count the numbers of lines and points and convert them to the value of each grid 5.2.3 Polygon (Vector data) Polygon data (e.g. disaster occurrence) is processed to the calculable form by the following steps with shape files. Step 1: Calculate the centroids (points) to each polygon. Step 2: Count the numbers of centroids (points) and convert them to the value of each grid. 348 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 5.2.4 Image data (Raster data) Satellite image data (e.g. forest distribution) is processed to the calculable form by the following steps. Step 1: Extract Band 2, 3, 4, and 5 of 10 cloud cover in the area to be analyzed from LANDSAT8 in LandBrowser [9] Step 2: Calculate the Normalized Difference Vegetation Index (NDVI) using Band 5 (near infrared NIR) and Band 4 (visible light red R) using the following formula using Raster Calculator NDVI = (NIR-R)/(NIR+R) Step 3: Construct virtual raster in QGIS using Band 2, 3, 4, and 5 Step 4: change the virtual raster to False Color using color-lamps Step 5: Convert the raster data to polygon data (Polygonization) Step 6: Calculate the centroids (points) to each polygon Step 7: Count the numbers of centroids (points) and convert them to the value of each grid 5.2.5 Total Risk-Resilience Calculation Step 1: Time-series grid-layers of each variable are integrated by “spatial join of attributes” using data id. In this implementation, we select “equals” from geometric relations such as “intersects”, “overlaps”, “contains”, “within”, “crosses” and “touches” [35]. Step 2: For the Time-series Calculation, the difference between two input layers is calculated, and for Multilayer Calculation, the accumulation among multiple layers is performed. 6 Experiments To examine the feasibility of the proposed method, we conducted several experiments using the time-series data of forest-related disasters in Ibaraki prefecture in the Kanto region of Japan (2015-2020) as an example. For the purpose of evaluating the role of forests in urban disaster resilience, we define the relationship between forests, urban development, and natural disasters as "forest-urban-disaster resilience," and describe a method to analyze and visualize the time- S. Sasaki, Y. Kiyoki, A. Hamano: A Risk-Resilience Calculation Method for Environmental Change and Disaster Analysis with 5D World Map System Visualization 349. series of vegetation and forest distribution in urban areas using unevenly distributed time series geographic data, socio-economic indicator data, and natural disaster data. First, we set a vulnerability for disaster as an explanatory variable y, and 1) disaster risk/hazard, 2) population density, 3) transportation (high-way) density, 4) forest distribution as objective variables x 1, x 2, x 3 and x 4. Second, we conduct experiments to examine the Multilayer Calculation function using the data of 2015 and the Timeseries Calculation function using the data 2015 and 2020. Third, we examine the total risk values to analyze the relation among a vulnerability for disaster and 1) disaster risk/hazard, 2) population density, 3) urban infrastructure development and 4) forest distribution. Finally, we discuss the feasibility of our method to accurately assess the importance of forests and their effects and impacts. Experiment 1: Examination on the Multilayer Calculation function Experiment 2: Examination on the Time-series Calculation function Experiment 3: Examination on the Total Risk-Resilience Calculation function Figure 6: Target area for analysis (Ibaraki prefecture, Japan) Source: own. Data for experiments are: − Disaster Data 1: 2015 & 2020 Landslide Disaster Precaution Area Data (Polygon, Shape file) obtained from National Land Data [13]. 350 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 − Disaster data 2: 2015 & 2020 Flood Inundation Assumed Inundation Area Data (Polygon Shape file) obtained from National Land Information [14], Ministry of Land, Infrastructure, Transport and Tourism. − Demographic data: Population distribution data of the Tokyo metropolitan area in 2015 & 2020 obtained from the Ministry of Land, Infrastructure, Transport and Tourism's National Land Survey Data [11] (Grid, Population Projection by 1km mesh of the National Land Survey Data (H30 National Bureau Estimates)). − Urban infrastructure data: 2020 expressway time series data (Line and points, Shape file) obtained from National Land Information [12], Ministry of Land, Infrastructure, Transport and Tourism. − Forest distribution data: satellite Landsat8 multispectral images (GeoTiff data) of Kanto region in 2015 and 2020 obtained from USGS [36] and Copernicus Open Access Hub [41]. 6.1 Experiment 1: Examination on the Multilayer Calculation of the Risk-Resilience of Environmental Phenomena Figure 7 shows the original geographical information data of Ibaraki area in 2015. (a) Disaster Risk (Polygon, Vector data), (b) Population (Grid, Vector data), (c) Highway (Line and Point, Vector data), Vegetation (NDVI, Raster data) and (d) Forest distribution (Raster data) are shown. Figure 8 shows the same data shown in Figure 7 converted to a grid form by a data processing method described in Section 5.2. By this process, distributed and heterogeneous data are normalized and become calculable. We can grasp that the disaster risk are high in the northern part which is a mountainous area close to Tochigi prefecture, the population density is high in the middle part with relatively-big cities (Tsukuba-city, Mito-city, Hitachinama-city and Hitachi-city), the transportation density is high in the southern part close to Tokyo, and the forest density is high in the western part. Figure 9 shows the total risk calculation result by the multilayer calculation method described in Section4.2. From this result, we can observe the total risk distribution that cannot be figured from Figure 8. From Figure 8, the independent layer’s values S. Sasaki, Y. Kiyoki, A. Hamano: A Risk-Resilience Calculation Method for Environmental Change and Disaster Analysis with 5D World Map System Visualization 351. and distribution can be observed, but total risk of each grid is difficult to estimate by human eyes. Only by this calculation, we can evaluate the total risk distribution. (a) Disaster Risk (2015) (b) Population (2015) (c) Transportation (2015) (e) Forest (NDVI) (2015) MIN (-0.5) 0 MAX (0.5) Figure 7: Original Data of Ibaraki area in 2015: (a) Disaster Risk (Polygon, Vector data), (b) Population (Grid, Vector data), (c) Highway (Line and Point, Vector data), Vegetation (NDVI, Raster data) and (d) Forest distribution (Raster data) Source: own. The results of Experiment 1 in Figure 7 – Figure 9 show that our method enables to calculate the total risk/resilience for disasters of the target area as aggregated values with multidimensional, distributed and heterogeneous variables by a normalized grid format. 352 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 (a) Disaster Risk (2015) (b) Population (2015) (c) Transportation (d) Forest (2015) (2015) Figure 8: Data of 2015 in Figure 7 converted to grid data: (a) Disaster Risk, (b) Population, (c) Highway and (d) Forest distribution (NDVI) Source: own. MIN (0) MAX (119) Figure 9: Total Risk Calculation Result (without Population) in 2015 Source: own. 6.2 Experiment 2: Examination on the Time-series Change Calculation of the Risk-Resilience of Environmental Phenomena Figure 10, Figure 11, Figure 12 and Figure 13 show the results of time-series change calculation for geographical distribution of disaster risk, population, transportation and forest, respectively, from 2015 to 2020. From (a) 2015 and (b) 2020 of each figure, it is difficult to find the difference by human eyes. These results show that the time-series change is clearly notable with numerical values by our timeseries change calculation. Also, by using a grid form, distributed and heterogeneous data are normalized and become calculable and comparative. S. Sasaki, Y. Kiyoki, A. Hamano: A Risk-Resilience Calculation Method for Environmental Change and Disaster Analysis with 5D World Map System Visualization 353. From Figure 10, we can grasp that the disaster risk increased in the northern part which is a mountainous area close to Tochigi prefecture. Figure 11 indicates that the population density decreased in the northern coastal area around Hitachinakai-city, where a big Tsunami hit in 2011, and increased in the southern part around Tsukuba-city. Figure 12 shows that highway junctions increased a few in the southern part close to Tokyo. Figure 13 show that forest and vegetation are decreased seriously in the western part close to Saitama prefecture and newly developing cities (Shimotsuma-city and Yachiyo-city). These changes might be happened from land use change from agricultural area to housing area because the western part of Ibaraki is a large field of rice and vegetables. The results of Experiment 2, shown in Figures 10 through 13, indicate that this method can be used to calculate the time-series change of each variable in a normalized grid format among a multidimensional, distributed and heterogeneous set of variables of disaster risk/resilience in the target area. (a) Disaster Risk (2015) (b) Disaster Risk (2020) MIN (-117) 0 MAX (110) (c) Diff: 2020-2015 Figure 10: Time-series Change Calculation of Disaster Risk (2020-2015): (a) disaster risk in 2015, disaster risk in 2020, (c) increase or decrease in disaster risk value (grid expression, increase: red, decrease: blue) Source: own. 354 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 (a) Population (2015) (b) Population (2020) MIN (-336) 0 MAX (568) (c) Diff: 2020-2015 Figure 11: Time-series Change Calculation of Population (2020-2015): (a) population in 2015, population in 2020, (c) increase or decrease in population value (grid expression, increase: red, decrease: blue) Source: own. (a) Transportation (2015) (b) Transportation (2020) MIN (-1) 0 MAX (1) (c) Diff: 2020-2015 Figure 12: Time-series Change Calculation of Transportation (2020-2015): (a) highway junctions in 2015, highway junctions in 2020, (c) increase or decrease in the number of highway junctions (grid expression, increase: red, decrease: blue) Source: own. S. Sasaki, Y. Kiyoki, A. Hamano: A Risk-Resilience Calculation Method for Environmental Change and Disaster Analysis with 5D World Map System Visualization 355. (a) Forest (2015) (b) Forest (2020) MIN (-1) 0 MAX (1) (c) Diff: 2020-2015 (d) Forest (2015): grid of (a) (e) Forest (2020): grid of (b) MIN (-14) 0 MAX (12) (f) Diff: 2020-2015 Figure 13: Time-series Change Calculation of Forest Distribution (2020-2015) : (a) forest distribution (NDVI) in 2015, (b) forest distribution (NDVI) in 2020, (c) increase or decrease of forest areas (increase: green, decrease: red), (d) forest distribution (grid expression) in 2015, (b) forest distribution (grid expression) in 2020, (c) increase or decrease of forest areas (grid expression, increase: red, decrease: blue) Source: own. Experiment 3: Examination on the Time-series Multilayered Calculation of the Risk-Resilience of Environmental Phenomena Experiment 3 is a combination of the results of Experiment 1 and 2. Figure 14 shows the result of the total risk-resilience calculation by the Time-series Multilayered Calculation method described in Section 4.2 and Section 4.3. In this calculation, we did not include population density data because we judge that it is difficult to define if a population growth contributes to disaster resilience or not. Figure 15 shows an overlay result of Ibaraki-pref. base map [43] on the result shown in Figure 4 for a reference. Figure 16 shows the visualization and sharing of the result of Figure 14 on 5D World Map System and Figure 17 shows the same result on Google Earth. From these results, we can observe that the vulnerability to 356 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 disasters increased in the red part (the northern part of Ibaraki) and decreased in the blue part (the western part of Ibaraki). Conversely, the result can be interpreted that the resilience to disasters increased in the blue part (the western part of Ibaraki) and decreased in the red part (the northern part of Ibaraki). The results of Experiment 3 in Figures 14 and Figure 15 show that our method enables to transform multidimensional, distributed and heterogeneous variables into a form that al ows comparative and arithmetic operations through a normalization process, by reflecting the amount of change in each variable in the past to calculate a total aggregate value of risk/resilience to disaster in a specific target area. Figure 16 and Figure 17 show that the results by our method can be shared on the common Web application such as Google Earth or 5D World Map System to be utilized for designing disaster-countermeasure and policies in local governments. To increase the degree of accuracy and precision, it seems to be important to add more variables such as land use for houses, agriculture, manufacture, commercial malls, power plants and dams as infrastructure parameters and socioeconomic parameters. Also, natural parameters such as the amount of rainfal , humidity and snowfal , the frequency of serious earthquakes and forest fires should be added. S. Sasaki, Y. Kiyoki, A. Hamano: A Risk-Resilience Calculation Method for Environmental Change and Disaster Analysis with 5D World Map System Visualization 357. MIN (-7) 0 MAX (21) (f) Diff: 2020-2015 Figure 14: Total Risk-Resilience Calculation results by the Time-series Multilayered Calculation (Ibaraki prefecture in Japan, 2015-2020) Source: own. Figure 15: Overlay result of Ibaraki-pref. base map [43] on the result shown in Figure 4. Source: own & [43]. 358 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Figure 16: Mapping of the total risk-resilience calculation result as a KML/KMZ file for visualization and sharing on 5D World Map System Source: own. (a) (b) Figure 17: Mapping of the total risk-resilience calculation result for visualization and sharing on Google Earth [44]: (a) Mapping of a KML/KMZ file on Google Earth, (b) Mapping of shape file on QGIS with a Google Earth BaseMap Source: own. S. Sasaki, Y. Kiyoki, A. Hamano: A Risk-Resilience Calculation Method for Environmental Change and Disaster Analysis with 5D World Map System Visualization 359. 7 Conclusion and Future Direction In this paper, “Time-series Multilayer Risk-Resilience Calculation” method for global environmental change and disaster analysis has been presented. Through the implementation and experiments, it is examined that our method enables to transform multidimensional, distributed and heterogeneous variables into a form that al ows comparative and arithmetic operations through a normalization process, by reflecting the amount of change in each variable in the past to calculate a total aggregate value of risk/resilience to disaster in a specific target area. As a future development, we will implement an automatic disaster estimation and prediction system using open data to realize a disaster resilience improvement system integrated with 5D World Map System. Future issues include an addition of explanatory variables such as land use for houses, agriculture, manufacture, commercial malls, power plants and dams as infrastructure parameters and socioeconomic parameters, and natural parameters such as the amount of rainfall, humidity and snowfall, the frequency of serious earthquakes and forest fires etc. to increase the degree of accuracy and precision of our method. The goal of our research is to support the realization of SDG9, SDG13, SDG11 and SDG15 in countries and regions around the world, especial y in the Least Developing Countries (LDCs) that lack advanced observation equipment, technology, and financial resources. Specifical y, the project wil be developed with evaluation of relationships among multiple variables, and disaster prediction. We will develop our method to build a system that can predict potential disaster locations and vulnerable locations by integrating elements divided into the specific research fields. Acknowledgement We appreciate the members of ICT & Disaster Risk Reduction Division (IDD), UN ESCAP and Asia AI Institute (AAII), Musashino University and Thammasat University for the significant discussions on this study. 360 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 References [1] Kiyoki, Y. and Chen, X., “Contextual and Differential Computing for the Multi-Dimensional World Map with Context-Specific Spatial-Temporal and Semantic Axes.” Information Modelling and Knowledge Bases XXV 260 (2014): 82. [2] Kiyoki, Y., Chen, X., Sasaki, S., Koopipat, C., “A Global y-Integrated Environmental Analysis and Visualization System with Multi-Spectral & Semantic Computing in “Multi-Dimensional World Map””, Information Modelling and Knowledge Bases XXVIII, pp.106-122,2017. [3] Sasaki, S., Takahashi, Y, Kiyoki, Y., “The 4D World Map System with Semantic and Spatiotemporal Analyzers,” Information Modelling and Knowledge Bases, Vol.XXI, IOS Press, pp. 1 - 18, 2010. [4] Sasaki, S. and Kiyoki, Y., "Real-time Sensing, Processing and Actuation Functions of 5D World Map System: A Col aborative Knowledge Sharing System for Environmental Analysis" Information Modelling and Knowledge Bases, Vol. XXVIII, IOS Press, pp. 220-239, May 2016. [5] Sasaki, S. and Kiyoki, Y., "Analytical Visualization Functions of 5D World Map System for Multi-Dimensional Sensing Data", Information Modelling and Knowledge Bases XXIX, IOS Press, pp.71 – 89, May 2017. [6] Kiyoki, Y., Chen, X., C., Rachmawan, I. E. W., Chawakitchareon, P., “A SPA-based Semantic Computing System for Global & Environmental Analysis and Visualization with “5-Dimensional World-Map”: “Towards Environmental Artificial Intelligence”” Information Modelling and Knowledge Bases XXXI, Vol. 321, pp. 285 – 305, DOI 10.3233/FAIA200021, IOS Press, 2020. [7] Sasaki, S., Kiyoki, Y, Sarkar-Swaisgood, M., Wijitdechakul, J., Rachmawan, I. E. W.., Srivastava, S., Shaw, R. and Veesommai, C., “5D World Map System for Disaster-Resilience Monitoring from Global to Local: Environmental AI System for Piloting SDG 9 and 11”, Information Modelling and Knowledge Bases XXXI, Vol. 321, pp. 306 - 323, DOI 10.3233/FAIA200022, IOS Press, 2020. [8] Uraki, A., Sasaki, S. and Kiyoki, Y., "A Multi-dimensional Visualization Method for Disaster Analysis on 5D World Map System", 2018 Int’l Electronics Symposium (IES-KCIC), 139-145, 2018. [9] Kiyoki, Y, Chen, X., “A Semantic Associative Computation Method for Automatic Decorative-Multimedia Creation with “Kansei” Information” (Invited Paper), The Sixth Asia-Pacific Conferences on Conceptual Modelling (APCCM 2009), 9 pages, January 20-23, 2009. [10] Chen, X., Kiyoki, Y., “A Semantic Orthogonal Mapping Method through Deep-learning for Semantic Computing”, Information Modelling and Knowledge Bases XXX, Vol.312, pp.39 – 60, DOI 10.3233/978-1-61499-933-1-39, IOS Press, 2019. [11] Kiyoki, Y., Chawakitchareon, P., Rungsupa, S., Chen, X., Samlansin,K., “A Global & Environmental Coral Analysis System with SPA-Based Semantic Computing for Integrating and Visualizing Ocean-Phenomena with “5-Dimensional World-Map”, INFORMATION MODELLING AND KNOWLEDGE BASES XXXII, Frontiers in Artificial Intelligence and Applications 333, IOS Press, pp. 76 – 91, Dec 2020. [12] Rachmawan, I. E. W. and Kiyoki, Y., “A New Approach of Semantic Computing with Interval Matrix Decomposition for Interpreting Deforestation Phenomenon”, Information Modelling and Knowledge Bases XXX, Vol.312, pp.353 – 368, DOI 10.3233/978-1-61499-933-1-353, IOS Press, 2019. [13] Wijitdechakul, J. and Yasushi Kiyoki, Chawan Koopipat, “An environmental-semantic computing system of multispectral imagery for coral health monitoring and analysis”, Information Modelling and Knowledge Bases XXX, Vol.312, pp.293 – 311, DOI 10.3233/9781-61499-933-1-293, IOS Press, 2019. [14] Rachmawan, I. E. W. and Kiyoki, Y., “Semantic Multi-Valued Logic for Deforestation Phenomena Interpretation”, Information Modelling and Knowledge Bases XXXI, Vol. 321, pp. 401 - 418, DOI 10.3233/FAIA200027, IOS Press, 2020. S. Sasaki, Y. Kiyoki, A. Hamano: A Risk-Resilience Calculation Method for Environmental Change and Disaster Analysis with 5D World Map System Visualization 361. [15] Veesommai, C., Kiyoki, Y. and Sasaki, S., “A Multi-Dimensional River-Water Quality Analysis System for Interpreting Environmental Situations”, Information Modelling and Knowledge Bases XXVIII, pp.43-62, 2017. [16] Wijitdechakul, J., Kiyoki, Y., Sasaki, S., Koopipat, C., “A Multispectral Imaging and Semantic Computing System for Agricultural Monitoring and Analysis”, Information Modelling and Knowledge Bases XXVIII, pp.314-333,2017. [17] Rachmawan, I. E. W. and Kiyoki, Y.,” Semantic Spatial Weighted Regression for Realizing Spatial Correlation of Deforestation Effect on Soil Degradation”, International Electronics Symposium on Knowledge Creation and Intelligent Computing (IES-KCIC), September 26,2017, Surabaya Indonesia. [18] Wijitdechakul, J., Kiyoki, Y., Sasaki, S. and Koopipat, C., “UAV-based Multispectral Aerial Image Retrieval using Spectral Feature and Semantic Computing”, International Electronics Symposium on Knowledge Creation and Intelligent Computing (IES-KCIC), September 26,2017, Surabaya Indonesia. [19] UNDRR, “Implementing the Sendai Framework”, SF and the SDGs, Accessed: Jan. 30, 2023. [Online]. Available: https://www.undrr.org/implementing-sendai-framework/sf-and-sdgs [20] UNDRR, Sendai Framework, Accessed: Jan.30, 2023. [Online]. Available: https://www.undrr.org/publication/sendai-framework-disaster-risk-reduction-2015-2030 [21] NIED, J-SHIS Map, Accessed: Jan. 30, 2022. [Online]. Available: https://www.j-shis.bosai.go.jp/map/ [22] UNEP, Global Risk Data Platform, Accessed: Jan. 30, 2022. [Online]. Available: https://preview.grid.unep.ch/ [23] UNDRR, Prevention Web, Accessed: Jan. 30, 2022. [Online]. Available: https://www.preventionweb.net/ [24] UNDRR, Global Assessment Report on Disaster Risk Reduction, Accessed: Jan. 30, 2022. [Online]. Available: https://gar.undrr.org/ [25] NICT, ARIA project , Accessed: Jan. 30, 2022. [Online]. Available: https://testbed.nict.go.jp/interview/007_1.html [26] M. Kawamura, K.Tsujino, Y. Ohtsuji, “Investigation of Sediment Disaster Mitigation GIS by Using Results of Large Area Disaster Characteristic Analysis,” Journal of Disaster Science and Management, Vol.25(1), 2006, pp.35-50. [27] H. Kasa, M. Kurodai, S. Obayashi, H. Kojima, “On the Applicability of Remote Sensing Data for Landslide Prediction Model,” Journal of the Remote Sensing Society of Japan, Vol.12(1), 1992, pp.5-15. [28] R. Furuda, GIS and satellite image analysis with QGIS (Part 3: Basic functions, Part 2), Information Geology, vol. 29, no. 4, pp. 141-149, 2018. (Japanese) [29] M. Miyamoto, Causes of tropical deforestation: Reconsidering slash-and-burn, population growth, poverty, and road construction, Jirinshi (2010) 92: 226-234. (Japanese) [30] Katsuhide Yokoyama, Hiroto Tauchi, Hideo Amaguchi, Akira Kawamura, Study on the relationship between urbanization on steep slopes and landslide disasters in Japan (Japanese) [31] ESCAP SDGHELPDESK: https://sdghelpdesk.unescap.org/ [32] Closing-the-Loop - ESCAP: https://www.unescap.org/projects/ctl [33] Kiyoki, Y., Sasaki, S. and Barakbah, A.R., "AI-Sensing Functions with SPA-based 5D World Map System for Ocean Plastic Garbage Detection and Reduction ", Information Modelling and Knowledge Bases XXXIV, Jan. 2023. DOI:10.3233/FAIA220489 [34] Nakamura, Y. and Sasaki, S., "Disaster-Affected Area Estimation Method with Open Multispectral-Image Data Analysis for Multidimensional World Map System ", ICBIR 2022 - 2022 7th International Conference on Business and Industrial Research, Proceedings, 616-621, Jun, 2022. [35] Max J Egenhofer, David M. Mark, Modeling conceptual neighborhoods of topological relations, Geographical Information Systems 9(5):555-565, DBLP, September 1995. [36] USGS: https://www.usgs.gov/ 362 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 [37] Ministry of Land, Infrastructure, Transport and Tourism, National Land Numerical Data, Future Population Estimates by 1km Mesh (H30 National Bureau Estimates) (shape format version), in Japan (Japanese) https://nlftp.mlit.go.jp/ksj/gml/datalist/KsjTmplt-mesh1000h30.html [38] Ministry of Land, Infrastructure, Transport and Tourism, National Land Numerical Data, Expressway Time Series Data, in Japan(Japanese) https://nlftp.mlit.go.jp/ksj/gml/datalist/KsjTmplt-N06-v1_2.html [39] Ministry of Land, Infrastructure, Transport and Tourism, National Land Numerical Data, Flood Inundation Assumption Area Data, in Japan (Japanese) https://nlftp.mlit.go.jp/ksj/gml/datalist/KsjTmplt-A31-v2_1.html [40] Ministry of Land, Infrastructure, Transport and Tourism, National Land Numerical Data, Landslide Disaster Precaution Area Data, in Japan (Japanese) https://nlftp.mlit.go.jp/ksj/gml/datalist/KsjTmplt-A33-v1_4.html [41] Copernicus Open Access Hub, Open Hub: https://scihub.copernicus.eu/dhus/#/home [42] QGIS: https://qgis.org/ja/site/ [43] City, town, and vil age offices in Ibaraki prefecture: https://www.pref.ibaraki.jp/bugai/kokusai/tabunka/en/administration/level.html [44] Google Earth: https://earth.google.com/ ON THE PARALLEL SPACES OF KNOWLEDGE AND EXPERIENCE BASED ON THE CONCEPT OF “DARK-MATTER” XING CHEN,1 YASUSHI KIYOKI2 1 Kanagawa Institute of Technology, Department of Information & Computer Sciences, Kanagawa, Japan chen@ic.kanagawa-it.ac.jp 2 Keio University, Graduate School of Media and Governance, Tokyo, Japan kiyoki@sfc.keio.ac.jp This paper explores the phenomenon of inapplicability of experience, which occurs when we make mistakes by applying past experiences to current problems. The paper aims to create a knowledge model to analyze this phenomenon based on the concept of “dark-matter,” which represents time-related data. This model uses two-dimensional matrixes to represent both time-related and non-time-related data. In this paper, the authors propose a concept of “paral el spaces” for the expression and processing of knowledge based on the concept of “dark-matter.” Case studies are used to il ustrate how knowledge is generated and expressed in this model, including examples of the phenomenon of inapplicability of experience. The contribution Keywords: of this paper is the presentation of a new concept of “paral el knowledge, spaces” based on the concept of “dark-matter,” and the knowledge presentation, exploration of the relationship between the paral el spaces and knowledge knowledge and experience. Based on the case studies, the reason generation, for the phenomenon of inapplicability of experience is revealed, machine learning, semantic space, providing insight into how we can more effectively use our past spatiotemporal experiences to solve current problems. space DOI https://doi.org/10.18690/um.feri.5.2023.17 ISBN 978-961-286-745-4 364 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 1 Introduction People make judgments and decisions based on experience and knowledge to solve problems. However, it often happens that wrong judgments and decisions are made based on experience and knowledge. For example, in the stock trading process, people wil determine the current trade based on the stock’s past ups and downs and trading experience. However, decisions based on experience are not always correct. After you decide to buy, the stock may fal , or after you decide to sel , the stock may rise. This phenomenon is referred to as “the phenomenon of inapplicability of experiences” in this paper. The aim of this paper is to create a knowledge model to analyze this phenomenon. It is known that knowledge and experience build up over time. In other words, if we want to express knowledge and experience in computers, we need to create a time related knowledge model. In our previous research, we proposed a knowledge model based on a concept of “dark-matter” [1]. The concept of “dark-matter” stems from existing research on semantic computing models [2-5]. These models use semantic spaces to represent the meaning of data. By mapping data into target semantic spaces and presenting them as points, these models calculate Euclidean distances between them to perform semantic calculations. For instance, to conduct semantic queries, query data or keywords are mapped to a semantic space and summarized as a point. The same applies to retrieval candidate data, before computing Euclidean distance between the point of the query and the points of the retrieval candidates to determine which relative retrieval candidate should be a retrieval result. In our previous researches, two methods, Mathematical Model of Meaning (MMM) [4, 5], and Semantic Feature Extracting Model (SFEM) [2, 3] are used in creating semantic spaces. In MMM, a semantic dataset such as the English-English dictionary is used to create the semantic space. On the other hand, SFEM creates semantic spaces based on defined data sets relevant to specific applications. Mapping matrixes are required to map input data to the semantic space. SFEM mapping matrixes are defined according to the model's application [6, 7]. Various techniques have been developed to construct mapping matrixes of MMM for semantic information retrieval, classification, extraction, and analysis on cause and effect [8-14]. Moreover, a technique has been developed to construct the mapping matrixes through deep-learning [15]. In MMM and SFEM, the elements of the matrixes that represent X. Chen, Y. Kiyoki: On the Parallel Spaces of Knowledge and Experience Based on the Concept of “Dark-matter” 365. semantic spaces are predefined, which distinguishes them from other models, such as the artificial neural network model and deep-learning artificial neural network model [16-19]. A mechanism based on the semantic space model is developed to implement basic logic computation to determine true and false judgments [20], which is a fundamental mechanism required for machine learning. This mechanism is applied to simulate unmanned ground vehicle control [21]. Case studies are used to illustrate why the phenomenon of inapplicability occurred. A model is presented for temporal data processing, with the word “matter” used to represent non-temporal elements and “dark-matter” for temporal y changing elements [1]. An exploratory research is presented on knowledge expression and generation processes based on the concept of “dark-matter” [22]. This research creates a new knowledge model to analyze the phenomenon of inapplicability and introduces the concept of “paral el spaces” based on previous studies. The concept of “dark-matter” which is used in the analysis of the phenomenon of inapplicability is discussed in Section 2. Section 3 describes the relationship between knowledge and dark-matter, and Section 4 analyzes the phenomenon of inapplicability, and provides case studies to il ustrate the issue. In section 4, the paper goes on to present the concept of “paral el spaces” and its applications in examples before concluding the study. 2 The machine learning model created based on the concept “dark-matter” This section offers a brief review of a machine learning model known as the dark-matter learning model, which is based on the concept of dark-matter. In this model, matrix multiplication is referred to as mapping or space mapping. If a two-dimensional matrix X represents a space, matrix X can be decomposed into matter and dark-matter matrix. The former refers to the first column of X that correlates with sensor data and is referred to as visible data, while the latter refers to the matrix’s second to the last column that is randomly filled and defined as chaotic space. Figure 1 shows an example of the space matrix X. According to the paper [1], knowledge expression and knowledge generation are linked to dark-matter. 366 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Figure 1: An example of a space matrix X Source: own. The learning process in the dark-matter learning model is concerned with transforming chaotic space into ordered space. The learning model resembles a state machine where states transition from one to another. A state transition diagram is employed to illustrate this process. Antimatter space is defined as the inverse of the matrix X, and it is used to create a new matrix C from E that represents actions taken by agents. Mass and energy equivalent equation explain that mass is visible and correlates with sensor data and refers to the amount of matter an object contains. Energy is represented by a vector E and is measurable. Equation (1) shows the relationship of these three matrixes. Figure 1 shows an example of matrix X. E = X*C (1) As mentioned in the paper [22], the creation and development of knowledge is linked with what is cal ed “dark-matter.” A “dark-matter learning model” grounded on this concept is also presented in the paper. The goal of the learning process is to shift from a chaotic space to an ordered space wherein the learning model is viewed as a state machine. The state machine's state changes from one to another, referred to as state transition, as il ustrated by state transition diagrams such as Figure 2. In Figure 2, A circle and a number identify a state, while an arrow signifies the state’s transition from one to the next. Numbers beside arrows indicate the condition needed for the transition, like for state “0.0” to transition to state “0.1” when the input data is “0.0.” Figure 3 displays the creation of an “ordered space” from the original “chaotic space” through state transitions using the example of the state transition diagram in Figure 2. The start state is “0.0” and the first step is the transition from “0.0” to “0.1”. The chaotic space is shown in Figure 3 (a) and the first step is shown in Figure X. Chen, Y. Kiyoki: On the Parallel Spaces of Knowledge and Experience Based on the Concept of “Dark-matter” 367. 3 (b), with the transition from “0.1” to “0.3” shown in Figure 3 (c). The full state transition diagram is represented in Figure 3 (d), creating an “ordered space” from the original “chaotic space.” The “matter” refers to the first column of the matrix and the grey-painted second, third, and fourth columns represent the “dark-matter” matrix. 0.0 0.0 0.1 0.1 1.0 1.0 1.1 1.1 Figure 2: An example of state transition Source: own. 0.0 8.0 1.0 1.0 0.0 0.1 1.0 1.0 0.0 0.1 0.3 1.0 0.0 0.1 0.3 0.0 0.1 6.0 1.0 5.0 0.1 6.0 1.0 5.0 0.1 0.3 1.0 5.0 0.1 0.3 0.0 5.0 0.2 9.0 4.0 5.0 0.2 9.0 4.0 5.0 0.2 9.0 4.0 5.0 0.2 0.1 0.3 0.0 0.3 4.0 4.0 4.0 0.3 4.0 4.0 4.0 0.3 4.0 4.0 4.0 0.3 0.0 4.0 4.0 Chaotic space Transition of state from 0.0 to 0.1 Transition of state from 0.1 to 0.3 Ordered space (a) (b) (c) (d) Figure 3: Creating an “ordered space” from a “chaotic space” Source: own. If X-1 is an inverse matrix of the matrix X, the matrix X-1 is referred to as an “antimatter space.” By applying antimatter space to a matrix E, which represents actions of agents, a new matrix C is created as shown in equation (2). The calculation presented by the equation (2) is referred to as “learning” or “training” calculation. C = X-1*E (2) In physics, “mass” is the measure of the amount of matter an object contains and can be sensed by sensors. “Matter” is defined as a vector that correlates with sensor data and is similar to the concept of “mass”. Energy in physics is a vector E, and the elements of the vector E and the mass in matrix X are measurable. Thus, the 368 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 measurable values of E correspond to energy in physics and the measurable values of mass in physics correspond to “matter”. To analyze the relationship between equation (1) and the “energy mass equivalent equation”, we will take an example. Suppose that space X is a five-by-five matrix, as shown in Figure 4 (a). This means that there are five different types of matter in space. The elements in the first column of the matrix are the mass of matter. In the example, the values of the “matter” and “dark-matter”, which are the elements of the matrix X, are shown in Figure 4 (a). The elements of the vector E are shown in Figure 4 (b). The inverse matrix of X, X-1, is shown in Figure 4 (c). 6.00 14.16 72.84 49.42 26.72 600.00 -0.09 -0.07 0.11 -0.15 0.16 3.00 83.37 75.00 36.21 28.21 300.00 -0.02 0.01 0.01 0.00 0.00 9.00 27.55 32.70 59.96 20.56 900.00 0.02 0.00 -0.01 -0.01 0.00 2.00 28.83 34.27 35.39 74.98 200.00 0.01 0.01 0.01 0.02 -0.03 7.00 55.43 86.68 38.21 87.91 700.00 -0.01 -0.01 0.00 0.01 0.01 mass dark-matter E X-1 (a) Space X: created with five types of matter (b) Energy vector (c) Inverse matrix of X Figure 4: An example of a space with five different types of matter Source: own. Another matrix, vector C is also used to analyze the relationship between the energy and “mass equivalent equation”. The values of the element of the vector C are calculated based on equation (2), multiplying X-1 by E. if the value of the first element of vector C is a non-zero value, and al other elements in vector C are zero, as that shown in Figure 5, the elements of vector E can be calculated by multiplying vector mass and only the first value of vector C, that is the dark-matter is not utilized in this computation process, equation e=mc2 is appropriate for this scenario where the first value of vector C is presented as c2. X. Chen, Y. Kiyoki: On the Parallel Spaces of Knowledge and Experience Based on the Concept of “Dark-matter” 369. 100.00 -0.09 -0.07 0.11 -0.15 0.16 600.00 0.00 -0.02 0.01 0.01 0.00 0.00 300.00 0.00 = 0.02 0.00 -0.01 -0.01 0.00 * 900.00 0.00 0.01 0.01 0.01 0.02 -0.03 200.00 0.00 -0.01 -0.01 0.00 0.01 0.01 700.00 C X-1 E Figure 5: Multiplying X-1 by E to calculate the rule vector C Source: own. The vector E can be calculated from C using equation (1), as shown in Figure 6 (a). Let c2 represent the first value of C. Then, the vector E can be calculated by multiplying the first column of X by c2, as shown in Figure 6 (b). 600.00 6.00 14.16 72.84 49.42 26.72 100.00 600.00 6.00 300.00 3.00 83.37 75.00 36.21 28.21 0.00 300.00 3.00 900.00 = 9.00 27.55 32.70 59.96 20.56 * 0.00 900.00 = 9.00 * 100.00 200.00 2.00 28.83 34.27 35.39 74.98 0.00 200.00 2.00 700.00 7.00 55.43 86.68 38.21 87.91 0.00 700.00 7.00 E mass C E mass c2 dark-matter (a) Calculating E with vector C2 (b) Calculating E with the scalar value c2 Figure 6: The calculated result of E with the vector C2 and the scalar value c2 Source: own. In this example, if e is the i-th element of the vector E and m is also the i-th element of the vector mass, it can be found that e = m c2. For example, for the first element of E and the first element of mass, E = 600; m = 6; c2 = 100, where, c = 10; 600 = 6 * 102. 370 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 That is, e = m c2. To summarize, the dark-matter learning model employs matrix multiplication and defines dark-matter as chaotic space. It uses a state machine to transform chaotic space into ordered space. The concept of antimatter space is introduced, and the mass and energy equivalent equation explains the relationship between mass and visible object and measurable energy. A comprehensive example il ustrates how vector C and vector E are computed. The relationship between mass and energy is described by the equation e = mc2. 3 A knowledge representation method with “dark-matter” The relationship between dark-matter and knowledge is explained in [22]. In the example shown in Figure 7, an agent’s state is defined as its position in the maze. There are 16 possible positions, so the state space is represented by a 16-element vector. The agent’s actions are defined as the four cardinal directions: up, down, left, and right. The start position is marked with the character “S” and the goal position is marked with the character “G”. The agent wil go along the path shown by the arrow-mark “→”. The agent will go through the points (2,1), (2, 2), (3, 2), (4, 2), (4, 3), (4, 4) and (3, 4) and reach the goal position (2, 4), as shown in Figure 7 (a). A space matrix can be defined by representing the agent’s states as its positions on the maze matrix. There are 16 possible positions for the agent on the maze matrix, so the state space is represented by a 16-element matrix as shown in Figure 7 (b). There are 16 values from 0.0 to 1.5 which are used as the index of positions. 1 2 3 4 1 2 3 4 1 0 0 0 0 1 0.0 0.4 0.8 1.2 2 S 1 0 G 2 0.1 0.5 0.9 1.3 3 0 1 0 1 3 0.2 0.6 1.0 1.4 4 0 1 1 1 4 0.3 0.7 1.1 1.5 (a) A maze matrix (b) The position index values of the maze matrix Figure 7: A maze matrix shown for an agent moving from “S” to “G” Source: own. X. Chen, Y. Kiyoki: On the Parallel Spaces of Knowledge and Experience Based on the Concept of “Dark-matter” 371. If the agent does not have a position sensor, but it has a laser sensor, the output of the sensor shows which direction the agent can move. The states of the agent can be calculated through the output values of the laser sensor. However, the states of the agent cannot always be calculated without dark-matter. Let’s take an example to explain it. The output values of the sensor are defined as follows: − “1000” - Up − “0100” - Down − “0010” - Left − “0001” – Right The direction in which the agent can move at each position is shown in Figure 8 (a). The sensor’s output values for each direction are shown in Figure 8 (b) and the sensor’s output values when the agent is in different positions are shown in Figure 8 (c). For example, at the position (3, 2), the agent can move to “Up” direction to go back from the current position to its previous position., and “Down” directions to a new position, therefore, the output value of the sensor is “1100”. If the agent is at the point (2, 2), the agent can move to two different positions, the agent can move back to the start position (2, 1) when it moves “Left”. It can also move “Down” to the next position (3, 2). The output value of the sensor at the position (2, 2) is “0110”, which is the sum of the two directions "0010", "Left" and "0100", "Down". 1 2 3 4 1 2 3 4 1 0 0 0 0 1 0 0 0 0 2 S 1 0 G Up Down Left Right 2 0001 0110 0 0100 3 0 1 0 1 1000 0100 0010 0001 3 0 1100 0 1100 4 0 1 1 1 4 0 1001 0011 1010 (a) The maze (b) The output of (c) The output of the matrix the laser sensor laser sensor in the maze Figure 8: The output of the laser sensor of the agent Source: own. As shown in Figure 8 (c), the sensor output values for the agent’s positions (3, 2) and (3, 4) are both "1100". This means that the agent can move “Up” or “Down”. If a function with the sensor value as the input of the function is used to retrieve the agent’s action, two different actions, “Moving Down” and “Moving Up”, wil be 372 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 returned. Therefore, without dark-matter, it is impossible to find a unique action for the agent. Dark-matter is used to calculate the best action to take. The path from the start position to the goal position is not always known at the start position. For example, when reinforcement learning is used to find the path, the path is unknown at the start position. The path is found after many trials. Rewards are assigned to the found paths. The path with the shortest length is assigned the highest reward. The agent is trained to obtain the maximum reward during reinforcement learning. In this way, the optimal path from the start position to the goal position can be found. At the same time, the agent’s actions at the positions on the path are determined. When the path is unknown at the start position, it is impossible to create a space matrix as shown in Figure 8 (a). Here, a new method is proposed for generating the space matrix. This method records passed positions instead of the next positions. For example, when the agent moves from the start position (2, 1) to the position (2, 2), the passed position (2, 1) is recorded. This means that the dark-matter matrix can be created based on events that have occurred, rather than events that wil occur in the future. In the following, an example is used to illustrate the method in detail. In the example, the agent has a laser sensor. The agent starts at the position (2, 1). It uses its laser sensor to scan the environment and detects that there are obstacles at the position (1, 1) and (3, 1). The agent then moves to the position (2, 2). It again uses its laser sensor to scan the environment and detects that there are obstacles at the position (1, 2) and (2, 4). Then, the agent moves to the position (3, 2). The agent continues to move in this way, recording the passed positions as it goes. After a while, the agent has created a space matrix that shows al of the positions that it has passed. This matrix can be used to plan the agent’s next move. For example, if the agent wants to move to the position (3, 4), it can use the space matrix to find the shortest path. X. Chen, Y. Kiyoki: On the Parallel Spaces of Knowledge and Experience Based on the Concept of “Dark-matter” 373. The new method for generating the space matrix is more efficient than the previous method. This is because the new method only records the passed positions, while the previous method had to record al of the possible positions. The new method is also more accurate, because it is based on events that have occurred, rather than events that will occur in the future. The maze is represented by a matrix as shown in Figure 9(a), where the agent can be in any position marked with a “1” and cannot be in any position marked with a “0”. The output of the sensor at each position is shown in Figure 9(b), and the index values of the sensor outputs are shown in Figure 9(c). For example, when the agent is at the position (2, 2), the output of the sensor is “1110”, which indicates that the agent can move up, down, and left. The agent cannot be in the position where it is marked as “0” shown in Figure 9(a), so the outputs of the sensor at those positions are marked with an “X”. 1 2 3 4 1 2 3 4 1 2 3 4 1 1 1 1 0 1 0101 0111 0111 X 1 0.5 0.7 0.7 X 2 1 1 0 1 2 1001 1110 X 0100 2 0.9 1.4 X 0.4 3 0 1 0 1 3 X 1100 X 1100 3 X 1.2 X 1.2 4 0 1 1 1 4 X 1001 0011 1010 4 X 0.9 0.3 1.0 (a) The maze matrix (b) The output of the (c) The index of the laser sensor laser sensor Figure 9: The output of the laser sensor of the agent Source: own. In Figure 9(a), the agent starts at position (2, 1) and can move to two different directions: up and right. The probability of moving up is 0.5 and the probability of moving right is 0.5. When the agent moves to position (2, 2), it can move up, down, or left. The probability of moving in each direction is 0.33. The target position is (2, 4). The agent receives a reward of 1 for each step it takes to reach the target position. The agent learns to move in the direction that leads to the highest reward. For example, if the agent moves to the right from position (2, 1), it wil receive a reward of 1. This wil increase the probability of the agent moving right in the future. After 2000 trials, the agent has learned to move from position (2, 1) to position (2, 4) in 7 steps. The probability of the agent moving in the direction that leads to the target position is 0.99. 374 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 To record sensor values, a working memory mechanism is used. The mechanism creates a vector with the same number of elements as the number of steps required for the agent to reach the goal position. For example, if the agent starts at position (2, 1) and moves through positions (2, 2), (3, 2), (4, 2), (4, 3), (4, 4), and (3, 4) to reach the goal position (2, 4), the vector will have 7 elements. The initial values of the elements are randomly assigned. The current sensor value is then added to the vector, creating a new vector with 8 elements. The background of the element containing the current sensor value is white as shown in Figure 10(a), and the backgrounds of the elements containing the past sensor values are grey. For example, at the start position (2, 1), the current sensor value is 0.9, which is recorded in the first element of the vector. The background of the first element is white, and the backgrounds of the other elements are grey. The values of the elements from the second to the eighth are random because no sensor data has been recorded yet. When the agent moves to position (2, 2), the index value of the sensor at that position, 1.4, is recorded in the first element of the vector. The previously recorded value, 0.9, is moved to the second element as shown in Figure 10(b). The backgrounds of the first and second elements are white, and the backgrounds of the other elements are grey. This process continues as the agent moves through the maze as shown in Figure 10(c). When the agent reaches the goal position, al of the index values of the sensor outputs are recorded in the vector. The index value of the sensor output at the goal position is recorded in the first element, and the index value of the sensor output at the start position is recorded in the eighth element, as shown in Figure 10(d). The working memory mechanism al ows the agent to store a history of its sensor values. This information can be used to help the agent make decisions about where to move next. For example, if the agent has previously encountered a wal at a particular location, it is less likely to move in that direction in the future. X. Chen, Y. Kiyoki: On the Parallel Spaces of Knowledge and Experience Based on the Concept of “Dark-matter” 375. 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 0.9 94.0 84.0 12.0 78.0 58.0 84.0 23.0 1.4 0.9 84.0 12.0 78.0 58.0 84.0 23.0 (a) At position (2,1) (b) At position (2,2) 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1.2 1.4 0.9 12.0 78.0 58.0 84.0 23.0 0.4 1.2 1.0 0.3 0.9 1.2 1.4 0.9 (c) At position (3,2) (d) At position (2,4) Figure 10: Working-memory and recorded index value of the sensor output Source: own. The agent’s actions are recorded by an action index value. For example, if the agent moves to the right at the start position, the index value 0.1, which is the index value of moving to the right, is recorded. A vector E is used to record al the action index values from the start position to the end position, as shown in Figure 11(c). As shown in Figure 11(a), the space X is a collection of the working memory of the agent moved from the start position to the goal position. Its inverse matrix X-1 is shown in Figure 11(b). By multiplying X-1 by E, a vector C is generated, as shown in Figure 11(d). 0.3 0.9 1.2 1.4 0.9 58.0 84.0 23.0 -0.03 2.77 -0.03 0.03 0.01 -0.03 -0.06 0.00 0.1 -0.06 0.4 1.2 1.0 0.3 0.9 1.2 1.4 0.9 0.00 0.01 0.01 0.00 0.00 0.00 0.00 -0.01 0.0 0.00 0.9 94.0 84.0 12.0 78.0 58.0 84.0 23.0 0.00 -0.01 0.00 0.00 0.00 -0.01 0.00 0.01 0.1 0.00 0.9 1.2 1.4 0.9 78.0 58.0 84.0 23.0 0.00 -0.08 0.00 -0.09 0.00 0.09 0.00 0.00 0.1 0.03 1.0 0.3 0.9 1.2 1.4 0.9 84.0 23.0 -0.01 -0.02 0.00 0.01 0.00 0.00 0.00 0.00 0.8 0.00 1.2 1.4 0.9 12.0 78.0 58.0 84.0 23.0 0.02 0.03 0.00 0.00 -0.02 0.00 0.00 0.00 0.4 -0.01 1.2 1.0 0.3 0.9 1.2 1.4 0.9 23.0 0.00 0.01 0.00 0.00 0.01 0.00 -0.01 0.00 0.8 0.00 1.4 0.9 84.0 12.0 78.0 58.0 84.0 23.0 0.00 -0.14 0.00 0.00 0.00 0.00 0.05 0.00 0.4 0.04 (a) X (b) X-1 (c) E (d) C Figure 11: Working-memory and recorded index value of the sensor output Source: own. When the vector C is generated, the agent can calculate its actions at each relative position by multiplying the vector of the working-memory by C. This is expressed by equation (1) in Section 2. The dark-matter matrix can be used to calculate a unique action index value even if the output values of the sensor are the same. For example, at the position (3, 2) and (3, 4), the index values of the sensor output value are both 1.2, as shown in Figure 376 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 12 (a) and (b). The index values are recorded at the first elements of the two vectors. In the working-memory, the values of the dark-matter values in the second to the eighth elements of the two vectors are different, as shown in Figure 12 (a) and (b). 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1.2 1.4 0.9 12.0 78.0 58.0 84.0 23.0 1.2 1.0 0.3 0.9 1.2 1.4 0.9 23.0 (a) Values of the working-memory (b) Values of the working-memory at at position (3,2) position (3,4) Figure 12: Working-memory and recorded index value of the sensor output Source: own. The working-memory vectors are multiplied by the energy vector E to produce two action index values: 0.4 and 0.8, as shown in equation (3) and (4). The index value 0.4 indicates that the agent should move down to the position (3, 2), and the index value 0.8 indicates that the agent should move up to the position (3, 4). −0.06 ⎡ 0.00 ⎤ ⎢ ⎥ ⎢ 0.00 ⎥ [1.2, 1.4, 0.9, 12.0, 78.0, 58.0, 84.0, 23.0] × ⎢ 0.03 ⎥ = 0.4 (4) ⎢ 0.00 ⎥ ⎢−0.01⎥ ⎢ 0.00 ⎥ ⎣ 0.04 ⎦ −0.06 ⎡ 0.00 ⎤ ⎢ ⎥ ⎢ 0.00 ⎥ [1.2, 1.0, 0.3, 0.9, 1.2, 1.4, 0.9, 23.0] × ⎢ 0.03 ⎥ = 0.8 (5) ⎢ 0.00 ⎥ ⎢−0.01⎥ ⎢ 0.00 ⎥ ⎣ 0.04 ⎦ The agent learns to take appropriate actions at different positions to move from the start position to the goal positions. This is done through experience, which is a form of knowledge. In conclusion, knowledge can be expressed in the dark-matter matrix of the space matrix. This is because knowledge is a form of information, and X. Chen, Y. Kiyoki: On the Parallel Spaces of Knowledge and Experience Based on the Concept of “Dark-matter” 377. information can be encoded in the dark-matter matrix. The dark-matter matrix is possible to contains al of the knowledge that has ever been known. 4 The phenomenon of inapplicability of experience and the paral el spaces The phenomenon of inapplicability of experiences often happened when we use our experiences to make judgements. When we made judgements, it often happened that wrong judgments and decisions are made based on experiences. For example, in the stock trading process, we wil determine the current trade based on the stock's past ups and downs and our trading experience. However, it always happened that when we decided to buy, the stock fel , or when we decided to sell, the stock rose. Figure 13 shows an example of the phenomenon of inapplicability of experience. In the figure, the circles are the states of an agent and arrows indicate state migration. As shown in Figure 13 (a), at the beginning, the agent is in state 1, then the it will move to the next state, state 2. Once it reaches state 3, it migrates to the next state, state 4. If an agent is trained and got the experienced as shown in Figure 13 (a), the agent wil always move from the state 3 to state 4. However, if the next state of state 3 is state 5, the phenomenon of inapplicability of experiences happens as shown in Figure 13 (b). (a) The next state of state 3 (b) If the next state of state 3 is should be state based on the not state 4 but state 5, then the experience experience of (a) is inapplicable. Figure 13: An example of the phenomenon of inapplicability of experience When the phenomenon of inapplicability of the experience occurs, it is impossible to decide which state wil be the next in the current state, as shown in Figure 14. In 378 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Figure 14, both state 4 and state 5 are the next states after state 3. In this case, it's impossible to empirical y decide which state wil be the next state of state 3. Figure 14: Both the state 4 and the state 5 wil be the next state of state 3 Source: own. A reason for the phenomenon of inapplicability of experiences is due to the sensor's lack of detection accuracy. When the sensor's detection accuracy is insufficient, it is impossible to detect whether the current state should be state 3.1 or state 3.2, as shown in Figure 15 (a). Since it could only detect the current state as state 3, it is impossible to determine whether the next state should be state 4 or state 5. When the sensor has enough detection accuracy to detect whether the current state is state 3.1 or state 3.2, it can be determined whether the next state should be state 4 or state 5, as shown in Figure 15 (b). (a) Sensor’s lack of detection (b) The sensor has enough accuracy detection accuracy Figure 15: Sensor's lack of detection accuracy causes the phenomenon of inapplicability of experiences. Source: own. X. Chen, Y. Kiyoki: On the Parallel Spaces of Knowledge and Experience Based on the Concept of “Dark-matter” 379. The paral el space model proposed in this paper is a model to il ustrate the phenomenon of inapplicability of experience. Figure 16 is an example. In figure 16, there are an observation space and two paral el spaces, Space1 and Space2. The observation space is a projection space of the paral el spaces. That is, the observation space is a space the sensor can detect. In Figure 16, the sensor can only detect x and y direction. It cannot detect z direction. If two agents are trained in Space1 and Space2, respectively, they wil move from state 3 to state 4 and state 5, respectively. The phenomenon of inapplicability of experience wil not happen. But in the observation space, as shown in Figure 16, the phenomenon of inapplicability of experience wil happen. If the agent moves in Space2 based on the experience obtained from Space1, from state 3, it wil move to the wrong sate, state 4, which is not existed in Space2. If the agent moves in Space1 based on the experience obtained from Space2, from state 3, it wil move to the wrong sate, state 5, which is not existed in Space1. From the point view in the observation space, from state 3, sometimes the agent moves to state 4 but sometime it moves state 5. In the observation space, we cannot predict what state the agent will move to from state 3 until it has moved. Figure 16: An example of paral el spaces Source: own. Suppose an agent is used to control an automated cleaning robot and it wil clean two floors. The two floors are represented by Space1 and Space2, respectively. The robot's position on the floor plane is represented by x and y coordinates. The floor 380 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 where the robot is located is represented by z-coordinates. The robot’s sensors can detect positions of the robot in the x and y directions, but not the z direction. The positions of the robot are represented as 1, 2, 3 and 4 when the robot is on the Space1 floor plan. If the robot is on the Space2 floor plan, the positions of the robot are represented as 1, 2, 3 and 5. When the robot is on the floor plan Space1, as shown in Figure 17 (a), the matrix X is constructed with the elements of the first column are the position of the robot and the other columns are experiences and dark-matter. The inverse matrix of the matrix X, X-1 is represented in Figure 17 (b). The elements of vector E are the next positions of the current positions. As shown in Fugure 17 (c), the next positions are 2, 3 and when the current positions are 1, 2 and 3, respectively. When the robot reached to the position 4, its next position is also 4. The rule vector C, shown in Figure 17 (d), is calculated multiplying X-1 by C. 1.0000 94.0000 84.0000 12.0000 -0.0081 0.0020 -0.0158 0.2629 2 0.9781 2.0000 1.0000 84.0000 12.0000 0.0107 -0.0107 -0.0002 0.0028 3 -0.0002 3.0000 2.0000 1.0000 12.0000 0.0000 0.0119 -0.0122 0.0032 4 -0.0003 4.0000 3.0000 2.0000 1.0000 0.0002 0.0003 0.0883 -0.0665 4 0.0889 (a) X (b) X-1 (c) E (d) C Figure 17: Space1 Source: own. Figure 18 shows the matrices when the robot is on the floor plan Space2. Same as those of in Figure 17, the space X, its inverse matrix X-1, the next position vector E and the rule vector C are represented in Figure 18 (a), (b), (c) and (d), respectively. 1.0000 94.0000 84.0000 12.0000 -0.0064 0.0016 -0.0125 0.2082 2 0.9701 2.0000 1.0000 84.0000 12.0000 0.0107 -0.0107 -0.0001 0.0022 3 -0.0003 3.0000 2.0000 1.0000 12.0000 0.0001 0.0119 -0.0122 0.0025 5 -0.0124 5.0000 3.0000 2.0000 1.0000 -0.0002 0.0004 0.0875 -0.0526 5 0.1752 (a) X (b) X-1 (c) E (d) C Figure 18: Space2 Source: own. X. Chen, Y. Kiyoki: On the Parallel Spaces of Knowledge and Experience Based on the Concept of “Dark-matter” 381. As shown in Figure 16, on both floor plans, Space1 and Space2, the agent moves in the same way from the position 1 to the position 3. Therefore, experiences of the agent at the position 3 are the same. The experience stored in the working memory is shown in Figure 19. It is a row vector. Its first element 3.0000 represents the current robot’s position, the second element 2.0000 presents its previous position and the third element 1.0000 is the start position of the robot. The last element 12.0000 is the dark-matter. 3.0000 2.0000 1.0000 12.0000 Figure 19: Experiences stored in the working memory Source: own. Although the same experience is stored the in working memory at position 3, different next position is obtained using different rule vectors. The rule vector C of Space1 and the rule vector C of Space2 are shown in Figure 17 and 18 (d), respectively. Multiplying the row vector stored in the working memory by the rule vector C respectively, the next position, 4, in the Space1, and the next position, 5, are calculated respectively, as shown in Figure 20 (a) and (b). 0.9781 3.0000 2.0000 1.0000 12.0000 * -0.0002 = 4 -0.0003 0.0889 (a) Calculating the next state at state 3 using the rule of Space1 0.9701 3.0000 2.0000 1.0000 12.0000 * -0.0003 = 5 -0.0124 0.1752 (b) Calculating the next state at state 3 using he rule of Space2 Figure 20: Calculating the next state at state Source: own. 382 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Dimensional expansion is necessary when the phenomenon of inapplicability of experience happens. In the example shown in Figure 16, if only the x and y dimensions are used, experience is not applicable to decide which state, state 4 or 5 wil be the next state when the current state is state 3. Adding a new sensor is one of the methods for dimensional expansion. For example, we can add a color sensor to the cleaning robot to detect which floor it is on. Painting the floor of Space1 as yel ow and the floor of Space2 as green, the output of the color sensor wil be different on different floors. As a result, it can be found out which floor the robot is on and which wil be the next position when the robot is at the position 3. Another way for the dimensional expansion is to use sensors which can detect the next possible moving position. For example, if the sensors of the cleaning robot can detect whether the it can move to the position 4 or 5 at the position 3, it is possible to decide which wil be the next position when the robot reached at the position 3. Trial-and-error is also a dimensional expansion method. Suppose the robot is on the floor Space1. If the robot hits an obstacle as it moves from the position 3 to the next position, position 4, it can be found that the robot is not on the floor Space1 but on the floor Space2. 5 Conclusion and future work In this paper, the concept of “dark-matter” and its relative concept on experience and knowledge are reviewed. The concept of “space” represented as a matrix created based on experience is also reviewed. In addition, the characteristic of the concept of space’s “rule” is revealed, indicating that it is useful to transform experiences into the output of agents according to the inputs of the agents. Examples are used to il ustrate experience and knowledge expression. The concept of the working memory mechanism is reviewed which is used to recorde the agent’s experience and create the space matrix X. A new model “paral el spaces” is presented which is useful to illustrate the “phenomenon of inapplicability of experience.” This model revealed why mistake decisions based on experiences. The most important contribution of this paper is that we revealed how the “phenomenon of inapplicability of experience” happened and proposed solutions. In the paper, it is also proposed that for each space, only one rule vector is required to calucate output results for the given input. Therefore, only a two-dimensional matrix, of which each X. Chen, Y. Kiyoki: On the Parallel Spaces of Knowledge and Experience Based on the Concept of “Dark-matter” 383. column is a rule vector, is required for the parallel space model. Dimensional expansion is also required to decide which rule vector, or in other words, which space is used. Solutions for the dimensional expansion is presented. Adding new sensors and trial-and-error are the two methods for the dimensional expansion. As our future work, application systems based on the proposed methods and the mechanism wil be developed. References [1] Chen, X. and Kiyoki, Y., “On Semantic Spatiotemporal Space and Knowledge with the Concept of “Dark-Matter”,” Information Modelling and Knowledge Bases XXXIII, IOS Press, pp.110-128, 2021. [2] Chen, X. and Kiyoki, Y., “A query-meaning recognition method with a learning mechanism for document information retrieval,” Information Modelling and Knowledge Bases XV, IOS Press, Vol. 105, pp.37-54, 2004. [3] Chen, X. and Kiyoki, Y., “A dynamic retrieval space creation method for semantic information retrieval,” Information Modelling and Knowledge Bases XVI, IOS Press, Vol. 121, pp.46-63, 2005. [4] Kiyoki, Y. and Kitagawa, T., "A semantic associative search method for knowledge acquisition," Information Modelling and Knowledge Bases, IOS Press, Vol. VI, pp.121-130, 1995. [5] Kitagawa, T. and Kiyoki, Y., “A mathematical model of meaning and its application to multidatabase systems,” Proc. 3rd IEEE International Workshop on Research Issues on Data Engineering: Interoperability in Multidatabase Systems, pp.130-135, April 1993. [6] Chen, X., Kiyoki, Y. and Kitagawa, T., “A multi-language oriented intelligent information retrieval system utilizing a semantic associative search method,” Proceedings of the 17th IASTED International Conference on Applied Informatics, pp.135-140, 1999. [7] Chen, X., Kiyoki, Y. and Kitagawa, T., “A semantic metadata-translation method for multilingual cross-language information retrieval,” Information Modelling and Knowledge Bases XII, IOS Press, Vol. 67, pp.299-315, 2001. [8] Kiyoki, Y., Kitagawa, T. and Hitomi, Y., "A fundamental framework for realizing semantic interoperability in a multidatabase environment, " International Journal of Integrated Computer-Aided Engineering, Vol.2, No.1(Special Issue on Multidatabase and Interoperable Systems), pp.3-20, John Wiley & Sons, Jan. 1995. [9] Kiyoki, Y., Kitagawa, T. and Hayama, T., "A metadatabase system for semantic image search by a mathematical model of meaning, " ACM SIGMOD Record, Vol.23, No. 4, pp.34-41, Dec. 1994. [10] Kiyoki, Y, Chen, X. and Kitagawa, T., “A WWW Intel igent Information Retrieval System Utilizing a Semantic Associative Search Method,” APWeb’98, 1st Asia Pacific Web Conference on Web Technologies and Applications, pp. 93-102, 1998. [11] Ijichi, A. and Kiyoki, Y.: “A Kansei metadata generation method for music data dealing with dramatic interpretation,” Information Modelling and Knowledge Bases, Vol.XVI, IOS Press, pp. 170-182, May, 2005. [12] Kiyoki, Y., Chen, X. and Ohashi, H.: “A semantic spectrum analyzer for realizing semantic learning in a semantic associative search space,” Information Modelling and Knowledge Bases, Vol.XVII, IOS Press, pp.50-67, May 2006. [13] Takano, K. and Kiyoki, Y.: “A causality computation retrieval method with context dependent dynamics and causal-route search functions,” Information Modelling and Knowledge Bases, ISO Press, Vol.XVIII, pp.186-205, May 2007. 384 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 [14] Chen, X. and Kiyoki, Y.: “A visual and semantic image retrieval method based on similarity computing with query-context recognition,” Information Modelling and Knowledge Bases, IOS Press, Vol.XVIII, pp.245-252, May 2007. [15] Nitta T, “Resolution of singularities introduced by hierarchical structure in deep neural networks,” IEEE Trans Neural Netw Learn Syst., Vol.28, No.10, pp.2282-2293Oct. 2017. [16] Wiatowski, T. and Bölcskei, H., “A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction,” IEEE Transactions on Information Theory, PP(99) · Dec. 2015. [17] Hochreiter, S., Bengio, Y., Frasconi, P. and Schmidhuber, J. “Gradient flow in recurrent nets: the difficulty of learning long-term dependencies,” In Kremer, S. C. and Kolen, J. F. (eds.), A Field Guide to Dynamical Recurrent Neural Networks, IEEE Press, 2001. [18] Hochreiter, S. and Schmidhuber, J., “Long short-term memory,”. Neural computation, Vol.9, No.8, pp.1735-1780, 1997. [19] Kalchbrenner, N., Danihelka, I. and Graves, “A. Grid long short-term memory,” CoRR, abs/1507.01526, 2015. [20] Chen, X. and Kiyoki, Y., “On Logic Calculation with Semantic Space and Machine Learning,” Information Modelling and Knowledge Bases XXXI, IOS Press, Vol. 321, pp.324-343, 2019. [21] Chen, X. Prayongrat, M. and Kiyoki, Y., “A Concept for Control and Program Based on the Semantic Space Model,” Information Modelling and Knowledge Bases XXXII, IOS Press, Vol. 333, pp. 26-44, 2020. [22] Chen, X., “An Exploratory Research on the Expression of Knowledge and Its Generation Process based on the Concept of “Dark-matter”, Information Modelling and Knowledge Bases XXXIV, IOS Press, Vol. 364, pp. 110-124, 2023. A SPATIO-TEMPORAL AND CATEGORICAL CORRELATION COMPUTING METHOD FOR INDUCTION AND DEDUCTION ANALYSIS YASUHIRO HAYASHI,1 YASUSHI KIYOKI,1 YOSHINORI HARADA,2 KAZUKO MAKINO,2 SEIGO KANEOYA2 1 Musashino University, Tokyo, Japan yhayashi@musashino-u.ac.jp, y-kiyoki@musashino-u.ac.jp 2 Credit Saison Co., Ltd Tokyo, Japan We propose a spatio-temporal and categorical correlation computing method for induction and deduction analysis in order to reveal relationships between two sets in past events, thereby finding insights to build new relationships between the two sets in the future. The most significant feature of this method is that it provides a means for inductive and deductive thinking in the cycle of memory recal in which humans unravel the relationship between two entities. Concretely, this method calculates correlations in a ‘hypothesis-to-fact’ and ‘fact-to-hypothesis’ Keywords: approach based on information such as when, how often, who, spatio-temporal & where, what, and how, and enables to derive the relationship categorical between two sets. For the correlation calculation, this method correlation dynamical y creates a multi-dimensional vector space in which computing, the dimensions consist of time, space, and category, and creates induction and a vector consisting of temporal, spatial, and categorical features deduction analysis, dynamic multi-of independent elements in each attribute from past events dimensional vector containing two attributes with a certain relationship. The space creation, strength of relationships between the two sets is calculated as vector similarity. This method also makes it possible to derive facts composition from hypotheses by applying context vectors. This paper shows operator, the details of this method and implementation method and context-based data assumed applications in commerce activities. mining DOI https://doi.org/10.18690/um.feri.5.2023.18 ISBN 978-961-286-745-4 386 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 1 Introduction We propose a method that realizes deductive and inductive analysis of spatiotemporal and categorical relationships between entities that are related to each other from a certain set. The most significant feature of this method is that it provides a means for inductive and deductive thinking in the cycle of spatio-temporal and categorical memory recal in which humans unravel the relationship between two entities. Concretely, this method calculates correlations in a ‘hypothesis-to-fact’ and ‘fact-to-hypothesis’ approach based on information such as when, how often, who, where, what, and how, and enables to derive the relationship between two sets. For the correlation calculation, this method dynamically creates a multi-dimensional vector space in which the dimensions consist of time, space, and category. This method dynamical y creates a vector consisting of temporal, spatial, and categorical features of independent elements in each attribute from past events containing two attributes with a certain relationship. This vector is mapped to the multi-dimensional vector space. Based on the correlation calculation between vectors mapped to that multi-dimensional vector space, the strength of relationships between the two sets is calculated as similarity. Consequently, the method can retrieve past events mapped to a multi-dimensional vector space by time, space, and category conditions, and reveals relationships between the two sets. Namely, this method makes it possible to derive hypotheses from facts. The method also enables the setting of context vectors describing spatio-temporal and categorical features as hypotheses. This context vector is also mapped to a multidimensional vector space, and a distance calculation between the context vector and other vectors reveals relationships between the two sets. Namely, this method also makes it possible to derive facts from hypotheses. This method was inspired by the fol owing related research. 1.1 (1) Semantic Computing by the Mathematical Model of Meaning & Meta-Level System The Mathematical Model of Meaning & Meta-Level System is the core method that inspired this research. The mathematical model of meaning proposed by Kiyoki et al [1,2] is a method for computing semantic associations between data that change Y. Hayashi e. al.: A Spatio-Temporal and Categorical Correlation Computing Method for Induction and Deduction Analysis 387. dynamical y according to context or situation. A n orthonormal space cal ed the metadata space is created and media data are mapped onto the space. By calculating distances in the metadata space, this method realizes retrieval of media data that are semantical y similar to the query. If the context is given along with the query at the time of retrieval, the dimensionality selection control of the space is dynamically executed, and the retrieval of semantical y similar media data is executed according to the context. Furthermore, the meta-level system proposed by Kiyoki et al [2,3] is a method that enables integration and linkage of heterogeneous local database systems by setting up a meta-database system in the upper layer of heterogeneous local database systems. By realizing an integrated semantic space and a mechanism for semantic distance calculation in the meta-database system, the correlation between temporal, spatial, and semantic features obtained from each local database is weighed. With the proposed mathematical model of meaning and the meta-level system, Kiyoki et al. aim to realize a memory processing mechanism that realizes interpretation of dynamical y changing meanings and sensitivities depending on context or situation [3]. Our method analyzes data in which two attributes have some relationships. The data is inputted to our method by Table Join process on meta-level of heterogenous relational databases. Plus, our method dynamically controls dimensions to calculate correlations between two sets. Meta-level system and The Mathematical Model of Meaning are fundamental calculation model of this method. 1.2 Image-Query Creation Method The image-query creation method is proposed by Hayashi and Kiyoki [4,5] creates image queries for content-based image retrieval by combining images. In this method, an image-query creation database and image-query creation operators are set up in the query part of the content-based image database system. The combination of the image database and the operators is used to operate the color and shape features. Based on the color and shape features of the images that the searcher wants to focus on, this method dynamical y controls dimensions of the image query and the image database to be searched. 388 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 In our method, vectors of two sets are created in the integrated space of past events in various fields. Creating the vectors in the image-query creation method and calculating contextual correlation quantities in the orthogonal space of color and shape features is one of the methods that inspired this research. 1.3 Emotional MaaS (Mobility as a Service) The Emotional MaaS, proposed by Kawashima, Hayashi, and Kiyoki [6] is an application of the Mathematical Model of Meaning and Meta-Level System, calculates travel routes and facilities based on the context of tourists. MaaS provides mobility and related services to tourists across the board by highly integrating real space and information space. In this method, the context of the tourist’s speed of move, distance in real-space, and purpose are set in advance, and transportations and related facilities that are highly correlated with that context are weighed. In order to create a variety of traveler contexts, the intention and situation of each traveler are described as vectors, and the context is described by composing these vectors. The correlation calculation and the multi-dimensional vector space creation to describe various contexts are similar to our method in this research. In addition, our method can be applied to the commerce activity field to clarify human behaviors. Research area overlaps with that of data utilization in mobility information services. Figure 1: The Concept of The Proposed Method Source: own. Y. Hayashi e. al.: A Spatio-Temporal and Categorical Correlation Computing Method for Induction and Deduction Analysis 389. 2 A Spatio-Temporal & Categorical Correlation Computing Method for Induction and Deduction Analysis 2.1 Data Structure & Calculation Method This method executes spatio-temporal and categorical correlation calculation for induction and deduction analysis. The concept of this model is shown in Figure 1. The concrete calculations are defined as fol ows. The given a set P is expressed as Formula 1. The elements of the set P are expressed as pij. Where i := 1 …, q, q is the number of attributes of the set P. Furthermore, j := 1 …, r, r is the number of elements of the set P. Also, the attributes of the set P are expressed as ai. (1) (2) (3) Based on the given necessary conditions about time, space, and category, selection and projection are executed on the set P. The set P reduced in number of elements and attributes by this symbolic filtering is expressed as the set P'. The set P' is assumed to have two attributes ax and ay that have significant relationships as entities. Where 1 <= x < q, 1 <= y < q, and x not equals y. (4) Here, the set P' aggregated by the independent element P[ax] in the attribute ax is defined as a set U. 390 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 (5) Elements in the set U is expressed by uij. Where, i := 1 …, q, q is the number of attributes in the set U. Furthermore, j := 1…, m, where m is the number of elements in the set U. (6) When temporal attributes in the set U are at, the temporal elements are expressed as u[at]. Where { at | t = 1 …, tt } and tt is an arbitrary number. When spatial attributes in the set U are at, the temporal elements are expressed as u[at]. Where { at | t = 1 …, ss } and ss is an arbitrary number. Furthermore, when categorical attributes in the set U are at, the temporal elements are expressed as u[at]. Where { at | t = 1 …, cc } and cc is an arbitrary number. With the temporal feature extraction function tf, the spatial feature extraction function sf, and the categorical feature extraction function cf defined below, the temporal feature u[axj, t], the spatial feature u[axj, s], and the categorical feature u[axj, c] are calculated as fol ows. (7) Note that u[ax] = p[ax], j := 1…, m, where m is the number of elements in the set U. The spatio-temporal and categorical feature vector v[axj] of the set u[axj] is created by this process. (8) Y. Hayashi e. al.: A Spatio-Temporal and Categorical Correlation Computing Method for Induction and Deduction Analysis 391. Thus, the set P' aggregated by the independent element P[ay] in the attribute ay is defined by a set W. (9) Elements in the set W are expressed as wij. Where, i := 1 …, q, q is the number of attributes of the set W. Plus, j := 1 …, n, n is the number of elements in the set W. (10) When temporal attributes in the set W is at, the temporal elements are expressed by w[at]. Where { at | t = 1 …, tt } and tt is an arbitrary number. When spatial attributes in the set W are at, the temporal elements are expressed as w[at]. Where { at | t = 1 …, ss } and ss is an arbitrary number. Furthermore, when categorical attributes in the set W are at, the temporal elements are expressed as w[at]. Where { at | t = 1 …, cc } and cc is an arbitrary number. With the temporal feature extraction function tf, the spatial feature extraction function sf, and the categorical feature extraction function cf defined below, the temporal feature w[ayj, t], the spatial feature w[ayj, s], and the categorical feature w[ayj, c] are calculated as fol ows. (11) Note that w[ay] = p[ay], j := 1…, n, where n is the number of elements in the set W. The spatio-temporal and categorical feature vector v[ayj] of the set w[ayj] is created by this process. (12) 392 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 The vectors v[axj] and v[ayj] created by Equations 8 and 12 are mapped to a multidimensional vector space V with time, space, and category as dimensions. The distance between vectors dt, ds, dc is calculated for each temporal, spatial, and categorical feature by the temporal feature distance function td, spatial feature distance function sd, and categorical feature distance function cd defined by Formula 13. Plus, to calculate the total correlation score between mapped vectors v[axj] and v[ayj], the similarities dt, ds, dc calculated in different method are normalized and expressed as dt', ds', dc' (Formula 14). (13) (14) The result of each normalized distance calculation is multiplied by the weights wtt, wts, wtc and calculated as a sum value score. (15) 2.2 Context Vector for Induction and Deduction Data Analysis Approach To realize the inductive and deductive data analysis approach, this method applies a context vector CX which is the originality of this method. The inductive and deductive approaches in this method are defined as fol ows: The inductive approach (fact-to-hypothesis): To extract a context vector CX that is similar to the temporal, spatial, and categorical features of the elements of the set U or the set W as model events. The deductive approach (hypothesis-to-fact): To extract elements of the set U or elements of the set W that are similar to the temporal, spatial, or categorical features of some model events indicated by a context vector CX. Y. Hayashi e. al.: A Spatio-Temporal and Categorical Correlation Computing Method for Induction and Deduction Analysis 393. The context vector CX, consisting of the temporal feature cxt, the spatial feature cxs, and the category feature cxc, is expressed as follows. (16) Using Formula 15, the similarity calculation between the elements v[axj] of the set U and the context vector CX, or the elements v[ayj] of the set W and the context vector CX, enables an inductive or deductive data analysis approach. (17) 2.3 Spatio-Temporal and Categorical Feature Extraction Functions 2.3.1 Temporal Feature Extraction Function This function extracts temporal average and variance. The date in the purchase history data has two facets: the usage date and the usage date interval. The extracted temporal feature vector VT consists of the average ta of the usage dates in the n data in the purchase history, its variance tv, the average tia of the usage date interval, and its variance tiv. Note that the usage interval date is not constant. (18) When each usage date data in the purchase history is expressed as pi, the average ta of usage dates in n data is calculated as follows. Where, i := 1, ..., n. The dtoi function converts a Gregorian date to a standard date integer. The itod function converts an integer value of a date in standard format to a date in Gregorian format. The variance tv of the date of use in n data is calculated by Formula 20. Note that absolute values are used to simplify the calculation. (19) (20) 394 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 When each usage date data in the purchase history is expressed as pi, the average tia of usage date intervals for n data is calculated as follows. Where i := 1, ..., n. When n> 0, the average tia and its variance tiv are obtained. When n=0, tia and tiv are zero. Additionally, the variance tiv of the usage date interval in n data is calculated as follows. (21) (22) 2.3.2 Spatial Feature Extraction Function This function calculates spatial feature. The spatial feature vector VS consists of the center position sa and variance sv of latitude and longitude of stores included in the n data in the purchase history. (23) The latitude and longitude sa corresponding to the center of gravity of the latitude and longitude in the n data is calculated by Formula 24. When the latitude and longitude of two points p, q on the earth are given as p(latitude1, longitude1), q(latitude2, longitude2), the great circle distance sd between p and q is calculated by Formula 25. This formula is also used for spatial similarity degree. (24) Y. Hayashi e. al.: A Spatio-Temporal and Categorical Correlation Computing Method for Induction and Deduction Analysis 395. (25) When each latitude and longitude data in the purchase history is expressed as pi, the variance sv of latitude and longitude in n data is calculated as follows. Where i := 1, ..., n. sd is a function to calculate the great distance between pi and the center of gravity sa of latitude and longitude in n data. (26) 2.3.3 Categorical Feature Extraction Function The Category Feature Histogram VC is the sum of n stores' category vector data Ci in the purchase history. Where i := 1, ..., k. The category is expressed as tree data consisting of four levels: large, medium, small, and detailed. By converting the tree data format to vector data format, distance calculation in the vector space can be applied. When L major category, M medium category, S minor category, and D detailed category consist of l, m, s, and d elements, respectively, tree data T is converted to vector data Ci consisting of k := l + m + s + d elements. (27) 2.3.4 Vector Composition Function Given k vectors Vi := (vi1, vi2, …, vin) consisting of n-dimensions, this function that is named Vector Creation Operator composes a new vector by computing the sum of the same elements of those vectors. Where 2 <= i <= k. 396 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 (28) 2.3.5 Distance Calculation Function The inner product ip is used to calculate the semantic distance as similarity similarity between two context vectors Va, Vb created by this method. Where D := 2+4+k. Note that if semantical distance is required between elements of two vectors, Euclidean distance calculation and geographical distance calculation are also used. (29) 3 Implementation Method and Assumed Applications This method realizes deductive and inductive analysis of spatio-temporal and categorical relationships between entities that are related to each other from a certain set. One of the implementation fields of this method is the purchase between customers and stores in commerce. Customers and stores have a spatio-temporal and categorical relationship. Concretely, the relationship is that on a certain day (time information), a customer purchased a certain product (category information) at a certain store (place information). In order to implement this method, the system requires a computation layer that enables deductive and inductive analysis of the relationship between customers and stores, an input layer that receives parameters from the analyst, and an output layer that visualizes the analysis results. Furthermore, the following applications of this method in commerce are assumed. (1) Store recommendation based on inductive and deductive analysis of customers Y. Hayashi e. al.: A Spatio-Temporal and Categorical Correlation Computing Method for Induction and Deduction Analysis 397. the customer's spatio-temporal and categorical features and the context vector. Furthermore, store recommendations are performed to approach the ideal happiness by calculating the similarity between the context vector and the customer's happiness state. (2) Store recommendation based on deductive and inductive analysis of customers Deductively analyze customers who are similar to the ideal happiness by calculating the similarity between the spatio-temporal and categorical features of the context vector and customers. Furthermore, store recommendations are performed to approach better happiness by calculating the similarity between the customer and other customers. 4 Conclusion We described a spatio-temporal and categorical correlation computing method for induction and deduction analysis. The originality of this method is to realize deductive and inductive analysis of spatio-temporal and categorical relationships between entities that are related to each other from a certain set. The introduction of context vectors enables inductive data analysis (fact-to-hypothesis) and deductive data analysis (hypothesis-to-fact) by spatio-temporal and category features between two sets. This is corresponding to humans’ logical thinking involving the cycle of temporal, spatial, and categorical memory recal to reveal the relationship between two entities. For this reason, in the calculation process, this method dynamical y creates the metric space and queries based on the context, consisting of spatio-temporal and categorical features, and calculates correlations between customers and stores. Additionally, the implementation method and assumed applications were shown in this paper. For the next step, we start to develop a proto-type system that is applied the proposed method and experiments to evaluate effectiveness and feasibility, and business deployment. 398 PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 References [1] Yasushi Kiyoki, Xing Chen, “A Semantic Associative Computation Method for Automatic Decorative-Multimedia Creation with “Kansei” Information” (Invited Paper), The Sixth Asia-Pacific Conferences on Conceptual Modelling (APCCM 2009), 9 pages, January 20-23, 2009. [2] Yasushi Kiyoki and Saeko Ishihara: “A Semantic Search Space Integration Method for Meta-level Knowledge Acquisition from Heterogeneous Databases,” Information Modeling and Knowledge Bases (IOS Press), Vol. 14, pp.86-103, May 2002. [3] Kiyoki, Y., Chen, X., Veesommai, C., Wijitdechakul, J., Sasaki, S., Koopipat, C., & Chawakitchareon, P.: "A semantic-associative computing system with multi-dimensional world map for ocean-environment analysis", Information Modelling and Knowledge Bases XXX, pp. 147-168. [4] Hayashi, Y., Kiyoki, Y., and Chen, X.: "An Image-Query Creation Method for Expressing User’s Intentions by Combining Multiple Images", Information Modelling and Knowledge Bases, Vol.XXI, IOS Press, pp. 188-207, 2010. [5] Hayashi, Y., Kiyoki, Y., and Chen, X.: "A Combined Image-Query Creation Method for Expressing User’s Intentions with Shape and Color Features in Multiple Digital Images", Information Modelling and Knowledge Bases, Vol. XXII, IOS Press, pp. 258-277, 2011. [6] Kawashima, K., Hayashi, Y., Kiyoki, Y., Mita., T.: "A Mobility and Activity Integration System Supporting Sensitivity to Contexts in Dynamic Routing - Emotional MaaS -", Information Modelling and Knowledge Bases, Vol. XXXIII, IOS Press, pp. 297-308, 2021. [7] UN SDGs-3, Ensure healthy lives and promote well-being for al at al ages, https://sdgs.un.org/goals/goal3, 2023/01/28. [8] Tal Ben-Shahar, "Even Happier: A Gratitude Journal for Daily Joy and Lasting Fulfil ment," McGraw Hill, 2009. [9] Kahneman, D, and A Deaton. 2010. “High income improves evaluation of life but not emotional well-being.” Proceedings of the National Academy of Sciences 107 (38): 16489-16493. LOČNA LOČNA PROCEEDINGS OF THE 33RD INTERNATIONAL CONFERENCE ON INFORMATION MODELLING AND KNOWLEDGE BASES EJC 2023 Keywords: conceptual TATJANA WELZER DRUŽOVEC ET AL. (ED.) modelling, University of Maribor, Faculty of Electrical Engineering and Computer Science, knowledge Maribor, Slovenia. modelling, tatjana.welzer@um.si information modelling, The proceedings of the 33rd conference EJC 2023 combine the linguistic experience and knowledge of the experts working in different modelling, cross-cultural research fields of Information model ing, Conceptual model ing, communication, Knowledge and information model ing and discovery, Linguistic social computing, modelling, Cross-cultural communication and social computing, environmental modelling, Environmental modelling and engineering, and Multimedia data multimedia data model ing and systems. modelling DOI https://doi.org/10.18690/um.feri.5.2023 ISBN 978-961-286-745-4