https://doi.org/10.31449/inf.v48i14.6271 Informatica 48 (2024) 65–82 65 Real-time Semantic Healthcar e System: V isual Risks Identification for Elders and Childr en Malak Belkebir 1 , Toufik Messaoud Maarouk 2 , Brahim Nini 1 1 Research Laboratory on Computer Science’s Complex Systems ReLa(CS)2, University of Oum el Bouaghi, Oum el Bouaghi, Algeria 2 ICOSI laboratory, Dept. of Mathematics and Computer Science, Khenchela University, Khenchela, Algeria E-mail: belkebir.malak@univ-oeb.dz, maarouk.toufik@univ-khenchela.dz, brm.nini@gmail.com Keywords: deep learning, ontology, healthcare system, risk identification, high-level semantic, reasoning Received: May 26, 2024 Deep learning and data-driven appr oaches ar e commonly used to avoid accidents involving elders and childr en. However , existing models ar e limited by a semantic gap, hindering their ability to infer new risks that have not been pr eviously trained. In this paper , a r eal-time healthcar e system is developed to identify and infer visual risks in surveillance videos for elders and childr en. The system consists of thr ee main mod- ules: ”visual information extraction,” which leverages advancements in artificial vision techniques such as GCD and IoU for r elationship detection, YOLO for object detection, and ResNet18 for scene r ecognition; ”ontology modeling,” wher e a new high-level ontology named ”Risks-Identification-Onto” is constructed based on FOL and DLs; and ”risk identification,” wher e the system infers risks by deducing new knowl- edge thr ough the r easoning of generated formal rules over the data-driven techniques. Additionally , the system generates a high-level semantic description of the risky situation. Four common risk scenarios - ”Hurt,” ”Burn,” ”Existing-in-danger ous-places,” and ”Hit”- ar e selected to evaluate the effectiveness of the pr oposed system. Evaluation is conducted using the Charades and A2D datasets, each including 9,848 and 3,782 indoor activity videos. It demonstrate the system’ s efficiency in identifying and inferring risks in r eal-time with an accuracy ranging fr om 97.61% to 99.43% for each scenario. Povzetek: Študija pr edstavlja sistem zdravstvenega varstva v r ealnem času za pr epoznavanje vizualnih tveganj pri star ejših in otr ocih, ki uporablja globoko učenje in ontologijo. 1 Intr oduction Parents today are more distracted by work, outside activ- ities, and other responsibilities, making it difficult to pro- vide constant care for their young children or aging par- ents. Many statistics [1, 2] indicate an increase in incidents involving elders and children, underscoring the importance of ongoing supervision. Children, in particular, are more prone to accidents due to their inherent curiosity and lack of knowledge about potential risks. On the other hand, el- derly people with dementia or mobility issues have a higher risk of having an accident. With the prevalence of indoor and outdoor surveillance cameras, such as those found in homes, offices, and public places, they are widely used for manually monitoring children and elders, despite the fact that it is time-consuming and potentially ineffective during periods of inattention. One solution to automate the monitoring process is through artificial vision based on data, also known as ”data- driven approaches,” which have demonstrated significant breakthroughs and results [3–5]. These methods sought to limit risks by tracking individuals, identifying objects, and recognizing actions/behaviors. However, their reliance solely on existing data and the need for extensive training and testing datasets renders them inadequate for inferring and identifying emerging risks in line with evolving safety and care standards for children and the elderly. The pri- mary limitations of data-driven approaches lie in the ”se- mantic gap” and the absence of knowledge-based reason- ing regarding new information. For example, while a data- driven approach can detect ”an elder near a table, a knife is on the table,” it lacks the ability to infer from this out- put that ”this individual may be at risk of harm from the sharp object (knife).” Moreover, following a training phase with a large dataset, a single model or algorithm is typically employed to detect a specific risk in a given setting. Another solution used by researchers [3, 4, 6–14] is the integration of data-driven approaches with ontology and logical techniques to minimize the resource-intensive re- quirements, such as data and computational power. An ontology, as defined by Borst [15], is a ”formal specifica- tion of a shared conceptualization,” providing rich mean- ings and semantics for a specific domain that computers can understand and use in formal ways. This integration effectively reduces the semantic gap and has been applied across various fields, such as image retrieval, object recog- nition, and risk prediction. However, it is still uncommon for real-time detection of dangers affecting children and the 66 Informatica 48 (2024) 65–82 M. Belkebir et al. elderly. Moreover, these studies do not consider the deduction of new risks in real-time, which is the most effective preventa- tive measure for unexpected accidents. For that, this paper introduces a real-time healthcare semantic system that com- bines formal approaches with artificial vision techniques for the identification and inference of visual risks in surveil- lance videos for elders and children. The proposed system uses the least amount of resources and data possible; thus, its performance in real-time is effective and appropriate for the sake of risk identification. Additionally, it presents a newly constructed ontology called Risks-Identification- Ontology. On the one hand, data-driven approaches are used to extract visual data such as objects (YoLoV5), visual relationships (Grounding Consistency Distillation GCD), spatial-geometric relationships (IoU), and scene environ- ment (ResNet18). The outputs are represented as sets of triples. On the other hand, a combination of formal ap- proaches, including ontology, FOL, and description log- ics (DLs), is employed for knowledge representation, along with reasoning-logic rules (i.e., those generated with a high level of semantics) to detect and infer dangers. The real-time risk identification process for each sce- nario is accomplished with no need for a training phase, and this is done by mapping the set of triples obtained to the developed ontology, which serves as a ”Fact Base.” If the situation is deemed dangerous, each person in the scene will be assigned to the appropriate risk class by applying a reasoner to the well-established rules using the Seman- tic Web Rule Language SWRL, which serves as the ”Rule base.” Additionally, an auto-description is generated, along with an alert in case of danger. The contributions of this paper are: 1) the construction of a new ontology called ”Risks-Identification-Onto,” and 2) the development of a high-level semantic healthcare sys- tem combining logic with artificial vision, while using min- imal resources and maintaining real-time performance, as well as closing the semantic gap between information -i.e., results of data-driven approaches- and knowledge -i.e., re- sults of reasoning about information-. The effectiveness of the proposal is demonstrated by its use in this critical do- main, assisting parents and caregivers in ensuring the safety of elders and children with an accuracy of 97.61% to 99.43 %. Notably, four common risk scenarios are identified: ”hurt,” ”burn,” ”existing in dangerous environments,” and ”hit.” The rest of this paper is structured as follows: Section 2 summarizes the state-of-the-art work. Section 3 describes the proposal’s architecture and methodology, together with the newly developed ontology. Section 4 illustrates the study cases, experiments, and tests. Finally, Section 5 in- cludes a conclusion. 2 Related works Several studies on risk identification and people-care are being conducted, with various approaches being proposed. These approaches can be categorized into three main groups: imaging and artificial vision approaches, i.e., data- driven-based, formal approaches, i.e., knowledge-driven based, and hybrid approaches, i.e., combining both. In the following, a synthesis of each approach is outlined and ad- dressed. 2.1 Data-driven based for people-car e and risk identification Data-driven approaches for risk identification have demon- strated high accuracy in various fields, including monitor- ing systems [3] that provide valuable assistance to parents in monitoring infants to prevent accidents and unforeseen injuries. The author in [4] introduced an improved accident prediction model that combines the temporal pyramid of the LSTM (TP-LSTM) model, the temporal attention mecha- nism, and the early exponential loss (EEL) function to an- ticipate infant accidents within seconds or fractions of a second before they occur. Similarly, the work in [3] ad- dresses a monitoring system where risk detection considers the spatial interactions between each newborn and adjacent objects, such as entering dangerous zones or coming into contact with harmful objects that should be predefined in each time and case. The proposed system in [16] introduces a wearable device equipped with a fall detection approach. This device takes the form of a wireless bracelet and it is designed to aid individuals with vision impairments by de- tecting obstacles in indoor environments. In contrast, [17] describes ”Friendy,” a deep learning-based chatbot. This chatbot is intended to provide psychotherapy interventions to children with autism. Finally, [18] proposes a monitor- ing model that tracks pedestrian flow to prevent crowding and stampedes. Finally, in [19], a deep learning-based system is proposed for monitoring shared autonomous vehicles. It employs three distinct algorithms: a system for detecting violent ac- tions, a system for detecting violent objects, and a system for detecting lost items. Despite the significant results of these works, they con- tinue to present obstacles and challenges in terms of se- mantic reasoning and the inference of new information or knowledge that differs from the inputted training data. 2.2 Knowledge-driven based for people-car e and risk identification Ontologies are becoming increasingly popular in knowledge-driven approaches. For instance, the study in [7] models two ontologies of actions and objects for elders in the home setting. Its purpose is to formally describe the scope domains while providing additional semantic de- tails about them. [6] proposes an ontology for representing Real-time Semantic Healthcare System… Informatica 48 (2024) 65–82 67 and identifying risks during building renovations. Nonetheless, relying solely on an ontology for risk iden- tification provides semantic modeling of a specific domain without the auto-extraction of real-world information. Con- sequently, it may be unable to automatically determine haz- ards in real-world situations without the manual input of data. 2.3 Hybrid appr oaches based for people-car e and risk identification Although ”data-driven approaches” achieved significant results, they are semantically low/medium-level, lacking the ability to reason and infer new high-level semantic knowledge and interpretations. State-of-the-art works rec- ommend integrating ”knowledge-driven approaches” with them. The authors of [11] have synthesized literature sup- porting this integration and have demonstrated how formal and logical inferences can bridge the semantic gap. Furthermore, many works proposed aim to extract se- mantic visual relationships in sports images [8], semantic analysis for human behavior [14], enhancing image recog- nition [13], and risk prediction [12]. In addition, a multi- modal approach is proposed in [10] with the aim of iden- tifying hazards at building sites. Finally, [9] proposed a graph-based framework that integrates linguistic Natural Language Processing (NLP), OpenPose, and YoLov4, with a reasoning approach to process regulatory rule sentences and images for on-site occupational hazards like ”working on height” and ”operating a grinder.” 2.4 Recap The majority of the discussed works, which combine ”knowledge-driven” and ”data-driven” approaches, have effectively bridged the semantic gap. However, their ap- plications in real-time danger recognition for children and elders remain limited (Table 1). To address this gap, this work proposes a semantic system integrating artificial vi- sion techniques with knowledge-driven methods, which will be discussed in more detail in the following sections. 3 Ar chitectur e and methodology of the pr oposal This work combines deductive reasoning from knowledge- driven approaches with inductive reasoning from data- driven methods. The proposed real-time semantic system integrates artificial vision with logic and ontology to de- tect dangers affecting children and elders in indoor/outdoor environments by integrating low/medium-level semantic information with high-level knowledge. The architecture consists of three main modules, as shown in Figure.1: 1. Visual Information Extraction: This module identifies visual elements within a captured scene, including the indoor/outdoor environment, individuals, surrounding objects, and their visual relationships; 2. Ontology Modeling: A formal ontology named ”Risks-Identification-Onto” is developed to interpret the results of the first module. It renders them machine-readable for formal interpretations and pre- pares them for semantic-risk reasoning and querying. Additionally, the outputs of the visual information ex- traction module are exploited to instantiate individuals in the ontology; 3. Risk Identification:First-order logic (FOL) and De- scription logic (DLs) are used to define common risk scenarios, which serve as inputs for the risk inference process, along with the ontology and its instantiation outputs. Risky scenarios posing threats to children and/or elders are inferred by applying reasoning-logic rules to the extracted visual information. 3.1 V isual information extraction This module aims to extract the minimum amount of in- formation essential for effective risk detection while main- taining real-time aspects. This information is used to gener- ate low/medium-level semantic description of a particular scene in indoor/outdoor environments. The following are the models used to detect visual content: deep learning for object detection (YoLov5), scene recognition (Resnet18), and visual relationships detection (GCD), with IoU met- ric for spatial-geometric extraction. The outputs are repre- sented in sets of three triples, a computational format, and then stored in JSON files for use as inputs in subsequent modules. 3.1.1 Objects detection and scene r ecognition The YOLO algorithm, a recent convolutional neural net- work (CNN) model, excels at speed and precision for ob- ject recognition and is widely used in action recognition, and risk analysis. Since its inception by [20], YOLO has evolved through versions like YOLO V2 to V8 [21]. The authors of YOLO revamped object identification from classification to regression, replacing a two-stage al- gorithm with one-stage methods. Unlike earlier methods, which required hundreds or thousands of passes per im- age, YOLO conducts detection in a single pass by dividing the image into grid regions. Each region predicts bounding boxes (BBOX) and probabilities, indicating object classes, locations, and scores. The YOLO network consists of three main phases [22, 23]: – Backbone: a convolutional neural network that ex- tracts features from various sizes of images; – Neck: series of network layers aggregate the extracted image features to enrich semantic information, serving as input to the prediction layer; 68 Informatica 48 (2024) 65–82 M. Belkebir et al. Table 1: Recap of related works Ref. Proposed Methods Dataset Type, time, and accuracy-rate of inference (respec- tively) Limitation [3] Monitoring system for detecting ac- cidents involving infants in rooms OpenPose for in- dividual detection, background sub- traction technique The article does not specify a dataset, but it does use an infant doll measur- ing 58cm tall Induction (no rea- soning about knowl- edge). Unspecified Time and Accuracy Preliminary ex- periments with no knowledge infer- ence [4] Early accident pre- diction model for in- fants and children TP-LSTM, ex- ponential loss (EEL) function, TWO-STREAM- CONVNET Baby Video Dataset (BVD) Induction. 4.196 seconds. 61.13% The model predic- tion is limited to trained cases [16] Fall detection wire- less bracelet for vision-impaired individuals. It is based on the detec- tion of obstacles in indoor environ- ments Firebase database, NodeMCU WiFi, HC-SR04 ultrasonic distance sensor Not specified Induction (for training of obsta- cles). 0.3 seconds. Accuracy is Not mentioned (demon- strates real-world experiments) Several devices are used for the detec- tion of environmen- tal objects, which should be wearable constantly, with no inference on risks [17] ”Friendy”, A ther- apy enhancement framework for autistic children. It is based on deep learning with a contextual chatbot LSTM, the Gated Recurrent Unit (GRU) topology Newly constructed dataset Induction (based on information provided by ex- perts). Mentioned as real-time but not calculated. Accu- racy of 80.5% The lack of data makes the system challenging to scale and integrate into diverse real-world settings [19] Violence moni- toring system for shared autonomous vehicles YOLOv5 for object detection, 3D Con- vNet, SlowFast, and Temporal Segment/ Shift Networks for video action recog- nition TAO, COCO, MoLa InCar Induction. Real- time (170-330). Ac- curacy of 94.32% Limited to the trained samples [7] Construction of two formal ontologies of home actions and objects for elders. Standard ontology construction Charades, Home- Ontology Deduction (logical inference). Un- specified Time. Accuracy cannot be estimated; logic- based results are always true The mere use of on- tology without eval- uation in real-world scenarios is insuffi- cient Real-time Semantic Healthcare System… Informatica 48 (2024) 65–82 69 [6] Risk ontology for building renova- tions Standard ontology construction Deep renova- tion projects data (RINNO; Europe 2020 research project) Deduction. Unspec- ified Time. Accu- racy cannot be esti- mated Effectiveness has not been demon- strated in real-world projects [8] Semantic extraction and interpretation of visual relationships in sports images Standard ontol- ogy construction, VGG-16 for object detection, VRD for relationship detection HCVRD Induction for ob- ject detections, deduction false positive filtering. Unspecified Time. Accuracy Depends on each ”Concept” It lacks the inference of new knowledge [14] Chatbot for per- sonality disorder assistance through semantic analysis NLP, Standard on- tology construction Twitter Inductive, deduc- tive. Unspecified Time. Accuracy of 72% High potential for misinterpretation of disorder intricacies [13] Image recognition enhancement with ConSE -a new ontology- of digital images in construc- tion sites CNN-LSTM, Graph Neural Network (GNN), standard ontology construc- tion Construction site images produced by authors Induction for ontol- ogy development and deduction for ontology validation. Unspecified Time and Accuracy Manual low-level information de- termination in the images, and lack of system evaluation [12] Disease prediction model LSTM, Bidi- rectional Gated Recurrent Unit (Bi-GRU) for the prediction Not mentioned Induction for fea- tures extraction and training, deduction for diseases predic- tion. Unspecified Time and Accuracy Training such models effectively requires large and diverse datasets [10] Hazards identifi- cation at building sites; multimodal approach Standard ontology construction, Swi- prolog, Nlp VRD Induction (infor- mation detection), deduction ( logical semantic reasoning with swi-prolog). Unspecified Time. Accuracy of 49.91% The majority of the computational capacities and resources are con- centrated on safe objects, which limits real-time risk identification [9] Graph-based haz- ard identification framework for oc- cupational sites that integrates linguistic and visual informa- tion OpenPose, YOLOv4, spaCy (feature generation), NetworkX (graph structure) Created by au- thors and presents ”working on height” and ”operating a grinder” Induction (detec- tion of visual and linguistic informa- tion), deduction (hazard reasoning). Unspecified Time and Accuracy High computational requirements; not real-time results 70 Informatica 48 (2024) 65–82 M. Belkebir et al. Figure 1: Architecture of the proposed approach. – Head: predicts inputs from the neck, providing object classes with coordinates, probabilities, and scores. In this work, YOLOv5 [24] is used for object detec- tion in indoor/outdoor environments. It is pretrained on the COCO dataset and implemented using the PyTorch frame- work. YOLOv5 incorporates the AutoAnchor algorithm by Ultralytics, which adjusts anchor boxes to achieve a bet- ter fit for the database and the training parameters. The architecture incorporates a modified CSPDarknet53 back- bone, a stem, and a convolutional layer with a large win- dow size to save memory and compute during feature ex- traction. A spatial pyramid pooling fast (SPPF) layer accel- erates computation by combining features into a fixed-size map. Each convolution goes through batch normalization (BN) and SiLU activation. While the neck makes use of SPPF and a modified CSP-PAN. This model outperforms existing classifier-based techniques. Its advantage lies in its exceptional speed, enabling real-time results with high precision, making it particularly suitable for risk identifica- tion proposals. On the other hand, the localization of the detected objects is recognized using the pretrained model (ResNet18) on the PLACE 365 [25] dataset for scene recognition. This model accurately identifies the environment in which a person is situated within a scene. 3.1.2 Relationships identification Two types of relationships have been identified to recog- nize interactions between each pair (person, objects): (1) spatial-geometric relationships and (2) the visual relation- ships detection (GCD) model. Spatial-geometric relationships The Intersection over Union (IoU) metric is the evalua- tion standard for quantifying the degree or ratio of overlap between two BBOXs [26], which means that it operates di- rectly and instantaneously on the BBOXs. The IoU can be calculated as shown in Equation.1 by dividing the intersec- tion of the two bounding boxes by their union. The metric used in this work considers three possible ratios: (0), [0- 1], and (1). These ratios represent three types of spatial- geometric interactions between individuals and scene ob- jects: ”far,” ”overlap,” and ”complete overlap,” respec- tively (see Figure.2). IoU(x, y)= Intersection of the two BBOXes∥ X∩ Y∥ Union of the two BBOXes| X∪ Y| (1) Visual Relationships Detection (VRD) Several approaches and models have demonstrated re- markable outcomes in visual relationship detection (VRD) [27]. These approaches reveal the visual interactions be- tween each pair of detected objects. Formally, in an image IMG withObjects representing the number of detected ob- Real-time Semantic Healthcare System… Informatica 48 (2024) 65–82 71 Figure 2: Results applying IoU; the Bounding Box of ”per- son1” is overlapping the Bounding Box of ”motorcycle2.” jects, andP denoting the total number of all potentially cre- ated pairs of Objects (referred to as Pair), the relationship is defined in Equation.2: P=Objects× (Objects− 1), Pair=⟨ Subject− Object⟩ (2) The objective of VRD is to generate < Subject − Predicate− Object > triples and/or scene graphs, with the predicate representing the relationship between the sub- ject and object (e.g., < Person− next t o− knife >). This work adopts the Grounding Consistency Distillation (GCD) model [28] to identify visual relationships between persons and nearby objects, particularly those that may pose a dan- ger. To streamline the process and maintain real-time per- formance, only triples involving individuals and potentially hazardous items are selected, with benign objects filtered out during the information extraction phase. The GCD model, a semi-supervised distillation training approach, addresses a key weakness in traditional Scene Graph Generator (SGG) and VRD models. The SGG mod- els often prioritize high recall over the expense of consider- ing spatial and visual evidence, relying heavily on datasets biased toward common relationships, termed ”bias on re- lationships.” Consequently, they may generate inaccurate predictions by disregarding visual information, spatial co- ordinates, and genuine object connections (see Figure.3.a). For example, if the dataset used for training consists of nu- merous instances of a person carrying a knife, it will be deemed that every person can hold a knife. As a result, if a knife is detected alongside multiple individuals, , the model may incorrectly identify the entire group as wield- ing the knife, regardless of their original proximity to the object. This can result in false-positive risk identification alarms, which are exacerbated by a lack of ”negative exam- ples” to highlight those false facts, as ”not everyone in the scene necessarily is holding the detected knife.” This is a positive reason to confirm that the GCD is a viable model for this proposal. It can accurately identify which object is in-relation-to the subject based on its po- sition, geometric coordinates, and visual information, as well as, having high accuracy and recall outcomes (see Figure.3.b). To achieve this purpose, three networks are used: the Grounder, the teacher-SGG, and the student- SGG. More specifically, using a pretrained grounding net- work, the teacher-SGG is constrained to predicting the most pertinent and ground relationships to the scene to create spatial common sense knowledge, which is subsequently distilled into the student-SGG model. The latter considers unlabeled data and provides out-of-distribution cases that cast doubt on the perception of the network of the domi- nant classes. Additionally, there is a generation phase of negative labels for the unlabeled data (see Figure.3.c). Figure 3: a. Training results using standard ”bias on re- lationships” datasets, demonstrating ”overfitting.”b. The training results of GCD demonstrate that this issue has been addressed. c. excerpt from the generation of negative la- bels. 3.1.3 Repr esentation format of visual information (triples) The nature of extracted visual information, being heterogeneous-textual, poses challenges for integra- tion with ontology for real-time formal manipulation and reasoning. To enable computer comprehension, a structured and unified format is necessary. A three-triples structure (< Subject − Relationship − Object >) is adopted to encapsulate this information, which is then outputted in JSON format. This JSON file serves as an intermediary representation and a mapper between the modules of the proposed system. Figure.4 showcases a sample output from the ”visual information extraction” module. 3.2 Ontology modeling The challenges of data-driven approaches include con- structing logical systems and generating high-level seman- tic knowledge in real-time with high precision. One solu- tion to these challenges is to integrate ontology, First Order Logic (FOL), and Description Logics (DLs). Researchers [11] confirm that this is effective for describing and provid- ing formal reasoning for semantic content. 72 Informatica 48 (2024) 65–82 M. Belkebir et al. Figure 4: Results of visual information extraction module. Due to that, a new ontology named ”Risks-Identification- Onto” is developed and integrated with the Visual Informa- tion Extraction module. The idea is to minimize the se- mantic gap, provide a rich context and deep meaning for its findings, and automatically generate high-level seman- tic descriptions of the given scene. The construction of this ontology is grounded in the foundational principles artic- ulated by Gruber i.e., ”explicit specification of a concep- tualization” and Borst’s ”formal specification of a shared conceptualization” [29]. From a formal description perspective, the present on- tology is constructed with a concise approach: concep- tualizing aspects of interest and their intra/inter relation- ships, formalizing these concepts using appropriate lan- guage, and achieving a ”shared conceptualization” stage in which primitives are understandable to ontology users. Four background knowledge are chosen: (1) the subsump- tion of classes, defining the connection where one class is a subclass of another; (2) the domain/range restrictions, which specify the domain or range of object classes for a relation class; (3) the cardinality restrictions, limiting the maximum number of relations of a certain relation class that an object may have; and (4) object collections, referring to groups of image objects that fall under the same object class. From a logical implementation perspective, this ontol- ogy uses DLs and FOL to generate a machine-readable structure for a particular domain. It abstractly -explicitly or implicitly- conceptualizes all Concepts (Cp) with their Properties and the Relationships (R) between them. Ax- ioms (ϕ ), which impose constraints on these entities, are also integrated, along with Individuals serving as instances (l) (Equation.3). This formalization aims to unify domain knowledge, derive new knowledge through logical infer- ence, and facilitate automated reasoning and querying pro- cesses. O=/    Σ ϕ Cp={ cp1, cp2,..., cpn} R={ r1, r2,..., rn} I ={ i1, i2,..., in}    (3) The ”Risks-Identification-Onto” is constructed through the following steps: – A: Define the domain of ontology. – B: Search for existing ontologies to reuse. – C: Select the taxonomy of the chosen domain. – D: Define the top-concepts, then categorize step (C) into ”concepts” and ”relationships, i.e., properties.” – E: Instantiate individuals based on the results of vi- sual information extraction module, using intermedi- ate JSON files as an input. Since no relevant ontology exists for risk identification in indoor/outdoor environments, the top-down” approach [30] is used for the construction of the ”Risks-Identification- Onto.” It entails starting with the top-level concepts and gradually refining them to establish a hierarchical struc- ture. These concepts are conceptualized based on the def- inition of the formal extensional and intentional concepts: A=(D, R, C, S, A), with: ∀ d i .⊤⊓ d j .⊤ ⊨ D⊓ D(d i )⊨ Ξ ⊓ D(d j .⊤ )⊨ Ξ ⊓ d i .⊤≡¬ d j .⊤ (4) ∃ r i, j ⊓ r j, i ⊨ R⊓ R(r i, j )⊓ R(r j, i )⊨ Ξ ⊓ r i, j ≡¬ r j, i (5) ∀ r i, j ⊓ r j, i ⊨ R⊓ R(r i, j )⊓ R(r j, i )⊨ Ξ ⊓ d i .⊤⊓ − → r i, j d j .⊤ ≡¬ d j .⊤ − → r j, i d i .⊤ (6) Noting that: (1) D represents the set of defined aspects/concepts. The YOLO and VRD BBOXES, with PLACE-365 la- bels, are chosen as the ontology’s taxonomy and will play the roles of concepts representing < Subject > and < Object >. Figure.5 depicts ten classes of the top layer: 1) Thing, 2) Be_alive, 3) Environment, 4) Food, 5) Furniture, 6) Mean_of_transport, 7) Ob- ject_to_use, 8) Positioning, 9) Traffic_lights and 10) is_in_Danger. Figure.6 shows their properties, while Figure.7 illustrates additional classes derived from the top layer, among which, elder, child, adult, Ani- mal, Plant, Safe_Outdoor, Unsafe_Outdoor, Safe_Indoor, Unsafe_Indoor, ground_transportation, Maritime_ trans- portation, Air_ transportation, Sport_equipment, Gen- eral_things, Electric_device, Electromechanical_device, Electronic_device, Kitchen_tool, Sharp_tool, Hot_tool, be- ing_in_dangerous_place, Burn, Hurt, and Hit. Figure 5: The top-layer concepts of the Risks- Identification-Onto; OntoGraph. Real-time Semantic Healthcare System… Informatica 48 (2024) 65–82 73 Figure 6: An excerpt from the object properties classifica- tion of the Risks-Identification-Onto. Additionally, according to the National Institutes of Health [31], ”age” is defined with a ”restriction and reason- ing on numbers” - a Python module that includes functions for managing numerical constraints and performing reason- ing tasks with numerical data- to automatically classify a person as ”elder,” ”adult,” or ”child,” as shown in Table 2. Table 2: The ”restriction and reasoning on numbers” ap- plied to automatically classify a person to ”elder,” ”adult” or ”child” class Elder(Person): equivalent_to= [Person & age.some (ConstrainedDatatype (int,min_inclusive = 65))] class Child(Person): equivalent_to= [Person & age.some (ConstrainedDatatype (int, max_inclusive = 12))] class Adult(Person): equivalent_to= [Person & age.some (ConstrainedDatatype (int,min_inclusive =13, max_inclusive = 64))] (2) C is the constraint between D and R. For instance, taking Figure.4, let the concepts used for axioms generation be: d1, d2, and d 3 ⊨ D, and rd1, d2, rd1, d3, rd3, d1,and rd3, d2⊨ R. Where d1=(Person1, ... , Person n ), and d2=(oven1, ... , oven n ), d3 = (knife1,... , knife n ). Relationships that only exist between these concepts, i.e., person, knife, and oven, are rd1, d2=(hold, far, overlap, next_to, on, ... ), rd1, d3=(near, on, next,... ), rd3, d1=(far, overlap, next_to, ... ), rd2, d1=far, overlap, next_to, on, ... ), rd3, d2= (next_to, on, far, overlap,... ),rd1, d2,rd1, d3,rd3, d1, andrd3, d2. In addition, the conceptualization should remain un- changeable with changes in world instantiation [32]. The verb ”hold” serves as an example; the axioms and rules that define the verb ”hold” should not change with changes in the environment (and vice versa; ”hold” is understood with the same axioms and rules, for example, a ”knife” can- not ”hold” a ”person”). The Risks-Identification-Onto is recorded under these restrictions, which limit and provide extensive background knowledge for all and between as- pects and relationships. (3) S is the ontology universe defined as S = { ξ ‘ , ξ “ , ξ “‘ ,... } . This means that all the ontology entities are built in accordance with the time evolution for each ex- istence of the world ontology, and it is defined based on the following formal description: ξ ‘ ⇝ t  |=T⊓ ξ “ ⇝ t  |=T, ifξ ‘ ≡¬ ξ “ ∃ R.⊤⊑ A⊓∃ d i .⊤⊓ d j .⊤| =D⊓∃ r i, j |=R∧ ξ ‘ ξ ‘ (d i .⊤ −→ ri, jd j .⊤ )≡¬ ξ “ (d i .⊤ −→ ri, jd j .⊤ ) (4) A restricts and defines the conceptualization between the aspect sets D and the ontology universe S. It is preferable to consider the unary conceptualization of as- pects and the binary intra/inter-relationships as more rigid to build a straightforward formal extensional of aspects. e.g., Person1, Person2, oven1, oven2, knife1, knife2. Ad- ditionally for overlap, far, hold, next to , on, near. It is con- structed and mapped to the same extensions as the ontology universe for this reason. Similar assumptions were used to build the formal intentional of aspects: ∃ ξ ‘ ⊓ ξ “ ⊓ ξ “‘ ⊓ ... |=S:Person1(ξ ‘ ) ≡ d1∧ Person1(ξ “‘ )≡ d1∧ Person1(... )≡ d 1 ∃ ξ ‘ ⊓ ξ “ ⊓ ξ “‘ ⊓ ... |=S:oven1(ξ ‘ )≡ d 2 ∧ oven1(ξ “‘ )≡ d 2 ∧ oven1(... )≡ d 2 ∃ ξ ‘ ⊓ ξ “ ⊓ ξ “‘ ⊓ ... |=S:knife1(ξ ‘ )≡ d 3 ∧ knife1(ξ “‘ )≡ d 3 ∧ knife1(... )≡ d 3 ∃ overlap(u d1, d2 )|=A⊓ (ξ ‘ ⊓ ξ “ ⊓ ... )|=S≡ { (Person1(ξ ‘ )⊓ oven1(ξ ‘ ))∧ (Person1(ξ “ )⊓ oven1(ξ “ ))∧ (Person1(... )⊓ oven1(... )) ∃ hold(u d1, d3 )|=A⊓ (ξ ‘ ⊓ ξ “ ⊓ ... )|=S≡ { (Person1(ξ ‘ )⊓ knife1(ξ ‘ ))∧ (Person1(ξ “ )⊓ knife1(ξ “ ))∧ (Person1(... )⊓ knife1(... )) The Risks-Identification-Onto includes 504 classes, 88 properties, 1307 axioms, and 710 logical axioms. Top- layers of the Risks-Identification-Onto are shown in Fig- ure.7. After completing steps A to D, step E involves automat- ically instantiating individuals and relationships for each captured scene. This process applies uniformity to the results obtained from the ”visual information extraction” module. The instantiation is depicted in Figure.8, and it serves as input for the ”risk inference” module. Conse- quently, the proposal can deduce risks using sets of triples 74 Informatica 48 (2024) 65–82 M. Belkebir et al. Figure 7: Part of the Risks-Identification-Onto concepts hierarchy; OntoGraph. without requiring a separate training phase for each scenario. Moreover, the inference outcomes are used to auto- generate descriptions and trigger alarms, providing users with semantically descriptive information to facilitate quick understanding and intervention. 3.3 Risks identification The module consists of two primary steps: defining risk scenarios and performing risk inference. In the initial step, potential risk situations are outlined and described using FOL-based DLs, which are well suited to be integrated with the ontology to leverage its logical inference and reasoning capabilities. In the subsequent step, the system automati- cally deduces and identifies situations that may endanger el- ders and children in both indoor and outdoor environments through logical reasoning. DLs are formal languages that focus on knowledge rep- resentation, inference, and reasoning. It employs FOL to formalize and describe Knowledge Bases (KB), which in- clude, in this case, conceptsC, relationshipsR, individualsl and axiomsϕ [33]. KB contains three types of entities [34]: 1. Constants: set of individuals { c1, c2,... cn} e.g., ”person1,” ”knife1.” 2. Unary relations: set of concepts{ cp1, cp2,..., cpn} , e.g., ”Person,” ”knife.” 3. Binary relations: roles and properties, e.g., age, overlap,In_Contact. DLs is composed of the two groups of axioms (denoted ϕ ), which is the Fact Base specifying entities of a given knowledge domain with their constraints: KB =< A, T > [34]: 1. Assertional axioms A: named ABox, sets of individ- ualsl assertions, e.g., ”Person(person1), age(person1, 70)” 2. Terminological axiomsT : namedTBox, complex de- scriptions of relationships R between concepts Cp and collections of inclusion assertions, e.g., Elder⊑ Person, Elder≡ Person⊓ age⩾ 65. In this work, the ABox and the TBox are generated as follows: 1. The set of triples < Subject − Relationship − Object > is mapped to binary relations Relationship(Subject, Object) according to D, R|=Ξ , andA inS. 2. Constants and are asserted to their parent concepts using the unary relationCp(C) and/or inclusion assertions, according toC in terms of A and the time evolution ofS. For instance, ”knife1” is instantiated as a C of the Cp ”knife,” denoted by the axiomknife C (knife1) Cp . It is con- sidered both aSharp_tool and aKitchen_tool, symbolized byknife⊑ Sharp_tool, knife⊑ Kitchen_tool. Rules of danger, referred to as the Rule Base, are defined and generated by integrating and formalizing the knowl- edge of theABox and theTBox, as depicted in Figure.9. 3.3.1 Risk scenarios The paper outlines four primary risk scenarios, highlighting the most common potential dangers faced by ”elderly” or ”child” individuals (we hypothesized that adults can protect themselves in normal circumstances): – Hurt: When Person(P1)⩾ 65- or Person(P1)⩽ 12- , i.e., Elder or a Child. They are susceptible to in- juries from sharp tools like knife or scissor under Real-time Semantic Healthcare System… Informatica 48 (2024) 65–82 75 Figure 8: Results of the instantiation of Risks-Identification-Onto ontology of the frame in Figure.4; OntoGraph. Figure 9: Sample of a defined Rule using FOL. the following conditions: 1) Direct contact with sharp tools, indicated by spatial relationships such asR|=Ξ : ”overlap,” or ”completelyoverlap,” or via a VRD re- lationship, indicating aContact relationship between Person(P1) and a Sharp_tool. 2) Indirect contact R with Sharp_tools, such as when they are placed on furniture that individuals come into contact with. – Burn: Elders and Children may suffer burn from Hot_tools such as microwave or oven in such situa- tions: 1) Direct contact with hot tools, as indicated by spatial relationships like R |= Ξ : ”overlap,” or ”completelyoverlap” or any ”Contact” relationship with a Hot_tools. 2) Proximity to hot tools, as indi- cated by the ”Is_Around” relationship. – Hit: Individuals may be struck by ground_transportation ⊑ means_oftransport under the following circumstances: 1) Detected Elder andChild who have the relationshipR|=Ξ :on the street and have a contact relationship with any means_oftransport, but are not inside or on it. 2) When an Elder has dementia, which requires close supervision, is riding amotorcycle, bicycle, orbike. – Existing in dangerous places: Elder or aChild should not be present in hazardous outdoor or indoor environ- ments such as clif f, physicslaboratory, etc. , as these are only suitable for adults and experts. 3.3.2 Risk infer ence This module defines the logical-based rules that use the out- puts of data-driven approaches as input for deductive rea- soning. It provides high-level semantic understanding and reasoning capabilities that closely resemble human deduc- tion in the field of real-time risk identification. The ”Rule Base” that formalizes the risk scenarios is gen- erated using the Semantic Web Rule Language (SWRL). SWRL is both an extension of OWL and a rule language. It can be used in conjunction with ontology to automatically express more sorts axioms and constraints due to its strong deductive inference and reasoning abilities [35]. The SWRL rules are expressed in high-level abstraction as OWL ConceptsCp, properties/Relationships R, and in- stances/individuals I. Each rule consists of two parts: a consequent part called the Head, which is a set of atomic formulas that can serve as the logical conclusion of reason- ing, and an antecedent part called the Body, which is a con- junction of atomic formulas. Equation.7 presents a standard SWRL rule (− > is a separator between the Head and the Body): A(?i1)∧ B(?i1, ?i2)→ C(?i1) (7) A andCp are OWL classes;A(i1) andCp(i1) are atoms; B is a property;i1 andi2 are OWL individuals; and?i1 and ?i2 are SWRL variables. In contrast,A(?i1)∧ B(?i1, ?i2) is only valid and true if bothA(?i1) andB(?i1, ?i2) are true. In this case, when the logical conclusion of the inference over the body is reached, the fact base is expanded to include the newly inferred and deduced one. For example, the scenario ”Hurt,” presented with FOL (Equation.8), can be generated as Rule 2: 76 Informatica 48 (2024) 65–82 M. Belkebir et al. FOL:Hurt(X)≡∀ X. Person(X)⊓ age⩾ 65⊓ ∃ Y. (Y ⊑ Sharp_tool)⊓ On_Contact(X, Y) (8) SWRL:Rule 1 :Person(?x)∧ age(?x, ?b)greaterThan (?b, 64)∧ Sharp_tool(?c)∧ On_Contact(?x, ?c) → Hurt(?x) If a given person (x) in the given scene is more than or equal to 65 years old, i.e., an elder, and a detected sharp tool(c) comes into contact with person (x), then, (x) is in danger of being hurt and will be reclassified asHurt class; theHurt class is expanded to include this type of individual. Following that, the proposal uses the Protocol And RDF Query Language SPARQL [36] to enable automatic re- sponses to the query ”Is anyone in danger?.” Table 3 ex- emplifies the query, ”Is there anyone who could be hurt?” Table 3: The SPARQL query: ”Is there anyone who could be hurt?” SELECT ?b WHERE { ?b .} Applying similar techniques, the subsequent rules are formalized based on the descriptions outlined in the Risk Scenarios section: Rule 2 :Person(?x)∧ age(?x, ?b)∧ greaterThan(?b, 64)∧ kitchen(?z)∧ exist_in(?x, ?z)∧ Sharp_tool(?c) ∧ Furniture(?d)∧ On_Contact(?x, ?d)∧ On_Contact(?c, ?d)→ Hurt(?x) Rule 3 :Person(?x)∧ Child(?x)∧ Sharp_tool(?c)∧ On_Contact(?x, ?c)→ Hurt(?x, ?c) Rule 2 delineates the probability of an individual x c , des- ignated as Elder Cp or Child Cp , to get Hurt Cp and subse- quently be reclassified into that class if the rule body is true and satisfied. The body signifies an indirect association be- tween the danger tool andx. Conversely, Rule 3 illustrates a direct relationship. Rule 4 :Unsafe_Outdoor(?a)∧ Person(?x)∧ age(?x, ?b)∧ greaterThan(?b, 64)∧ exist_in(?x, ?a) → being_in_dangerous_place(?x) Rule 5 :Unsafe_Outdoor(?a)∧ Person(?x)∧ age(?x, ?b)∧ lessThan(?b, 13)∧ exist_in(?x, ?a) → being_in_dangerous_place(?x) Rule 4 and Rule 5 specify whether the individualx c iden- tified as anElder Cp or aChild Cp is susceptible to the risk of being_in d angerous_place Cp , indicating that x c is situated in hazardous outdoor or indoor environments. Rule 6 :Person(?x)∧ age(?x, ?b)∧ greaterThan(?b, 64)∧ Mean_of _transport(?c)∧ near(?x, ?c)∧ street(?d)∧ under(?d, ?x)→ Hit(?x) Rule 7 :Person(?x)∧ age(?x, ?b)∧ greaterThan(?b, 64)∧ Mean_of _transport(?c)∧ ride(?x, ?c) → Hit(?x) Rule 6 states that if x c is identified as an Elder Cp or a Child Cp and is situated near or around a Mean_of _transport Cp , specifically on the street but not inside the means of transportation, then he will be reclassified as a Hit Cp if the specified body is true. On the other hand, Rule 7 addresses the scenario where an Elder Cp is riding aMean_of _transport Cp . Rule 8 :Person(?x)∧ age(?x, ?b)∧ greaterThan(?b, 64)∧ kitchen(?z)∧ exist_in(?x, ?z)∧ Hot_tool(?c) ∧ On_Contact(?x, ?c)→ Burn(?x) Finally, Rule 8 outlines the Burn risk scenario, wherein if x c is classified as an Elder Cp and is either in contact with or in proximity to aHot_tool Cp , then he will be reclassified as aBurn_Cp. 4 Study cases, experiments and tests The motivation for this real-time healthcare system is twofold: Firstly, increasing statistics [1, 2] highlight a rise in accidents involving both the elderly and the young. Sec- ondly, the system addresses the challenge of caring for el- ders and children amid busy schedules, where continuous supervision may be lacking. To evaluate its effectiveness, four typical risk scenarios are chosen: ”Hurt,” ”Burn,” ”Ex- isting in Dangerous Places,” and ”Hit.” 4.1 V isual information extraction The visual information extraction module is processed using YOLOv5, IoU, GCD, and ResNet18 to generate low/medium-level semantic descriptions in a three-triple format. These descriptions are then used to instantiate in- dividuals in the Risks-Identification-Onto, which in turn passes to the high-level semantic risk identification. The datasets used include the Charades dataset [37], the Actor-Action Dataset [38], and collected surveillance videos from YouTube. Real-time Semantic Healthcare System… Informatica 48 (2024) 65–82 77 4.1.1 Charades dataset The Charades dataset [37] comprises 9,848 videos of typi- cal indoor activities with an average runtime of 30 seconds and interactions with 46 object classes across 15 different interior environments. The dataset also includes a vocab- ulary of 30 verbs, translated into 157 action classes. Each video has various free-text annotations, action labels, action intervals, and classifications of interacting objects. Addi- tionally, the dataset contains 27,847 textual descriptions of the videos and 41,104 labels for the 46 object classes, and it is divided into 7,986 training videos and 1,863 validation videos. 4.1.2 Actor -action dataset The Actor-Action Dataset (A2D) [38] is used to identify ac- tors and actions in videos at the same time. 8 action classes (climb, crawl, eat, fly, leap, roll, run, and walk) and 7 ac- tor classes (adult, baby, ball, bird, car, cat, and dog) are included in A2D. It contains 3,782 videos, with 99 occur- rences of each valid ”actor-action” tuple. 4.2 Risks-identification-onto The ontology is constructed using Python 3.8 along with the ”owlready2” module and Protégé 5.5.0 with the On- toGraf plugin. ”owlready2” is an ontology-oriented pro- gramming module with robust capabilities for expressing and manipulating formal ontologies, along with agility for executing object-oriented programs, which is not possible with the mere use of ontology editors. This module includes parsers for the Web Ontology Language (OWL) and a quad- store for the Resource Description Framework (RDF) for- mat (subject, property, object) [33]. Protégé is a free and open-source ontology editor, while OntoGraf is a Protégé plugin that explicitly displays the ontology entities. The Hermit and Pellet [39] reasoners are used to check the consistency of the constructed Risks-Identification- Onto and to infer new knowledge regarding concepts, data properties, object properties, and individuals (Figure10). 4.3 Risk identification The SWRL is used to generate rules based on the stated risk scenarios, using the Python module ”owlready2”, where risks can be derived and detected by reasoning this ”Rule Base” over the ”Fact Base,” as shown in Figure.11. Re- sults show that the Elder(Cp) has an indirect relation- ship (R) with the ”knife2 (C)” and is placed near(R) the oven/ stove(C), therefore this Person could get hurt and burned. Figure.12 illustrates a visualization of the entire process over the example case of Figure.4. The figure shows that the proposed real-time system can assist busy parents and caregivers in safeguarding elders and children by deduc- ing potential risks in various scenarios with minimal inputs. Figure 10: Pellet Inferences, the new information is high- lighted with yellow. Figure 11: Result of risk identification over the given scene Figure.4 and Figure.10 Additionally, as depicted in the ”Scene description gener- ation” step, the system provides contextualized informa- tion with auto-generated semantic descriptions and alarms to highlight potential dangers. The evaluation metrics used for the ”Real-time Health- care system” with both Charades and A2D datasets are ac- curacy, precision, recall, and F1score. Table 4 presents the evaluation with the Charades dataset, while Table 5 is for A2D. Results show the efficiency of the pro- posed system in terms of risk identification with the Ac- curacy (Charades/A2D) (98, 29%/ 99, 43%) for the hurt, (97, 61%/ 97, 61%) for the Hit, (97, 34%/ 98, 40%) for the burn, and (97, 41%/ 97. 25%) for the Dangerous place. Pre- cision (Charades/A2D) is (97, 78%/ 98, 89%) for the hurt, (97, 12%/ 96, 15%) for the Hit, (97, 89%/ 98, 95%) for the burn, and (96, 36%/ 98. 15%) for the Dangerous place. Fi- nally, Recall ranges from 96, 99% to 98. 88% for Charades and from 96, 36% to 100% for A2D, while F1score ranges from97, 25% to98. 32% for Charades and from97, 39% to 99, 44% for A2D. The accuracy and calculated error rate of risk assign- ments according to eachCp andC are presented in the chart in Figure.13. The system can identify four types of risks, including ”Hit,” ”Hurt,” ”Burn,” and ”Dangerous Place,” with high accuracy and a low error rate. Furthermore, the confusion matrix (Figure. 14) depicts the predicted labels 78 Informatica 48 (2024) 65–82 M. Belkebir et al. Figure 12: Risk identification process over Figure.4 . based on the true risk labels. Figure 13: Performance of the proposal in terms of accu- racy and error rate. Table 6 compares the proposed approach to the works presented in [19], [4], and [10] across the following crite- ria: (1) ability to identify risks proactively, (2) the inference type, being deductive or inductive, (3) requirement for ad- ditional training or new datasets when adding new risks, and (4) the inference time. The proposal stands out for its Table 4: Performance of the proposal using Charades [37]. Risk Hurt Hit Burn Dangerous _Place Accuracy 0.9829 0.9761 0.9734 0.9741 Precision 0.9778 0.9712 0.9789 0.9636 Recall 0.9888 0.9806 0.9688 0.9815 F1score 0.9832 0.9758 0.9738 0.9725 Samples 175 209 188 116 Table 5: Performance of the proposal using A2D [38]. Risk Hurt Hit Burn Dangerous _Place Accuracy 0.9943 0.9761 0.9840 0.9725 Precision 0.9889 0.9615 0.9895 0.9815 Recall 1.0000 0.9901 0.9792 0.9636 F1score 0.9944 0.9756 0.9843 0.9739 Samples 175 209 188 116 Figure 14: Confusion matrix of Hurt, Hit, Burn, Danger- ous_Place proactive risk identification using a combination of deduc- tive and inductive reasoning, its ability to quickly adapt to new risks without additional training data, and real-time in- ference capabilities. Table 7 compares the semantic-level capabilities of the proposed system to those of the previously mentioned ap- proaches [19], [4], and [10]. The comparison criteria in- clude: (1) the ability to achieve a high level of semantic understanding, (2) the provision of information-level se- mantic understanding, (3) the ability to deduce and reach a semantic knowledge level, (4) the capability to infer new knowledge distinct from the input data, and (5) the abil- ity to provide an auto-generated semantic description. The proposed system demonstrates proficiency across all these criteria. Real-time Semantic Healthcare System… Informatica 48 (2024) 65–82 79 Table 6: Comparison between the proposal with the proposed approaches in [19], [4], and [10], where ”R” denotes ”Re- quired.” Criteria Risk Identification Deduction Induction training New Data Inference time Proposal √ √ √ × × real-time [4] × × √ R R 2.5∼ 5 s [10] √ √ √ R R Not defined [19] √ × √ R R real-time Figure.15 showcases various inferences of risk and no- risk scenarios, including scenarios such as a) an elder on a cliff; b) an adult in the kitchen surrounded by several Elec- tromechanical tools; c) an elder in the street and around means of transportation; and d) a child in the kitchen hold- ing a knife. The results demonstrate the efficiency and con- sistency of the proposal in detecting elders and children, identifying and deducing risks before their occurrence, and generating a high-level semantic description that presents the risk type in real-time, with time intervals ranging from 0.13 to 0.29 seconds per frame. The risk identification sys- tem demonstrates real-time performance achieved through a combination of optimization techniques. For instance, it takes advantage of frameworks such as PyTorch, which is optimized for performance on both CPU and GPU archi- tectures, as well as specific deep learning models such as YOLOv5 and GCD, along with logical reasoning integrated with ontology. These optimizations ensure efficient pro- cessing, making the system appropriate for real-time iden- tification in a variety of scenarios. 5 Conclusion This paper proposes a real-time healthcare semantic sys- tem that identifies visual dangers in surveillance videos for elders and children. The idea is to combine formal tech- niques and artificial vision. The approach consists of three modules: visual information extraction, ontology model- ing, and risk detection. Each module is further subdivided into two bases: ”Fact Base” and ”Rule Base.” The Fact Base is generated using both extracted visual information and the newly constructed Risks-Identification-Ontology as well as its instantiations. Accordingly, the rule base is con- structed using FOL and DL using the four frequent risk sce- narios: ”Hurt,” ”Burn,” ”Existing in dangerous places,” and ”Hit.” The risk identification process is achieved through rea- soning using formal rules over the low/medium semantic outputs of data-driven approaches, which are mapped to a three-triple format. The Pellet and Hermit reasoners are used to perform the reasoning to identify and infer high- level semantic knowledge about risky situations, as well as to check the coherence of the ontology. The proposed real-time system was tested on a variety of risky and safety cases. The results obtained demon- strate the efficiency of our proposed system, where it Figure 15: Four examples of the proposal’s results. a) Be- ing in a dangerous place: An elder on a cliff, i.e., an unsafe environment. b) Safe: An adult can manage in the kitchen on his own. c) Hit An elder on the street/floor and around means of transportation is a car. d) Hurt: A child in the kitchen is in contact with a sharp tool, i.e., holding a knife. was successfully identified for each person on the scene in real-time with minimal use of resources and infor- mation. Moreover, it can automatically generate a se- mantic description. The efficiency of the system was tested using the very known and new datasets, i.e., the Charades dataset, the Actor-Action Dataset, and col- lected surveillance videos. The system gives an ac- curacy (Charades/A2D) (98, 29%/ 99, 43%) for the Hurt, (97, 61%/ 97, 61%) for the Hit, (97, 34%/ 98, 40%) for the 80 Informatica 48 (2024) 65–82 M. Belkebir et al. Table 7: Comparison between the semantic-level of the proposal with the semantic-level of the approaches [19], [4], and [10]. Criteria High-level semantic Information Knowledge deduction Semantic description The proposal √ √ √ √ √ [4] × √ × × × [10] √ √ √ × × [19] × √ × × × Burn, and (97, 41%/ 97. 25%) for the Dangerous place, with a low error rate of 0,57% to 2,75%. Finally, compared with other approaches, the proposal can infer risks proactively in real-time, as well as deduce high-level semantic knowledge that differs from the inputted data. Future expansions of this work include considering prob- abilistic scenarios to treat uncertainty in the generation of formal rules, incorporating advanced Machine Learning and Deep Learning (e.g., OpenPose), refining and expand- ing the ontology to cover a wider spectrum of concepts and domains, and extending the applicability of the system to diverse populations and application domains (e.g., risks identification in civil engineering and sites of construction). Acknowledgment The authors acknowledge the financial support and en- couragement of the Research Laboratory on Computer Sci- ence’s Complex Systems (ReLaCS2). Refer ences [1] Congxing Shi, Xiao Lin, Tingyuan Huang, Kai Zhang, Yanan Liu, Tian Tian, Pengyu Wang, Shimin Chen, Tong Guo, Zhiqiang Li, et al. “The association between wind speed and the risk of injuries among preschool children: New insight from a sentinel- surveillance-based study”. In: Science of the total en- vir onment 856 (2023), p. 159005. https : / / doi . org/10.1016/j.scitotenv.2022.159005. [2] Rik Dawson, Annie Feng, Juliana S Oliveira, Leanne Hassett, Catherine Sherrington, and Marina B Pin- heiro. “Monitoring falls in residential aged care facil- ities: Agreement between falls incident reports and progress notes”. In: Australasian journal on ageing (2024). https : / / doi . org / 10 . 1111 / ajag . 13276. [3] Takumi Ohnuki, Toru Abe, and Takuo Suganuma. “A Visual Monitoring Method for Infants in a Room”. In: 2020 IEEE 9th Global Confer ence on Consumer Electr onics (GCCE). IEEE. 2020, pp. 258–259. https : / / doi . org / 10 . 1109 / gcce50665.2020.9292074. [4] Peng-Jie Wang, Shao-Fu Lien, and Ming-Sui Lee. “A learning-based prediction model for baby acci- dents”. In: 2019 IEEE International Confer ence on Image Pr ocessing (ICIP). IEEE. 2019, pp. 629–633. https : / / doi . org / 10 . 1109 / icip . 2019 . 8803820. [5] Shajulin Benedict. “IoT-Enabled Remote Moni- toring Techniques for Healthcare Applications–An Overview”. In: Informatica 46.2 (2022). https : / / doi.org/10.31449/inf.v46i2.3912. [6] Omar Doukari, James Wakefield, Pablo Martinez, and Mohamad Kassem. “An ontology-based tool for safety management in building renovation projects”. In: Journal of Building Engineering 84 (2024), p. 108609. https : / / doi . org / 10 . 1016 / j . jobe . 2024.108609. [7] Satoshi Nishimura, Shusaku Egami, Takanori Ugai, Mikiko Oono, Koji Kitamura, and Ken Fukuda. “Ontologies of action and object in home environ- ment towards injury prevention”. In: The 10th In- ternational Joint Confer ence on Knowledge Graphs. 2021, pp. 126–130. https : / / doi . org / 10 . 1145 / 3502223.3502239. [8] Adel Zga and Brahim Nini. “Visual relationship extraction in images and a semantic interpretation with ontologies”. In: International Journal of Intelli- gent Information and Database Systems 15.2 (2022), pp. 223–247. https : / / doi . org / 10 . 1504 / ijiids.2021.10041280. [9] Shi Chen, Kazuyuki Demachi, and Feiyan Dong. “Graph-based linguistic and visual information inte- gration for on-site occupational hazards identifica- tion”. In: Automation in Construction 137 (2022), p. 104191. https : / / doi . org / 10 . 1016 / j . autcon.2022.104191. [10] Yange Li, Han Wei, Zheng Han, Nan Jiang, Wei- dong Wang, and Jianling Huang. “Computer Vision- Based Hazard Identification of Construction Site Us- ing Visual Relationship Detection and Ontology”. In: Buildings 12.6 (2022), p. 857. https : / / doi . org / 10.3390/buildings12060857. [11] Malak Belkebir, Toufik Messaoud Maarouk, and Brahim Nini. “Integrating Ontology with Imaging and Artificial Vision for a High-Level Semantic: A Review”. In: Pr oceedings of the 2nd International Real-time Semantic Healthcare System… Informatica 48 (2024) 65–82 81 Confer ence on Emer ging T echnologies and Intel- ligent Systems: ICETIS 2022, V olume 2. Springer. 2022, pp. 32–41. https : / / doi . org / 10 . 1007 / 978- 3- 031- 20429- 6_4. [12] Huma Parveen, Syed Wajahat Abbas Rizvi, and Raja Sarath Kumar Boddu. “Fuzzy-Ontology Based Knowledge Driven Disease Risk Level Predic- tion with Optimization Assisted Ensemble Classi- fier”. In: Data & Knowledge Engineering (2024), p. 102278. https : / / doi . org / 10 . 1016 / j . datak.2024.102278. [13] Cheng Zeng, Timo Hartmann, and Leyuan Ma. “ConSE: An ontology for visual representation and semantic enrichment of digital images in construc- tion sites”. In: Advanced Engineering Informatics 60 (2024), p. 102446. https : / / doi . org / 10 . 1016 / j.aei.2024.102446. [14] Mourad Ellouze and Lamia Hadrich Belguith. “Se- mantic analysis based on ontology and deep learn- ing for a chatbot to assist persons with personality disorders on Twitter”. In: Behaviour & Information T echnology (2023), pp. 1–20. https : / / doi . org / 10.1080/0144929x.2023.2272757. [15] Pim Borst and Hans Akkermans. “An ontology ap- proach to product disassembly”. In: Knowledge Ac- quisition, Modeling and Management: 10th Eur o- pean W orkshop, EKA W’97 Sant Feliu de Guixols, Catalonia, Spain October 15–18, 1997 Pr oceedings 10. Springer. 1997, pp. 33–48. https : / / doi . org / 10.1007/bfb0026776. [16] Ahmad Abusukhon. “IOT Bracelets for Guiding Blind People in an Indoor Environment”. In: Jour - nal of Communications Softwar e and Systems 19.2 (2023), pp. 114–125. https : / / doi . org / 10 . 24138/jcomss- 2022- 0160. [17] Sid Ahmed Hadri and Abdelkrim Bouramoul. “Friendy: A Deep Learning based Framework for Assisting in Young Autistic Children Psychotherapy Interventions”. In: Journal of Communications Soft- war e and Systems 19.1 (2023), pp. 30–38. https : //doi.org/10.24138/jcomss- 2022- 0074. [18] Ming Li, Hui Dong, Fei Zhang, and Xiaoxiao Liu. “A Method for Top View Pedestrian Flow Detection Based on Small Target Tracking”. In: Informatica 48.11 (2024). https://doi.org/10.31449/inf. v48i11.6033. [19] Nelson RP Rodrigues, Nuno MC da Costa, César Melo, Ali Abbasi, Jaime C Fonseca, Paulo Cardoso, and João Borges. “Fusion Object Detection and Ac- tion Recognition to Predict Violent Action”. In: Sen- sors 23.12 (2023), p. 5610. https://doi.org/10. 20944/preprints202304.1242.v1. [20] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. “You only look once: Unified, real- time object detection”. In: Pr oceedings of the IEEE confer ence on computer vision and pattern r ecogni- tion. 2016, pp. 779–788. https : / / doi . org / 10 . 1109/cvpr.2016.91. [21] Peiyuan Jiang, Daji Ergu, Fangyao Liu, Ying Cai, and Bo Ma. “A Review of Yolo algorithm develop- ments”. In: Pr ocedia Computer Science 199 (2022), pp. 1066–1073. https : / / doi . org / 10 . 1016 / j . procs.2022.01.135. [22] Xiaohang Shi, Jun Hu, Xueyue Lei, and Shiyou Xu. “Detection of flying birds in airport monitoring based on improved YOLOv5”. In: 2021 6th Interna- tional Confer ence on Intelligent Computing and Sig- nal Pr ocessing (ICSP). IEEE. 2021, pp. 1446–1451. https : / / doi . org / 10 . 1109 / icsp51882 . 2021 . 9408797. [23] Zexuan Guo, Chensheng Wang, Guang Yang, Zeyuan Huang, and Guo Li. “Msft-yolo: Improved yolov5 based on transformer for detecting defects of steel surface”. In: Sensors 22.9 (2022), p. 3467. https://doi.org/10.3390/s22093467. [24] Glenn Jocher, Alex Stoken, Jirka Borovec, Liu Changyu, Adam Hogan, Laurentiu Diaconu, Fran- cisco Ingham, Jake Poznanski, Jiacong Fang, Li- jun Yu, et al. “ultralytics/yolov5: v3. 1-bug fixes and performance improvements”. In: Zenodo (2020). https://doi.org/10.5281/zenodo.4154370. [25] Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. “Places: A 10 mil- lion image database for scene recognition”. In: IEEE transactions on pattern analysis and machine intelli- gence 40.6 (2017), pp. 1452–1464. https : / / doi . org/10.1109/tpami.2017.2723009. [26] Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. “Generalized intersection over union: A metric and a loss for bounding box regression”. In: Pr oceedings of the IEEE/CVF confer ence on computer vision and pattern r ecognition. 2019, pp. 658–666. https : / / doi.org/10.1109/cvpr.2019.00075. [27] Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al. “Visual genome: Connecting lan- guage and vision using crowdsourced dense image annotations”. In: International journal of computer vision 123 (2017), pp. 32–73. https : / / doi . org / 10.1007/s11263- 016- 0981- 7. [28] Markos Diomataris, Nikolaos Gkanatsios, Vassilis Pitsikalis, and Petros Maragos. “Grounding consis- tency: Distilling spatial common sense for precise visual relationship detection”. In: Pr oceedings of the IEEE/CVF International Confer ence on Computer 82 Informatica 48 (2024) 65–82 M. Belkebir et al. V ision. 2021, pp. 15911–15920. https : / / doi . org/10.1109/iccv48922.2021.01561. [29] Gruber Tom. “Toward principles for the design of ontologies used for knowledge sharing”. In: Int. W orkshop on Formal Ontology , 1993. 1993. https: //doi.org/10.1006/ijhc.1995.1081. [30] Mike Uschold and Michael Gruninger. “Ontolo- gies: Principles, methods and applications”. In: The knowledge engineering r eview 11.2 (1996), pp. 93– 136. https : / / doi . org / 10 . 1017 / s0269888900007797. [31] Hannah A Valantine and Francis S Collins. “National Institutes of Health addresses the science of diver- sity”. In: Pr oceedings of the National Academy of Sciences 112.40 (2015), pp. 12240–12242. https : //doi.org/10.1073/pnas.1515612112. [32] Michael R Genesereth and Nils J Nilsson. “Logical foundations of Artificial Intelligence”. In: New Y ork: Mor gan Kaufmann Publishers (1987). https : / / doi.org/10.1016/c2009- 0- 27551- 9. [33] Lamy Jean-Baptiste and Lamy Jean-Baptiste. “The Python language: Adopt a snake!” In: Ontolo- gies with Python: Pr ogramming OWL 2.0 Ontolo- gies with Python and Owlr eady2 (2021), pp. 9–48. https : / / doi . org / 10 . 1007 / 978 - 1 - 4842 - 6552- 9_2. [34] Markus Krötzsch, Frantisek Simancik, and Ian Hor- rocks. “A description logic primer”. In: arXiv pr eprint arXiv:1201.4089 (2012). https : / / doi . org/10.48550/arXiv.1201.4089. [35] Martin J O’Connor, Ravi D Shankar, Mark A Musen, Amar K Das, and Csongor Nyulas. “The SWR- LAPI: A Development Environment for Working with SWRL Rules.” In: OWLED. 2008. [36] Martin J O’Connor and Amar K Das. “SQWRL: a query language for OWL.” In: OWLED. V ol. 529. 2009, pp. 1–8. [37] Gunnar A Sigurdsson, Gül Varol, Xiaolong Wang, Ali Farhadi, Ivan Laptev, and Abhinav Gupta. “Hol- lywood in homes: Crowdsourcing data collection for activity understanding”. In: Computer V ision– ECCV 2016: 14th Eur opean Confer ence, Amster - dam, The Netherlands, October 1 1–14, 2016, Pr o- ceedings, Part I 14. Springer. 2016, pp. 510–526. https : / / doi . org / 10 . 1007 / 978 - 3 - 319 - 46448- 0_31. [38] Chenliang Xu, Shao-Hang Hsieh, Caiming Xiong, and Jason J Corso. “Can humans fly? action under- standing with multiple classes of actors”. In: Pr o- ceedings of the IEEE confer ence on computer vi- sion and pattern r ecognition. 2015, pp. 2264–2273. https : / / doi . org / 10 . 1109 / cvpr . 2015 . 7298839. [39] Evren Sirin, Bijan Parsia, Bernardo Cuenca Grau, Aditya Kalyanpur, and Yarden Katz. “Pellet: A prac- tical owl-dl reasoner”. In: Journal of W eb Semantics 5.2 (2007), pp. 51–53. https : / / doi . org / 10 . 2139/ssrn.3199351.