https://doi.or g/10.31449/inf.v47i2.4933 Informatica 47 (2023) 295–296 295 Detecting T emporal and Spatial Anomalies in Users’ Activities for Security Pr ovisioning in Computer Networks Aleks Huč University of Ljubljana, Faculty of Computer and Information Science, Ljubljana, Slovenia aleks.huc@fri.uni-lj.si Thesis Summary Keywords: anomaly detection, incremental learning, unsupervised learning, clustering, adaptive windowing, profiling, network security , network flows Received: June 1, 2023 The paper summarizes a Doctoral Thesis that focuses on two new appr oaches for detecting anomalies in computer networks based on network flows. The appr oaches use incr emental hierar chical clustering algorithms and monitor changes in the data structur es to detect anomalies. Both appr oaches achieved pr e- diction performance comparable to the state-of-the-art supervised appr oaches (F1 scor e over 0.90), even when taking into account that our appr oaches see every data point only once and then discar d it and they operate without the pr er equisite learning phase with labeled data. Povzetek: Članek povzema vsebino doktorske disertacije, v kateri se osr edotočimo na dva nova pristopa za detekcijo anomalij v računalniških omr ežjih. Pristopa temeljita na omr ežnih tokovih, inkr ementalnem hierar hičnem gručenju in spr emljanju spr ememb v podatkovnih strukturah z namenom detekcije anoma- lij. Oba pristopa dosežeta primerljivo stopnjo detekcije (mera F1 pr eko 0.90) v primerjavi z najnovejšimi nadzor ovanimi metodami, tudi ko upoštevamo, da naša pristopa vidita vsak podatek le enkrat in ga nato pozabita ter delujeta br ez pr edhodne faze učenja z označenimi podatki. 1 Intr oduction The goal of computer network security is to provide a se- cure environment for a computer network, its resources, data in storage and transit and all its users [1]. Network security starts with intrusion detection, which is defined as a deliberate unauthorized attempt (successful or not) by an intruder to gain access to, manipulate or misuse a computer system or network [1]. Examples include T rojans, viruses, malware and denial of service, brute force and probe at- tacks. Over the years of active development, two main cat- egories of intrusion detection approaches have emer ged: signature-based and anomaly-based [2]. Signature-based approaches detect intrusions on the basis of signature databases while anomaly-based approaches detect intru- sions on the basis of deviations from normal activity mod- els. V arious anomaly detection approaches have already been proposed but have problems with today’ s dynamic computer networks with lar ge volume and high velocity , variety and variability due to their use of supervised and batch learning. Newer methods have switched to unsu- pervised, incremental and adaptable methods to improve upon and augment traditional approaches and provide over - all better anomaly detection. This paper summarizes a Doctoral Thesis [3] that pro- vides two new approaches for improving the current state- of-the-art anomaly detection using unsupervised, incre- mental, adaptable and hierarchical clustering. 2 PHICAD PHICAD (Profile- and Hierarchical Incremental Clustering-based Anomaly Detection) is a single-layer , unsupervised and incremental algorithm that detects network activity anomalies in real-time. The input is a stream of chronologically ordered flows. The algorithm receives a new flow and sends it to the appropriate two profiles based on source and destination IP addresses. A profile models the incoming and outgoing activity of an individual network entity . The algorithm then extracts, transforms, and normalizes the features from the flow into a real-valued vector . The vector is then clustered inside the appropriate profile hierarchical clustering tree structure. The anomalies are determined in the leaf nodes where, if the new vector is mer ged with the existing leaf, we track the distance between the new vector and the leaf, the leaf cen- troid changes and the leaf size; or if the new vector becomes a new leaf, we track the distance between the new leaf and the centroid of neighboring leaves. The predictions from all detection mechanisms are put into a short-term model that discards mechanisms that trigger too often and reports final predictions. 296 Informatica 47 (2023) 295–296 A. Huč 3 PHI2CAD PHI2 CAD (Profile- and Hierarchical Incremental T wo- layer Clustering-based Anomaly Detection) builds upon our single-layer PHICAD with an additional second layer unsupervised and incremental clustering algorithm which detects anomalies in profiles and groups of profiles. The input into our approach is again a stream of chrono- logically ordered flows. First, the flow is sent to the first layer where the PHICAD algorithm creates and updates profiles of network entities and detects network anomalies in each individual profile separately . For each flow , the PHICAD produces two updated profiles with predictions for possible anomalies, one for the source and one for the destination IP address, which are then sent to the PHI2 CAD algorithm on the second layer . PHI2 CAD first checks for each updated profile if it has already been clustered into its tree data structure and if it has been, it checks if the updated profile is still inside the leaf or not. If it is still inside, we check for possible anoma- lies caused by the updated profile and produce possible anomaly predictions, by tracking the distance between the updated profile and the leaf, the leaf centroid changes, and the leaf size. Otherwise, if an updated profile has not been clustered yet or it falls outside the leaf it has previously been clustered to, we cluster the updated profile into PHI2 CAD tree data structure, while its previous version, if it exists, is removed from the tree. Finally , possible anomaly predictions are de- termined in the leaf to which the updated profile has been clustered. If the updated profile is mer ged with the exist- ing leaf, we track the distance between the updated profile and the leaf, the leaf centroid changes and the leaf size; or if the updated profile becomes a new leaf, we track the dis- tance between the new leaf and the centroid of the neighbor - ing leaves. The predictions from all detection mechanisms are input into a short-term model that discards mechanisms which trigger too often and reports the final predictions. 4 Conclusion The goal of this dissertation was to research if we can de- vise an anomaly detection approach with the following op- erational constraints: incremental execution, unsupervised learning, real-time response, ability to analyze lar ge data sets, lightweight design and ability to adapt to changes over time, while still providing comparable performance to clas- sic approaches and/or providing us with additional new in- sights. W e have evaluated our two approaches using a state-of- the-art data set CICIDS2017 [4] that comprises the most common network anomalies. T o measure the predictive performance we used standard machine learning metrics such as precision, recall and F1 score and also the execu- tion time against the supervised approaches. T o further ex- plain the achieved prediction performance we analyzed the influence of individual features on the predictions and per - formed sensitivity analysis of the main parameters. Our approaches can successfully detect Denial of Ser - vice, Distributed Denial of Service, Port Scan and W eb at- tacks when analyzing each anomaly separately and are also able to detect anomalies even when they analyze entire data sets with multiple types of anomalies. Performance is good where anomalous patterns clearly dif fer from the normal activity (F1 score over 0. 90 ), however , they have prob- lems detecting attacks that are presented with flows that are similar to normal flows or that are executed on higher lay- ers of the network stack or are a part of packet payloads. But we have to be mindful of the diminishing importance of packet-payload analysis, due to the increasing use of packet-payload encryption. The results were also published in a peer -reviewed journal paper [5]. Refer ences [1] Kizza, J. M. (2020) Guide to computer network secu- rity , Springer . [2] Thakkar , A. and Lohiya, R. (2021) A survey on in- trusion detection system: feature selection, model, performance measures, application perspective, chal- lenges, and future research directions, Artificial Intel- ligence Review , Springer , pp. 1–1 1 1. [3] Huč, A. (2022) Detecting temporal and spa- tial anomalies in users’ activities for security pr ovisioning in computer networks , doktorska disertacija, Ljubljana, https://repozitorij.uni- lj.si/IzpisGradiva.php?id=137562. [4] Sharafaldin, I. and Lashkari, A. H. and Ghorbani, A. A. (2018) T oward Generating a New Intrusion Detec- tion Dataset and Intrusion T raf fic Characterizationy , 4th International Confer ence on Information Systems Security and Privacy (ICISSP) , pp. 108–1 16. [5] Huč, A. and T rček, D. (2021) Anomaly detection in IoT networks: From architectures to machine learning transparency , IEEE Access , IEEE, pp. 60607–60616.